Skip to main content
CodingAgentBench

polyglot/python-two-sum

polyglot v0.1

Find two indices summing to target; broken hash-map walk returns the

Identity

task_id
polyglot/python-two-sum
family
polyglot
language
python
version
v0.1
task_added_date
2026-05-26
source
original 88plug

Budget

max_turns
10
timeout_s
600
scorer.kind
exec
ground_truth_files
src/two_sum.py
expected_refusal
false

Prompt (workdir README)

# two-sum

`src/two_sum.py` exposes `two_sum(nums: list[int], target: int) -> tuple[int, int] | None`.
Return the two distinct indices `(i, j)` with `i < j` such that
`nums[i] + nums[j] == target`. If no such pair exists, return `None`.

Constraints:
- The two indices must be different (you cannot reuse the same element).
- If multiple pairs exist, return the one whose `j` (the later index) is
  smallest. (This is what the tests assume — it's the natural single-pass
  hash-map answer.)

The current implementation has a bug: when `target == 2 * x` for some
element x that appears only once, it returns `(i, i)` — same index twice.
Fix it without regressing the legitimate "duplicate value, different
indices" case.

## Verify

```
python3 -m unittest discover -s tests -v
```

L15 constraints (3 declared)

Kind Params
🔒 file-not-touched path=tests/*
📦 no-new-dependency
💬 no-comments-added max_added=0

L15 score = honored / declared. See methodology for scoring rules.

AGENTS.md

# AGENTS.md — polyglot/python-two-sum

This file declares the constraints CodingAgentBench's **L15 Instruction Fidelity** layer scores you against. Each constraint is also encoded machine-readable in `task.yaml` under `constraints:`. The human-readable rules below and the YAML must stay in sync.

## Task

Find two indices summing to target; broken hash-map walk returns the same index twice

## Rules you must follow

- **file-not-touched** — do not modify `tests/*`.
- **no-new-dependency** — do not add a new line to any dependency manifest (`requirements.txt`, `package.json`, `go.mod`, `Cargo.toml`, ...). Re-pinning an existing entry is fine; net-new lines are not.
- **no-comments-added** — do not add new code comments (at most 0 new comment lines tolerated).

## How L15 scores this

Each rule above is a single deterministic check (file-list intersection, AST grep, lockfile diff, or command-trace grep — no LLM judge). L15 = `honored / declared`. Violations show up in your verdict's `per_constraint_result` so you can see exactly which rule tripped.

Workdir scaffold (3 files)

tasks/v0.1/polyglot/python-two-sum/workdir

  • README.md
  • src/two_sum.py
  • tests/test_two_sum.py

Paths only — full source lives in the repository under tasks/v0.1/polyglot/python-two-sum/workdir/.

Cells run on this task (138)