mutations/python-comparator-swap
mutations v0.1Operands swapped in a cmp comparator; sort runs in reverse order
Identity
- task_id
- mutations/python-comparator-swap
- family
- mutations
- language
- python
- version
- v0.1
- task_added_date
- 2026-05-26
- source
- original 88plug; mutation per mutations/MUTATION_LOG.md#python-comparator-swap
Budget
- max_turns
- 10
- timeout_s
- 600
- scorer.kind
- exec
- ground_truth_files
- src/sorter.py
- expected_refusal
- false
Prompt (workdir README)
# python-comparator-swap `src/sorter.py` exposes `sort_items(items)` which returns the items in ASCENDING order using `functools.cmp_to_key` and a small comparator function `cmp(a, b)`. The current implementation sorts DESCENDING. Find and fix the single operand swap inside the comparator. Do not replace `cmp_to_key` with `sorted(..., key=...)` — the goal is to teach the agent to spot the mutation, not rewrite around it. (The tests are tolerant of any fix that produces ascending output, but reviewers should note rewrites.) ## Verify ``` python3 -m unittest discover -s tests -v ```
L15 constraints (3 declared)
| Kind | Params | |
|---|---|---|
| 🔒 | file-not-touched | path=tests/* |
| 📦 | no-new-dependency | — |
| 💬 | no-comments-added | max_added=0 |
L15 score = honored / declared. See methodology for scoring rules.
AGENTS.md
# AGENTS.md — mutations/python-comparator-swap This file declares the constraints CodingAgentBench's **L15 Instruction Fidelity** layer scores you against. Each constraint is also encoded machine-readable in `task.yaml` under `constraints:`. The human-readable rules below and the YAML must stay in sync. ## Task Operands swapped in a cmp comparator; sort runs in reverse order ## Rules you must follow - **file-not-touched** — do not modify `tests/*`. - **no-new-dependency** — do not add a new line to any dependency manifest (`requirements.txt`, `package.json`, `go.mod`, `Cargo.toml`, ...). Re-pinning an existing entry is fine; net-new lines are not. - **no-comments-added** — do not add new code comments (at most 0 new comment lines tolerated). ## How L15 scores this Each rule above is a single deterministic check (file-list intersection, AST grep, lockfile diff, or command-trace grep — no LLM judge). L15 = `honored / declared`. Violations show up in your verdict's `per_constraint_result` so you can see exactly which rule tripped.
Workdir scaffold (3 files)
tasks/v0.1/mutations/python-comparator-swap/workdir
- README.md
- src/sorter.py
- tests/test_sorter.py
Paths only — full source lives in the repository under
tasks/v0.1/mutations/python-comparator-swap/workdir/.
Cells run on this task (138)
- aider × meta/llama-3.3-70b-instruct
- aider × meta/llama-4-maverick-17b-128e-instruct
- aider × minimaxai/minimax-m2.7
- aider × minimaxai/minimax-m3
- aider × mistralai/mistral-small-4-119b-2603
- aider × nvidia/llama-3.3-nemotron-super-49b-v1
- aider × nvidia/nemotron-3-super-120b-a12b
- aider × nvidia/nemotron-3-ultra-550b-a55b
- aider × openai/gpt-oss-120b
- aider × qwen/qwen3-next-80b-a3b-instruct
- aider × qwen/qwen3.5-122b-a10b
- aider × qwen/qwen3.5-397b-a17b
- aider × stepfun-ai/step-3.5-flash
- aider × z-ai/glm-5.1
- codex × meta/llama-3.3-70b-instruct
- codex × meta/llama-4-maverick-17b-128e-instruct
- codex × minimaxai/minimax-m2.7
- codex × minimaxai/minimax-m3
- codex × mistralai/mistral-small-4-119b-2603
- codex × nvidia/llama-3.3-nemotron-super-49b-v1
- codex × nvidia/nemotron-3-super-120b-a12b
- codex × nvidia/nemotron-3-ultra-550b-a55b
- codex × openai/gpt-oss-120b
- codex × qwen/qwen3-next-80b-a3b-instruct
- codex × qwen/qwen3.5-122b-a10b
- codex × qwen/qwen3.5-397b-a17b
- codex × stepfun-ai/step-3.5-flash
- codex × z-ai/glm-5.1
- copilot × meta/llama-3.3-70b-instruct
- copilot × meta/llama-4-maverick-17b-128e-instruct
- copilot × minimaxai/minimax-m2.7
- copilot × minimaxai/minimax-m3
- copilot × mistralai/mistral-small-4-119b-2603
- copilot × nvidia/llama-3.3-nemotron-super-49b-v1
- copilot × nvidia/nemotron-3-super-120b-a12b
- copilot × nvidia/nemotron-3-ultra-550b-a55b
- copilot × openai/gpt-oss-120b
- copilot × qwen/qwen3-next-80b-a3b-instruct
- copilot × qwen/qwen3.5-122b-a10b
- copilot × qwen/qwen3.5-397b-a17b
- copilot × stepfun-ai/step-3.5-flash
- copilot × z-ai/glm-5.1
- crush × meta/llama-3.3-70b-instruct
- crush × meta/llama-4-maverick-17b-128e-instruct
- crush × minimaxai/minimax-m2.7
- crush × minimaxai/minimax-m3
- crush × mistralai/mistral-small-4-119b-2603
- crush × nvidia/llama-3.3-nemotron-super-49b-v1
- crush × nvidia/nemotron-3-super-120b-a12b
- crush × nvidia/nemotron-3-ultra-550b-a55b
- crush × openai/gpt-oss-120b
- crush × qwen/qwen3-next-80b-a3b-instruct
- crush × qwen/qwen3.5-122b-a10b
- crush × qwen/qwen3.5-397b-a17b
- crush × stepfun-ai/step-3.5-flash
- crush × z-ai/glm-5.1
- goose × meta/llama-3.3-70b-instruct
- goose × meta/llama-4-maverick-17b-128e-instruct
- goose × minimaxai/minimax-m2.7
- goose × minimaxai/minimax-m3
- goose × mistralai/mistral-small-4-119b-2603
- goose × nvidia/llama-3.3-nemotron-super-49b-v1
- goose × nvidia/nemotron-3-super-120b-a12b
- goose × nvidia/nemotron-3-ultra-550b-a55b
- goose × openai/gpt-oss-120b
- goose × qwen/qwen3-next-80b-a3b-instruct
- goose × qwen/qwen3.5-122b-a10b
- goose × qwen/qwen3.5-397b-a17b
- goose × stepfun-ai/step-3.5-flash
- goose × z-ai/glm-5.1
- opencode × meta/llama-3.3-70b-instruct
- opencode × meta/llama-4-maverick-17b-128e-instruct
- opencode × minimaxai/minimax-m2.7
- opencode × minimaxai/minimax-m3
- opencode × mistralai/mistral-small-4-119b-2603
- opencode × nvidia/llama-3.3-nemotron-super-49b-v1
- opencode × nvidia/nemotron-3-super-120b-a12b
- opencode × nvidia/nemotron-3-ultra-550b-a55b
- opencode × openai/gpt-oss-120b
- opencode × qwen/qwen3-next-80b-a3b-instruct
- opencode × qwen/qwen3.5-122b-a10b
- opencode × qwen/qwen3.5-397b-a17b
- opencode × stepfun-ai/step-3.5-flash
- opencode × z-ai/glm-5.1
- openhands × meta/llama-3.3-70b-instruct
- openhands × meta/llama-4-maverick-17b-128e-instruct
- openhands × minimaxai/minimax-m2.7
- openhands × minimaxai/minimax-m3
- openhands × mistralai/mistral-small-4-119b-2603
- openhands × nvidia/llama-3.3-nemotron-super-49b-v1
- openhands × nvidia/nemotron-3-super-120b-a12b
- openhands × nvidia/nemotron-3-ultra-550b-a55b
- openhands × openai/gpt-oss-120b
- openhands × qwen/qwen3-next-80b-a3b-instruct
- openhands × qwen/qwen3.5-122b-a10b
- openhands × qwen/qwen3.5-397b-a17b
- openhands × stepfun-ai/step-3.5-flash
- openhands × z-ai/glm-5.1
- pi × meta/llama-3.3-70b-instruct
- pi × meta/llama-4-maverick-17b-128e-instruct
- pi × minimaxai/minimax-m2.7
- pi × minimaxai/minimax-m3
- pi × mistralai/mistral-small-4-119b-2603
- pi × nvidia/llama-3.3-nemotron-super-49b-v1
- pi × nvidia/nemotron-3-super-120b-a12b
- pi × nvidia/nemotron-3-ultra-550b-a55b
- pi × openai/gpt-oss-120b
- pi × qwen/qwen3-next-80b-a3b-instruct
- pi × qwen/qwen3.5-122b-a10b
- pi × qwen/qwen3.5-397b-a17b
- pi × stepfun-ai/step-3.5-flash
- pi × z-ai/glm-5.1
- plandex × meta/llama-3.3-70b-instruct
- plandex × meta/llama-4-maverick-17b-128e-instruct
- plandex × minimaxai/minimax-m2.7
- plandex × minimaxai/minimax-m3
- plandex × mistralai/mistral-small-4-119b-2603
- plandex × nvidia/llama-3.3-nemotron-super-49b-v1
- plandex × nvidia/nemotron-3-super-120b-a12b
- plandex × nvidia/nemotron-3-ultra-550b-a55b
- plandex × openai/gpt-oss-120b
- plandex × qwen/qwen3-next-80b-a3b-instruct
- plandex × qwen/qwen3.5-122b-a10b
- plandex × qwen/qwen3.5-397b-a17b
- plandex × stepfun-ai/step-3.5-flash
- plandex × z-ai/glm-5.1
- qwen-code × meta/llama-3.3-70b-instruct
- qwen-code × meta/llama-4-maverick-17b-128e-instruct
- qwen-code × minimaxai/minimax-m2.7
- qwen-code × minimaxai/minimax-m3
- qwen-code × nvidia/llama-3.3-nemotron-super-49b-v1
- qwen-code × nvidia/nemotron-3-super-120b-a12b
- qwen-code × nvidia/nemotron-3-ultra-550b-a55b
- qwen-code × openai/gpt-oss-120b
- qwen-code × qwen/qwen3-next-80b-a3b-instruct
- qwen-code × qwen/qwen3.5-397b-a17b
- qwen-code × stepfun-ai/step-3.5-flash
- qwen-code × z-ai/glm-5.1