Skip to main content
CodingAgentBench

A defensive program

Help us keep the bench honest.

Other benchmarks rotted into a leaderboard for the exploit, not the capability. We invite the attack early, credit researchers, and wire every accepted finding back into the bench.

Switch to Nerds mode to read the scope, the rubric, and the fix log.

Defensive disclosure program · v0.1

Help us harden CodingAgentBench.

Every static benchmark eventually rots into a leaderboard for the exploit. We invite the attack and wire every accepted finding back into the bench as a new Reality Engine layer or a task refinement. The catalogue below is the public ledger of that work.

Triage verdict
5 business days
Acknowledgement
2 business days
Top tier reward
$2 000
Exploit code on-site
Never

01 Scope

In scope

  • The CodingAgentBench harness (harness/**) and its sandbox enforcement.
  • Reality Engine layers (plugins/reality/**) — refusal and blast-radius accounting.
  • Task definitions (tasks/v*/) — refusal-expected flags, gold answers, prompt sanitisation.
  • The composite score formula and per-axis weightings.
  • The published artefact pipeline (results/published/**, dataset card, content hashes).
  • Leaderboard / cube / notebook renders, where a misrender would materially mislead a reader.

Out of scope

  • Attacks requiring physical access to the runner host.
  • DOS against the static site — that's the CDN's problem.
  • Anything needing non-public model weights or paid API access.
  • Vulnerabilities inside upstream TUIs — disclose those to the TUI vendor.
  • Issues already in the public catalogue at /security/exploits.

02 Severity rubric

Four tiers. We grade on reach and magnitude, not on offensive sophistication.

critical $2 000

Lifts ≥10 percent of cells to ceiling, or breaks trace integrity.

Example. A prompt that silently bypasses the refusal gate for an entire task family.

high $750

Single cell trivially passes, or a single TUI gains a measurable metric advantage.

Example. A whitespace canonicalisation gap that flips a strict-equality grader.

medium $200

Measurable bias visible across the bench at small magnitude.

Example. A locale-dependent tokenizer drift that nudges scores by a fraction.

low credit

Effect within published run-to-run variance, or a UX or rendering annoyance.

Example. A leaderboard tooltip that truncates a long model name.

Full rubric with worked examples lives in docs/exploit-rubric.md. Final tier is set by the triage committee.

03 How to report

Four steps. The full protocol lives on /security/submit.

  1. Open a GitHub issue

    Use the disclosure template. The issue is the authoritative record for every finding we triage.

  2. Sign your disclosure

    PGP, signed commit, or ssh-keygen -Y sign. Signing protects both sides against impersonation and replay.

  3. Describe in prose, not in code

    We do not host reproducer code. Link to a reproducer in your repo and describe the attack class in prose.

  4. We triage within 5 business days

    Accept, reject, or need-info. Accepted findings land as a Reality Engine layer or task refinement.

04 Response SLA

Within 2 business days Acknowledgement that we received your disclosure.
Within 5 business days Triage verdict: accept, reject, or need-info with a 14-day clock.
If accepted A fix lands in the bench. Writeup and credit are published.
If rejected A written reason. You may appeal once with new evidence.