Skip to main content
CodingAgentBench

CodingAgentBench · how to cite

Cite the cells we ran

Pick the format your venue requires. We list the four common ones below. The Zenodo DOI lands here once the first sweep is tagged; until then the placeholder string is stable enough to drop in a draft.

Citation formats

Tap "Copy" beside any block. Each is exactly what you paste into your manuscript or reference manager.

BibTeX

@misc{codingagentbench2026,
  title  = {CodingAgentBench: a multi-CLI agent benchmark for open coding models},
  author = {Mello, Andrew},
  year   = {2026},
  month  = {May},
  version = {0.1},
  doi    = {10.5281/zenodo.PENDING},
  url    = {https://codingagentbench.com},
  note   = {Methodology v0.1. Code Apache-2.0, data CC-BY-4.0.}
}

APA 7

Mello, A. (2026). CodingAgentBench: a multi-CLI agent benchmark for open coding models (Version 0.1) [Dataset]. 88plug AI Lab. https://doi.org/10.5281/zenodo.PENDING

Chicago author-date

Andrew Mello. 2026. "CodingAgentBench: a multi-CLI agent benchmark for open coding models." Version 0.1. 88plug AI Lab. https://doi.org/10.5281/zenodo.PENDING.

IEEE

A. Mello, "CodingAgentBench: a multi-CLI agent benchmark for open coding models," 88plug AI Lab, ver. 0.1, 2026. [Online]. Available: https://doi.org/10.5281/zenodo.PENDING

DOI is pending; the placeholder 10.5281/zenodo.PENDING resolves to a redirect page until Zenodo cuts the v0.1 record.

Sample sentences

Three styles for three audiences. Copy as a template, then swap in the cell you ran.

  • News

    CodingAgentBench reports that Aider with Llama 3.3 70B shipped on N of M held-out tasks, ahead of OpenHands on the same model.

  • Academic

    Per [CodingAgentBench, 2026], cross-CLI variance on the integrity tasks exceeds cross-model variance (n tasks, k reruns).

  • Practitioner

    on CodingAgentBench, Crush + GPT-OSS 120B passed P% of the polyglot suite at a measured token cost per cell.

Citation guidelines

  • Cite the cells we ran. Do not cite cells we did not run.
  • Name the methodology version. Scores anchor to a specific version.
  • Include the sha256 receipt when claiming a specific cell result.
  • Link to the run page on codingagentbench.com so readers can rerun the row.