Skip to main content
CodingAgentBench

Cost · Latency · Quality · Blast

Cost-vs-quality tradeoffs

Sometimes you'd pay 3× more for 5% better. Sometimes you wouldn't. This page shows you which is which.

What this page is

  • What you'll find: charts of every CLI on four axes (cost, latency, quality, blast radius).
  • Who it's for: people picking under a budget.
  • How to switch: toggle to Nerds mode for the full 4-axis chart with brush-to-filter axes.
Take me back home

4-axis dominance · 3,465 cells

Pareto Frontier

Every cell is plotted across four axes: cost (USD), latency (ms), quality (composite), and blast-radius (collateral file edits, lines-weighted). A cell is on the frontier iff no other cell beats it on every axis simultaneously. Dominated cells fade to 20% opacity so the frontier stands out. Artificial Analysis stops at 2D, syftr at 3. CodingAgentBench is the first to publish 4D.

Methodology v0.1 | Pinned to image digests as of 2026-06-20
3465 rows · 10 TUIs · 14 models · 3 categories

How to read it. Toggle between Parallel coordinates (all 4 axes on one chart, polylines per cell) and 2×2 scatter small-multiples (six pairwise views). Drag any parallel-axis to brush a range; selected cells highlight across all panels. Frontier cells are solid; dominated cells are dimmed but still visible — we don't hide honest losers.

Dominance rule. Cell A dominates cell B iff A is ≥ B on cost (lower-better), latency (lower-better), quality (higher-better), and blast (lower-better), and strictly better on at least one axis. Implementation: harness/scorers/pareto.py. Per-cell cost roll-up: harness/tracing/cost.py.