Skip to main content
CodingAgentBench

Pi

badlogic's minimal, token-efficient harness for hackable BYO-model runs.

v0.73.1 upstream ↗

First-look recording not yet available.

Version history

Version Composite Pass-rate Cells
v0.73.1 0.558 27.4% 350 rows →

Performance is tracked per pinned upstream version. New releases are auto-rebenched and a fresh "watch it run" is captured — the trend grows here.