Aider
aider.chat
Aider-AI's terse, git-aware editor for clean, diff-by-diff commits.
Asks before it edits, then stages a tidy commit. The diff-first agent that treats your repo like a co-author would.
Field guide
Ten coding agents — nine open-source, one proprietary with BYO-model support. We tested them all on the same tasks, with the same models, and recorded what each one did.
aider.chat
Aider-AI's terse, git-aware editor for clean, diff-by-diff commits.
Asks before it edits, then stages a tidy commit. The diff-first agent that treats your repo like a co-author would.
github.com/sst/opencode
sst's polished TypeScript TUI for interactive multi-file refactors.
A modern terminal cockpit with first-class MCP support and a BYO-model surface. Active maintenance, sensible defaults.
github.com/block/goose
block's MCP-native Rust agent for tool-heavy automation workflows.
Funded by Block. Speaks MCP fluently and treats external tools as first-class citizens, not afterthoughts.
github.com/All-Hands-AI/OpenHands
All-Hands-AI's sandboxed runtime for long-horizon, end-to-end coding tasks.
Runs the agent inside a real sandbox with browser, shell, and editor. Built to finish multi-hour jobs unattended.
github.com/charmbracelet/crush
charmbracelet's slick keyboard-first TUI for quick interactive edits.
The Charm studio's coding agent. Lean, opinionated, and visually delightful. Built for the terminal you already live in.
github.com/plandex-ai/plandex
plandex-ai's plan-first agent for sweeping multi-step changes you can review.
Drafts an explicit plan before touching files. Branches, reviews, and rollbacks are first-class, not bolted on.
github.com/QwenLM/qwen-code
QwenLM's reference CLI for the Qwen coder model family.
Alibaba's own harness for their coder models. A useful calibration baseline when comparing community TUIs head to head.
github.com/openai/codex
openai's open-source agentic CLI for BYO-model autonomous one-shot tasks.
OpenAI's own Apache-2.0 coding agent. BYO any OpenAI-compatible endpoint via config overrides — no OpenAI auth required. Runs an autonomous exec loop that plans, edits, and verifies.
github.com/features/copilot
github's Copilot CLI for BYOK-powered open-weight model testing.
GitHub's official terminal coding agent. As of the April 2026 BYOK release, any OpenAI-compatible provider plugs in via env vars. The largest installed-base terminal assistant, now testable on open-weight models.
github.com/badlogic/pi-mono
badlogic's minimal, token-efficient harness for hackable BYO-model runs.
The smallest, clearest coding-agent harness you can adapt. Open-source (MIT), token-efficient, and built to be forked and hacked on.
Long-form evaluation log of every coding-agent TUI / CLI that CodingAgentBench has considered. Each entry includes the GitHub URL, license, primary implementation language, an approximate star count (as of 2026-05, in ~K notation — we do not claim precision), BYO-model and MCP support flags, the CodingAgentBench decision (MVP / Phase 1 / Phase 2 / Skip), and a one-sentence rationale.
Star counts are point-in-time snapshots; they will drift. Where the count is rounded heavily it is because we do not believe finer precision adds signal.
For graveyard entries (archived, abandoned, pivoted, vendor-retired) see graveyard.md. For closed-source vendor TUIs, see the FAQ.
Ten TUIs. All support a configurable OpenAI-compatible endpoint and are Dockerizable. Nine are open-source; GitHub Copilot CLI is proprietary but BYO-model capable via its April 2026 BYOK release.
Aider-AI/aider-c model_providers.* config overrides; points at any OpenAI-compatible endpoint including the harness LiteLLM proxy. No OpenAI auth required in BYO mode.@github/copilot)COPILOT_PROVIDER_TYPE / COPILOT_PROVIDER_BASE_URL / COPILOT_PROVIDER_API_KEY / COPILOT_MODEL. In COPILOT_OFFLINE=true mode, no GitHub auth or telemetry — the harness endpoint is the only egress.@mariozechner/pi-coding-agent)~/.pi/agent/models.json; openai-completions API wired to any OpenAI Chat-Completions endpoint. No Responses API, no telemetry.Real projects, viable on paper, that need a smoke pass and Dockerfile before they enter the matrix.
willmcgugan/toad depending on final namespace)Kilo-Org/kilocodegraveyard.md) and continuation of the Cline lineage. Large existing user base. Needs Dockerfile + adapter; will enter at next sweep.Tools we are aware of but have not yet evaluated in depth. No commitment.
geekan/MetaGPTMulti-agent role-playing framework. May or may not fit the controlled-matrix model. On the watchlist (see graveyard.md).
cline/clineOriginally a VS Code extension; CLI-mode maturity is uncertain. Lineage continued through Kilo Code (Phase 1).
Closed-source. See Skip section.
The following are vendor-locked, model-bundled, closed-source, or all three. They are out of scope by manifesto commitment #1 and the controlled-experiment design (see manifesto-faq.md).
graveyard.md.(Note: Codex CLI and GitHub Copilot CLI were originally in this list. Both shipped BYO-model/BYOK support in early 2026 — see their MVP entries above.)
These tools may be excellent. They are simply not what CodingAgentBench measures. A separate "vendor reference" column may appear in future versions for context.
See graveyard.md for full entries on Roo Code, Open Interpreter, AutoGPT, GPT-Pilot, Devika, Continue, Smol Developer, and others.
CodingAgentBench v0.2 adopts Edgar Morin's framework of complex thought as its epistemological foundation, with five additional scoring layers (L10–L14) that operationalize the framework's operators. See morin-framework.md for the full exposition and ../METHODOLOGY.md §14 for the methodology-level cross-reference.
TUIs do not need to know any of this. The Morin probes operate at the eval layer, not the harness layer. A TUI's responsibility ends where it always has — accept a workdir, accept a task description, accept an OpenAI-compatible endpoint, produce a final commit. Everything Morin-related happens after the container exits.
Concretely:
codingagentbench:eval notice in the prompt; again, normal task input from the TUI's perspective.This is by design. The Morin framework is a methodology-side commitment, not a TUI integration requirement. The criterion for inclusion in the MVP sweep remains what it has always been: open-source, OpenAI-compatible endpoint, Dockerizable, see §2 of ../METHODOLOGY.md. No new integration work is asked of any TUI listed in this catalog.
If a future TUI wishes to explicitly cooperate with Morin probes (for example, by exposing its own self-mutation hooks or by structuring its commit history for cleaner L11 continuation), we welcome the cooperation but do not require it. The benchmark is designed to work with TUIs as they are.
If your favorite TUI is not in this catalog and you believe it should be, open an issue with the tui-request label or submit a PR per docs/submitting.md. Nominations are evaluated on a rolling basis and decisions land in this document.
Catalog maintained for methodology v0.1, May 2026. Star counts as of 2026-05 and will not be silently updated; they are refreshed at each methodology version bump and the prior values archived.