TUI Catalog

Long-form evaluation log of every coding-agent TUI / CLI that CodingAgentBench has considered. Each entry includes the GitHub URL, license, primary implementation language, an approximate star count (as of 2026-05, in ~K notation — we do not claim precision), BYO-model and MCP support flags, the CodingAgentBench decision (MVP / Phase 1 / Phase 2 / Skip), and a one-sentence rationale.

Star counts are point-in-time snapshots; they will drift. Where the count is rounded heavily it is because we do not believe finer precision adds signal.

For graveyard entries (archived, abandoned, pivoted, vendor-retired) see graveyard.md. For closed-source vendor TUIs, see the FAQ.

MVP — Phase 0 sweep (v0.1, May 2026)

Ten TUIs. All support a configurable OpenAI-compatible endpoint and are Dockerizable. Nine are open-source; GitHub Copilot CLI is proprietary but BYO-model capable via its April 2026 BYOK release.

sst/opencode

URL: https://github.com/sst/opencode
License: MIT
Language: TypeScript
Stars (~2026-05): ~50K
BYO-model: Yes (OpenAI-compatible endpoint, configurable model name)
MCP support: Yes
Decision: MVP
Rationale: Most-starred modern open TUI, active maintenance, clean BYO-model surface — the obvious anchor entry.

Aider — `Aider-AI/aider`

URL: https://github.com/Aider-AI/aider
License: Apache-2.0
Language: Python
Stars (~2026-05): ~40K
BYO-model: Yes (any LiteLLM-compatible backend including local OpenAI-compatible)
MCP support: Limited (third-party plugins; native support roadmapped)
Decision: MVP
Rationale: The canonical "edit-in-place" CLI agent; widely-cited reference harness in prior benchmarks (Aider-Polyglot). Mature, well-documented, predictable.

block/goose

URL: https://github.com/block/goose
License: Apache-2.0
Language: Rust
Stars (~2026-05): ~20K
BYO-model: Yes (OpenAI-compatible + Ollama + Anthropic + Bedrock)
MCP support: Yes (first-class)
Decision: MVP
Rationale: Block-funded, MCP-native, distinct architecture from the JS/Python pack — important for axis diversity in the matrix.

All-Hands-AI/OpenHands

URL: https://github.com/All-Hands-AI/OpenHands
License: MIT
Language: Python
Stars (~2026-05): ~50K
BYO-model: Yes (LiteLLM under the hood)
MCP support: Yes
Decision: MVP
Rationale: Heavyweight autonomous-agent category leader, strong SWE-bench results historically — needs to be in the matrix to quantify operational performance at the heavy-agent end.

charmbracelet/crush

URL: https://github.com/charmbracelet/crush
License: FSL-1.1-MIT
Language: Go
Stars (~2026-05): ~10K (and rising fast)
BYO-model: Yes
MCP support: Yes
Decision: MVP
Rationale: From the Bubble Tea / Charm ecosystem, a modern Go TUI with strong UX engineering. Source-available FSL license is acceptable for our purposes; flagged as "source-available, not OSI" in results.

plandex-ai/plandex

URL: https://github.com/plandex-ai/plandex
License: MIT (server components) / AGPL-3.0 (CLI client)
Language: Go
Stars (~2026-05): ~12K
BYO-model: Yes (OpenAI-compatible)
MCP support: Roadmapped
Decision: MVP
Rationale: Multi-step planning specialization is distinct from the edit-in-place pattern. Important for measuring whether planning improves operational performance or wastes tokens.

QwenLM/qwen-code

URL: https://github.com/QwenLM/qwen-code
License: Apache-2.0
Language: TypeScript (fork of Gemini CLI lineage)
Stars (~2026-05): ~12K
BYO-model: Yes (default tuned for Qwen-Coder, accepts arbitrary OpenAI-compatible endpoints)
MCP support: Yes
Decision: MVP
Rationale: Vendor-tuned for Qwen models but model-agnostic in practice — important for a like-for-like test of "does the vendor's matched harness ship better diffs from their own model than competing TUIs do."

openai/codex

URL: https://github.com/openai/codex
License: Apache-2.0
Language: TypeScript
Stars (~2026-05): ~30K
BYO-model: Yes — custom model provider injected via -c model_providers.* config overrides; points at any OpenAI-compatible endpoint including the harness LiteLLM proxy. No OpenAI auth required in BYO mode.
MCP support: Yes (tool-use via Responses API; multi-agent / browser / computer-use plugins disabled for the benchmark run)
Decision: MVP (elevated from Skip when Apache-2.0 + BYO-model support shipped in 2026)
Rationale: OpenAI's own reference local coding agent, now fully open-weight and model-agnostic. Indispensable for the matrix: if the vendor's own harness underperforms another TUI on an OpenAI-family model, that is a signal worth measuring.

github/copilot-cli

URL: https://github.com/github/copilot-cli
License: Proprietary (GitHub / Microsoft)
Language: TypeScript (npm: @github/copilot)
Stars (~2026-05): N/A (distributed as CLI tool, not a starred OSS repo)
BYO-model: Yes — as of the April 7 2026 BYOK release, any OpenAI-compatible provider can be wired in via COPILOT_PROVIDER_TYPE / COPILOT_PROVIDER_BASE_URL / COPILOT_PROVIDER_API_KEY / COPILOT_MODEL. In COPILOT_OFFLINE=true mode, no GitHub auth or telemetry — the harness endpoint is the only egress.
MCP support: No — custom agents / MCP surface emits a no-op warning in the benchmark mode.
Decision: MVP (elevated from Skip when BYO-provider support shipped April 2026)
Rationale: The largest installed-base terminal coding assistant in the world. Evaluating it on the open-weight matrix answers a question many developers have: how does Copilot CLI perform when you swap its hosted model for an open alternative?

badlogic/pi-mono

URL: https://github.com/badlogic/pi-mono
License: MIT
Language: TypeScript (npm: @mariozechner/pi-coding-agent)
Stars (~2026-05): ~3K
BYO-model: Yes — custom provider block in ~/.pi/agent/models.json; openai-completions API wired to any OpenAI Chat-Completions endpoint. No Responses API, no telemetry.
MCP support: No (minimal by design — "smallest, clearest harness you can adapt")
Decision: MVP
Rationale: Deliberate minimalism is a distinct design axis. Pi's claim is that a stripped-down, token-efficient harness extracts more signal per dollar. Including it lets the matrix test that claim directly.

Phase 1 — candidates for the next sweep

Real projects, viable on paper, that need a smoke pass and Dockerfile before they enter the matrix.

Toad — Will McGugan (Textualize)

URL: https://github.com/Textualize/toad (placeholder; project may live under willmcgugan/toad depending on final namespace)
License: Expected MIT or Apache-2.0
Language: Python
Stars (~2026-05): Early/pre-launch
BYO-model: Designed for universal frontend — pluggable backend.
MCP support: TBD
Decision: Phase 1
Rationale: Will McGugan's positioning of Toad as a "universal frontend" for coding agents is structurally interesting — if it can drive Aider, opencode, or goose underneath, it is a meta-TUI category that CodingAgentBench will need to handle. Adding when first stable release lands.

Kilo Code — `Kilo-Org/kilocode`

URL: https://github.com/Kilo-Org/kilocode
License: Apache-2.0 (verify on submission)
Language: TypeScript
Stars (~2026-05): ~10K (inheriting Cline/Roo lineage)
BYO-model: Yes
MCP support: Yes
Decision: Phase 1
Rationale: Successor to Roo Code (see graveyard.md) and continuation of the Cline lineage. Large existing user base. Needs Dockerfile + adapter; will enter at next sweep.

DeepSeek-Reasonix (single-backend specialist)

URL: Placeholder — project is a specialized harness with cache pre-warming.
License: Expected open-source.
Language: Python/Go (TBD)
Stars (~2026-05): Pre-launch / niche
BYO-model: Single-backend by design (specialist)
MCP support: TBD
Decision: Phase 1, with caveat
Rationale: Single-backend specialists violate the "configurable model endpoint" rule in spirit, but are interesting as an operational-performance upper bound — what is achievable when the harness can pre-warm caches and pre-tune prompts to a specific model? Will be benchmarked but flagged with a "specialist" tag so its numbers are not directly compared to general TUIs.

Phase 2 — future consideration

Tools we are aware of but have not yet evaluated in depth. No commitment.

MetaGPT — `geekan/MetaGPT`

Multi-agent role-playing framework. May or may not fit the controlled-matrix model. On the watchlist (see graveyard.md).

Cline — `cline/cline`

Originally a VS Code extension; CLI-mode maturity is uncertain. Lineage continued through Kilo Code (Phase 1).

Cursor headless mode

Closed-source. See Skip section.

Skip — explicitly out of scope

Closed-source vendor TUIs

The following are vendor-locked, model-bundled, closed-source, or all three. They are out of scope by manifesto commitment #1 and the controlled-experiment design (see manifesto-faq.md).

Claude Code (Anthropic) — closed-source, model-bundled. Not benchmarkable in the controlled cross-product.
Gemini CLI (Google) — closed-source, model-bundled, and retired by vendor 2026-05-19. See graveyard.md.
Cursor headless (Anysphere) — closed-source, model-bundled.

(Note: Codex CLI and GitHub Copilot CLI were originally in this list. Both shipped BYO-model/BYOK support in early 2026 — see their MVP entries above.)

These tools may be excellent. They are simply not what CodingAgentBench measures. A separate "vendor reference" column may appear in future versions for context.

Dead / archived / pivoted

See graveyard.md for full entries on Roo Code, Open Interpreter, AutoGPT, GPT-Pilot, Devika, Continue, Smol Developer, and others.

CodingAgentBench v0.2 adopts Edgar Morin's framework of complex thought as its epistemological foundation, with five additional scoring layers (L10–L14) that operationalize the framework's operators. See morin-framework.md for the full exposition and ../METHODOLOGY.md §14 for the methodology-level cross-reference.

TUIs do not need to know any of this. The Morin probes operate at the eval layer, not the harness layer. A TUI's responsibility ends where it always has — accept a workdir, accept a task description, accept an OpenAI-compatible endpoint, produce a final commit. Everything Morin-related happens after the container exits.

Concretely:

L10 (test-mutation probe) runs out-of-container against the agent's diff.
L11 (N-run continuation) drives the TUI through additional passes using the same standard harness invocation as the first pass — no new APIs.
L12 (style coherence) is a static analysis over the workdir; the TUI is not involved.
L13 (score-feedback injection) modifies the task prompt between runs; the TUI sees a normal task input.
L14 (observer probe) modifies the codingagentbench:eval notice in the prompt; again, normal task input from the TUI's perspective.
MCP server tier classification (autonomy-dependence operator) is computed from observed MCP call traces in the run JSONL; the TUI does not need to declare its MCP posture.

This is by design. The Morin framework is a methodology-side commitment, not a TUI integration requirement. The criterion for inclusion in the MVP sweep remains what it has always been: open-source, OpenAI-compatible endpoint, Dockerizable, see §2 of ../METHODOLOGY.md. No new integration work is asked of any TUI listed in this catalog.

If a future TUI wishes to explicitly cooperate with Morin probes (for example, by exposing its own self-mutation hooks or by structuring its commit history for cleaner L11 continuation), we welcome the cooperation but do not require it. The benchmark is designed to work with TUIs as they are.

How to nominate

If your favorite TUI is not in this catalog and you believe it should be, open an issue with the tui-request label or submit a PR per docs/submitting.md. Nominations are evaluated on a rolling basis and decisions land in this document.

Catalog maintained for methodology v0.1, May 2026. Star counts as of 2026-05 and will not be silently updated; they are refreshed at each methodology version bump and the prior values archived.

Meet the contenders

Aider

opencode

Goose

OpenHands

Crush

Plandex

Qwen Code

Codex CLI

GitHub Copilot CLI

Pi

TUI Catalog

MVP — Phase 0 sweep (v0.1, May 2026)

sst/opencode

Aider — `Aider-AI/aider`

block/goose

All-Hands-AI/OpenHands

charmbracelet/crush

plandex-ai/plandex

QwenLM/qwen-code

openai/codex

github/copilot-cli

badlogic/pi-mono

Phase 1 — candidates for the next sweep

Toad — Will McGugan (Textualize)

Kilo Code — `Kilo-Org/kilocode`

DeepSeek-Reasonix (single-backend specialist)

Phase 2 — future consideration

MetaGPT — `geekan/MetaGPT`

Cline — `cline/cline`

Cursor headless mode

Skip — explicitly out of scope

Closed-source vendor TUIs

Dead / archived / pivoted

Sidebar — How TUIs are expected to interact with the Morin framework

How to nominate

Keyboard shortcuts

Navigation

Command palette

Page

TUI Catalog

MVP — Phase 0 sweep (v0.1, May 2026)

sst/opencode

Aider — Aider-AI/aider

block/goose

All-Hands-AI/OpenHands

charmbracelet/crush

plandex-ai/plandex

QwenLM/qwen-code

openai/codex

github/copilot-cli

badlogic/pi-mono

Phase 1 — candidates for the next sweep

Toad — Will McGugan (Textualize)

Kilo Code — Kilo-Org/kilocode

DeepSeek-Reasonix (single-backend specialist)

Phase 2 — future consideration

MetaGPT — geekan/MetaGPT

Cline — cline/cline

Cursor headless mode

Skip — explicitly out of scope

Closed-source vendor TUIs

Dead / archived / pivoted

Sidebar — How TUIs are expected to interact with the Morin framework

How to nominate

Aider — `Aider-AI/aider`

Kilo Code — `Kilo-Org/kilocode`

MetaGPT — `geekan/MetaGPT`

Cline — `cline/cline`