OLAP cube · 3,465 cells
Slice the data any way you want
All 3,465 cells. Pick a TUI, a model, a task family. Get instant answers.
What this page is
- What you'll find: every (CLI x model x task) cell scored on quality, latency, and cost.
- Who it's for: people picking a CLI and model combo that wins on their workload.
- How to switch: toggle to Nerds mode for the full pivot table, heatmap, and brush filters.
Top three winners
- opencode + openai/gpt-oss-120b wins 8/25 tasks at $0.0000 per task and 14.8s wall time (strongest on polyglot: 7).
- pi + openai/gpt-oss-120b wins 5/25 tasks at $0.0000 per task and 12.6s wall time (strongest on polyglot: 4).
- copilot + mistralai/mistral-small-4-119b-2603 wins 2/25 tasks at $0.0000 per task and 10.7s wall time (strongest on mutations: 1).
A combo wins a task when its composite score (we call it PBS) is the highest of any CLI on that task. Cost is a $1.0 / MTok proxy on tokens-per-correct (we call it the flat-rate roll-up); real per-token prices land via harness/tracing/cost.py.
Where the wins land
cli / model
Qwen3.5 GPT-OSS 120B Llama 3.3 70B
Aider
Crush
Goose
opencode
OpenHands
Plandex
Qwen-Code
- clis
- 7
- models
- 3
- tasks
- 25
- peak
- 1
Each cell is one (CLI, model) pair. Darker cells win more tasks. The full cube has 25 tasks across three categories (polyglot, mutations, integrity).
OLAP cube · DuckDB-Wasm · 3,465 cells
Cube Explorer
Slice the benchmark cube on TUI × model × task × plugin-stack. Switch between table, heatmap, scatter, and parallel coordinates. The URL is the source of truth — every view is shareable. Press ? for shortcuts.
Methodology v0.1 | Pinned to image digests as of 2026-06-20
3465 rows ·
10 TUIs ·
14 models ·
3 categories