Skip to main content

Model Selection

In one line: Match model capability to task — judgment-heavy reviews get the strongest model, mechanical checklists get a fast one.

Do this: Set the model field in each agent's frontmatter by capability tier, not by habit; re-bind the names when models are deprecated.

What: Different tasks need different model capabilities. The methodology matches model to task, trading off judgment quality, speed, and cost.

Why: The most capable model for everything is wasteful — a mechanical checklist does not benefit from deep reasoning. A fast model on a judgment-heavy task is worse: it produces superficial results that miss critical issues. Formalizing the tradeoff keeps it consistent rather than ad hoc.

Model Selection Rationale:

The tier — not the model name — is the contract. The capability tier is the durable binding; the model names below (Opus / Sonnet / Haiku) are point-in-time bindings of those tiers, re-bound when a model is deprecated without any change to the methodology.

Task TypeCapability Tier (current binding)Rationale
Security review, compliance review, architecture designHigh-capability (Opus)Requires judgment, nuanced context evaluation, regulatory knowledge. False negatives (missed vulnerabilities, compliance gaps) are costly. The quality premium justifies the cost.
API review, migration review, mechanical implementationMid-capability (Sonnet)Pattern matching against checklists, structural verification. The task is well-defined and the correct answer is deterministic. Speed matters more than depth.
Simple searches, file lookups, formatting tasksFast (Haiku)Speed over depth. The task has a clear correct answer that does not require reasoning.

How this maps to the agent definitions:

Each agent definition includes a model field in its frontmatter that specifies the default model:

---
name: security-reviewer
model: opus
tools: Read, Grep, Glob
---
---
name: api-reviewer
model: sonnet
tools: Read, Grep, Glob
---

The model selection is a default, not a constraint. If a nominally mechanical task turns out to require more reasoning (an API review uncovers a complex authorization pattern), the orchestrator can re-dispatch with a more capable model.

Evidence: The allocation follows the risk profile: judgment-intensive reviewers (security, compliance) run on the strongest model where a false negative is costly; checklist-based reviewers (API, migration) run on a faster model where the correct answer is deterministic. The binding lives in each agent's model frontmatter field — reproducible by reading the definitions described in Section 5.3.

Capability-based tool prescription. Capability tiers are the durable contract; model names in this section's table are point-in-time bindings of those tiers — re-bind them when models are deprecated, without a methodology change.

CapabilityIf YesIf No
Context > 200K tokensRead files directlyUse Serena for symbol navigation
Internal reasoningNo external reasoning tool neededUse Sequential Thinking MCP
Reliable multi-step executionDirect tool useSkill-guided execution (Superpowers)

Tools are also prescribed per development phase — brainstorming needs no code tools, implementation needs LSP and navigation, review needs multi-agent review plugins. See Section 13 for the complete decision matrix.