Model Selection

In one line: Match model capability to task — judgment-heavy reviews get the strongest model, mechanical checklists get a fast one.

Do this: Set the model field in each agent's frontmatter by capability tier, not by habit; re-bind the names when models are deprecated.

What: Different tasks need different model capabilities. The methodology matches model to task, trading off judgment quality, speed, and cost.

Why: The most capable model for everything is wasteful — a mechanical checklist does not benefit from deep reasoning. A fast model on a judgment-heavy task is worse: it produces superficial results that miss critical issues. Formalizing the tradeoff keeps it consistent rather than ad hoc.

Model Selection Rationale:

The tier — not the model name — is the contract. The capability tier is the durable binding; the model names below (Opus / Sonnet / Haiku) are point-in-time bindings of those tiers, re-bound when a model is deprecated without any change to the methodology.

Task Type	Capability Tier (current binding)	Rationale
Security review, compliance review, architecture design	High-capability (Opus)	Requires judgment, nuanced context evaluation, regulatory knowledge. False negatives (missed vulnerabilities, compliance gaps) are costly. The quality premium justifies the cost.
API review, migration review, mechanical implementation	Mid-capability (Sonnet)	Pattern matching against checklists, structural verification. The task is well-defined and the correct answer is deterministic. Speed matters more than depth.
Simple searches, file lookups, formatting tasks	Fast (Haiku)	Speed over depth. The task has a clear correct answer that does not require reasoning.

How this maps to the agent definitions:

Each agent definition includes a model field in its frontmatter that specifies the default model:

---
name: security-reviewer
model: opus
tools: Read, Grep, Glob
---

---
name: api-reviewer
model: sonnet
tools: Read, Grep, Glob
---

The model selection is a default, not a constraint. If a nominally mechanical task turns out to require more reasoning (an API review uncovers a complex authorization pattern), the orchestrator can re-dispatch with a more capable model.

Evidence: The allocation follows the risk profile: judgment-intensive reviewers (security, compliance) run on the strongest model where a false negative is costly; checklist-based reviewers (API, migration) run on a faster model where the correct answer is deterministic. The binding lives in each agent's model frontmatter field — reproducible by reading the definitions described in Section 5.3.

Capability-based tool prescription. Capability tiers are the durable contract; model names in this section's table are point-in-time bindings of those tiers — re-bind them when models are deprecated, without a methodology change.

Capability	If Yes	If No
Context > 200K tokens	Read files directly	Use Serena for symbol navigation
Internal reasoning	No external reasoning tool needed	Use Sequential Thinking MCP
Reliable multi-step execution	Direct tool use	Skill-guided execution (Superpowers)

Tools are also prescribed per development phase — brainstorming needs no code tools, implementation needs LSP and navigation, review needs multi-agent review plugins. See Section 13 for the complete decision matrix.