Model Selection
In one line: Match model capability to task — judgment-heavy reviews get the strongest model, mechanical checklists get a fast one.
Do this: Set the model field in each agent's frontmatter by capability tier, not by habit; re-bind the names when models are deprecated.
What: Different tasks need different model capabilities. The methodology matches model to task, trading off judgment quality, speed, and cost.
Why: The most capable model for everything is wasteful — a mechanical checklist does not benefit from deep reasoning. A fast model on a judgment-heavy task is worse: it produces superficial results that miss critical issues. Formalizing the tradeoff keeps it consistent rather than ad hoc.
Model Selection Rationale:
The tier — not the model name — is the contract. The capability tier is the durable binding; the model names below (Opus / Sonnet / Haiku) are point-in-time bindings of those tiers, re-bound when a model is deprecated without any change to the methodology.
| Task Type | Capability Tier (current binding) | Rationale |
|---|---|---|
| Security review, compliance review, architecture design | High-capability (Opus) | Requires judgment, nuanced context evaluation, regulatory knowledge. False negatives (missed vulnerabilities, compliance gaps) are costly. The quality premium justifies the cost. |
| API review, migration review, mechanical implementation | Mid-capability (Sonnet) | Pattern matching against checklists, structural verification. The task is well-defined and the correct answer is deterministic. Speed matters more than depth. |
| Simple searches, file lookups, formatting tasks | Fast (Haiku) | Speed over depth. The task has a clear correct answer that does not require reasoning. |
How this maps to the agent definitions:
Each agent definition includes a model field in its frontmatter that specifies the default model:
---
name: security-reviewer
model: opus
tools: Read, Grep, Glob
---
---
name: api-reviewer
model: sonnet
tools: Read, Grep, Glob
---
The model selection is a default, not a constraint. If a nominally mechanical task turns out to require more reasoning (an API review uncovers a complex authorization pattern), the orchestrator can re-dispatch with a more capable model.
Evidence: The allocation follows the risk profile: judgment-intensive reviewers (security, compliance) run on the strongest model where a false negative is costly; checklist-based reviewers (API, migration) run on a faster model where the correct answer is deterministic. The binding lives in each agent's model frontmatter field — reproducible by reading the definitions described in Section 5.3.
Capability-based tool prescription. Capability tiers are the durable contract; model names in this section's table are point-in-time bindings of those tiers — re-bind them when models are deprecated, without a methodology change.
| Capability | If Yes | If No |
|---|---|---|
| Context > 200K tokens | Read files directly | Use Serena for symbol navigation |
| Internal reasoning | No external reasoning tool needed | Use Sequential Thinking MCP |
| Reliable multi-step execution | Direct tool use | Skill-guided execution (Superpowers) |
Tools are also prescribed per development phase — brainstorming needs no code tools, implementation needs LSP and navigation, review needs multi-agent review plugins. See Section 13 for the complete decision matrix.