I looked at a Fleet user's agent configuration last week and the first thing I noticed was that every single agent on it was running Claude Opus.
The frontend developer agent was on Opus. So was the code reviewer. So was the ticket triage agent. So was the agent whose entire job was to add labels to GitHub issues.
That's like hiring a principal engineer to update README files.
It works. The READMEs get updated. But you're paying for a lot more intelligence than the task requires, and when you scale that across 10 agents running eight hours a day, you're spending multiples of what you actually need to for no measurable improvement in output.
Not all tasks need the same model
This should be obvious, but the tooling hasn't made it easy to act on. Most AI agent setups use a single model across everything because that's what the tool defaults to. Changing the model means changing config files, restarting agents, and hoping nothing breaks.
In Fleet, each agent has its own model configuration. One line:
agents:
- name: frontend-dev
model: claude-opus # complex implementation work
- name: code-reviewer
model: claude-sonnet # review doesn't need Opus
- name: ticket-triage
model: claude-haiku # classification and labeling
- name: sre-watcher
model: claude-haiku # monitoring and alerting
- name: product-owner
model: claude-sonnet # ticket refinement and routing
Each agent runs the model that fits its job. No global setting, no workarounds.
A practical framework for model selection
I've been running mixed-model fleets for [YOUR DETAIL: how long], and this is the rough framework I've landed on.
Your most capable model (Opus) is for implementing features that touch multiple systems, writing code that requires understanding complex business logic, and working through architectural decisions within the scope of a ticket.
A mid-tier model (Sonnet) is for code review, ticket refinement, PR descriptions, documentation updates, and test generation for existing code.
The cheapest model that works (Haiku) is for issue triage and labeling, routing decisions, log monitoring, deployment watching, status checks, and simple formatting tasks.
Most fleets I've seen settle into roughly 20-30 percent Opus, 30-40 percent Sonnet, and 30-40 percent Haiku. The exact split depends on your workload, but the majority of agent tasks don't need your most expensive model.
The cost difference is real
Fleet doesn't track your API spend for you. That's what your AI provider's billing dashboard is for. What Fleet does is make the cost optimization possible by giving you per-agent model control in the first place.
A rough comparison. A 10-agent fleet running all Opus for eight hours a day, five days a week:
- Estimated monthly API cost: $3,000 to $6,000
The same fleet with right-sized models (3 Opus, 3 Sonnet, 4 Haiku):
- Estimated monthly API cost: $800 to $1,800
You'll see the difference in your Anthropic or OpenAI dashboard. Fleet's role is making the optimization possible, not tracking the dollars for you.
Speed matters too
Cheaper models are also faster. Haiku responds in a fraction of the time Opus takes. For agents doing simple tasks inside a pipeline (triage, routing, labeling, status checks), that speed difference compounds.
Your triage agent on Haiku processes a ticket in two or three seconds. On Opus, the same task takes 10 to 15 seconds. For one ticket nobody cares. For 30 tickets on Monday morning you care a lot.
Your SRE agent watching a deployment has to make a fast binary decision about whether the error rate is above threshold. Haiku handles that without breaking a sweat. Running Opus on a threshold check means you're paying for intelligence that goes completely unused.
Fleet's per-agent model config means your pipeline runs at the speed of the fastest appropriate model at each stage, instead of the speed of whatever your default happened to be.