Fleet 1.13:Teams are now shipping 5x more PRs with autonomous pipelines.See what's new →
Fleet
← All posts
Technical

Stop Running Opus on Everything

Every agent running Claude Opus is like hiring a principal engineer to update README files. A practical framework for matching models to agent tasks, and why the savings show up in your billing dashboard.

April 7, 2026·5 min read

I looked at a Fleet user's agent configuration last week and the first thing I noticed was that every single agent on it was running Claude Opus.

The frontend developer agent was on Opus. So was the code reviewer. So was the ticket triage agent. So was the agent whose entire job was to add labels to GitHub issues.

That's like hiring a principal engineer to update README files.

It works. The READMEs get updated. But you're paying for a lot more intelligence than the task requires, and when you scale that across 10 agents running eight hours a day, you're spending multiples of what you actually need to for no measurable improvement in output.

Not all tasks need the same model

This should be obvious, but the tooling hasn't made it easy to act on. Most AI agent setups use a single model across everything because that's what the tool defaults to. Changing the model means changing config files, restarting agents, and hoping nothing breaks.

In Fleet, each agent has its own model configuration. One line:

agents:
  - name: frontend-dev
    model: claude-opus         # complex implementation work

  - name: code-reviewer
    model: claude-sonnet       # review doesn't need Opus

  - name: ticket-triage
    model: claude-haiku        # classification and labeling

  - name: sre-watcher
    model: claude-haiku        # monitoring and alerting

  - name: product-owner
    model: claude-sonnet       # ticket refinement and routing

Each agent runs the model that fits its job. No global setting, no workarounds.

A practical framework for model selection

I've been running mixed-model fleets for [YOUR DETAIL: how long], and this is the rough framework I've landed on.

Your most capable model (Opus) is for implementing features that touch multiple systems, writing code that requires understanding complex business logic, and working through architectural decisions within the scope of a ticket.

A mid-tier model (Sonnet) is for code review, ticket refinement, PR descriptions, documentation updates, and test generation for existing code.

The cheapest model that works (Haiku) is for issue triage and labeling, routing decisions, log monitoring, deployment watching, status checks, and simple formatting tasks.

Most fleets I've seen settle into roughly 20-30 percent Opus, 30-40 percent Sonnet, and 30-40 percent Haiku. The exact split depends on your workload, but the majority of agent tasks don't need your most expensive model.

The cost difference is real

Fleet doesn't track your API spend for you. That's what your AI provider's billing dashboard is for. What Fleet does is make the cost optimization possible by giving you per-agent model control in the first place.

A rough comparison. A 10-agent fleet running all Opus for eight hours a day, five days a week:

  • Estimated monthly API cost: $3,000 to $6,000

The same fleet with right-sized models (3 Opus, 3 Sonnet, 4 Haiku):

  • Estimated monthly API cost: $800 to $1,800

You'll see the difference in your Anthropic or OpenAI dashboard. Fleet's role is making the optimization possible, not tracking the dollars for you.

Speed matters too

Cheaper models are also faster. Haiku responds in a fraction of the time Opus takes. For agents doing simple tasks inside a pipeline (triage, routing, labeling, status checks), that speed difference compounds.

Your triage agent on Haiku processes a ticket in two or three seconds. On Opus, the same task takes 10 to 15 seconds. For one ticket nobody cares. For 30 tickets on Monday morning you care a lot.

Your SRE agent watching a deployment has to make a fast binary decision about whether the error rate is above threshold. Haiku handles that without breaking a sweat. Running Opus on a threshold check means you're paying for intelligence that goes completely unused.

Fleet's per-agent model config means your pipeline runs at the speed of the fastest appropriate model at each stage, instead of the speed of whatever your default happened to be.

Try Fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.