Name: Fleet
Author: Fleet

A common first Fleet configuration puts every agent on Claude Opus.

The frontend developer agent was on Opus. So was the code reviewer. So was the ticket triage agent. So was the agent whose entire job was to add labels to GitHub issues.

That's like hiring a principal engineer to update README files.

It works. The READMEs get updated. But simple tasks may not benefit enough from the most capable model to justify its higher provider cost and latency. The right comparison is measured output quality for your task, not the model label.

Not all tasks need the same model

This should be obvious, but the tooling hasn't made it easy to act on. Most AI agent setups use a single model across everything because that's what the tool defaults to. Changing the model means changing config files, restarting agents, and hoping nothing breaks.

In Fleet, each agent has its own model configuration. One line:

agents:
  - name: frontend-dev
    model: opus         # complex implementation work

  - name: code-reviewer
    model: sonnet       # review doesn't need Opus

  - name: ticket-triage
    model: haiku        # classification and labeling

  - name: sre-watcher
    model: haiku        # monitoring and alerting

  - name: product-owner
    model: sonnet       # ticket refinement and routing

Each agent runs the model that fits its job. No global setting, no workarounds.

A practical framework for model selection

A useful starting framework is to match model capability to the uncertainty and consequence of the task, then verify the choice with your own results.

Your most capable model (Opus) is for implementing features that touch multiple systems, writing code that requires understanding complex business logic, and working through architectural decisions within the scope of a ticket.

A mid-tier model (Sonnet) is for code review, ticket refinement, PR descriptions, documentation updates, and test generation for existing code.

The cheapest model that works (Haiku) is for issue triage and labeling, routing decisions, log monitoring, deployment watching, status checks, and simple formatting tasks.

Start with the most capable model only where the work is genuinely complex or costly to get wrong. Move repeatable classification, routing, and formatting work to a smaller model after you confirm its output meets the same acceptance criteria.

The cost difference is real

Fleet doesn't track your API spend for you. That's what your AI provider's billing dashboard is for. What Fleet does is make the cost optimization possible by giving you per-agent model control in the first place.

Run the same representative task set with two model configurations and compare acceptance rate, correction time, latency, and the bill in your model provider's dashboard. Fleet's role is making per-agent selection possible, not inventing a dollar estimate or tracking provider spend for you.

Speed matters too

Smaller models can also be faster. For agents doing simple tasks inside a workflow—triage, routing, labeling, and status checks—that latency difference can compound.

Measure it with your own prompts and provider region. Model latency changes, and a precise number copied from somebody else's setup is not a useful planning input.

Do not use an LLM for a numeric threshold check at all. Let the integration publish the metric and let the KPI evaluator compare it to the condition; use an agent workflow for the response that requires judgment.

Fleet's per-agent model config lets each workflow step use the model you selected for that agent type, instead of inheriting one global default.

Stop Running Opus on Everything

Not all tasks need the same model

A practical framework for model selection

The cost difference is real

Speed matters too

Keep reading

Fleet vs. Hiring: The Math on AI Agent Teams

Introducing the Fleet Workflow Builder

Try Fleet