AI agent fleet management is the operational discipline of running many AI coding agents as one governed system instead of as a pile of independent sessions. It covers four things: coordinating who does what and in what order, putting guardrails on what each agent is allowed to do, recovering when something goes wrong, and keeping an audit trail of every decision. If you only have one agent, you don't need any of this. The moment you have five, you need all of it.
This is a blunt guide to the category — what it requires, where the hype is, and what the tools actually deliver. Including ours.
Why "fleet management" is a separate problem
Running one agent is a productivity question: does it write good code fast? Running a fleet is an operations question: does the whole thing stay coordinated, bounded, and accountable while nobody is watching?
The failure modes are different in kind, not degree. One agent that goes off-script wastes an hour. A dozen agents with no handoff protocol, no approval gate, and no spend ceiling can open conflicting pull requests, review each other's mistakes, and burn a real bill before anyone notices. Single-agent tools — IDE assistants, terminal pair programmers — are not built to solve this, and bolting more of them together does not produce a fleet. It produces a mess with more moving parts.
So the category exists because the operational layer is the hard part once agents are individually good enough. By 2026, the agents are good enough. Claude Code can write production code. The bottleneck moved to everything around it.
What fleet management actually requires
Strip away the marketing and there are five capabilities that matter. Be skeptical of any tool that claims the category without most of them.
1. Coordination, not just parallelism
Running ten agents at once is parallelism. Fleet management is handoffs: a developer agent opens a pull request, which triggers a reviewer agent, whose approval triggers a release manager. That requires an event bus so agents react to each other and to your repository, not a person manually starting the next step.
Fleet does this with a reactive event chain over a shared event bus called Fabric. A watcher daemon polls GitHub labels (via the gh CLI — no webhooks, no GitHub App required), publishes events, and a subscription processor starts the right agent for each event. This is the real coordination model. It is not a "pipeline mode" you configure step by step; it is agents subscribing to events and reacting.
2. Role separation
A reviewer that is also the author is not a reviewer. Fleet management means agents have defined roles with non-overlapping scopes — so the agent that approves a change is not the one that wrote it. Fleet ships 120+ role templates (developer, reviewer, QA, release manager, PM, and more) and compiles a role-specific handbook into each agent at launch. The role is the job description.
3. Guardrails that actually bind
This is where claims get loose, so be precise about what a guardrail is.
- Budgets. Fleet enforces a run-time budget — a cumulative ceiling on how long each agent is allowed to run, in seconds. It is not a token budget. Fleet does not cap or track token counts; the enforced limit is time. Any tool advertising a "token budget" as an enforced cap is describing something different from what Fleet does. See controlling AI agent costs for why time is the lever Fleet pulls.
- Evaluation. Fleet scores agents on six dimensions — task output, reliability, output quality, efficiency, collaboration, and cost. That is the evaluation system, and it is separate from risk.
- Risk and quarantine. A separate risk model (logistic regression over operational signals like error rate, restarts, blocked tasks, and silent hours) drives auto-quarantine when an agent's risk hits critical. These are two systems. Anyone who says "six-dimension risk scoring" has conflated them.
- Approval gates. A human or another agent can be required to approve before a merge. Marketing copy ships only after a person signs off — including this page.
4. Recovery — and an honest note about what Fleet does not do
Here is a limitation stated plainly: Fleet's watcher does not monitor and respawn crashed agent sessions. There is no agent health-check-and-restart loop. The --supervised flag restarts the watcher daemon itself, not your agents.
What does recover is the work, not the process. If a pull request needs another pass, the reactive chain and the PR reconciler re-dispatch the responsible role against the live state of the PR. Recovery comes from re-triggering the workflow, not from a babysitter relaunching a dead terminal. That is a deliberate design choice, and it is worth understanding before you assume "self-healing" means something it doesn't.
5. An audit trail you can actually read
Every consequential decision an agent makes should be recorded and queryable. Fleet keeps a unified decision and conversation log (fleet log) and a fabric event history, so you can answer "which agent did this, when, and why" after the fact. For governed or regulated environments, this is not optional. See AI agent audit trail.
Where fleet management gets oversold
Three claims show up constantly in this category. Two of them are not true of Fleet, and we won't pretend otherwise.
"Zero telemetry / nothing leaves your machine." Fleet is local-first and your source code stays private — it goes only to your model backend and GitHub. But Fleet is not silent. When you connect an instance to the dashboard, it reports operational metadata and usage metering — agent status, run counts, and run time — for visibility and billing. The CLI also sends anonymous, opt-out usage analytics (which commands run, version, OS — never code, paths, arguments, or repo names; opt out with fleet config set telemetry off, FLEET_TELEMETRY=0, or DO_NOT_TRACK=1). "Your code stays private" is the honest line. "Air-gapped" or "zero telemetry" is not.
"Air-gapped / fully offline." Not possible here. Agents are Claude Code sessions that must reach a model backend, the watcher polls GitHub, and connected instances report metering. The model backend is configurable — the Anthropic API directly, or Amazon Bedrock / Google Vertex to keep traffic inside your own cloud — but a packet capture will always show outbound requests.
"Works with any agent / any model." Fleet's public, supported path is Claude Code only. You can point Claude Code at different backends (Anthropic, Bedrock, Vertex) and assign different Claude tiers per role — Opus for judgment-heavy work like architecture and review, Sonnet for procedural development — but it is Claude under the hood, not a free-for-all of vendors.
If a fleet management tool can't tell you plainly what leaves your network and which agents it really supports, that is the first thing to pin down.
Fleet management vs. the adjacent categories
"Fleet management" gets confused with three neighbors. They are not the same layer.
- Frameworks (CrewAI, LangGraph) are Python libraries for building multi-agent workflows in code. Fleet is a finished product you configure in YAML; you don't write orchestration logic.
- Observability (LangSmith, AgentOps) watches agents run — traces, token cost, session replays. Fleet runs the agents and governs them. You can use both: Fleet to orchestrate, an observability tool to inspect the model calls.
- Enterprise control planes (Microsoft Agent 365, GitHub Agent HQ) manage agents at company scale or inside a platform. Fleet is narrower and self-hosted: a coding-specific fleet on your own infrastructure, independent of any suite.
The useful test: does the tool run and coordinate the agents, or does it build, observe, or register them? Fleet runs and coordinates. Most of the category does one of the other three.
What you actually get with Fleet
A single self-hosted Go binary — no Docker, no Node.js, no JavaScript. It launches Claude Code agents in tmux sessions, coordinates them through the Fabric event bus, reacts to GitHub label changes, and governs the whole team with run-time budgets, six-dimension evaluation, a separate auto-quarantine risk model, approval gates, and a full audit trail. Pricing is per fleet, not per seat or per role: Free for one fleet with unlimited roles, Team at $299/month per fleet, and Enterprise for multiple fleets. You pay your own model API costs on top.
That is the honest shape of fleet management as Fleet implements it. If you want the deeper definition of the category term, start with What is an AI agent fleet?. If you're ready to run one, run multiple Claude Code agents is the place to begin.
FAQ
What is AI agent fleet management? It is the practice of running many AI coding agents as one governed system — coordinating handoffs between them, enforcing guardrails on what each can do, recovering work when something fails, and keeping an audit trail of every decision. It is an operations discipline, distinct from the productivity question of how good a single agent is.
How is it different from running multiple agents in parallel? Parallelism is just many agents at once. Fleet management adds coordination (agents react to each other through an event bus), role separation, guardrails (run-time budgets, evaluation, a risk model, approval gates), and a shared audit trail. Without those, parallel agents step on each other's work.
Does fleet management control token costs? Fleet controls run time, not tokens. Its enforced budget is a cumulative ceiling in seconds, and it meters agent status, run counts, and run duration — it does not cap or track token counts. If you need token-level cost analytics, pair Fleet with an observability tool like AgentOps or LangSmith.
Does Fleet keep my code private? Yes — your source code stays local and goes only to your model backend and GitHub. Fleet is not, however, air-gapped or zero-telemetry: connected instances report operational metering, and the CLI sends anonymous, opt-out usage analytics. Code never leaves; some operational metadata does.
Which agents does Fleet manage? Claude Code, as the supported public path. You can run Claude Code against the Anthropic API directly or via Amazon Bedrock / Google Vertex, and assign different Claude model tiers per role.