Fleet 1.13:Teams are now shipping 5x more PRs with autonomous pipelines.See what's new →
FleetFleet
Comparison

Fleet vs. SWE-agent: Production Orchestration vs. Research-Grade Autonomous Agent

SWE-agent is a research project demonstrating autonomous software engineering on benchmarks. Fleet is a production orchestration system for running teams of specialized agents with governance, budget controls, and reactive handoffs.

SWE-agent is a research project from Princeton that showed AI agents can autonomously resolve GitHub issues on the SWE-bench benchmark. It pioneered the agent-computer interface concept and demonstrated what autonomous software engineering could look like at a research level.

Fleet is built for production use. It coordinates multiple specialized agents — not just one generalist — with role definitions, model assignments, budget caps, risk scoring, and a full audit trail. Where SWE-agent runs a single agent to completion, Fleet runs a team: developer opens a PR, reviewer checks it, release manager merges it, with each step governed and logged.

Choose Fleet if

Production engineering teams that want a governed, multi-role agent system with budget controls, risk scoring, and an audit trail — running reliably day-to-day, not on benchmarks.

Choose SWE-agent if

Researchers and early adopters exploring autonomous software engineering capabilities, building on top of the SWE-agent framework, or running SWE-bench evaluations.

Fleet vs. SWE-agent: side by side

FeatureFleetSWE-agent
Intended useProduction multi-agent fleet managementResearch / benchmark evaluation
Multi-agent supportFull role-based roster with handoffsSingle agent per run
GovernanceApproval gates, budget caps, quarantine, audit logNot provided
Evaluation & risk6-dimension agent evaluation, plus a separate logistic-regression risk model that drives auto-quarantineNot provided
DeploymentSelf-hosted binary, production-readyPython research codebase, not production-hardened
Agent runnerRuns Claude Code as the agent runnerConfigurable, research-oriented
Watcher daemonReacts to GitHub labels and fabric events automaticallyManual invocation
MaintenanceActively maintained, versioned releasesResearch codebase, variable maintenance

Where Fleet is the better fit

  • Production-hardened with versioned releases, a watcher daemon, and 120+ agent prompt templates
  • Multi-role coordination — developer, reviewer, and release manager agents with reactive handoffs, not a single generalist agent
  • Governance stack: per-agent run-time budgets, 6-dimension evaluation, a separate auto-quarantine risk model, and a full audit trail
  • Reactive automation: watcher responds to GitHub label changes and dispatches agents without manual invocation

Where SWE-agent is the better fit

  • Pioneered agent-computer interface design patterns that influenced most subsequent autonomous coding agents
  • Strong benchmark results on SWE-bench provide a quantitative measure of raw task-completion capability
  • Open, extensible Python codebase is easy to modify for research purposes and custom agent interfaces
  • No cost overhead beyond model API usage — no orchestration platform fees

Pricing

SWE-agent is open source with no platform cost. You pay only for model API calls. Fleet has a free tier (1 agent slot), Team at $49/slot/month, and Enterprise pricing. The cost difference reflects production infrastructure, governance, and ongoing maintenance versus a research tool.

Do they compete, or coexist?

Limited direct integration, but Fleet's architecture was influenced by SWE-agent's agent-computer interface research. Teams that started with SWE-agent for benchmarking often move to Fleet for production deployment of similar workflows.

Frequently asked questions

Is SWE-agent ready for production use?

SWE-agent was designed as a research tool and benchmark harness. It lacks the governance, budget controls, and production-hardening features that Fleet provides. Teams building production agent workflows generally use more purpose-built orchestration tools.

Does Fleet achieve SWE-bench-level performance?

Fleet's agents run Claude Code, which uses the same Claude models that score well on SWE-bench. Fleet's value is not in raw task-completion benchmarks but in coordinating multiple agents reliably, with governance, in production environments.

Run your first agent fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.