Name: Fleet
Author: Fleet

I finally hit my breaking point waiting for the model providers to figure out multi-agent orchestration.

What I wanted was simple on paper. Point several agents at the same task, let them split the work, and have them coordinate what they each needed so the whole thing finished in parallel instead of one agent grinding through it serially. One picks up the schema changes, another takes the migration, a third writes the tests. They talk to each other about interfaces. I get back something that would have taken me a full day in a fraction of the time.

Anthropic had just shipped their teams concept. A lead agent with sub-agents. On the surface, that looked like exactly what I was describing. In practice, it was a black box. I had no idea which sub-agent was working on what. I couldn't tell when one had crashed, and they crashed regularly. The only interface back into the system was asking the lead agent, and the lead agent's answers were lackluster at best. Half-remembered summaries. Sub-agents it had forgotten about. Tasks it was convinced were "done" that had actually silently failed ten minutes earlier.

One afternoon I spent 20 minutes trying to pry a straight status report out of a lead agent that didn't actually know what its own team was doing. At some point it occurred to me that I'm an engineer and I built these workflows. If I can't see what's happening inside a five-agent team working on one task, what's happening at companies where 20 developers are each trying to run their own?

I started asking around. The answer was always some version of "yeah, we don't really know. We just use one agent at a time."

The mess is the same everywhere

Talk to enough engineering leaders and you hear the same complaints, almost word for word.

Most developers are running one agent at a time. Maybe two on a good day. And they're babysitting every minute of it: watching the terminal, waiting for the next prompt, catching it when it veers off, copy-pasting output into the next tool. The agent isn't multiplying anyone's output. It's a faster way to type, with a human stapled to the keyboard the entire time. The minute you try to walk away and let it work, it stalls or produces something useless, so nobody walks away.

That puts the real workload ceiling at one developer plus one agent. Barely better than one developer alone, and nowhere near what the tech is actually capable of. The whole pitch of AI-assisted engineering is supposed to be leverage. At 1x, you don't have leverage. You have a typing aid.

The next thing you hear, over and over, is that none of it is wired together. An agent finishes something and the human takes over: opens the PR, pings a reviewer, kicks off tests, posts in Slack, updates the ticket. Every handoff runs through a person, and that person is the bottleneck. A smarter model doesn't help you here. The waiting happens in the gaps between agents, and nothing inside the agent is going to fix that.

And when something goes wrong, nobody can prove what happened. There's no audit log. No record of which agent made which decision or what context it was looking at when it made it. No way to pull a misbehaving agent offline from a central place. The governance story is "trust each developer to do it right," which is the same governance story we all mocked in the pre-CI days, and for the same reason.

What I built, and why it's shaped the way it is

Fleet started as a tmux hack. I wanted visibility into my sub-agents, and the lead agent couldn't give me a straight answer about what its team was doing, so I cut it out of the observation loop entirely. I set up one orchestration agent that dispatched work, and each sub-agent got its own tmux pane that I could look at directly. Split-pane session, one sub-agent per pane, all of them alive in front of me at the same time. No more asking the lead what was going on. I could just watch them work. That was the whole thing at first: get eyes on the sub-agents without a middleman in the way.

Once I could see the team, the next bottleneck was obvious. They still needed me to connect the dots between them. One agent would finish something, I'd notice, I'd tell the next agent to pick it up. So I built explicit workflows. A development step could hand its result to review, review could route it back through a bounded fix loop, and an approval could stop the run before merge. The graph, not a person watching terminals, owned the handoff.

Then I wanted the start of the workflow to stop being me too. The idea was to have one step take a rough task, pull it apart, write out acceptance criteria, and pass the work to the agent type best suited to execute it. And "best suited" turns out to matter more than people give it credit for. Not all development agents are created equal. The system prompt is the job description, and a UI agent tuned for your component conventions, accessibility rules, and state libraries writes completely different code than a backend agent tuned for your service layer, database patterns, and error-handling contracts. Point a generalist "write code" prompt at a ticket and you get generalist output: the same agent that did a passable job on an API handler will happily wander into your component tree and start rewriting your design system because nothing told it not to. Splitting work along those seams isn't really a parallelism trick so much as a quality trick. Each agent actually gets to be good at its slice of the problem instead of mediocre at all of it. The workflow binds each step to a capable agent type and keeps the handoff explicit. By then, Fleet had stopped being a visibility tool and become an orchestrator.

Cost came next. I was tired of every agent running the most expensive model regardless of what it was actually doing. My triage agent doesn't need Opus. Neither does my deployment watcher. One line in the config and each agent runs the model that fits its job.

Around the same time, the agents started producing better-looking code and I ran into something uncomfortable. The biggest quality problem in my pipeline wasn't the agents. It was the tickets. Vague inputs produce bad code regardless of model, and I'd been blaming the executors for problems that started upstream. So I built support for product owner agents that refine tickets before any development agent touches them.

Every one of these pieces exists because I personally needed it on a specific day for a specific reason. There's nothing in Fleet that got built for a roadmap or a hypothetical future customer. Every feature is something I ran into a wall over, then built my way around.

The result is Fleet. The worker is a single Go binary you run on your own machine or server, with no Docker or Kubernetes runtime. You configure agent roles in YAML and build saved workflows in the hosted dashboard. The CLI sends anonymous, opt-out usage analytics; once registered, the worker also syncs operational metadata, run history, and metering to the control plane. Source code goes only to your model backend and GitHub, never to the Fleet dashboard.

Where this goes

Engineering teams will keep moving more repetitive work into agent workflows: implementing well-scoped tickets, first-pass code review, test suites, release management, and deployment monitoring. That isn't a prediction about replacing people. It is a shift in where humans spend their time: architecture, judgment calls, and the problems that actually need a person in the chair.

That future needs operational infrastructure that doesn't exist in most companies today. You cannot run a larger fleet the way you run one terminal. You need current agent status and run history. You need explicit handoffs that do not depend on someone watching a screen. You need approval gates and bounded retries instead of rules you hope everyone follows. And you need recorded workflow steps, approvals, decisions, and audit events so that when someone asks what happened, you have a real answer.

That's what Fleet is. It isn't clever and it isn't doing anything exotic. It's the boring operational layer this moment has been missing.

Try it. Break it. Tell me what's missing. I'm at jason@fleetctl.ai and I read everything.

Why We Built Fleet

The mess is the same everywhere

What I built, and why it's shaped the way it is

Where this goes

Keep reading

Fleet vs. Hiring: The Math on AI Agent Teams

Try Fleet