I finally hit my breaking point waiting for the model providers to figure out multi-agent orchestration.
What I wanted was simple on paper. Point several agents at the same task, let them split the work, and have them coordinate what they each needed so the whole thing finished in parallel instead of one agent grinding through it serially. One picks up the schema changes, another takes the migration, a third writes the tests. They talk to each other about interfaces. I get back something that would have taken me a full day in a fraction of the time.
Anthropic had just shipped their teams concept. A lead agent with sub-agents. On the surface, that looked like exactly what I was describing. In practice, it was a black box. I had no idea which sub-agent was working on what. I couldn't tell when one had crashed, and they crashed regularly. The only interface back into the system was asking the lead agent, and the lead agent's answers were lackluster at best. Half-remembered summaries. Sub-agents it had forgotten about. Tasks it was convinced were "done" that had actually silently failed ten minutes earlier.
One afternoon I spent 20 minutes trying to pry a straight status report out of a lead agent that didn't actually know what its own team was doing. At some point it occurred to me that I'm an engineer and I built these workflows. If I can't see what's happening inside a five-agent team working on one task, what's happening at companies where 20 developers are each trying to run their own?
I started asking around. The answer was always some version of "yeah, we don't really know. We just use one agent at a time."
The mess is the same everywhere
Talk to enough engineering leaders and you hear the same complaints, almost word for word.
Most developers are running one agent at a time. Maybe two on a good day. And they're babysitting every minute of it: watching the terminal, waiting for the next prompt, catching it when it veers off, copy-pasting output into the next tool. The agent isn't multiplying anyone's output. It's a faster way to type, with a human stapled to the keyboard the entire time. The minute you try to walk away and let it work, it stalls or produces something useless, so nobody walks away.
That puts the real workload ceiling at one developer plus one agent. Barely better than one developer alone, and nowhere near what the tech is actually capable of. The whole pitch of AI-assisted engineering is supposed to be leverage. At 1x, you don't have leverage. You have a typing aid.
The next thing you hear, over and over, is that none of it is wired together. An agent finishes something and the human takes over: opens the PR, pings a reviewer, kicks off tests, posts in Slack, updates the ticket. Every handoff runs through a person, and that person is the bottleneck. A smarter model doesn't help you here. The waiting happens in the gaps between agents, and nothing inside the agent is going to fix that.
And when something goes wrong, nobody can prove what happened. There's no audit log. No record of which agent made which decision or what context it was looking at when it made it. No way to pull a misbehaving agent offline from a central place. The governance story is "trust each developer to do it right," which is the same governance story we all mocked in the pre-CI days, and for the same reason.
What I built, and why it's shaped the way it is
Fleet started as a tmux hack. I wanted visibility into my sub-agents, and the lead agent couldn't give me a straight answer about what its team was doing, so I cut it out of the observation loop entirely. I set up one orchestration agent that dispatched work, and each sub-agent got its own tmux pane that I could look at directly. Split-pane session, one sub-agent per pane, all of them alive in front of me at the same time. No more asking the lead what was going on. I could just watch them work. That was the whole thing at first: get eyes on the sub-agents without a middleman in the way.
Once I could see the team, the next bottleneck was obvious. They still needed me to connect the dots between them. One agent would finish something, I'd notice, I'd tell the next agent to pick it up. So I built an event bus. Now when one agent opened a PR, another agent could pick it up for review without me doing anything. Review turnaround went from hours to minutes, and I didn't have to be in the room for it.
Then I wanted the front of the pipeline to stop being me too. The idea was to have one agent take a rough task, pull it apart, write out acceptance criteria, and hand the work to whichever agents were actually best suited to execute it. And "best suited" turns out to matter more than people give it credit for. Not all development agents are created equal. The system prompt is the job description, and a UI agent tuned for your component conventions, accessibility rules, and state libraries writes completely different code than a backend agent tuned for your service layer, database patterns, and error-handling contracts. Point a generalist "write code" prompt at a ticket and you get generalist output: the same agent that did a passable job on an API handler will happily wander into your component tree and start rewriting your design system because nothing told it not to. Splitting work along those seams isn't really a parallelism trick so much as a quality trick. Each agent actually gets to be good at its slice of the problem instead of mediocre at all of it. So the routing agent's job isn't "pick someone who's free." It's matching the shape of the task to the agent whose prompt was built for that shape. By the time the router was in place, Fleet had stopped being a visibility tool and become an orchestrator.
Cost came next. I was tired of every agent running the most expensive model regardless of what it was actually doing. My triage agent doesn't need Opus. Neither does my deployment watcher. One line in the config and each agent runs the model that fits its job.
Around the same time, the agents started producing better-looking code and I ran into something uncomfortable. The biggest quality problem in my pipeline wasn't the agents. It was the tickets. Vague inputs produce bad code regardless of model, and I'd been blaming the executors for problems that started upstream. So I built support for product owner agents that refine tickets before any development agent touches them.
Every one of these pieces exists because I personally needed it on a specific day for a specific reason. There's nothing in Fleet that got built for a roadmap or a hypothetical future customer. Every feature is something I ran into a wall over, then built my way around.
The result is Fleet. A single binary you download and run. It sits on your own machine or your own server. It doesn't call home, it doesn't need Docker or Kubernetes, and it doesn't need a cloud account. You configure your agents in a YAML file, start them with one command, and from that point on you can see what they're doing, hand work between them, and stop any of them cold when one goes sideways.
Where this goes
The engineering team of 2026 will run roughly 5x the workload of a 2024 team with the same headcount. That isn't a prediction about replacement. It's who does the repetitive work: implementing well-scoped tickets, first-pass code review, test suites, release management, deployment monitoring. That work is going to be handled by coordinated agent fleets, and the humans are going to spend their time on architecture, judgment calls, and the problems that actually need a person in the chair.
That future needs operational infrastructure that doesn't exist in most companies today. You cannot run a fleet of 30 agents the way you run 3. You need to see what every agent is doing in real time. You need work to move between agents without a person in the middle. You need guardrails that the system enforces for you instead of rules you're hoping everyone follows. And you need a complete record of every decision every agent makes, so that when someone asks what happened, you have a real answer.
That's what Fleet is. It isn't clever and it isn't doing anything exotic. It's the boring operational layer this moment has been missing.
Try it. Break it. Tell me what's missing. I'm at jason@fleetctl.ai and I read everything.