Fleet 1.13:Teams are now shipping 5x more PRs with autonomous pipelines.See what's new →
Fleet
← All posts
Leadership

The Hidden Cost of Unmanaged AI Coding Agents

Your AI provider's invoice is the cheap part. The expensive part is the developer sitting next to every agent, doing the supervising. That cost is nowhere on your dashboard, and it's larger than you think.

April 2, 2026·6 min read

Every engineering leader I talk to can tell me what their AI provider is charging them per month. Almost none can tell me what their team is actually spending to use it.

The invoice is the cheap part. The expensive part is the one that isn't on any dashboard: the developer sitting next to every agent, doing the supervising.

Here's what that supervision actually looks like, in two scenes that will be familiar to anyone on an engineering team right now.

Scene one. A senior engineer opens Claude Code and pulls up a ticket they're trying to close. They type a prompt. The agent produces something. They read it, find two things they don't like, and write a correction. New output. Mostly better, but now there's a subtle bug in the error handling. Another correction. Another round. Forty-five minutes in, nothing has been committed. The agent isn't broken. This is what driving an agent looks like. You can't walk away from it, because the minute you do, it generates something confidently wrong and you won't catch it until the tests fail an hour later. By the time they actually commit, it's noon. They started at 10. There is one PR open.

Scene two. That same PR goes to the release manager later in the day. There's a change request that needs filling out, a back-out plan to document, a risk classification to set, and a security questionnaire that has to be answered rather than rubber-stamped. Half the time the developer who kicked off the agent that morning skipped most of those fields, so the release manager does it themselves. They read the diff to reconstruct the rationale, run the PR against the team's compliance checklist, and ping the original developer on Slack for the bits they can't figure out from the code alone. That's an hour of work before merge even happens. The deploy runs in late afternoon. Somebody should be watching error rates for the next few minutes in case something regressed, but nobody has thirty minutes of uninterrupted attention to stare at a dashboard for a deploy that's probably fine. Two days later on-call gets paged on an endpoint the PR touched, and what should have been a five-minute rollback turns into a forty-five-minute incident because nobody remembered which deploy was the one.

Neither of those scenes is unusual. You've watched both happen, probably this week. And neither is line-itemized anywhere. They show up as general slowness that's hard to diagnose, and as the thing every engineering leader I've talked to says quietly in private: we bought the AI tools, we don't know who uses them, and the team still doesn't feel meaningfully faster.

Of course they don't. The agent is the fast part. The developer supervising the agent is not fast. The reviewer waiting in an inbox is not fast. The person who eventually clicks merge, and the person watching the deploy afterwards, are not fast. Everything around the agent is still a human, and everything-around is most of the elapsed time.

The other cost you don't see for a while

There's a quieter problem that doesn't show up for a few months. When every developer on your team is driving their own agent on their own corner of the codebase, the implementations stop cohering. One developer's agent writes the new service layer one way. Another developer's agent writes a parallel piece of infrastructure in a different module, with different conventions and different error handling and different test patterns. Every PR reviews cleanly in isolation. Three months later, you're looking at a codebase that feels like it was written by six strangers who never spoke to each other, because effectively it was.

That cleanup is on the tab too. You just won't see it until later, and when you do, you'll blame yourselves for not writing a better style guide. The style guide isn't the problem. The problem is that a style guide is a document, and the agents reading it are six separate sessions with no shared memory of what the other five decided yesterday.

What changes when the wrapper isn't a person

Fleet isn't a replacement for Codex, Copilot or Claude Code. Your senior engineers should keep driving agents directly on work that actually benefits from their judgment. What Fleet takes off their plate is the other work: the parts of the pipeline where a person is currently the mechanism by which something moves to the next stage.

A product owner agent refines a ticket before a development agent touches it, so the development agent starts from a well-scoped problem instead of a two-line request. A review agent pulls PRs off an event stream in seconds instead of waiting for someone to check their email. A release agent merges and deploys once the review passes. An SRE agent watches the deploy for 30 minutes and opens a rollback if error rates spike. Each of those agents runs the model appropriate to its job, so you aren't paying Opus rates for work that Haiku handles fine. And because they draw from a shared set of conventions rather than six independent chat sessions, the code they produce agrees with itself across the codebase.

Nothing in that chain is sitting in an inbox. Nothing in it is waiting on a developer to get back from lunch. When something does go wrong, there's a full audit log of which agent did what and why, so you can answer the question when someone asks.

None of this removes humans from your engineering process. It stops spending senior engineer time on the parts of the pipeline where no human judgment is actually required. That time is the most expensive thing on your team, and right now it's being spent on handoffs, queue-waits, and babysitting prompt loops. That's the hidden cost. That's what you stop paying when the wrapper around the agent stops being a person.

Try Fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.