Fleet 1.13:Teams are now shipping 5x more PRs with autonomous pipelines.See what's new →
FleetFleet
← All posts
Technical

Autonomous AI PR Review: A Practical Setup

Once agents write code fast, review becomes the bottleneck. Here's how to set up a reviewer agent that picks up every PR in minutes — event-driven, with human gates where they matter.

June 1, 2026·7 min read

AI coding agents have gotten fast enough that the bottleneck is no longer writing code. If you run a developer agent on a well-scoped ticket, you will have a pull request in minutes. Then it sits. The developer agent is idle, the ticket is not shipped, and someone has to remember to go look at it.

The bottleneck is review. More precisely: the wait between when a PR lands and when something acts on it. Once you have agents generating PRs at volume, that gap compounds quickly. Three agents opening PRs simultaneously means three things waiting, not three things in parallel.

The good news is that first-pass review — the kind that catches obvious problems, enforces conventions, flags missing tests, and confirms the change matches the ticket — is highly automatable. Not all review, but enough to clear the queue so human reviewers spend time on architecture decisions and security questions rather than formatting and test coverage.

Why first-pass review is the bottleneck

When a developer agent opens a PR, it has done its job. It is no longer running. A PR sitting unreviewed is not a running agent stuck on a problem; it is a completed artifact waiting for the next step. That next step requires a different kind of attention than writing the code.

If you are running multiple developer agents — which is the point of an agent fleet — they can open several PRs in the time it takes a human reviewer to finish one. This is not a complaint about AI code quality. It is a structural problem: the ratio of code generation speed to review bandwidth is wildly out of balance, and it gets worse as you add agents.

Human reviewers help, but they also have other work. They are not sitting at a queue waiting for the next PR to arrive. In practice, PRs wait hours, sometimes days, even when the code is fine. The agent that wrote the code is long idle. The ticket is not shipped. The delay is not in the code generation step; it is entirely in the handoff.

An automated reviewer agent does not fix every review problem. It does fix the gap. It picks up the PR in minutes, runs a first pass, and either approves it or sends it back with specific findings. The human reviewer, when they do look at it, is looking at something that has already passed a structured first-pass check.

What a reviewer agent actually does

A reviewer agent is not a generalist agent asked to "review this PR." The role is specific, and the prompt should match the role.

A reviewer agent configured with a tech-lead role has a prompt that reflects tech-lead concerns: does the implementation match the stated intent, are edge cases handled, does the code introduce patterns inconsistent with the codebase, are there performance implications worth flagging? A QA-lead reviewer prompt looks different: is there test coverage for the new behavior, do existing tests still make sense, are there regression risks?

The distinction matters because a generalist prompt produces generalist output. A reviewer prompt tuned to a specific role produces output that is actually useful for that role's decision. The findings are different. The approval criteria are different.

Mechanically, what the reviewer agent does is straightforward: it checks out the PR branch, runs the review against its role-specific criteria, posts its findings as a comment on the PR, and then publishes a structured outcome — either pr_approved or pr_changes_requested — to the event bus. If changes are requested, it includes specific findings that the developer agent can act on.

One thing worth being direct about: AI review is not a replacement for human judgment on hard problems. A reviewer agent will not catch a fundamentally wrong architectural decision. It will not have the context of a three-year-old system quirk that makes a seemingly clean change dangerous. What it will do is handle the surface area of review that does not require that context — the part that is currently eating hours of human attention and creating the queue.

Wiring it up: event-driven, not manual

The way this works in Fleet is through the fabric event bus. No one pings the reviewer. No cron job polls for open PRs. The handoff is triggered by events.

When a developer agent opens a PR, it publishes a pr_created event to fabric and adds a needs-review label to the pull request. Fleet's watcher picks up the label change and publishes a pr_needs_review event. That event matches a subscription on the reviewer agent — say, a tech-lead or qa-lead role. The watcher starts the reviewer agent automatically.

The reviewer agent runs its review, posts findings to the PR, and publishes either pr_approved or pr_changes_requested to fabric. If the PR is approved, a release-manager agent picks up the pr_approved event and handles the merge — after verifying the approval gate is satisfied.

The full chain looks like this:

developer agent opens PR
  → publishes pr_created, adds needs-review label
  → watcher sees label, publishes pr_needs_review
  → reviewer agent starts (tech-lead or qa-lead role)
  → reviewer posts findings, publishes pr_approved or pr_changes_requested
  → if approved: release-manager agent merges
  → if changes requested: developer agent re-runs, cycle repeats

Nothing in this chain requires a human to notice a PR exists, decide who should review it, or remember to follow up. The waiting happens in the gaps between agents, and nothing inside an individual agent is going to fix that. The fabric event bus is what closes the gaps.

Each step in the chain is logged. The pr_approved event, the pr_changes_requested event, the findings, the merge — all of it goes into Fleet's audit trail. You can see exactly what happened, in what order, and which agent made which decision.

Keeping humans in the loop where it matters

Automated review does not mean humans are out of the loop. It means humans are in the loop on the things that benefit from human judgment.

Fleet supports approval gates. A sensitive path — say, changes to authentication logic, database migrations, or anything touching billing — can be configured to require a human sign-off before the release-manager agent proceeds. The reviewer agent still does the first pass. The human sign-off is the gate on merge.

This is probably the right model for most teams: AI review handles the queue, humans approve the merge on the things that carry real risk. The AI reviewer catches the obvious problems and reduces the surface area a human needs to examine. The human brings the judgment that catches the non-obvious ones.

One practical note: a pr_approved fabric event from a reviewer agent is a structured, auditable signal. It is not a rubber stamp. The agent checked out the code, ran it against its role-specific criteria, and decided the findings were clear. That is a meaningful first pass. It is also not the same as a senior engineer who has worked on the system for three years deciding the architecture is sound.

Be honest with your team about what automated review catches and what it does not. The failure mode is not "AI review misses something" — that happens with human review too. The failure mode is treating AI review as a substitute for human judgment on problems that actually require it.

The audit trail helps here. Every review decision is logged with the agent's findings. If something ships and later turns out to be wrong, you can see exactly what the reviewer agent said, what criteria it applied, and what it flagged. That is more traceable than a lot of human review processes.

Getting started

To set up a reviewer agent with Fleet, you configure a tech-lead or qa-lead agent in your .fleet/config.yaml with a reviewer-specific prompt. The prompt should reflect the actual criteria you want the reviewer to apply — not a generic "review this code" instruction, but specific guidance on what matters for that role.

You wire the subscription so the reviewer agent starts on pr_needs_review events. The developer agent's prompt should publish pr_created and add the needs-review label when it opens a PR. Fleet's watcher handles the rest of the event chain.

For the release-manager, configure it to subscribe to pr_approved events and use Fleet's merge gate — fleet release check — to verify the approval before merging. If you want a human gate on specific paths, configure an approval requirement in the release-manager's settings.

Fleet is a single Go binary. It runs on your own machine or a server you control. No Docker, no cloud orchestration — it runs on your own infrastructure and your source code stays private. It orchestrates the Claude Code agents you are already running — it does not replace them.

The setup takes an afternoon. The result is a review queue that clears in minutes instead of hours, a traceable audit trail for every decision, and human attention reserved for the problems that actually need it.

More at fleetctl.ai.

Try Fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.