Fleet 1.13:Teams are now shipping 5x more PRs with autonomous pipelines.See what's new →
FleetFleet
Guide

How to Detect and Quarantine a Rogue AI Agent

An agent that is looping, running endlessly, making erratic commits, or behaving in ways inconsistent with its role needs to be stopped quickly and cleanly. "Rogue" in this context does not mean adversarial — it usually means a prompt edge case, a malformed task input, or an unexpected environment condition caused the agent to enter a bad state.

Fleet's brain daemon runs two distinct models on every agent: a 6-dimension evaluation that scores quality, and a separate risk model built on operational signals. When an agent's risk level reaches critical, Fleet quarantines it by stopping the session and publishing a quarantine event. This guide covers both automatic and manual quarantine.

Before you start

  • Fleet brain daemon running (`fleet brain start`)
  • Agents defined in `.fleet/config.yaml`
  • Access to `fleet log` and `fleet status` for monitoring
1

Understand what Fleet monitors

The brain daemon runs two separate models. The 6-dimension evaluation scores task output, reliability, output quality, efficiency, collaboration, and cost. A distinct logistic-regression risk model reads operational signals — error rate, restarts, blocked tasks, silent hours, uptime, eval score, and SLA compliance — and drives auto-quarantine when risk reaches critical. The 6 dimensions are evaluation, not risk; don't conflate them.

# Start the brain daemon to enable evaluation and risk monitoring:
fleet brain start

# Surface flagged agents and insights:
fleet brain insights
2

Review brain insights for risk signals

The brain publishes actionable insights when it detects anomalies. Use fleet brain insights to see current risk assessments and any agents flagged for unusual behavior. This is your early warning system before a quarantine is triggered.

fleet brain insights
3

Manually stop a suspect agent

If you observe an agent behaving badly before the brain's automatic detection fires, stop it immediately. Fleet terminates the tmux session cleanly. The agent's Fabric events and audit trail are preserved.

fleet agent stop backend-dev
# Verify it stopped:
fleet status
4

Examine the audit trail before restarting

Before restarting a quarantined or manually stopped agent, review what it was doing. Use fleet log to see its last decision events. Look for patterns: was it looping on the same action, making repeated failed GitHub API calls, or producing commits that were reverted by the reviewer?

fleet log --agent backend-dev --since 2h
# Look for repeated identical decision events (loop indicator)
# Look for high volume of events in a short window (runaway indicator)
5

Fix the underlying cause before restarting

Common causes of rogue behavior: a malformed task title with ambiguous scope, a GitHub API state the agent did not handle (e.g., a closed PR it kept trying to modify), or a stale skill file. Address the root cause, then restart.

# Check if skills are stale:
fleet skills install --dry-run

# Reinstall if needed:
fleet skills install

# Then restart the agent:
fleet agent start backend-dev

Common pitfalls

  • Stopping an agent mid-task can leave a branch in an incomplete or broken state. Check the repository state after stopping an agent and clean up any partial commits or open PRs that the agent left behind.
  • The brain daemon requires additional compute to run its periodic scoring. On resource-constrained machines, the 5-minute heartbeat polling may add latency to agent startup. Monitor system load when running many agents plus the brain.
  • Not all looping behavior is immediately obvious from the decision log. An agent can loop at a low rate (one action per minute) for hours before its run-time budget runs out. Set a tight enough run-time budget to catch slow loops.
  • Publishing a quarantine fabric event for an agent that has already been manually stopped does not restart or affect the stopped agent. The event is informational for the rest of the fleet.

When Fleet is the right tool

The brain daemon's automatic quarantine is most valuable in setups where agents run unattended overnight or across multiple days. If you are actively watching agent sessions, manual intervention is often faster than waiting for risk scoring to trigger. Enable the brain when you want a safety net for unattended operation.

Frequently asked questions

Can I see the risk assessment for a specific agent?

Yes. `fleet brain insights` surfaces both the 6-dimension evaluation score and the separate risk-model assessment per agent. They are two different models — the evaluation scores quality, the risk model drives auto-quarantine — so read them independently rather than as one blended number.

If an agent is quarantined automatically, does the watcher try to restart it?

No. A quarantined agent is not automatically restarted. The watcher will not re-dispatch work to it until a human explicitly restarts the agent session. This is intentional — quarantine requires human review before reactivation.

Run your first agent fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.