Fleet 1.13:Teams are now shipping 5x more PRs with autonomous pipelines.See what's new →
FleetFleet
CLI reference

fleet eval

fleet eval <list|run> [<agent-name>] [flags]

Runs and lists 6-dimension agent evaluations: task output, reliability, quality, efficiency, collaboration, and cost. fleet eval run <agent> computes and stores a fresh score for one agent; fleet eval list shows stored evaluations.

eval requires a subcommand — there is no bare positional form. The brain daemon uses the same scoring model continuously; eval is the on-demand entry point. A low eval score does not quarantine an agent; that is a separate brain risk-model behavior.

Flags

--agent <name>With list, filter stored evaluations to one agent
--limit <N>With list, maximum results to return (default 20)
--jsonWith list, emit the evaluations as JSON (empty result is `[]`)

Examples

Run an evaluation pass on one agent

fleet eval run frontend-dev

List stored evaluations for an agent

fleet eval list --agent tech-lead --limit 10

List all stored evaluations as JSON

fleet eval list --json

Frequently asked questions

What does eval score on?

Six dimensions: task output, reliability, quality, efficiency, collaboration, and cost. Risk is a separate logistic-regression model — eval does not produce a risk score.

Does a low eval score quarantine an agent?

No. eval only reports scores. Automated quarantine is a brain-daemon behavior driven by the separate risk model, not by eval.

Run your first agent fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.