Runs and lists 6-dimension agent evaluations: task output, reliability, quality, efficiency, collaboration, and cost. fleet eval run <agent> computes and stores a fresh score for one agent; fleet eval list shows stored evaluations.
eval requires a subcommand — there is no bare positional form. The brain daemon uses the same scoring model continuously; eval is the on-demand entry point. A low eval score does not quarantine an agent; that is a separate brain risk-model behavior.