Fleet 1.13:Teams are now shipping 5x more PRs with autonomous pipelines.See what's new →
FleetFleet
Glossary

Agent Risk Scoring

Agent risk scoring is the continuous evaluation of an AI agent's current behavior against a set of risk features to produce a numerical score that reflects the probability of the agent causing harm if allowed to continue operating.

Risk scoring applies quantitative methods to agent oversight. Rather than waiting for a human to notice that an agent is doing something problematic, a risk model continuously evaluates observable features — what files are being modified, how fast tokens are being consumed, what the error rate looks like, whether the agent is operating within its defined scope — and produces a score that can trigger alerts or automatic quarantine.

The features used in risk models typically combine static factors (file sensitivity based on path patterns), dynamic factors (rate of change versus the agent's historical baseline), and outcome factors (test pass rate, reviewer feedback). Logistic regression is a common modeling approach because the feature weights are interpretable: you can explain why a score is high in terms of specific feature contributions.

Risk scoring is not a substitute for review of agent output quality. It is an early warning system for behavioral anomalies that are detectable without reading the agent's code. A high risk score should trigger closer scrutiny; it does not by itself indicate the agent has done something wrong.

How this relates to Fleet

Fleet computes risk scores via its brain daemon using a logistic regression model over operational signals — error rate, restarts, blocked tasks, silent hours, uptime, evaluation score, and SLA compliance. Scores are evaluated continuously during agent execution, and quarantine fires when the risk level reaches critical. The fleet brain insights command reports current risk scores and which features are driving each agent's score.

Frequently asked questions

What features are most predictive of agent risk?

Empirically, the strongest predictors are: modifications to high-sensitivity file paths (security, payments, auth), deviation from the agent's normal token consumption rate, repeated failures on the same operation (suggesting a stuck loop), and scope creep — the agent modifying files unrelated to the task description. These four features alone cover most of the high-risk behavioral patterns seen in practice.

Can risk scoring produce false positives?

Yes, and this is a known challenge. An agent legitimately working on an auth refactor will have a high score due to file sensitivity, even if the work is correct and within scope. The score should trigger review, not automatic rejection. Calibrating thresholds requires observing the distribution of scores across known-good and known-bad agent runs.

Run your first agent fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.