Comparing agent efficiency across different tasks is difficult when raw token counts are the only metric. A task that requires many short tool calls (file reads, test executions) uses fewer tokens than a task requiring long reasoning traces, but the wall-clock time and operational complexity may be similar. ACUs provide a composite metric that weights these dimensions.
The term originates with Devin (Cognition AI), whose ACU is a normalized billing unit — roughly 15 minutes of active autonomous work — combining VM time, model inference, and network use. Beyond Devin, the specific definition of an ACU varies by provider and tooling. Some systems define it as tokens multiplied by a complexity factor; others use a cost-normalized unit tied to billing rates. The important property is consistency: the same task should produce approximately the same ACU count across runs, making variance in ACU consumption a signal worth investigating.
For fleet operators, ACUs are useful for capacity planning (how many tasks can run in parallel given a compute budget), cost allocation (which projects or teams are consuming the most compute), and efficiency benchmarking (is agent B completing similar tasks with fewer ACUs than agent A).