Can the agent decide what to refactor on its own?

Not in the standard configuration. The agent is assigned specific tasks. If you want automated identification of refactoring opportunities, you would write a separate analysis agent that creates tickets, which then get assigned to the developer agent.

How do you prevent the agent from making too many changes at once?

The task description scopes the work. Write the ticket with a specific target — one function, one module, one pattern — and the agent stays within that scope. The reviewer prompt is also configured to flag PRs that exceed the stated scope.

AI Agents for Refactoring

Name: Fleet
Author: Fleet

Refactoring is the work that never makes it onto the sprint because it produces no user-visible output. It is easy to defer indefinitely. The cost of that deferral is slow feature development: engineers spend more time navigating confusing code, more time debugging because the structure obscures what is happening, and more time in PR review because reviewers cannot tell if a change is correct.

When refactoring does happen, it often creates large diffs that are hard to review safely. A single refactoring PR might touch 30 files. The reviewer cannot hold all of it in their head at once, and the chance of approving something that breaks a non-obvious dependency is high.

How it works with an agent fleet

A refactoring agent works in small, targeted increments. It is assigned a specific refactoring task — extract a function, rename a module, replace a pattern across a package — and opens a focused PR for each unit of work.

# Assign a targeted refactoring task
fleet task assign backend-dev "Extract database retry logic into a reusable middleware (see ADR-12)"

The agent follows the /fleet-dev-task skill: creates a branch, makes the targeted change, ensures tests still pass, and opens a PR with a clear description of what changed and why. Small scope per PR makes review tractable.

The fleet pattern

Refactoring tasks are queued as GitHub issues with the ready label. The watcher daemon dispatches a developer agent for each. Each agent produces one focused PR. A tech-lead reviews for correctness and architectural fit. The result is a series of small, reviewable changes rather than one large risky refactoring.

Guardrails that matter here

Agent prompt restricts scope to the described task — it does not expand into adjacent refactoring without a new ticket
All tests must pass before the agent opens a PR
Tech-lead review required before merge — refactoring that breaks non-obvious contracts is caught at review, not in production

Who this is for

Engineering teams carrying a refactoring backlog that never gets addressed during normal sprints. Works best when the refactoring tasks are well-specified: the agent needs to know what to change, not figure out what needs changing.