Fleet 1.13:Teams are now shipping 5x more PRs with autonomous pipelines.See what's new →
FleetFleet
Guide

How to Set Up AI QA Testing Agents

A QA agent is most useful when it runs automatically on every pull request rather than waiting for someone to remember to ask for a test pass. The goal is an agent that checks out the branch, exercises the change, and publishes a clear pass-or-fail decision the rest of the workflow can act on — without a human kicking it off each time.

Fleet wires this up through its reactive event chain. A QA agent subscribes to the pr_needs_review event, runs the built-in /fleet-review-pr skill, and publishes a pr_approved or pr_changes_requested fabric event. The one detail that trips people up: the role string matters. qa-engineer and qa-lead inject the review skill directive into the agent's handbook prompt; a generic role like qa injects nothing. This guide shows how to stand up a working QA agent and explains why fabric — not GitHub's review state — is the source of truth for the decision.

Before you start

  • Fleet installed and initialized in your repository (`fleet init`)
  • Fleet skills installed: `fleet skills install`
  • The Fleet watcher running in your repository (`fleet watcher start`)
  • GitHub CLI (`gh`) authenticated with read access to pull requests
  • At least one developer agent already opening PRs into the `needs-review` state
1

Use a QA role that actually triggers the skill

Fleet injects the /fleet-review-pr skill directive into an agent's handbook prompt based on its role. Only specific role strings hit that branch. For QA, use qa-engineer or qa-lead. A generic qa falls through to the default case and injects no skill at all — the agent will start but will not know it is supposed to review the PR. Pick the role string deliberately.

# Triggers the /fleet-review-pr skill directive:
#   qa-engineer
#   qa-lead
#
# Injects NOTHING (default case) — do not use:
#   qa
#   tester
#   developer
2

Define the qa-lead agent in config

Add a QA agent to the agents section of .fleet/config.yaml. Give it a qa-lead role and subscribe it to pr_needs_review so it auto-starts when a PR enters review. QA work is largely procedural — running tests and checking output — so Sonnet is usually the right model. You do not mention the skill in any prompt file; the role drives the directive.

agents:
  - name: qa-lead
    role: qa-lead
    department: engineering
    reports_to: tech-lead
    model: claude-sonnet-4-5
    subscribes_to: pr_needs_review
3

Install the Fleet skills

The /fleet-review-pr skill is vendored inside the Fleet binary and must be synced to ~/.claude/skills/fleet/ before any agent can run it. Run fleet skills install once after install and again after every fleet upgrade, since skill fixes ship with newer binaries. Verify the skill is present with fleet skills list.

fleet skills install
fleet skills list
4

Trigger a review with the needs-review label

When a developer opens a PR and adds the needs-review label, the watcher's label loop detects it and publishes a pr_needs_review fabric event. The subscription processor matches that event to your qa-lead agent and starts it. The agent runs /fleet-review-pr: it finds the PR, checks out the branch, reviews the change, and publishes its decision. You can trigger one manually to test.

# Add the label to a PR to start the QA agent
gh pr edit 57 --add-label needs-review

# Watch the QA agent's decisions
fleet log --type decision --agent qa-lead --since 1h
5

Understand why fabric is the source of truth

Fleet agents all share one GitHub identity, so gh pr review --approve fails for a PR that identity authored ("authors cannot approve their own PR"). The /fleet-review-pr skill is self-review-safe: it publishes the pr_approved (or pr_changes_requested) fabric event regardless of whether the formal GitHub approval call succeeds. Downstream, the release-manager's merge gate reads the fabric event, not GitHub's reviewDecision. Fabric is the source of truth; GitHub's review state is just one signal.

# The QA skill always publishes the fabric decision, e.g.:
# fleet fabric publish --kind pr_approved --sender qa-lead \
#   --summary "QA passed PR #57" --payload '{"pr": 57}'
#
# The release-manager merge gate reads this event, not gh's reviewDecision.
6

Confirm the QA decision landed

Check the decision log to confirm the QA agent finished and published a result. Look for pr_approved or pr_changes_requested. If neither appears, the agent exited before completing the skill — attach to its tmux session to see the error, and verify skills are installed.

fleet log --type decision --since 1h

# If nothing published, confirm skills are current:
fleet skills install --dry-run

Common pitfalls

  • Using a generic `qa` role is the most common mistake. It is not an error — the agent starts cleanly — but no skill directive is injected, so the agent has no instructions to review the PR. Use `qa-engineer` or `qa-lead`.
  • If skills are not installed or are stale, the agent cannot run `/fleet-review-pr`. When a QA agent starts and then does nothing useful, run `fleet skills install --dry-run` first to confirm the skill files are present and current.
  • A QA agent is a first-pass structural and functional check, not a substitute for human judgment on security-sensitive or architecture-changing PRs. Treat its `pr_approved` as one signal, especially on high-stakes changes.
  • Do not expect `gh pr review --approve` to carry the decision for fleet-authored PRs — it will fail on self-authored PRs. Always read the fabric event in `fleet log` to confirm the real decision.
  • If the QA agent publishes `pr_changes_requested`, the developer needs to be re-dispatched for another round. Confirm the chain actually re-fires the developer in `fleet log` rather than assuming the feedback was picked up.

When Fleet is the right tool

A Fleet QA agent is a good fit when developer agents are opening PRs faster than a person can test them, and you want a consistent automated check on every PR before it reaches the merge gate. It is honestly less useful when your QA is mostly exploratory or requires deep product knowledge an agent does not have — in that case use the QA agent for the mechanical test-run pass and keep a human in the loop for the judgment calls. Start with one qa-lead agent and confirm the decision flows through fabric before adding more.

Frequently asked questions

Why does my QA agent start but never review anything?

Almost always one of two causes: the role is a generic `qa` (which injects no skill directive — use `qa-engineer` or `qa-lead`), or skills are not installed. Run `fleet skills install --dry-run` to check, then verify the role string in `.fleet/config.yaml`.

Can a QA agent approve a PR another fleet agent authored?

Yes, through fabric. GitHub blocks `gh pr review --approve` on self-authored PRs because fleet agents share one identity, but the `/fleet-review-pr` skill publishes the `pr_approved` fabric event regardless. The release-manager merge gate reads that event, so the approval still counts.

What event should the QA agent subscribe to?

`pr_needs_review`. The watcher publishes this when the `needs-review` label is added to a PR. Subscribing the qa-lead agent to it makes the agent auto-start on every PR that enters review.

Do I need a separate model for QA agents?

Not required, but QA work is mostly procedural, so Sonnet usually offers the right balance of cost and capability. Reserve Opus for agents making architectural or complex-reasoning decisions.

Run your first agent fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.