Can it verify steps that touch live systems?

It verifies against the corpus — configs, service docs, alert definitions committed to the repo. Steps whose truth lives only in a live console are flagged as unverifiable, which is itself useful: those are the steps most likely to be stale.

How is this different from a docs linter?

A linter checks links and formatting. This reads the runbook's meaning against the current deploy configs and service docs — 'the failover command references a service that no longer exists' is a semantic catch, then a drafted fix, then a human-approved correction.

AI Workflows for Runbook Maintenance

Name: Fleet
Author: Fleet

A stale runbook is worse than no runbook: at 3am, step four references a dashboard that was renamed two quarters ago, and the on-call engineer burns twenty minutes discovering the docs are lying before falling back to tribal knowledge. Runbooks decay because verifying them is nobody's job — they're only read during incidents, which is exactly when nobody can fix them.

The sources of truth (deploy configs, service docs, alert definitions) keep moving; the runbooks that reference them don't.

How it works with an agent fleet

A scheduled Fleet workflow audits each runbook against the current state of its referenced sources, drafts corrections, and routes them to the on-call lead for approval — so decay gets caught on a Tuesday afternoon instead of during an outage.

Trigger: schedule — Tuesdays

Step	Kind	What it does
`audit`	Document	Runs once per file in `docs/runbooks/.md`. “Audit this runbook against the corpus: flag steps referencing renamed/removed services, dashboards, or commands, and draft the corrected version.” Reads `docs/runbooks/.md`, `deploy/*`, `docs/services/.md`.
`verify`	Review	“Check each correction against the sources. Flag any fix that itself can't be verified from the corpus.”
`oncall-lead-ok`	Approval gate	Pauses the run for a human sign-off; records the approver, timestamp, and note.

You build this once on the dashboard canvas; your self-hosted Fleet worker pulls the definition and runs it against your repository.

The fan-out audits every runbook each week; fingerprinting means runbooks whose sources didn't change skip instantly, so the steady-state run is small. The lead approves corrections with the verification flags in view.

The fleet pattern

Schedule → fan-out audit of runbooks against live config/docs → verification review → on-call lead approval. The runbooks' freshness becomes a property of the system rather than a hope.

Guardrails that matter here

Corrections are verified by a second pass before any human sees them — a correction the corpus can't support gets flagged, not approved by momentum
The lead's approval gates every change to documents people follow under pressure
Weekly cadence + incremental rebuild keeps cost proportional to what actually changed

Who this is for

SRE and platform teams whose incident response depends on runbooks, and who have been burned by following one into a renamed world.