Fleet 1.17.0 is out.See what's new →
FleetFleet
Use case

AI Workflows for Incident Postmortems

The postmortem is due while the incident is still emotionally warm. The engineer who led the response — the only person with the full picture — is also the most exhausted, so the writeup slips a week, the details blur, and the document that ships is a timeline with the insight sanded off. Or worse: it assigns blame in language that teaches everyone to share less next time.

The raw material was all there at resolution time: the channel log, the timeline notes, the fix PRs. What's scarce is the synthesis energy, exactly when it's needed.

How it works with an agent fleet

A Fleet workflow drafts the postmortem from the incident's artifacts while they're fresh, a review step checks it for blamelessness and factual support, and the incident lead approves the final document.

genflows:
  - name: postmortem
    steps:
      - {name: draft, prompt: "Draft the blameless postmortem from the incident corpus: timeline, contributing factors, what worked, action items. Systems language, not people language.", corpus: ["incidents/2026-06-meltdown/*.md"], kind: report, out: postmortem.md}
      - {name: review, prompt: "Flag blame-coded language, claims unsupported by the timeline, and action items with no owner.", depends_on: [draft], kind: review, out: flags.md}
      - {name: lead-ok, depends_on: [draft, review], kind: approval, out: decision.md}
      - {name: publish, depends_on: [lead-ok, draft], kind: publish, out: published.md}

The lead's job shrinks from 'write the document' to 'correct the draft and approve' — an hour of judgment instead of a deferred afternoon of synthesis. The review step's blamelessness check is a real filter: 'engineer X failed to' becomes a flag before it becomes culture damage.

The fleet pattern

Incident corpus → blameless draft → language + factuality review → incident-lead approval → publish. The postmortem ships in days while details are sharp, not weeks later when they've blurred.

Guardrails that matter here

  • The review step flags unsupported claims against the timeline — the postmortem says what the artifacts support
  • Blame-language filtering is an explicit review concern, protecting the reporting culture postmortems depend on
  • Action items without owners get flagged at the gate — the most common postmortem failure caught structurally

Who this is for

SRE and platform teams with a postmortem practice that's healthy in principle and chronically late in practice.

Frequently asked questions

Doesn't an AI-drafted postmortem lose the human insight?

The draft is the floor, not the ceiling: it assembles the timeline and the mechanically derivable factors so the lead's energy goes into the genuinely human part — the judgment about why, and what to change. The approval gate means nothing ships without that judgment applied.

What goes in the incident corpus?

Whatever the response produced: exported channel logs, timeline notes, the fix PR descriptions, monitoring screenshots' captions. Teams standardize a per-incident directory; the workflow points at it.

Run your first agent fleet

One binary. Five minutes. See every agent, coordinate every handoff, and keep a full audit trail of what your fleet did.