Incident Postmortem Template Builder

Generate a blameless engineering postmortem with timeline, root cause analysis, and follow-up actions after a production incident.

Prompt Template

You are a senior SRE facilitating a blameless postmortem. Draft a complete incident review based on the details below.

**Incident title:** [short name]
**Date and duration:** [when it happened and how long it lasted]
**Customer impact:** [who was affected and how]
**Severity level:** [SEV1 / SEV2 / SEV3]
**Systems involved:** [services, databases, providers]
**Detection method:** [alert, user report, dashboard, support ticket]
**Timeline notes:** [key events with times]
**Known root cause:** [what failed]
**Contributing factors:** [process, tooling, staffing, architecture]
**Temporary mitigation:** [what stopped the immediate issue]
**Permanent fix ideas:** [candidate fixes]

Output:
1. Executive summary
2. Customer impact statement
3. Minute-by-minute incident timeline
4. Five Whys root cause analysis
5. What went well / what went poorly / where we got lucky
6. Corrective actions table with owner, priority, and due date
7. Prevention recommendations for monitoring, testing, rollout safety, and communication

Tone: factual, blameless, and useful for engineering leadership.

Example Output

Executive Summary

On 14 May, a configuration deploy to the billing worker caused duplicate invoice retries for 38 minutes. 11.2% of payment attempts failed and 64 customers saw delayed receipts. The issue was detected by support before alerts fired.

Five Whys

1. Why did invoices retry indefinitely? The worker ignored the retry cap after a config parse failure.

2. Why did the parse fail? A new timeout field shipped as a string instead of an integer.

3. Why was that not caught? Config validation only ran in app boot, not in background workers.

4. Why was rollout not halted? The deploy pipeline lacked canary checks for worker-only services.

5. Why did support detect it first? No alert existed for retry queue growth by tenant.

Corrective Actions

|---|---|---|---|

| Add schema validation for worker config | Priya | P0 | 2026-05-20 |

| Create retry queue anomaly alert | Anton | P1 | 2026-05-18 |

Tips for Best Results

💡Keep the timeline precise, it often reveals detection and escalation gaps better than summaries do
💡Separate root cause from contributing factors so remediation stays targeted
💡Assign owners and due dates immediately or the postmortem will turn into shelfware
💡Blameless does not mean vague, be explicit about system and process failures

Try it with

ChatGPT Claude Gemini

Related Prompts

Coding

Kubernetes Deployment Troubleshooting Runbook Builder

Generate a practical Kubernetes runbook for diagnosing failed rollouts, crash loops, image pulls, probes, config errors, and safe rollback paths.

ChatGPTClaudeGemini

Coding

Code Review Assistant

Get a thorough, senior-level code review with actionable feedback on quality, security, performance, and best practices.

ChatGPTClaudeGemini

Coding

Debugging Detective

Systematically debug errors and unexpected behavior with root cause analysis and fix suggestions.

ChatGPTClaudeGemini