Flaky Test Triage and Stabilization Runbook Builder
Create an engineering runbook to identify, classify, quarantine, debug, and permanently stabilize flaky automated tests in CI.
Prompt Template
You are a senior quality engineer. Build a flaky test triage and stabilization runbook for [codebase/team]. Context: - Tech stack and test tools: [language, framework, Playwright/Cypress/Jest/Pytest/etc.] - CI provider: [GitHub Actions, GitLab CI, CircleCI, Buildkite, Jenkins] - Flaky test examples: [test names, failure messages, logs, screenshots] - Failure pattern: [random, time-based, environment-specific, parallel-only, data-dependent] - Frequency and impact: [failure rate, blocked deploys, rerun count] - Recent changes: [dependency upgrades, infrastructure changes, test additions] - Test environment: [database, network mocks, browser, timezone, containers, seed data] - Ownership model: [team owners, on-call, QA, platform] - Constraints: [cannot disable coverage, release deadline, limited CI minutes] Deliver: 1. Flake severity classification and ownership rules. 2. Triage checklist for reproducing locally and in CI. 3. Common root-cause map with evidence to collect. 4. Quarantine policy that avoids hiding product risk. 5. Stabilization plan with code/test design changes. 6. CI configuration improvements for retries, artifacts, clocks, seeds, and isolation. 7. Metrics dashboard: flake rate, reruns, mean time to fix, top offenders. 8. Example issue template and pull request checklist for fixes. Be specific to the provided framework and logs. Prefer deterministic fixes over adding blind retries.
Example Output
Flaky Test Runbook โ Playwright Checkout Suite
Severity
| Severity | Definition | Action |
|---|---|---|
| P0 | Blocks release or hides payment/auth risk | Owner fixes before release |
| P1 | Fails >5% of CI runs in main branch | Quarantine only with linked fix issue |
| P2 | Rare failure under known external dependency | Track and monitor |
Investigation Path
1. Re-run with the same seed, browser, timezone, and viewport as CI.
2. Collect trace, video, console logs, network HAR, and DB state.
3. Check for shared test data, order dependence, clock assumptions, animation waits, and external API calls.
4. Replace fixed sleeps with assertion-based waits.
5. Isolate test accounts per worker and reset state after each spec.
Likely Fix
For `checkout applies discount`, replace the shared coupon code with a per-test generated code, freeze time at test start, and assert on the server-confirmed order summary instead of a transient toast.
CI Metric
Track flake rate by test file weekly. Any test above 2% for two consecutive weeks gets an owner and due date.
Tips for Best Results
- ๐กPaste real logs and failure screenshots; flaky tests are solved with evidence, not vibes.
- ๐กAsk for quarantine rules so the team does not quietly sweep broken smoke alarms under the rug.
- ๐กSeparate product bugs from test instability before changing assertions.
- ๐กTrack flake rate over time โ one random red build is annoying, a pattern is engineering debt with a siren.
Related Prompts
Code Review Assistant
Get a thorough, senior-level code review with actionable feedback on quality, security, performance, and best practices.
Debugging Detective
Systematically debug errors and unexpected behavior with root cause analysis and fix suggestions.
Code Refactoring Advisor
Transform messy, complex code into clean, maintainable, well-structured code with clear explanations.