Flaky Test Triage and Stabilization Runbook Builder

Create an engineering runbook to identify, classify, quarantine, debug, and permanently stabilize flaky automated tests in CI.

Prompt Template

You are a senior quality engineer. Build a flaky test triage and stabilization runbook for [codebase/team].

Context:
- Tech stack and test tools: [language, framework, Playwright/Cypress/Jest/Pytest/etc.]
- CI provider: [GitHub Actions, GitLab CI, CircleCI, Buildkite, Jenkins]
- Flaky test examples: [test names, failure messages, logs, screenshots]
- Failure pattern: [random, time-based, environment-specific, parallel-only, data-dependent]
- Frequency and impact: [failure rate, blocked deploys, rerun count]
- Recent changes: [dependency upgrades, infrastructure changes, test additions]
- Test environment: [database, network mocks, browser, timezone, containers, seed data]
- Ownership model: [team owners, on-call, QA, platform]
- Constraints: [cannot disable coverage, release deadline, limited CI minutes]

Deliver:
1. Flake severity classification and ownership rules.
2. Triage checklist for reproducing locally and in CI.
3. Common root-cause map with evidence to collect.
4. Quarantine policy that avoids hiding product risk.
5. Stabilization plan with code/test design changes.
6. CI configuration improvements for retries, artifacts, clocks, seeds, and isolation.
7. Metrics dashboard: flake rate, reruns, mean time to fix, top offenders.
8. Example issue template and pull request checklist for fixes.

Be specific to the provided framework and logs. Prefer deterministic fixes over adding blind retries.

Example Output

Flaky Test Runbook โ€” Playwright Checkout Suite

Severity

| Severity | Definition | Action |

|---|---|---|

| P0 | Blocks release or hides payment/auth risk | Owner fixes before release |

| P1 | Fails >5% of CI runs in main branch | Quarantine only with linked fix issue |

| P2 | Rare failure under known external dependency | Track and monitor |

Investigation Path

1. Re-run with the same seed, browser, timezone, and viewport as CI.

2. Collect trace, video, console logs, network HAR, and DB state.

3. Check for shared test data, order dependence, clock assumptions, animation waits, and external API calls.

4. Replace fixed sleeps with assertion-based waits.

5. Isolate test accounts per worker and reset state after each spec.

Likely Fix

For `checkout applies discount`, replace the shared coupon code with a per-test generated code, freeze time at test start, and assert on the server-confirmed order summary instead of a transient toast.

CI Metric

Track flake rate by test file weekly. Any test above 2% for two consecutive weeks gets an owner and due date.

Tips for Best Results

  • ๐Ÿ’กPaste real logs and failure screenshots; flaky tests are solved with evidence, not vibes.
  • ๐Ÿ’กAsk for quarantine rules so the team does not quietly sweep broken smoke alarms under the rug.
  • ๐Ÿ’กSeparate product bugs from test instability before changing assertions.
  • ๐Ÿ’กTrack flake rate over time โ€” one random red build is annoying, a pattern is engineering debt with a siren.