Data Quality Incident Root Cause Analysis Builder

Investigate metric discrepancies, broken pipelines, missing events, or dashboard errors with a structured data quality incident RCA and prevention plan.

Prompt Template

You are a data reliability lead. Help investigate a data quality incident and produce a root cause analysis with prevention steps.

**Incident summary:** [what looked wrong]
**Metric/table/dashboard affected:** [name and link/description]
**Business impact:** [wrong report, decision risk, customer-facing error, revenue impact]
**Detection source:** [analyst, stakeholder, alert, data test, customer]
**Time window affected:** [start/end or unknown]
**Systems involved:** [source app, warehouse, ETL/ELT, BI tool, reverse ETL, spreadsheet]
**Recent changes:** [deploys, schema changes, tracking changes, backfills, vendor changes]
**Evidence available:** [queries, logs, screenshots, row counts, test failures]
**Known constraints:** [limited lineage, no owner, missing logs, urgent board report]
**Stakeholders:** [data, product, finance, marketing, executives]

Produce:
1. **Incident summary** — plain-English explanation of what happened and why it matters.
2. **Impact assessment** — affected metrics, audiences, reports, date range, confidence level, and decisions at risk.
3. **Investigation plan** — step-by-step checks from source event to final dashboard.
4. **Root cause hypotheses** — ranked causes with evidence needed to confirm or reject each.
5. **SQL/query checks** — sample validation queries or logic to compare counts, freshness, nulls, duplicates, joins, and filters.
6. **RCA narrative** — timeline, root cause, contributing factors, detection gap, and resolution.
7. **Remediation plan** — data fix, stakeholder communication, dashboard annotations, and backfill validation.
8. **Prevention plan** — tests, ownership, lineage, alert thresholds, release checks, and runbook updates.
9. **Stakeholder update draft** — concise message for non-technical stakeholders.

Separate confirmed facts from assumptions. Do not invent data that was not provided.

Example Output

Data Quality Incident RCA — MRR Dashboard Drop

Summary

The April MRR dashboard showed an 18% drop that was not real. The issue was caused by a billing export schema change: `plan_amount` switched from dollars to cents, while the transformation still divided by 100 only for older records.

Impact

| Area | Impact | Confidence |

|---|---|---|

| Executive MRR dashboard | April 14-16 understated MRR | High |

| Finance forecast export | Not affected; uses source billing report | High |

| Marketing cohort report | Affected for paid conversion value | Medium |

Investigation Checks

- Compare raw billing totals by day against transformed `fact_subscription_revenue`.

- Check row freshness and duplicate invoice IDs.

- Inspect schema diff from the billing connector release.

- Validate dashboard filter logic after backfill.

Prevention

1. Add an accepted range test for MRR day-over-day movement above 5%.

2. Add a schema-change alert for billing connector numeric fields.

3. Assign Finance Ops as business owner for revenue metric sign-off.

4. Annotate the dashboard and send a correction note to stakeholders.

Tips for Best Results

  • 💡Start with impact and affected decisions so the investigation does not become a purely technical treasure hunt.
  • 💡Compare source, staging, model, and BI layers separately to isolate where the issue entered.
  • 💡Document assumptions explicitly; data incidents get messy fast when guesses become facts.
  • 💡After fixing the data, update dashboards and stakeholders so stale screenshots do not keep circulating.