Database Backup Restore Drill Runbook Builder

Create a practical database backup restore drill runbook that validates recovery time, recovery point, data integrity, and team readiness.

Prompt Template

You are a senior SRE and database reliability engineer. Create a backup restore drill runbook for:

Database system: [PostgreSQL, MySQL, MongoDB, SQL Server, etc.]
Hosting environment: [AWS RDS, Cloud SQL, self-hosted Kubernetes, bare metal, etc.]
Application context: [brief description of the app and critical data]
Backup strategy: [snapshot, WAL/binlog, logical dump, physical backup, continuous backup]
Target RPO/RTO: [recovery point objective and recovery time objective]
Restore target: [staging environment, isolated VPC, temporary instance]
Team roles: [DBA, backend engineer, SRE, security, product owner]
Constraints: [PII handling, encryption, cost, data residency, downtime, compliance]

Produce:
1. Drill objective and success criteria
2. Preconditions, access checks, and safety boundaries
3. Step-by-step restore procedure with commands/placeholders where useful
4. Data integrity validation queries and application smoke tests
5. Timing log template to compare actual RPO/RTO against targets
6. Rollback/cleanup process for temporary resources
7. Evidence checklist for audit or compliance records
8. Post-drill review questions and improvement backlog

Assume this runbook will be used during a real incident, so make it explicit, calm, and copy-paste friendly.

Example Output

# PostgreSQL Restore Drill Runbook

**Objective:** Restore production snapshot plus WAL into isolated staging and prove the app can read customer, order, and billing tables.

**Target:** RPO <= 15 minutes, RTO <= 60 minutes

Preconditions

- Confirm restore target is isolated from production networks

- Verify encryption key access with the SRE on call

- Freeze outbound email/SMS jobs in the restored app

- Create timing log: start, snapshot selected, restore complete, app smoke test complete

Restore Steps

1. Select latest automated snapshot before drill start.

2. Create temporary restore instance: db-restore-drill-{{date}}.

3. Apply WAL to target timestamp: {{timestamp}}.

4. Attach read-only app credentials.

5. Run schema migration status check.

Integrity Checks

- orders count for previous complete day matches analytics export within 0.5%.

- No orphaned order_items rows:

select count(*) from order_items oi left join orders o on oi.order_id = o.id where o.id is null;

- Latest payment record timestamp is within the expected RPO window.

Smoke Tests

- Login with test admin

- Load customer profile

- Create draft order with payment gateway in sandbox mode

Evidence

Save restore logs, query screenshots, timing log, and post-drill action items in /drills/{{date}}.

Tips for Best Results

  • 💡Run restore drills before you need them; untested backups are just expensive hope with timestamps.
  • 💡Always restore into an isolated environment so production jobs cannot email customers or mutate live systems.
  • 💡Track actual RPO and RTO during every drill; those numbers matter more than the backup vendor dashboard.