Terraform State Drift Reconciliation Runbook Builder

Create a runbook for safely detecting and reconciling Terraform state drift, imports, moved resources, provider changes, and plan risk.

Prompt Template

You are a senior platform engineer helping a team reconcile Terraform state drift safely. Build a runbook for the situation below.

Infrastructure scope: [cloud/provider, accounts/projects, environments, modules]
Terraform setup: [backend, workspaces, state storage, module structure, provider versions]
Drift signal: [terraform plan output, cloud console change, failed apply, imported resource, deleted resource, unknown]
Affected resources: [resource types, names, criticality, dependencies]
Environment risk: [dev, staging, production, shared services, regulated system]
Recent changes: [manual hotfix, provider upgrade, module refactor, incident response, account migration]
Team constraints: [change window, approvals, on-call coverage, access limits, remote state locks]
Desired outcome: [adopt manual change, revert cloud change, import resource, move address, recreate resource, investigate only]
Observability/backup: [state snapshots, cloud audit logs, monitoring, cost alerts]
CI/CD process: [plan in PR, apply approvals, Atlantis, Terraform Cloud, GitHub Actions, manual apply]

Create:
1. Drift triage summary with blast radius and urgency.
2. Pre-flight checklist for state backup, lock ownership, credentials, and audit logs.
3. Decision tree: import, state mv, config update, cloud revert, taint/replace, or no-op.
4. Step-by-step reconciliation plan with commands shown as placeholders, not blindly executable.
5. Plan review checklist for deletes, replacements, dependencies, and provider diffs.
6. Rollback and abort criteria before apply.
7. Communication note for stakeholders and change approval.
8. Post-apply validation checks in cloud, Terraform state, monitoring, and cost.
9. Prevention actions: policy, drift detection cadence, permissions, documentation.
10. Incident notes if the drift came from an emergency console change.

Be conservative. Never recommend applying a plan with unexplained deletes or replacements in production.

Example Output

Drift Runbook: Production S3 Bucket Policy

Triage

Terraform plans to replace a bucket policy after an emergency console edit during last night's incident. Blast radius is high because the bucket serves customer uploads. Do not apply until the manual change is understood and represented in code.

Decision

Adopt the approved emergency change into Terraform config, then run a plan that shows no bucket replacement and only the intended policy diff.

Pre-Flight

- Download a state snapshot from the remote backend.

- Confirm no active state lock or queued apply.

- Pull CloudTrail/audit logs for the console edit.

- Ask security to approve the final policy document.

Apply Gate

Proceed only if the plan contains zero resource deletes, zero bucket replacement, and a policy diff matching the reviewed JSON.

Tips for Best Results

  • 💡Paste the plan summary and resource addresses; drift work depends on exact addresses.
  • 💡Back up state and read the cloud audit trail before touching import or state commands.
  • 💡Treat unexplained replacements as a stop sign, especially in production.