Webhook Retry and Idempotency Design Guide
Design resilient webhook delivery and consumer handling with idempotency keys, retry policies, signature verification, and dead-letter recovery workflows.
Prompt Template
You are a senior distributed systems engineer specializing in webhook reliability. Help me design a production-ready webhook delivery and consumption system. **Use case:** [e.g., SaaS billing events, marketplace order updates, CRM sync] **Webhook producer or consumer?:** [producer / consumer / both] **Expected volume:** [events per minute/day] **Payload shape:** [briefly describe fields and approximate size] **Current pain points:** [duplicate deliveries, missing retries, out-of-order events, slow downstream services] **Security requirements:** [HMAC signature, IP allowlist, mTLS, none yet] **Infrastructure:** [e.g., Node.js + Postgres + SQS, Laravel + Redis, serverless] **Downstream dependencies:** [APIs, database writes, third-party services] **Failure tolerance:** [how long events can wait, acceptable data loss = none/low/medium] Please provide: 1. **Reference Architecture** for reliable webhook publishing and/or consumption 2. **Idempotency Strategy** including unique event IDs, storage design, TTL, and replay behavior 3. **Retry Policy** with backoff schedule, max attempts, jitter, and terminal failure handling 4. **Security Layer** covering signature verification, timestamp tolerance, replay attack prevention, and secret rotation 5. **Ordering and Concurrency Rules** for events that may arrive out of order or be processed in parallel 6. **Dead-Letter and Replay Workflow** with operator runbook steps 7. **Observability Plan** including logs, metrics, alerts, and dashboard widgets 8. **Implementation Checklist** with common mistakes to avoid Include pseudocode or code snippets for the stack I specify, plus a sample event table schema.
Example Output
# Webhook Reliability Blueprint
**Use case:** Subscription billing events
**Stack:** Node.js + Postgres + SQS
Architecture
Producer writes each event to an `outbox_events` table inside the same DB transaction as the business action. A relay worker publishes to the delivery queue. Consumers verify the signature, persist `event_id`, process side effects, and mark the event complete.
Retry Policy
| Attempt | Delay | Notes |
|---|---:|---|
| 1 | immediate | initial delivery |
| 2 | 30s | transient failure |
| 3 | 2m | add jitter ±20% |
| 4 | 10m | alert if failure rate spikes |
| 5 | 1h | final automatic retry |
| 6 | manual replay | move to DLQ and page ops |
Idempotency Table
CREATE TABLE processed_webhooks (
event_id TEXT PRIMARY KEY,
event_type TEXT NOT NULL,
received_at TIMESTAMPTZ NOT NULL DEFAULT now(),
status TEXT NOT NULL,
response_code INT,
idempotency_expires_at TIMESTAMPTZ
);
Consumer Flow
1. Verify HMAC signature and reject requests older than 5 minutes.
2. Check `processed_webhooks` for `event_id`.
3. If already completed, return 200 with `duplicate=true`.
4. If not seen, insert row, process side effects inside a transaction, then mark complete.
5. On failure, keep the row and retry safely because writes are keyed by `event_id`.
Alerts
- Retry queue depth > 500 for 10 minutes
- Signature failures > 2%
- DLQ count > 0 in production
- P95 processing time > 5s
Tips for Best Results
- 💡Use the outbox pattern when you publish webhooks from your own system. It prevents the classic bug where the DB commit succeeds but the webhook send never happens.
- 💡Never treat a webhook as exactly-once delivery. Design for at-least-once and make the consumer idempotent by default.
- 💡Store the raw payload and signature headers for failed events so support can replay them without guessing what the sender actually sent.
- 💡Return 2xx quickly and offload heavy work to a queue when possible. Slow synchronous handlers create duplicate deliveries and timeout storms.
Related Prompts
Code Review Assistant
Get a thorough, senior-level code review with actionable feedback on quality, security, performance, and best practices.
Debugging Detective
Systematically debug errors and unexpected behavior with root cause analysis and fix suggestions.
Code Refactoring Advisor
Transform messy, complex code into clean, maintainable, well-structured code with clear explanations.