API Rate Limit and Quota Design Guide
Design a developer-friendly API rate limiting and quota system with algorithms, headers, errors, storage, and rollout strategy.
Prompt Template
Act as a senior backend architect designing rate limits for a public API. Create an API rate limit and quota design guide for [product/API] used by [developer audience]. API context: [REST/GraphQL/webhooks/streaming/internal API] Traffic profile: [requests per second, burst patterns, top endpoints, tenant sizes] Business tiers: [free, pro, enterprise, partner, internal] Fairness goals: [protect infrastructure, prevent abuse, monetize usage, guarantee enterprise capacity] Current stack: [language/framework, gateway, cache, database, queue, observability] Failure tolerance: [strict enforcement vs graceful degradation] Deliver: 1. **Rate limit policy matrix** by tier, endpoint class, authentication state, and time window 2. **Algorithm recommendation** — token bucket, leaky bucket, fixed window, sliding window, or hybrid, with tradeoffs 3. **Quota model** — monthly usage quotas, burst allowances, overage behavior, and upgrade paths 4. **Response contract** — 429 body, retry guidance, headers, idempotency notes, and SDK behavior 5. **Storage and scaling design** — cache keys, distributed counters, race conditions, and fallback mode 6. **Abuse and exception handling** — suspicious patterns, allowlists, partner overrides, and admin tooling 7. **Observability plan** — metrics, alerts, dashboards, and customer-facing usage reporting 8. **Rollout plan** — shadow mode, communication, migration timeline, and rollback steps Include concrete examples for [2-3 critical endpoints] and call out edge cases developers commonly miss.
Example Output
API Rate Limit Design — Payments API
Policy Matrix
| Tier | Default | Write endpoints | Report exports | Burst |
|---|---:|---:|---:|---:|
| Free | 60 req/min | 20 req/min | 5/hour | 2x for 30 sec |
| Pro | 600 req/min | 120 req/min | 30/hour | 2x for 60 sec |
| Enterprise | contract-based | contract-based | contract-based | custom |
Recommended Algorithm
Use token bucket for per-minute limits because customers need short bursts during sync jobs. Add a monthly quota counter for billing and abuse management.
429 Response Contract
Status: 429 Too Many Requests
Headers: RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, Retry-After
Body: {"error":"rate_limit_exceeded","message":"Write endpoint limit exceeded. Retry after 18 seconds.","upgrade_url":"..."}
Rollout
Run shadow mode for 14 days, email developers who would exceed limits, publish docs, then enforce free-tier write endpoints first.
Tips for Best Results
- 💡Provide real traffic patterns if you have them; rate limit design is much better with burst and endpoint data.
- 💡Ask for both the policy and the developer-facing error contract so the system is usable, not just protective.
- 💡Include business tiers early because pricing and abuse controls often shape the technical design.
- 💡Request a shadow-mode rollout to avoid surprising legitimate customers.
Related Prompts
Background Job Queue Design Guide
Design a reliable background job queue system with retries, idempotency, scheduling, observability, and failure handling.
Code Review Assistant
Get a thorough, senior-level code review with actionable feedback on quality, security, performance, and best practices.
Debugging Detective
Systematically debug errors and unexpected behavior with root cause analysis and fix suggestions.