API Rate Limit and Quota Design Guide
Design a developer-friendly API rate limiting and quota system with algorithms, headers, errors, storage, and rollout strategy.
Prompt Template
Act as a senior backend architect designing rate limits for a public API. Create an API rate limit and quota design guide for [product/API] used by [developer audience]. API context: [REST/GraphQL/webhooks/streaming/internal API] Traffic profile: [requests per second, burst patterns, top endpoints, tenant sizes] Business tiers: [free, pro, enterprise, partner, internal] Fairness goals: [protect infrastructure, prevent abuse, monetize usage, guarantee enterprise capacity] Current stack: [language/framework, gateway, cache, database, queue, observability] Failure tolerance: [strict enforcement vs graceful degradation] Deliver: 1. **Rate limit policy matrix** by tier, endpoint class, authentication state, and time window 2. **Algorithm recommendation** — token bucket, leaky bucket, fixed window, sliding window, or hybrid, with tradeoffs 3. **Quota model** — monthly usage quotas, burst allowances, overage behavior, and upgrade paths 4. **Response contract** — 429 body, retry guidance, headers, idempotency notes, and SDK behavior 5. **Storage and scaling design** — cache keys, distributed counters, race conditions, and fallback mode 6. **Abuse and exception handling** — suspicious patterns, allowlists, partner overrides, and admin tooling 7. **Observability plan** — metrics, alerts, dashboards, and customer-facing usage reporting 8. **Rollout plan** — shadow mode, communication, migration timeline, and rollback steps Include concrete examples for [2-3 critical endpoints] and call out edge cases developers commonly miss.
Example Output
API Rate Limit Design — Payments API
Policy Matrix
| Tier | Default | Write endpoints | Report exports | Burst |
|---|---:|---:|---:|---:|
| Free | 60 req/min | 20 req/min | 5/hour | 2x for 30 sec |
| Pro | 600 req/min | 120 req/min | 30/hour | 2x for 60 sec |
| Enterprise | contract-based | contract-based | contract-based | custom |
Recommended Algorithm
Use token bucket for per-minute limits because customers need short bursts during sync jobs. Add a monthly quota counter for billing and abuse management.
429 Response Contract
Status: 429 Too Many Requests
Headers: RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, Retry-After
Body: {"error":"rate_limit_exceeded","message":"Write endpoint limit exceeded. Retry after 18 seconds.","upgrade_url":"..."}
Rollout
Run shadow mode for 14 days, email developers who would exceed limits, publish docs, then enforce free-tier write endpoints first.
Tips for Best Results
- 💡Provide real traffic patterns if you have them; rate limit design is much better with burst and endpoint data.
- 💡Ask for both the policy and the developer-facing error contract so the system is usable, not just protective.
- 💡Include business tiers early because pricing and abuse controls often shape the technical design.
- 💡Request a shadow-mode rollout to avoid surprising legitimate customers.
Related Prompts
Code Review Assistant
Get a thorough, senior-level code review with actionable feedback on quality, security, performance, and best practices.
Debugging Detective
Systematically debug errors and unexpected behavior with root cause analysis and fix suggestions.
Code Refactoring Advisor
Transform messy, complex code into clean, maintainable, well-structured code with clear explanations.