API Rate Limit and Quota Design Guide

Design a developer-friendly API rate limiting and quota system with algorithms, headers, errors, storage, and rollout strategy.

Prompt Template

Act as a senior backend architect designing rate limits for a public API. Create an API rate limit and quota design guide for [product/API] used by [developer audience].

API context: [REST/GraphQL/webhooks/streaming/internal API]
Traffic profile: [requests per second, burst patterns, top endpoints, tenant sizes]
Business tiers: [free, pro, enterprise, partner, internal]
Fairness goals: [protect infrastructure, prevent abuse, monetize usage, guarantee enterprise capacity]
Current stack: [language/framework, gateway, cache, database, queue, observability]
Failure tolerance: [strict enforcement vs graceful degradation]

Deliver:
1. **Rate limit policy matrix** by tier, endpoint class, authentication state, and time window
2. **Algorithm recommendation** — token bucket, leaky bucket, fixed window, sliding window, or hybrid, with tradeoffs
3. **Quota model** — monthly usage quotas, burst allowances, overage behavior, and upgrade paths
4. **Response contract** — 429 body, retry guidance, headers, idempotency notes, and SDK behavior
5. **Storage and scaling design** — cache keys, distributed counters, race conditions, and fallback mode
6. **Abuse and exception handling** — suspicious patterns, allowlists, partner overrides, and admin tooling
7. **Observability plan** — metrics, alerts, dashboards, and customer-facing usage reporting
8. **Rollout plan** — shadow mode, communication, migration timeline, and rollback steps

Include concrete examples for [2-3 critical endpoints] and call out edge cases developers commonly miss.

Example Output

API Rate Limit Design — Payments API

Policy Matrix

| Tier | Default | Write endpoints | Report exports | Burst |

|---|---:|---:|---:|---:|

| Free | 60 req/min | 20 req/min | 5/hour | 2x for 30 sec |

| Pro | 600 req/min | 120 req/min | 30/hour | 2x for 60 sec |

| Enterprise | contract-based | contract-based | contract-based | custom |

Recommended Algorithm

Use token bucket for per-minute limits because customers need short bursts during sync jobs. Add a monthly quota counter for billing and abuse management.

429 Response Contract

Status: 429 Too Many Requests

Headers: RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, Retry-After

Body: {"error":"rate_limit_exceeded","message":"Write endpoint limit exceeded. Retry after 18 seconds.","upgrade_url":"..."}

Rollout

Run shadow mode for 14 days, email developers who would exceed limits, publish docs, then enforce free-tier write endpoints first.

Tips for Best Results

  • 💡Provide real traffic patterns if you have them; rate limit design is much better with burst and endpoint data.
  • 💡Ask for both the policy and the developer-facing error contract so the system is usable, not just protective.
  • 💡Include business tiers early because pricing and abuse controls often shape the technical design.
  • 💡Request a shadow-mode rollout to avoid surprising legitimate customers.