System Design Series

Rate Limiter Simulator

Watch API requests get accepted or rejected in real time. Experiment with Token Bucket, Fixed Window, and Sliding Window algorithms.

Rate Limiter Simulator

Token Bucket:Tokens refill at a steady rate (1 token every 800ms). Each request consumes one token. Bucket capacity: 8. Requests are rejected when the bucket is empty — but short bursts are allowed.
Interactive rate limiter simulator. Visualize Token Bucket, Fixed Window, and Sliding Window algorithms. Watch API requests get accepted or rejected in real time.
Accepted:0
Rejected:0
API Clients
Rate Limiter
Tokens8 / 8
API Server

Visual Guide

  • White packet: Request in transit to the rate limiter.
  • Green packet: Allowed — forwarded to the API server.
  • Red packet: Rate limited — dropped with HTTP 429.
  • Amber dots: Remaining tokens in the bucket.

How to use

1. Click Start Traffic to begin sending API requests.
2. Switch Algorithms to compare Token Bucket vs Window strategies.
3. Increase Traffic Intensity to Burst to trigger rejections.
4. Watch the Accept rate drop — and the tokens empty out.

Quick Guide: Rate Limiting

Understanding the basics in 30 seconds

How It Works

  • Client sends a request to the API
  • Rate limiter checks the current token count (or window counter)
  • If within the limit: request is forwarded to the server
  • If over the limit: HTTP 429 is returned immediately
  • Tokens refill (or window resets) over time, restoring capacity

Key Benefits

  • Protects APIs from DDoS and abusive clients
  • Ensures fair usage across all consumers
  • Prevents a single client from starving others
  • Reduces infrastructure costs during traffic spikes
  • Enables tiered pricing (free vs paid API limits)

Real-World Uses

  • Stripe: 100 req/s per API key (Token Bucket)
  • GitHub API: 5,000 req/hour for authenticated users
  • Twitter/X API: Rate limits per endpoint per 15-min window
  • AWS API Gateway: Configurable per-route throttling
  • NGINX: limit_req_zone directive for web server throttling

Rate Limiting in Production

How real-world APIs protect themselves from abuse, traffic spikes, and runaway clients.

Token Bucket

Tokens refill at a fixed rate and are consumed one per request. An empty bucket means the request is rejected — but bursts up to the full bucket size are always allowed.

  • Smooth, predictable refill rate
  • Naturally absorbs short bursts
  • Used by: Stripe API, AWS API Gateway

Fixed & Sliding Window

Count requests inside a time window. Fixed windows reset sharply — a client can fire double the limit right at the boundary. Sliding windows prevent this with a rolling counter.

  • Fixed: Simple, low memory — boundary-burst risk
  • Sliding: Fairer, prevents edge spikes
  • Used by: GitHub API, Twitter/X API

HTTP 429 — Too Many Requests

When a rate limiter rejects a request, the server responds with HTTP 429 and a Retry-After header. Well-behaved clients implement exponential backoff — doubling the wait time between retries — to avoid a thundering herd where all rejected clients retry simultaneously and immediately.

Rate Limiting Algorithms Explained

Token Bucket

The most widely used algorithm. A bucket holds up to N tokens. One token is consumed per request. Tokens refill at a fixed rate. If the bucket is empty, the request is rejected — but short bursts are naturally absorbed as long as tokens are available.

Example: Stripe API

  • Bucket size: 100 tokens
  • Refill rate: 100 tokens / second
  • A client can burst 100 requests instantly, then sustains 100 req/s

Fixed Window vs Sliding Window

Both count requests inside a time window, but differ at the boundaries:

Fixed Window

Counter resets sharply every N seconds. A client can fire 2× the limit by sending at the end of one window and the start of the next.

Sliding Window

Uses a rolling counter weighted across the current and previous window. Eliminates boundary bursts. Fairer but slightly more memory-intensive.

HTTP 429 & Retry-After

When a request is rate limited, the correct HTTP response is 429 Too Many Requests. The server should include a Retry-After header telling the client when to try again.

  • Retry-After: 30 — wait 30 seconds before retrying
  • X-RateLimit-Limit: total allowed requests
  • X-RateLimit-Remaining: requests left in current window
  • X-RateLimit-Reset: Unix timestamp when the window resets

💡 Pro Tip: Clients should implement exponential backoff — doubling the wait time on each retry — to avoid a thundering herd effect when limits reset.