Rate Limiter Simulator

Token Bucket:Tokens refill at a steady rate (1 token every 800ms). Each request consumes one token. Bucket capacity: 8. Requests are rejected when the bucket is empty — but short bursts are allowed.

Status

Algorithm

Traffic Intensity

Actions

Accepted:0

Rejected:0

API Clients

Rate Limiter

Tokens8 / 8

API Server

Visual Guide

White packet: Request in transit to the rate limiter.
Green packet: Allowed — forwarded to the API server.
Red packet: Rate limited — dropped with HTTP 429.
Amber dots: Remaining tokens in the bucket.

How to use

1. Click Start Traffic to begin sending API requests.
2. Switch Algorithms to compare Token Bucket vs Window strategies.
3. Increase Traffic Intensity to Burst to trigger rejections.
4. Watch the Accept rate drop — and the tokens empty out.

Rate Limiting in Production

How real-world APIs protect themselves from abuse, traffic spikes, and runaway clients.

Token Bucket

Tokens refill at a fixed rate and are consumed one per request. An empty bucket means the request is rejected — but bursts up to the full bucket size are always allowed.

Smooth, predictable refill rate
Naturally absorbs short bursts
Used by: Stripe API, AWS API Gateway

Fixed & Sliding Window

Count requests inside a time window. Fixed windows reset sharply — a client can fire double the limit right at the boundary. Sliding windows prevent this with a rolling counter.

Fixed: Simple, low memory — boundary-burst risk
Sliding: Fairer, prevents edge spikes
Used by: GitHub API, Twitter/X API

HTTP 429 — Too Many Requests

When a rate limiter rejects a request, the server responds with HTTP 429 and a Retry-After header. Well-behaved clients implement exponential backoff — doubling the wait time between retries — to avoid a thundering herd where all rejected clients retry simultaneously and immediately.

Rate Limiting Algorithms Explained

Token Bucket

The most widely used algorithm. A bucket holds up to N tokens. One token is consumed per request. Tokens refill at a fixed rate. If the bucket is empty, the request is rejected — but short bursts are naturally absorbed as long as tokens are available.

Example: Stripe API

Bucket size: 100 tokens
Refill rate: 100 tokens / second
A client can burst 100 requests instantly, then sustains 100 req/s

Fixed Window vs Sliding Window

Both count requests inside a time window, but differ at the boundaries:

Fixed Window

Counter resets sharply every N seconds. A client can fire 2× the limit by sending at the end of one window and the start of the next.

Sliding Window

Uses a rolling counter weighted across the current and previous window. Eliminates boundary bursts. Fairer but slightly more memory-intensive.

HTTP 429 & Retry-After

When a request is rate limited, the correct HTTP response is 429 Too Many Requests. The server should include a Retry-After header telling the client when to try again.

Retry-After: 30 — wait 30 seconds before retrying
X-RateLimit-Limit: total allowed requests
X-RateLimit-Remaining: requests left in current window
X-RateLimit-Reset: Unix timestamp when the window resets

💡 Pro Tip: Clients should implement exponential backoff — doubling the wait time on each retry — to avoid a thundering herd effect when limits reset.