Design a rate limiter that can be used as middleware for an API. It should support different rate limits per user and per endpoint.

Question

Accepted Answer

A rate limiter controls how many requests a client can make within a time window. The most common algorithms are token bucket (tokens refill at a fixed rate, each request consumes one), sliding window log (store timestamps of each request, count within window), and sliding window counter (hybrid that approximates with less memory). Token bucket is the most widely used because it naturally handles burst traffic — the bucket can accumulate tokens during quiet periods, allowing short bursts. For a distributed system, you need shared state — Redis is the standard choice. Store the token count and last refill timestamp per user/endpoint key. Use Redis MULTI/EXEC or Lua scripts for atomic operations to prevent race conditions. Return 429 Too Many Requests when the limit is exceeded, and include X-RateLimit-Remaining and Retry-After headers so clients can back off gracefully.

Design a Rate Limiter (System Design Interview)

The Interview Question

Expert Answer

Key Points to Hit in Your Answer

Code Example

What Interviewers Are Really Looking For

Practice This Question with AI Grading

The Interview Question

Expert Answer

Key Points to Hit in Your Answer

Code Example

What Interviewers Are Really Looking For

Practice This Question with AI Grading

Related Interview Questions

Design a URL Shortener (System Design Interview)

Design a Distributed Key-Value Store (System Design Interview)

Design a Notification System (System Design Interview)