Design a Rate Limiter (System Design Interview)
The Interview Question
Design a rate limiter that can be used as middleware for an API. It should support different rate limits per user and per endpoint.
Expert Answer
A rate limiter controls how many requests a client can make within a time window. The most common algorithms are token bucket (tokens refill at a fixed rate, each request consumes one), sliding window log (store timestamps of each request, count within window), and sliding window counter (hybrid that approximates with less memory). Token bucket is the most widely used because it naturally handles burst traffic — the bucket can accumulate tokens during quiet periods, allowing short bursts. For a distributed system, you need shared state — Redis is the standard choice. Store the token count and last refill timestamp per user/endpoint key. Use Redis MULTI/EXEC or Lua scripts for atomic operations to prevent race conditions. Return 429 Too Many Requests when the limit is exceeded, and include X-RateLimit-Remaining and Retry-After headers so clients can back off gracefully.
Key Points to Hit in Your Answer
- Token bucket is the industry standard (used by Stripe, AWS, Cloudflare)
- Redis for distributed rate limiting with atomic Lua scripts
- Include headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
- Race conditions are the main challenge in distributed environments
- Sliding window counter balances accuracy with memory efficiency
- Consider rate limiting by: user ID, IP address, API key, or endpoint
Code Example
-- Redis Lua script for token bucket (atomic)
local key = KEYS[1]
local max_tokens = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or max_tokens
local last_refill = tonumber(data[2]) or now
-- Refill tokens
local elapsed = now - last_refill
tokens = math.min(max_tokens, tokens + elapsed * refill_rate)
if tokens >= 1 then
tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, max_tokens / refill_rate * 2)
return 1 -- allowed
else
return 0 -- rate limited
end
What Interviewers Are Really Looking For
Compare at least two algorithms and explain the tradeoff. Token bucket for burstiness, sliding window for strict limits. The Redis race condition discussion is critical — if you mention using Lua scripts or MULTI/EXEC for atomicity, that's a strong signal. Bonus: discuss where the rate limiter sits (API gateway vs. application middleware).
Practice This Question with AI Grading
Reading about interview questions is a start — but practicing with real-time AI feedback is how you actually get better. Goliath Prep grades your answers instantly and tells you exactly what you're missing.
Start Practicing Free →