Design a Distributed Key-Value Store (System Design Interview)
The Interview Question
Design a distributed key-value store that supports get(key) and put(key, value) with high availability and partition tolerance.
Expert Answer
This question is essentially asking you to design a simplified DynamoDB or Cassandra. Start with the CAP theorem tradeoff — you need to choose between consistency and availability during network partitions. Most modern systems choose AP (available + partition-tolerant) with eventual consistency. The core components: data partitioning uses consistent hashing with virtual nodes to distribute keys across servers evenly and minimize redistribution when nodes join or leave. Replication copies each key to N nodes (typically 3) walking clockwise on the hash ring. For writes, use a quorum-based approach: W replicas must acknowledge a write before returning success. For reads, R replicas are queried and the latest version wins (using vector clocks or timestamps for conflict resolution). The formula: if W + R > N, you get strong consistency; if W + R <= N, you get better availability with eventual consistency. Handle node failures with hinted handoff (temporarily store writes for a downed node on its neighbor) and anti-entropy using Merkle trees (detect and repair inconsistencies between replicas in the background).
Key Points to Hit in Your Answer
- CAP theorem: choose AP (eventual consistency) or CP (strong consistency) during partitions
- Consistent hashing with virtual nodes for even distribution
- Replication factor N, write quorum W, read quorum R — W+R>N means strong consistency
- Vector clocks or timestamps for conflict detection/resolution
- Hinted handoff for temporary node failures
- Merkle trees for anti-entropy and replica synchronization
- Gossip protocol for cluster membership and failure detection
Code Example
// Consistent hashing pseudocode
class ConsistentHash {
constructor(nodes, virtualNodesPerNode = 150) {
this.ring = new SortedMap();
for (const node of nodes) {
for (let i = 0; i < virtualNodesPerNode; i++) {
const hash = md5(`${node}:${i}`);
this.ring.set(hash, node);
}
}
}
getNode(key) {
const hash = md5(key);
// Find first node clockwise from key's position
const entry = this.ring.ceiling(hash) || this.ring.first();
return entry.value;
}
getReplicaNodes(key, n = 3) {
// Walk clockwise, collect N distinct physical nodes
// Skip virtual nodes that map to same physical node
}
}
What Interviewers Are Really Looking For
This is an advanced question that tests distributed systems knowledge. The interviewer wants you to discuss tradeoffs explicitly: consistency vs. availability, latency vs. durability. Mention the quorum formula (W+R>N). Explaining vector clocks or hinted handoff demonstrates deep knowledge. Structure matters — start with single-node, then distribute.
Practice This Question with AI Grading
Reading about interview questions is a start — but practicing with real-time AI feedback is how you actually get better. Goliath Prep grades your answers instantly and tells you exactly what you're missing.
Start Practicing Free →