Distributed Systems
Consistent Hashing
A way to spread keys across servers so that adding or removing one moves only a small fraction of them — not all of them.
Updated 2026-06-09
Consistent Hashing
When you split data across N servers, the obvious scheme is server = hash(key) % N.
It's simple, it balances well — and it falls apart the moment N changes.
The problem with modulo hashing
Say you have 4 cache servers and use hash(key) % 4. Add a fifth server and the
formula becomes % 5. Almost every key now maps to a different server:
| key hash | % 4 | % 5 |
|---|---|---|
| 10 | 2 | 0 |
| 11 | 3 | 1 |
| 12 | 0 | 2 |
For a cache, that means a near-total miss storm — every client suddenly looks in the wrong place, all the load slams the origin at once, and you can take down the very database the cache was protecting. Removing a node is just as bad.
Goal: when the number of servers changes, only the keys that have to move should move — ideally about
k/nof them, wherekis the number of keys andnthe number of servers.
The ring
Consistent hashing puts both keys and servers on the same circular hash space
(imagine 0 … 2³²-1 wrapped into a ring):
- Hash each server to a point on the ring.
- Hash each key to a point on the ring.
- A key belongs to the first server you reach going clockwise from the key's position.
[S1]
k3 k1
┌───────────────┐
[S3] ring [S2]
└───────────────┘
k2
k1 → S2 k2 → S3 k3 → S1 (first server clockwise)
Why it survives changes
- Add a server: drop its point on the ring. It only steals the keys sitting between it and the previous server, clockwise. Every other key stays put.
- Remove a server (or it crashes): its keys fall to the next server clockwise. Nobody else is touched.
Either way, on average only k/n keys move — not all of them.
Virtual nodes (the practical fix)
Placing each server at a single point makes the ring lumpy: by chance one server can own a huge arc and get overloaded, and when a node dies all its load lands on one neighbor. The fix is virtual nodes — hash each physical server to many points on the ring (e.g. 100–200 each):
- Load spreads more evenly across physical servers.
- When a node leaves, its share is redistributed across many others, not dumped on one.
- You can give a more powerful machine more virtual nodes to send it proportionally more traffic.
Where it shows up
Consistent hashing is the backbone of partitioning in Dynamo-style and wide-column stores (DynamoDB, Cassandra, Riak), distributed caches (memcached client rings), CDNs, and sharded load balancers — anywhere nodes come and go and you can't afford to reshuffle everything when they do.
