Building Blocks
Load Balancers
A traffic cop that spreads requests across many servers so none gets overwhelmed — the component that turns one box into a scalable, fault-tolerant pool.
Updated 2026-06-09
Load Balancers
A load balancer sits in front of a pool of servers and distributes incoming traffic across them, so no single server crashes under load. It also detects unhealthy servers and routes around them. Almost every distributed architecture has one — NGINX, HAProxy, AWS ELB. Designing one tests your grasp of networking, high availability, and scale.
1. Requirements
A general-purpose balancer typically needs to:
- Distribute traffic across backends using configurable algorithms.
- Health-check backends and remove unhealthy ones automatically.
- Support sticky sessions (optional), SSL termination, and both Layer 4 (TCP/UDP) and Layer 7 (HTTP) balancing.
Non-functionally: 99.99% uptime with no single point of failure, sub-millisecond added latency, ~1M requests/sec at peak, horizontal scalability.
2. What the numbers tell us
At 1M RPS, ~500K concurrent connections, and ~500 bytes of state per connection:
- Network bandwidth is the real constraint — tens of Gbps means multiple NICs or multiple LB nodes. A single 10 Gbps card won't do.
- Memory is cheap — 500K × 500B ≈ 250 MB. Connection tracking is not the bottleneck (SSL session caches are the bigger consumer).
- Health checks are negligible — a few hundred probes/sec against 1M RPS.
- Horizontal scaling is mandatory — design for multiple LB nodes from the start.
3. Architecture: data plane vs control plane
┌──────────────── Control plane (slow) ────────────────┐
│ Config API → Config Manager → Config Store │
│ Health Checker │
└───────────────────┬──────────────────────────────────┘
push config / health
┌───────────────────▼──────────────── Data plane (fast) ┐
Clients ─▶ Frontend Listener ─▶ Routing Engine ─▶ Backend 1/2/3 │
└───────────────────────────────────────────────────────┘
The data plane forwards every request, so it must be fast and mostly in-memory. The control plane (config changes, health checks) runs at a slower cadence.
- Frontend listener — accepts TCP connections, parses HTTP for L7 routing, hands
off to the routing engine (using
epoll/kqueueto handle hundreds of thousands of connections). - Routing engine — the brain. Holds the backend list, each backend's health and connection counts, and the configured algorithm; picks a healthy backend per request.
- Backend pool — a logical group of interchangeable servers (you might have one pool per service).
4. Health monitoring
A background health checker probes each backend every ~5s and flips its state after a few consecutive results — never on a single failure (could be a network blip), which prevents flapping.
| Type | How | Use |
|---|---|---|
| TCP | Open/close a connection | Basic reachability — doesn't prove the app works |
| HTTP | GET /health, expect 2xx | Web apps — verifies the app can serve |
| Custom | Run a script | Complex checks (DB connectivity, queue depth) |
A typical config: check every 5s, 3s timeout, 3 consecutive fails → unhealthy, 2 passes → healthy again. When a backend fails, the engine stops sending it new traffic but lets in-flight requests finish — connection draining — before fully removing it.
5. Load balancing algorithms
The single most important decision. A good one spreads load evenly, avoids overloading any server, fits the traffic shape, and runs fast (it executes on every request).
| Algorithm | Stateful? | Adapts? | Best for |
|---|---|---|---|
| Round robin | No | No | Identical backends, uniform requests |
| Weighted round robin | No | No | Known capacity differences |
| Least connections | Per-node | Yes | Variable request costs |
| Weighted least connections | Per-node | Yes | Mixed servers in production |
| IP hash | No | No | Basic session stickiness |
| Consistent hash | No | No | Dynamic pools, caching |
- Round robin —
counter % n. Trivial and stateless, but blind: it ignores server power, request cost, and slow backends. - Weighted round robin — assign weights so a beefier box gets proportionally more
traffic. Smooth implementations interleave (
1,3,1,2,1,3) rather than bursting all of a backend's share at once. Still static — won't react if a backend slows down. - Least connections — send each request to whoever has the fewest open connections. Self-balancing: a backend stuck on slow requests accumulates connections and naturally gets fewer. Needs per-node connection counts.
- Weighted least connections — pick the lowest connections ÷ weight ratio. Combines capacity-awareness with adaptivity; a common production default.
- IP hash —
hash(client_ip) % ngives stickiness with no stored state. But NAT funnels thousands of users to one backend, distribution can be lumpy, and changing the backend count reshuffles almost everyone. - Consistent hashing — put backends and requests on a ring; a request goes to the
first backend clockwise. Adding/removing a backend remaps only ~
1/nof requests instead of most of them. Virtual nodes even out the distribution. Ideal when backends scale up/down often or when cache locality matters. (See consistent hashing.)
Recommendation: start with weighted least connections for HTTP. Use consistent hashing for caches or auto-scaling pools.
6. Session persistence (sticky sessions)
If a backend keeps session state in local memory, a user's later requests must reach the same backend.
- Cookie-based (best for HTTP) — the LB sets
SERVERID=B1; subsequent requests carry it and route straight to B1. Survives IP changes and LB restarts, needs no shared state. - Source IP — route by client IP. Works for non-HTTP, but breaks under corporate NAT and mobile IP changes.
- External session store (best long-term) — make backends stateless by storing sessions in Redis. Any backend can serve any request; no stickiness needed. Requires app changes.
Prefer stateless backends with an external store. If you must be sticky, use cookies — avoid IP-based persistence unless you have no choice.
7. Layer 4 vs Layer 7
| Layer 4 (TCP/UDP) | Layer 7 (HTTP) | |
|---|---|---|
| Sees | IPs, ports, protocol | URL, headers, cookies, body |
| Speed | Fastest (packet forwarding) | Slower (parses HTTP) |
| SSL | Pass-through only | Full termination + inspection |
| Routing | By port | By path, host, method |
| Stickiness | IP-based only | Cookie/header/URL |
Use L4 for raw speed or non-HTTP protocols (databases, game servers). Use L7 for content-based routing, SSL termination, header rewriting, or caching — most web apps. A common pattern is an L4 balancer fronting a tier of L7 balancers.
8. SSL/TLS termination
Terminate TLS at the load balancer: clients connect over HTTPS, the LB decrypts, routes on the plaintext request, and forwards plain HTTP to backends.
- Why: certificates live in one place (renew once, not per server), backends spend CPU on app logic not crypto, and the LB can read the request for L7 routing.
- Trade-off: LB↔backend traffic is unencrypted — fine inside a trusted VPC. If you need end-to-end encryption, either re-encrypt (decrypt, inspect, re-encrypt — double the crypto) or use SSL passthrough (route on the TLS SNI, no content routing).
- Best practice: TLS 1.2+, strong ciphers (ECDHE + AES-GCM), HTTP/2, OCSP stapling, automated renewal.
9. The load balancer must not be the SPOF
Run multiple LB nodes. Two patterns:
Active-Passive (failover): one node owns a Virtual IP; a standby watches via heartbeats and claims the VIP (gratuitous ARP) if the primary goes silent. Simple, no state sync — but half your capacity sits idle. Failover in ~1–3s (VRRP).
Clients ─▶ VIP 10.0.0.1 ─▶ Active LB ──▶ backends
▲ heartbeat
Standby LB (claims VIP on failure)
Active-Active: all nodes serve traffic simultaneously, via DNS round-robin, an upstream L4 balancer, or anycast (all nodes share one IP; the network routes to the nearest — how Cloudflare works). Full utilisation and near-instant failover, but more complex and needs a shared session store if sticky.
| Active-Passive | Active-Active | |
|---|---|---|
| Utilisation | 50% | 100% |
| Failover | 1–3s | Near-instant |
| Sticky sessions | Easy (one node) | Needs shared store |
| Complexity | Simpler | Higher |
For most high-traffic systems, active-active wins despite the complexity.
On connections during failover: you can't move an open TCP connection between machines — clients see resets. Don't fight it; design for stateless failover with fast failover, client retries, and an external session store.
Where does data live?
A load balancer barely needs a database. The hot path is all in-memory; only the control plane persists:
| Data | Lives in | Why |
|---|---|---|
| Active connections, health, counters | In-memory per LB node | Microsecond-fast |
| Sticky session mappings | Redis (shared) | Shared across nodes for active-active |
| Backend & pool config | etcd / Consul | Persistent, versioned, survives restarts |
The key insight: the request path should never touch disk or cross a network for a routing decision — config is loaded into memory and updated via push.
