What is a load balancer?

A load balancer sits in front of a pool of servers and distributes incoming traffic across them so no single server is overwhelmed. It also health-checks backends and routes around unhealthy ones. Examples include NGINX, HAProxy, and AWS ELB. It turns a single server into a scalable, fault-tolerant pool.

What is the difference between Layer 4 and Layer 7 load balancing?

A Layer 4 load balancer routes on IPs, ports, and protocol by forwarding packets — it is fastest and works for non-HTTP protocols, but can only do IP-based stickiness and TLS pass-through. A Layer 7 load balancer parses HTTP, so it can route by URL, host, method, headers, or cookies, terminate SSL, and inspect content; it is slower but suits most web apps. A common pattern is an L4 balancer fronting a tier of L7 balancers.

Which load balancing algorithm should I use?

For HTTP, start with weighted least connections: it sends each request to the backend with the lowest connections-to-weight ratio, combining capacity-awareness with self-balancing adaptivity. Use round robin for identical backends and uniform requests, and consistent hashing for caches or auto-scaling pools where adding or removing a backend should remap only about 1/n of requests.

How do you stop the load balancer from being a single point of failure?

Run multiple load balancer nodes. Active-passive uses a Virtual IP owned by one node with a standby that claims it on failure (simple, but half the capacity is idle). Active-active has all nodes serve traffic at once via DNS round-robin, an upstream L4 balancer, or anycast (full utilization and near-instant failover, but more complex and needing a shared session store if sticky). Active-active usually wins for high-traffic systems.

Building Blocks

Load Balancers

A traffic cop that spreads requests across many servers so none gets overwhelmed — the component that turns one box into a scalable, fault-tolerant pool.

Updated 2026-06-09

Load Balancers

A load balancer sits in front of a pool of servers and distributes incoming traffic across them, so no single server crashes under load. It also detects unhealthy servers and routes around them. Almost every distributed architecture has one — NGINX, HAProxy, AWS ELB. Designing one tests your grasp of networking, high availability, and scale.

1. Requirements

A general-purpose balancer typically needs to:

Distribute traffic across backends using configurable algorithms.
Health-check backends and remove unhealthy ones automatically.
Support sticky sessions (optional), SSL termination, and both Layer 4 (TCP/UDP) and Layer 7 (HTTP) balancing.

Non-functionally: 99.99% uptime with no single point of failure, sub-millisecond added latency, ~1M requests/sec at peak, horizontal scalability.

2. What the numbers tell us

At 1M RPS, ~500K concurrent connections, and ~500 bytes of state per connection:

Network bandwidth is the real constraint — tens of Gbps means multiple NICs or multiple LB nodes. A single 10 Gbps card won't do.
Memory is cheap — 500K × 500B ≈ 250 MB. Connection tracking is not the bottleneck (SSL session caches are the bigger consumer).
Health checks are negligible — a few hundred probes/sec against 1M RPS.
Horizontal scaling is mandatory — design for multiple LB nodes from the start.

3. Architecture: data plane vs control plane

        ┌──────────────── Control plane (slow) ─────────────────┐
        │   Config API ─▶ Config Manager ─▶ Config Store        │
        │   Health Checker                                      │
        └───────────────────┬───────────────────────────────────┘
                  push config / health
        ┌───────────────────▾──────────────── Data plane (fast) ┐
Clients ─▶ Frontend Listener ─▶ Routing Engine ─▶ Backend 1/2/3 │
        └───────────────────────────────────────────────────────┘

The data plane forwards every request, so it must be fast and mostly in-memory. The control plane (config changes, health checks) runs at a slower cadence.

Frontend listener — accepts TCP connections, parses HTTP for L7 routing, hands off to the routing engine (using epoll/kqueue to handle hundreds of thousands of connections).
Routing engine — the brain. Holds the backend list, each backend's health and connection counts, and the configured algorithm; picks a healthy backend per request.
Backend pool — a logical group of interchangeable servers (you might have one pool per service).

4. Health monitoring

A background health checker probes each backend every ~5s and flips its state after a few consecutive results — never on a single failure (could be a network blip), which prevents flapping.

Type	How	Use
TCP	Open/close a connection	Basic reachability — doesn't prove the app works
HTTP	`GET /health`, expect 2xx	Web apps — verifies the app can serve
Custom	Run a script	Complex checks (DB connectivity, queue depth)

A typical config: check every 5s, 3s timeout, 3 consecutive fails → unhealthy, 2 passes → healthy again. When a backend fails, the engine stops sending it new traffic but lets in-flight requests finish — connection draining — before fully removing it.

5. Load balancing algorithms

The single most important decision. A good one spreads load evenly, avoids overloading any server, fits the traffic shape, and runs fast (it executes on every request).

Algorithm	Stateful?	Adapts?	Best for
Round robin	No	No	Identical backends, uniform requests
Weighted round robin	No	No	Known capacity differences
Least connections	Per-node	Yes	Variable request costs
Weighted least connections	Per-node	Yes	Mixed servers in production
IP hash	No	No	Basic session stickiness
Consistent hash	No	No	Dynamic pools, caching

Round robin — counter % n. Trivial and stateless, but blind: it ignores server power, request cost, and slow backends.
Weighted round robin — assign weights so a beefier box gets proportionally more traffic. Smooth implementations interleave (1,3,1,2,1,3) rather than bursting all of a backend's share at once. Still static — won't react if a backend slows down.
Least connections — send each request to whoever has the fewest open connections. Self-balancing: a backend stuck on slow requests accumulates connections and naturally gets fewer. Needs per-node connection counts.
Weighted least connections — pick the lowest connections ÷ weight ratio. Combines capacity-awareness with adaptivity; a common production default.
IP hash — hash(client_ip) % n gives stickiness with no stored state. But NAT funnels thousands of users to one backend, distribution can be lumpy, and changing the backend count reshuffles almost everyone.
Consistent hashing — put backends and requests on a ring; a request goes to the first backend clockwise. Adding/removing a backend remaps only ~1/n of requests instead of most of them. Virtual nodes even out the distribution. Ideal when backends scale up/down often or when cache locality matters. (See consistent hashing.)

Recommendation: start with weighted least connections for HTTP. Use consistent hashing for caches or auto-scaling pools.

6. Session persistence (sticky sessions)

If a backend keeps session state in local memory, a user's later requests must reach the same backend.

Cookie-based (best for HTTP) — the LB sets SERVERID=B1; subsequent requests carry it and route straight to B1. Survives IP changes and LB restarts, needs no shared state.
Source IP — route by client IP. Works for non-HTTP, but breaks under corporate NAT and mobile IP changes.
External session store (best long-term) — make backends stateless by storing sessions in Redis. Any backend can serve any request; no stickiness needed. Requires app changes.

Prefer stateless backends with an external store. If you must be sticky, use cookies — avoid IP-based persistence unless you have no choice.

7. Layer 4 vs Layer 7

	Layer 4 (TCP/UDP)	Layer 7 (HTTP)
Sees	IPs, ports, protocol	URL, headers, cookies, body
Speed	Fastest (packet forwarding)	Slower (parses HTTP)
SSL	Pass-through only	Full termination + inspection
Routing	By port	By path, host, method
Stickiness	IP-based only	Cookie/header/URL

Use L4 for raw speed or non-HTTP protocols (databases, game servers). Use L7 for content-based routing, SSL termination, header rewriting, or caching — most web apps. A common pattern is an L4 balancer fronting a tier of L7 balancers.

8. SSL/TLS termination

Terminate TLS at the load balancer: clients connect over HTTPS, the LB decrypts, routes on the plaintext request, and forwards plain HTTP to backends.

Why: certificates live in one place (renew once, not per server), backends spend CPU on app logic not crypto, and the LB can read the request for L7 routing.
Trade-off: LB↔backend traffic is unencrypted — fine inside a trusted VPC. If you need end-to-end encryption, either re-encrypt (decrypt, inspect, re-encrypt — double the crypto) or use SSL passthrough (route on the TLS SNI, no content routing).
Best practice: TLS 1.2+, strong ciphers (ECDHE + AES-GCM), HTTP/2, OCSP stapling, automated renewal.

9. The load balancer must not be the SPOF

Run multiple LB nodes. Two patterns:

Active-Passive (failover): one node owns a Virtual IP; a standby watches via heartbeats and claims the VIP (gratuitous ARP) if the primary goes silent. Simple, no state sync — but half your capacity sits idle. Failover in ~1–3s (VRRP).

Clients ─▶ VIP 10.0.0.1 ─▶ Active LB ──▶ backends
                              ▴ heartbeat
                           Standby LB   (claims VIP on failure)

Active-Active: all nodes serve traffic simultaneously, via DNS round-robin, an upstream L4 balancer, or anycast (all nodes share one IP; the network routes to the nearest — how Cloudflare works). Full utilisation and near-instant failover, but more complex and needs a shared session store if sticky.

	Active-Passive	Active-Active
Utilisation	50%	100%
Failover	1–3s	Near-instant
Sticky sessions	Easy (one node)	Needs shared store
Complexity	Simpler	Higher

For most high-traffic systems, active-active wins despite the complexity.

On connections during failover: you can't move an open TCP connection between machines — clients see resets. Don't fight it; design for stateless failover with fast failover, client retries, and an external session store.

Where does data live?

A load balancer barely needs a database. The hot path is all in-memory; only the control plane persists:

Data	Lives in	Why
Active connections, health, counters	In-memory per LB node	Microsecond-fast
Sticky session mappings	Redis (shared)	Shared across nodes for active-active
Backend & pool config	etcd / Consul	Persistent, versioned, survives restarts

The key insight: the request path should never touch disk or cross a network for a routing decision — config is loaded into memory and updated via push.

Building Blocks

Load Balancers

A traffic cop that spreads requests across many servers so none gets overwhelmed — the component that turns one box into a scalable, fault-tolerant pool.

Updated 2026-06-09

Load Balancers

1. Requirements

A general-purpose balancer typically needs to:

Distribute traffic across backends using configurable algorithms.
Health-check backends and remove unhealthy ones automatically.
Support sticky sessions (optional), SSL termination, and both Layer 4 (TCP/UDP) and Layer 7 (HTTP) balancing.

Non-functionally: 99.99% uptime with no single point of failure, sub-millisecond added latency, ~1M requests/sec at peak, horizontal scalability.

2. What the numbers tell us

At 1M RPS, ~500K concurrent connections, and ~500 bytes of state per connection:

Network bandwidth is the real constraint — tens of Gbps means multiple NICs or multiple LB nodes. A single 10 Gbps card won't do.
Memory is cheap — 500K × 500B ≈ 250 MB. Connection tracking is not the bottleneck (SSL session caches are the bigger consumer).
Health checks are negligible — a few hundred probes/sec against 1M RPS.
Horizontal scaling is mandatory — design for multiple LB nodes from the start.

3. Architecture: data plane vs control plane

        ┌──────────────── Control plane (slow) ─────────────────┐
        │   Config API ─▶ Config Manager ─▶ Config Store        │
        │   Health Checker                                      │
        └───────────────────┬───────────────────────────────────┘
                  push config / health
        ┌───────────────────▾──────────────── Data plane (fast) ┐
Clients ─▶ Frontend Listener ─▶ Routing Engine ─▶ Backend 1/2/3 │
        └───────────────────────────────────────────────────────┘

The data plane forwards every request, so it must be fast and mostly in-memory. The control plane (config changes, health checks) runs at a slower cadence.

Frontend listener — accepts TCP connections, parses HTTP for L7 routing, hands off to the routing engine (using epoll/kqueue to handle hundreds of thousands of connections).
Routing engine — the brain. Holds the backend list, each backend's health and connection counts, and the configured algorithm; picks a healthy backend per request.
Backend pool — a logical group of interchangeable servers (you might have one pool per service).

4. Health monitoring

Type	How	Use
TCP	Open/close a connection	Basic reachability — doesn't prove the app works
HTTP	`GET /health`, expect 2xx	Web apps — verifies the app can serve
Custom	Run a script	Complex checks (DB connectivity, queue depth)

5. Load balancing algorithms

The single most important decision. A good one spreads load evenly, avoids overloading any server, fits the traffic shape, and runs fast (it executes on every request).

Algorithm	Stateful?	Adapts?	Best for
Round robin	No	No	Identical backends, uniform requests
Weighted round robin	No	No	Known capacity differences
Least connections	Per-node	Yes	Variable request costs
Weighted least connections	Per-node	Yes	Mixed servers in production
IP hash	No	No	Basic session stickiness
Consistent hash	No	No	Dynamic pools, caching

Round robin — counter % n. Trivial and stateless, but blind: it ignores server power, request cost, and slow backends.
Weighted round robin — assign weights so a beefier box gets proportionally more traffic. Smooth implementations interleave (1,3,1,2,1,3) rather than bursting all of a backend's share at once. Still static — won't react if a backend slows down.
Least connections — send each request to whoever has the fewest open connections. Self-balancing: a backend stuck on slow requests accumulates connections and naturally gets fewer. Needs per-node connection counts.
Weighted least connections — pick the lowest connections ÷ weight ratio. Combines capacity-awareness with adaptivity; a common production default.
IP hash — hash(client_ip) % n gives stickiness with no stored state. But NAT funnels thousands of users to one backend, distribution can be lumpy, and changing the backend count reshuffles almost everyone.
Consistent hashing — put backends and requests on a ring; a request goes to the first backend clockwise. Adding/removing a backend remaps only ~1/n of requests instead of most of them. Virtual nodes even out the distribution. Ideal when backends scale up/down often or when cache locality matters. (See consistent hashing.)

Recommendation: start with weighted least connections for HTTP. Use consistent hashing for caches or auto-scaling pools.

6. Session persistence (sticky sessions)

If a backend keeps session state in local memory, a user's later requests must reach the same backend.

Cookie-based (best for HTTP) — the LB sets SERVERID=B1; subsequent requests carry it and route straight to B1. Survives IP changes and LB restarts, needs no shared state.
Source IP — route by client IP. Works for non-HTTP, but breaks under corporate NAT and mobile IP changes.
External session store (best long-term) — make backends stateless by storing sessions in Redis. Any backend can serve any request; no stickiness needed. Requires app changes.

Prefer stateless backends with an external store. If you must be sticky, use cookies — avoid IP-based persistence unless you have no choice.

7. Layer 4 vs Layer 7

	Layer 4 (TCP/UDP)	Layer 7 (HTTP)
Sees	IPs, ports, protocol	URL, headers, cookies, body
Speed	Fastest (packet forwarding)	Slower (parses HTTP)
SSL	Pass-through only	Full termination + inspection
Routing	By port	By path, host, method
Stickiness	IP-based only	Cookie/header/URL

8. SSL/TLS termination

Terminate TLS at the load balancer: clients connect over HTTPS, the LB decrypts, routes on the plaintext request, and forwards plain HTTP to backends.

Why: certificates live in one place (renew once, not per server), backends spend CPU on app logic not crypto, and the LB can read the request for L7 routing.
Trade-off: LB↔backend traffic is unencrypted — fine inside a trusted VPC. If you need end-to-end encryption, either re-encrypt (decrypt, inspect, re-encrypt — double the crypto) or use SSL passthrough (route on the TLS SNI, no content routing).
Best practice: TLS 1.2+, strong ciphers (ECDHE + AES-GCM), HTTP/2, OCSP stapling, automated renewal.

9. The load balancer must not be the SPOF

Run multiple LB nodes. Two patterns:

Clients ─▶ VIP 10.0.0.1 ─▶ Active LB ──▶ backends
                              ▴ heartbeat
                           Standby LB   (claims VIP on failure)

	Active-Passive	Active-Active
Utilisation	50%	100%
Failover	1–3s	Near-instant
Sticky sessions	Easy (one node)	Needs shared store
Complexity	Simpler	Higher

For most high-traffic systems, active-active wins despite the complexity.

Where does data live?

A load balancer barely needs a database. The hot path is all in-memory; only the control plane persists:

Data	Lives in	Why
Active connections, health, counters	In-memory per LB node	Microsecond-fast
Sticky session mappings	Redis (shared)	Shared across nodes for active-active
Backend & pool config	etcd / Consul	Persistent, versioned, survives restarts

The key insight: the request path should never touch disk or cross a network for a routing decision — config is loaded into memory and updated via push.

Load Balancers

Load Balancers

1. Requirements

2. What the numbers tell us

3. Architecture: data plane vs control plane

4. Health monitoring

5. Load balancing algorithms

6. Session persistence (sticky sessions)

7. Layer 4 vs Layer 7

8. SSL/TLS termination

9. The load balancer must not be the SPOF

Where does data live?

Practice this

Load Balancers

Load Balancers

1. Requirements

2. What the numbers tell us

3. Architecture: data plane vs control plane

4. Health monitoring

5. Load balancing algorithms

6. Session persistence (sticky sessions)

7. Layer 4 vs Layer 7

8. SSL/TLS termination

9. The load balancer must not be the SPOF

Where does data live?

Practice this