L = λ × W
The Most Useful Equation in Capacity Planning
For any stable queue, regardless of its internal structure: L = λ × W, where:
- L = average number of items in the system (in progress or waiting)
- λ (lambda) = average arrival rate (items per unit time)
- W = average time each item spends in the system
The geometric reading: plot λ on one axis & W on the other. The product L is the area of the rectangle they form. Capacity planning lives inside this rectangle.
Why it matters: any two of the three quantities determine the third. Measure throughput & latency, you know occupancy. Measure occupancy & throughput, you know latency. The law is robust: it applies to web requests, restaurant tables, supermarket queues, & CPU pipelines without modification.
Three concrete examples:
- A web service handles 200 req/s with average latency 50 ms (0.05 s). L = 200 × 0.05 = 10 in flight.
- A coffee shop serves 60 customers/hour with average dwell time 15 minutes (0.25 h). L = 60 × 0.25 = 15 customers inside.
- A backend pool handles 1500 req/s with average latency 200 ms (0.2 s). L = 1500 × 0.2 = 300 in flight.
Sizing implication: the worker count / thread count / connection count of a tier must be at least L to keep up. Anything less means queue growth.
Why Latency Explodes Past 80% Utilization
The Most Important Curve in Operations
Plot utilization on the x-axis (0% to 100%) & average wait time on the y-axis. The shape is one of the most consequential curves in capacity planning.
The M/M/1 queueing model: for a system with Poisson arrivals (random) & exponential service times (random), average waiting time:
W_q = ρ / (μ × (1 - ρ))
where ρ is utilization (0 to 1) & μ is service rate.
The curve's shape:
- At ρ = 0.5 (50% util), wait time is small (1 service time).
- At ρ = 0.7 (70% util), wait time is ~2.3 service times.
- At ρ = 0.8 (80% util), wait time is ~4 service times.
- At ρ = 0.9 (90% util), wait time is ~9 service times.
- At ρ = 0.95 (95% util), wait time is ~19 service times.
- At ρ = 1.0 (100% util), wait time is infinite.
The knee: around 80% utilization, the curve bends sharply. Below the knee, capacity is comfortable; above, latency climbs faster than utilization does.
Practical reading: target 70% utilization for steady-state, never 100%. The 30% 'headroom' is not waste; it is the price of bounded latency.
Sizing Across the Knee
Two scenarios:
Scenario A: 10 replicas running at 60% CPU. Latency p99 = 100 ms.
Scenario B: same fleet running at 90% CPU due to traffic growth. p99 = 600 ms.
Same fleet, same code, only utilization changed.
Size & Trigger Together
Synthesis
You can now apply Little's Law as a rectangle, read the queueing curve & its knee, & connect both to capacity decisions.
Apply both.
A backend tier handles 2,000 req/s with average latency 50 ms per replica capacity 80 req/s at 70% CPU. Surge factor 2x; you want to survive 3 simultaneous replica failures.
Companion Notes
Companion Notes
This geometry-of lesson recasts the Stateless Horizontal Scaling main lesson as quantitative geometry.
The next companion, geometry_of_ingress_egress_separation, recasts the network-boundary split as a bipartite graph with a cut vertex that the split removes.
Well done.