un — Geometry of Stateless Horizontal Scaling

un

게스트

1 / ?

수업 목록으로

L = λ × W

The Most Useful Equation in Capacity Planning

For any stable queue, regardless of its internal structure: L = λ × W, where:

- L = average number of items in the system (in progress or waiting)

- λ (lambda) = average arrival rate (items per unit time)

- W = average time each item spends in the system

The geometric reading: plot λ on one axis & W on the other. The product L is the area of the rectangle they form. Capacity planning lives inside this rectangle.

Why it matters: any two of the three quantities determine the third. Measure throughput & latency, you know occupancy. Measure occupancy & throughput, you know latency. The law is robust: it applies to web requests, restaurant tables, supermarket queues, & CPU pipelines without modification.

Three concrete examples:

- A web service handles 200 req/s with average latency 50 ms (0.05 s). L = 200 × 0.05 = 10 in flight.

- A coffee shop serves 60 customers/hour with average dwell time 15 minutes (0.25 h). L = 60 × 0.25 = 15 customers inside.

- A backend pool handles 1500 req/s with average latency 200 ms (0.2 s). L = 1500 × 0.2 = 300 in flight.

Sizing implication: the worker count / thread count / connection count of a tier must be at least L to keep up. Anything less means queue growth.

Little's Law as area: λ on x, W on y, L = area

Your API tier handles 1,200 req/s with average latency 80 ms. Apply Little's Law to compute L. Then explain what changes (a) if traffic doubles to 2,400 req/s with latency unchanged, & (b) if traffic stays at 1,200 but latency rises to 160 ms. Which scenario produces a larger L, & what does that mean operationally?

Why Latency Explodes Past 80% Utilization

The Most Important Curve in Operations

Plot utilization on the x-axis (0% to 100%) & average wait time on the y-axis. The shape is one of the most consequential curves in capacity planning.

The M/M/1 queueing model: for a system with Poisson arrivals (random) & exponential service times (random), average waiting time:

W_q = ρ / (μ × (1 - ρ))

where ρ is utilization (0 to 1) & μ is service rate.

The curve's shape:

- At ρ = 0.5 (50% util), wait time is small (1 service time).

- At ρ = 0.7 (70% util), wait time is ~2.3 service times.

- At ρ = 0.8 (80% util), wait time is ~4 service times.

- At ρ = 0.9 (90% util), wait time is ~9 service times.

- At ρ = 0.95 (95% util), wait time is ~19 service times.

- At ρ = 1.0 (100% util), wait time is infinite.

The knee: around 80% utilization, the curve bends sharply. Below the knee, capacity is comfortable; above, latency climbs faster than utilization does.

Practical reading: target 70% utilization for steady-state, never 100%. The 30% 'headroom' is not waste; it is the price of bounded latency.

Queueing curve with knee at 80% utilization

Sizing Across the Knee

Two scenarios:

Scenario A: 10 replicas running at 60% CPU. Latency p99 = 100 ms.

Scenario B: same fleet running at 90% CPU due to traffic growth. p99 = 600 ms.

Same fleet, same code, only utilization changed.

Explain why scenario B's latency is 6x worse despite only a 1.5x utilization increase, using the geometric shape of the queueing curve. Then propose: at what utilization should the team add capacity, & why that threshold instead of waiting for actual SLO violation?

Size & Trigger Together

Synthesis

You can now apply Little's Law as a rectangle, read the queueing curve & its knee, & connect both to capacity decisions.

Apply both.

A backend tier handles 2,000 req/s with average latency 50 ms per replica capacity 80 req/s at 70% CPU. Surge factor 2x; you want to survive 3 simultaneous replica failures.

Compute: (1) L using Little's Law at baseline; (2) replica count using the lesson formula (peak × surge / per-replica) + headroom; (3) at what observed utilization across the fleet should autoscaling trigger, & justify the threshold using the queueing curve.

Companion Notes

This geometry-of lesson recasts the Stateless Horizontal Scaling main lesson as quantitative geometry.

The next companion, geometry_of_ingress_egress_separation, recasts the network-boundary split as a bipartite graph with a cut vertex that the split removes.

Well done.