un — Geometry of Capacity Planning

un

guest

1 / ?

back to lessons

L = λ × W: A Rectangle

Little's Law: The Most Useful Equation in Capacity Planning

John Little proved in 1961 that for any stable queue, regardless of its internal structure: L = λ × W, where:

- L = average number of items in the system (queue + in service)

- λ (lambda) = average arrival rate of items per unit time

- W = average time each item spends in the system

The geometric reading: plot arrival rate λ on one axis & residence time W on the other. The product L is the area of the rectangle they form. Capacity planning lives inside this rectangle.

Why it matters: any two of the three quantities determine the third. If you measure throughput and latency, you know occupancy. If you measure occupancy and throughput, you know latency. The law is robust: it applies to web requests, restaurant tables, supermarket queues, and CPU pipelines without modification.

Three concrete examples:

- A web service handles 200 requests/second with average latency 50 ms (0.05 s). L = 200 × 0.05 = 10 requests in flight at any time.

- A coffee shop serves 60 customers/hour with average dwell time 15 minutes (0.25 h). L = 60 × 0.25 = 15 customers inside on average.

- A factory line produces 100 widgets/hour with each widget taking 2 hours end-to-end. L = 100 × 2 = 200 widgets in process.

Provisioning implication: if you can size for L (concurrent in-flight items), you have sized for the system. The number of worker threads, database connections, or queue slots all derive from L.

Little's Law as a rectangle: λ on x, W on y, area = L

Sizing a Worker Pool

Your video transcoding service is sized for an average arrival rate of 30 transcoding jobs per minute, each taking 90 seconds end-to-end. The current worker pool has 30 workers.

Apply Little's Law to determine whether the current pool is adequately sized. Show your work. Then explain what changes if arrival rate doubles, & what changes if individual transcode time doubles. Which scenario stresses the system more?

Why Latency Explodes Past 80% Utilization

The Most Important Curve in Capacity Planning

Plot utilization on the x-axis (0% to 100%) and average latency on the y-axis. The shape that emerges is one of the most consequential curves in operations: it explains why teams target utilization well below 100%, why reserved headroom is not waste, and why systems running 'efficiently' at high utilization fall over without warning.

The M/M/1 queueing curve: for a system with Poisson arrivals (random) and exponential service times (random), the average waiting time follows:

W_q = ρ / (μ(1-ρ))

where ρ (rho) is the utilization (0 to 1) and μ is the service rate. The denominator (1-ρ) is the punchline: as ρ approaches 1, the denominator approaches 0, and waiting time approaches infinity.

Numerical examples (latency multiplier vs ρ for M/M/1):

- ρ = 0.5: latency ratio 1.0 (baseline)

- ρ = 0.7: latency ratio ~2.3

- ρ = 0.8: latency ratio ~4.0

- ρ = 0.9: latency ratio ~9.0

- ρ = 0.95: latency ratio ~19.0

- ρ = 0.99: latency ratio ~99.0

The elbow sits around 70-80% utilization. Below the elbow, adding load increases latency slowly. Above the elbow, latency explodes nonlinearly. This is why the canonical SRE rule is: target steady-state utilization below 80%, never run sustained above 90%.

Why traditional ops teams underestimate this: a server at 60% CPU 'looks busy' but has comfortable latency headroom. A server at 90% CPU 'looks productive' but is one workload bump away from latency catastrophe. The geometric truth: the curve's slope is the actual threat, not its current y-value.

M/M/1 queueing curve: x = utilization, y = latency, elbow at ~80%

Reading the Curve

A team runs a service at 85% CPU utilization steady state. Current p99 latency is 200 ms. They are considering adding 30% more traffic to consolidate workload from another service that is being deprecated.

Predict what happens to latency at 85% becoming roughly 110% (over capacity) using the queueing curve. Why does CPU utilization above 100% literally cannot be sustained, & what visible symptom replaces it? Recommend a target utilization for the consolidated workload & justify the headroom you are leaving.

Slope, Intercept, & the Forecast Cone

Reading Growth From a Slope

Forecasting demand reduces (in many cases) to drawing the right line through historical data. The geometric properties of that line: slope, intercept, and uncertainty cone, encode the entire forecast.

Linear trend (y = mx + b): appropriate for short windows or genuinely linear processes. Slope m is the growth rate per time unit. Intercept b is the starting value. Useful when growth is steady. Tends to underestimate when the process is actually compounding.

Exponential trend (y = b × e^(mx)): appropriate for compound growth: viral adoption, user-network effects, multiplicative seasonality. On a log-scale y-axis, exponential growth becomes linear, which makes slope estimation easier. Slope m on log-scale is the growth rate per time unit.

Piecewise linear: appropriate when growth has distinct regimes. A startup might grow slowly for 18 months, then have a viral inflection that produces 6 months of explosive growth, then plateau. Three linear segments fit this better than any single curve.

Forecast cone: the central estimate plus upper and lower bounds, drawn as a widening cone into the future. The cone's width grows with time because uncertainty compounds. A 4-week forecast might have ±10% bounds; a 12-month forecast often has ±50% or more.

Seasonality decomposition: real demand combines trend + seasonal cycle + noise. Statistical libraries (statsmodels, Prophet) decompose a series into these three components, allowing the trend to be projected separately from the seasonal pattern. Geometrically, the trend is the underlying drift, the seasonality is the periodic ripple on top, and the noise is the residual jitter.

Forecast cone: trend line, seasonal ripples, widening uncertainty bounds

Choosing a Trend Model

You have 24 months of monthly request volumes. Months 1-12 grew from 1M to 2M (linear-looking, +83K/month). Months 13-18 grew from 2M to 4M (steeper, +330K/month). Months 19-24 grew from 4M to 12M (much steeper still). Marketing confirms a viral product feature launched in month 13 driving the inflection.

Which trend model fits best: pure linear, pure exponential, or piecewise linear? Justify your choice using the slope behavior. Then propose how to forecast months 25-30: explicit central estimate, upper bound, & lower bound. What real-world event could break either bound?

Capacity vs Demand as 2D Geometry

The Plot Every Capacity Team Lives Inside

Plot time on the x-axis. Plot demand and capacity on the y-axis as two separate lines. The vertical gap between them at any point in time is the headroom. The 2D area between the curves is the headroom envelope.

Three reference shapes:

- Healthy envelope: capacity line stays comfortably above demand line. The gap may narrow during peaks but never vanishes. The envelope is a band of safety.

- Closing envelope: capacity grows slower than demand. The gap narrows over time. The intersection point in the future is when the system runs out of headroom: the date the team must add capacity by.

- Inverted envelope: demand exceeds capacity. The system is in incident territory. The vertical magnitude of the inversion is the deficit that must be served somehow (queue overflow, error rates, customer impact).

The standard capacity planning chart plots:

- Recent demand history (solid blue line)

- Forecast demand with bounds (dashed line + shaded cone)

- Current capacity (solid green line)

- Planned capacity additions with delivery dates (step function)

- The intersection date where forecast demand crosses current capacity: this is the deadline for the next provisioning

The visual decision rule: keep the capacity step function above the upper bound of the forecast cone at all times. Do not provision to the central estimate; provision to the upper bound. The cost of over-provisioning is finite (some idle capacity); the cost of under-provisioning is unbounded (lost users, cascade failure, reputation damage).

Headroom envelope: demand line, capacity step function, forecast cone, intersection date

Reading the Envelope

Your capacity chart shows: current demand is 1,500 RPS growing 20% per month. Current capacity is 2,500 RPS. A new server batch (+1,500 RPS capacity) arrives in 8 weeks. The forecast cone has ±15% bounds at the 8-week horizon.

Compute the date when forecast demand (central estimate, upper bound) hits current capacity. Will the new server batch arrive in time? What is the visual shape of the envelope between now & the new batch arrival, & what action would you take if upper-bound demand intersects current capacity before the new batch arrives?

Geometry of Capacity: Wrapping Up

Shapes That Predict the Future

You have walked through four geometric structures that run beneath capacity planning:

- Little's Law (L = λ × W) as the area of a rectangle defining steady-state occupancy

- The queueing curve with its elbow at 80% utilization, encoding the nonlinear cost of running hot

- Trend slopes and forecast cones that turn historical data into actionable projections

- Headroom envelopes as 2D plots of capacity versus demand, with intersection dates marking provisioning deadlines

Capacity planning is, at its visual core, the discipline of keeping one curve safely above another over time. The numbers are dressing; the shapes carry the truth. A capacity engineer who reads the queueing curve correctly will catch problems that a CPU dashboard hides until the system is already burning.

The companion lesson on capacity planning covered the practices: measurement, forecasting, ceiling tests, headroom, and scaling. This lesson covered the geometry beneath them. Together they form the visual and analytical scaffolding of running services that scale without surprise.