un — Capacity Planning: Art & Science

un

guest

1 / ?

back to lessons

What Allspaw's Book Solves

The Discipline of Not Running Out

John Allspaw wrote 'The Art of Capacity Planning' (O'Reilly, 2008; second edition 2017) after running operations at Flickr through years of explosive growth. His thesis: capacity planning is not a one-time spreadsheet exercise. It is a continuous discipline that combines measurement, forecasting, & engineering judgment. Skip any of those three, & you either run out of capacity in production or burn money on hardware that idles.

Capacity planning sits between two failure modes:

- Underprovisioning: services run hot, latency spikes, error rates climb, customers leave. The fastest way to lose users in a growth phase.

- Overprovisioning: hardware sits at 10% utilization, finance asks why the budget keeps growing without revenue keeping pace. The fastest way to lose your headcount in a budget review.

The art lies in finding the corridor between those two cliffs and staying inside it as the workload changes.

Three core questions drive every capacity exercise:

- What do we have? Current capacity in concrete units: requests per second, queries per second, gigabytes of storage, concurrent connections.

- What do we need? Forecasted demand at a future date with explicit uncertainty bounds.

- When must we act? Lead time for procurement, provisioning, or scaling. Cloud reduces this to minutes; on-premise can mean months.

Capacity planning corridor: under, optimal, over

Why It Cannot Be a Spreadsheet

An e-commerce company plans capacity once a year, in November, by extrapolating the previous 12 months of traffic linearly. They run on dedicated servers with a 6-week procurement lead time. Their traffic shows strong weekly seasonality (3x weekend peak), strong yearly seasonality (5x Black Friday), & has been growing 40% year-over-year for three years.

List at least three specific failure modes this once-a-year linear-projection approach is likely to produce. For each failure, name the specific part of the company's reality the spreadsheet ignores, & propose a more frequent measurement or planning cadence that addresses it.

Workload versus Utilization

Two Different Numbers, Both Required

Capacity planning fails when teams measure only one of the two essential dimensions.

Workload: the demand on the system from the outside. Requests per second, transactions per minute, megabytes per second, concurrent users. Workload describes what the world is asking of you.

Utilization: how full the system runs while serving that demand. CPU percent, memory used, queue depth, network bandwidth, disk IOPS. Utilization describes how the system feels under that demand.

Workload alone tells you what is coming but not whether you can serve it. Utilization alone tells you how full you are but not what to expect tomorrow. You need both, plotted side by side, to make capacity decisions.

Capacity ratio = workload / utilization. If you serve 1,000 requests per second at 50% CPU, your capacity ratio is 2,000 RPS per 100% CPU per server. This conversion factor lets you translate forecasted workload into required server count.

Allspaw stresses measuring at the right granularity. One sample per minute hides 30-second peaks. One sample per hour hides everything. Real capacity work needs sub-minute resolution for peak events and minute resolution for trending. Anything coarser produces dangerous false confidence.

Workload + utilization plotted together over time

What to Instrument

Your team is launching capacity instrumentation on a new product launch (a video transcoding service). You can pick up to 8 metrics to track at sub-minute resolution. The service ingests video uploads, queues them, transcodes to multiple formats, & writes outputs to object storage.

Pick exactly 8 metrics. For each, label whether it captures workload or utilization, & justify why each metric earns inclusion versus a metric you left out. Identify one metric that, if you only had one, would be the most predictive of capacity exhaustion.

Trend, Seasonality, Uncertainty

Three Layers of Every Forecast

Allspaw and the Google SRE book agree on the structure of a useful forecast: trend, seasonality, & uncertainty bounds. Skip any one and the forecast becomes misleading.

Trend: the slope of demand over months or years. Often modeled with linear regression for short windows, exponential or piecewise-linear for compounding growth. The trend line answers 'where is demand headed in general?'

Seasonality: the cyclic patterns at multiple time scales. Daily (peak afternoon traffic), weekly (weekend spikes), yearly (Black Friday, tax season, school year). Multiplicative seasonality scales with the trend; additive seasonality adds a constant offset.

Uncertainty bounds: the forecast cone. A forecast without bounds is a guess. Real forecasts publish a central estimate with explicit upper and lower bounds, typically at 90% or 95% confidence. The cone widens as you project further into the future. A 4-week forecast might have ±10% bounds; a 12-month forecast often has ±50%.

Decoupling business growth from technical demand: capacity planning forecasts technical workload, but business teams forecast revenue, signups, or campaigns. The capacity planner's job is to translate business forecasts into technical demand: a 30% signup growth might mean 30% more API calls, but it might mean 80% more if new users use the system more heavily, or only 15% if they convert at lower rates. The conversion ratio matters as much as the underlying business forecast.

Forecast: trend line, seasonal ripples, widening cone

Forecasting Holiday Traffic

Your service serves an e-commerce site. Last year's Black Friday traffic was 5x the November average, sustained over 12 hours. The business has grown 40% year-over-year. Marketing is launching a paid promotion expected to add an additional 20% to Black Friday traffic this year.

Estimate this year's Black Friday peak as a multiple of the current monthly average. Show your work. Then propose specific upper and lower bounds for the forecast & explain what real-world events could push actual demand outside those bounds.

Knowing Your Ceiling

Find the Ceiling Before Production Does

Forecasting tells you what is coming. Ceiling tests tell you whether the system can serve it. Allspaw treats ceiling testing as a non-negotiable input to capacity planning: you do not know your real capacity until you have tested it under controlled load.

Three types of ceiling tests:

- Synthetic load test: a load generator (k6, Locust, JMeter, vegeta) drives traffic at a target service in staging. Increase load until something breaks. The breaking point is the ceiling. Best for isolated service testing.

- Production fire drill: deliberately reduce capacity in production (drain a percentage of servers, kill a region) and observe how the remaining capacity handles real traffic. Tests true production behavior including unexpected interactions. Highest confidence but highest risk.

- Shadow load: replay real production traffic at a target service running parallel to production. Captures real workload patterns (rare query mix, weird user agents) without affecting users. Strong middle ground.

Headroom is the buffer between current load and the ceiling. SRE rules of thumb:

- 50% headroom in steady state for a single-region service (so a region failure does not exhaust the surviving region)

- 30% headroom for a multi-region service with N+2 redundancy

- 100%+ headroom approaching known peak events (Black Friday, sports finals)

Headroom is not waste. It is the cost of not paging engineers at 3 AM, not losing customers during a spike, and not suffering a cascade failure when one region fails. Finance teams sometimes push to reduce headroom; capacity engineers must articulate the cost of running tight to make that conversation factual rather than emotional.

Headroom buffer: current load, ceiling, and the gap between

Designing a Ceiling Test

You inherit a service with no documented capacity ceiling. Current production load is 800 requests per second across 12 servers, average CPU 35%. Marketing is announcing a campaign in 6 weeks expected to drive traffic to 3,000 RPS at peak.

Design a ceiling test program in the next 4 weeks. Specify the test type(s), the metrics that define 'broken', the headroom target you would set, & the action you take depending on whether the test reveals enough capacity. Be concrete about what you do if the ceiling test shows the current 12 servers cannot handle 3,000 RPS.

Up, Out, or Diagonal

When to Add Power, Add Boxes, or Both

Three core scaling strategies, each with distinct cost and reliability profiles:

Vertical scaling (scaling up): bigger machines. Replace 8-core servers with 32-core servers. Simplest path; works until you hit single-machine limits. Single point of failure remains. Cost grows non-linearly: a 32-core machine often costs more than 4x an 8-core.

Horizontal scaling (scaling out): more machines. Add servers behind a load balancer. Capacity scales linearly with server count. Failure modes shift: you must handle distributed coordination, but a single server failure no longer destroys the service. Operational complexity increases.

Diagonal scaling (Allspaw's term): scale up first to a comfortable per-server size, then scale out from there. Combines simpler operations of large servers with the redundancy of multiple servers. Most production services live in diagonal scaling territory.

Reserved versus on-demand pricing: cloud providers reward predictability. Reserved capacity is 30-60% cheaper than on-demand but requires a 1-3 year commitment. Capacity planners typically lock in steady-state demand with reserved capacity and burst into on-demand for peaks. Misjudging this split can either waste money (over-reserved) or expose budget to surprise (under-reserved during peaks).

Spot instances and preemptible workloads: 60-90% cheaper than on-demand but can be reclaimed with minutes of notice. Suitable for batch jobs, analytics, training workloads, or any service designed for graceful interruption. Production user-facing traffic typically avoids spot.

Diagonal scaling path: small to medium boxes then horizontal scale-out

Choosing a Scaling Path

Your video transcoding service runs on 8 medium-sized cloud instances (8 cores each). You expect 3x growth over the next 6 months. The workload is CPU-bound, parallelizable per video, & each video transcode takes 90 seconds end-to-end. Reserved instances cost 50% of on-demand. Spot instances cost 30% of on-demand but can be terminated with 2-minute notice.

Recommend a scaling strategy for the next 6 months. Specify which instance sizes you choose, the mix of reserved/on-demand/spot, & justify each piece of the mix against the workload characteristics. Identify the single biggest risk in your plan & propose one mitigation.

Capacity Planning Careers

Where Capacity Planning Skills Pay

Capacity planning is rarely a job title on its own. The skills appear under several roles:

Site Reliability Engineer: capacity planning is a core SRE responsibility. Most SRE teams have one or two engineers who specialize in capacity, owning the forecast models, ceiling tests, and provisioning automation.

Cloud Cost / FinOps Engineer: a newer role focused on cloud spend optimization. Combines capacity planning with financial modeling, contract negotiation, and reserved-instance portfolio management. Pays extremely well at large cloud-native companies because cloud bills are often the second-largest expense after payroll.

Performance Engineer: focuses on per-node efficiency and ceiling testing. The job: extract more capacity from the same hardware through profiling, optimization, and architectural changes. Heavy systems and language-runtime knowledge.

Capacity Planning Specialist: at very large companies (Google, Meta, Amazon, Netflix), dedicated capacity planning teams exist. They own forecast models across the entire fleet, negotiate procurement at scale, and coordinate with finance on multi-year hardware roadmaps.

Skills that compound: time-series analysis (R, Python statsmodels, Prophet), queueing theory (M/M/1, M/M/c, Little's Law), at least one configuration management tool, at least one cloud cost dashboard, and the ability to write a forecast report that a CFO can understand and act on. The technical skills get you the interview; the communication skills get you the budget.

Capacity careers: SRE, FinOps, Performance, Specialist

Wrapping Up

What You Now Know

Capacity planning is a continuous discipline, not an annual exercise. You have covered:

- The corridor between underprovisioning and overprovisioning

- Workload versus utilization as the two dimensions of measurement

- Trend, seasonality, and uncertainty bounds as the three layers of every forecast

- Ceiling tests (synthetic, shadow, fire drill) as the only way to know real capacity

- Headroom buffers and why they are not waste

- Diagonal scaling and the reserved/on-demand/spot pricing decision

- Career paths where these skills earn budget authority

Two ideas matter most. Forecast with bounds, never with single points. And measure your ceiling before production does. Carry those two forward and the rest follows.

Recommended reading: Allspaw's 'The Art of Capacity Planning' (O'Reilly, 2017 second edition), the relevant chapters in Google's SRE Book (free at sre.google/books/), and Brendan Gregg's 'Systems Performance' for the underlying systems work. The geometry-of companion lesson goes deeper on the visual structure: Little's Law as area, queueing curves, trend slopes, and headroom envelopes.