The Bottleneck Node Identified Before Traffic Arrives
Betweenness Centrality
For every pair of nodes in a graph, there is a shortest path between them. Betweenness centrality of a node N = the fraction of all shortest paths that pass through N.
A node with high betweenness is on the path between many other pairs. If it slows down, many flows slow down. If it fails, many flows break.
Architectural reading: high-betweenness nodes are the ones every architecture review should pay extra attention to. They are bottlenecks, SPOFs, & capacity-critical components in one. They tend to be:
- The DNS provider (between every client & every service)
- The ingress proxy (between every client & every backend)
- The database primary (between every backend & every read)
- The authentication service (between every user & every authorized action)
Detection without measurement: graph topology alone identifies high-betweenness nodes. You don't need traffic data; you need the architecture diagram. A node that sits between many pairs of others is structurally critical.
Operational consequence: high-betweenness nodes deserve disproportionate investment in (1) capacity headroom, (2) redundancy, (3) observability, & (4) incident response playbooks.
The Smallest Cut Disconnects the Smallest Slice
Min-Cut Theorem in Plain Terms
The min-cut between two nodes in a graph = the smallest number of edges (or nodes) you must remove to disconnect them.
Operational reading: min-cut bounds the worst-case blast radius. If the min-cut between 'clients' & 'database' is 1 edge (a single proxy), then losing that edge disconnects all clients from the database. If the min-cut is 5, you need to lose 5 components simultaneously to fully disconnect; bad luck, but bounded.
Designing for blast radius: increase the min-cut at every important boundary. Multiple proxies; multiple cache nodes; multiple network paths between DCs. Each addition raises the min-cut by 1.
The bulkhead pattern in graph terms: partition resources into separate sub-graphs that share no min-cut with each other. A failure within one sub-graph cannot propagate to the others because the edges don't exist.
Diameter Sets Failure Propagation Distance
Graph diameter = the longest shortest path between any two nodes.
Failure propagation: when a node fails & retries flow back, they touch upstream nodes up to the diameter distance away. A diameter-3 system (client -> proxy -> backend -> DB) means a DB failure affects 3 upstream layers in a retry storm.
Implication: shorter diameter = faster failure containment but also more node concentration. Each design has its tradeoff.
Compute Min-Cut for a Real Architecture
An architecture: 1 DNS, 1 CDN, 3 reverse proxies, 12 backend replicas, 1 DB primary.
Failure-Mode Audit via Graph Metrics
Synthesis
You can now identify high-betweenness nodes, compute min-cut at every boundary, & estimate failure-propagation distance via diameter.
Apply all three.
A system: 50 customer endpoints -> 1 DNS -> 2 CDN POPs -> 4 reverse proxies -> 16 backend replicas -> { DB cluster (1 primary + 2 standbys), Redis cluster (5 nodes), 3 external APIs }.
Companion Notes
Companion Notes
This geometry-of lesson recasts the Failure Modes & Blast Radius main lesson through graph metrics (betweenness, min-cut, diameter).
The final companion, geometry_of_observability_and_capacity, treats Voronoi cells for CDN PoP catchments, the latency triangle's speed-of-light floor, & the queueing curve revisited at the proxy tier.
Well done.