Logarithmic Scale of Factorials
Stirling's approximation converts a product into a sum, which is the fundamental move that makes large-n mathematics tractable:
ln(n!) ≈ n·ln(n) − n + 0.5·ln(2πn)
This formula arises from approximating the sum Σ ln(k) for k=1..n by the integral of ln(x), then applying the trapezoid rule to bound the error.
Why It Matters Geometrically
The n-dimensional sphere volume formula involves Γ(n/2 + 1), which for integer n equals (n/2)! or products of half-integers. Stirling lets us estimate these for large n without computing each value directly.
Stirling's approximation gives log(n!) ≈ n·log(n) − n·log(e) in base-10 notation, useful for order-of-magnitude estimates.
For n = 10: ln(10!) ≈ 10·2.303 − 10 + 0.5·ln(62.83) ≈ 23.03 − 10 + 2.08 = 15.10 (true: 15.104).
For n = 100: ln(100!) ≈ 100·4.605 − 100 + 0.5·ln(628.3) ≈ 460.5 − 100 + 3.24 = 363.7 (true: 363.74).
Stirling at n=20
A direct computation: ln(20) ≈ 2.996. ln(2π·20) = ln(125.66) ≈ 4.833.
The Volume Formula
The volume of an n-dimensional sphere of radius r:
V_n(r) = C_n · r^n where C_n = π^(n/2) / Γ(n/2 + 1)
The C_n values for small n follow a pattern using Γ(1/2) = √π and the reduction formula:
- n=1: C_1 = π^(1/2)/Γ(3/2) = √π/(√π/2) = 2
- n=2: C_2 = π^1/Γ(2) = π/1 = π
- n=3: C_3 = π^(3/2)/Γ(5/2) = π^(3/2)/(3√π/4) = 4π/3
- n=4: C_4 = π²/Γ(3) = π²/2
- n=5: C_5 = π^(5/2)/Γ(7/2) = π^(5/2)/(15√π/8) = 8π²/15
Notice: C_n peaks near n=5 (≈ 5.264) then decreases. For large n, C_n → 0.
Maximum at n=5
C_5 = 8π²/15. With π² ≈ 9.870:
C_5 = 8·9.870/15 = 78.96/15 ≈ 5.264
To verify this is a maximum: C_6 = π³/6 ≈ 31.006/6 ≈ 5.168. So C_6 < C_5 — the peak occurred at n=5.
Fraction of Volume in Corners
The corner paradox quantified: what fraction of an n-dimensional unit hypercube [−1,1]^n lies outside the inscribed sphere of radius 1?
Corner fraction = 1 − C_n / 2^n
| n | C_n | 2^n | Sphere fraction | Corner fraction | |---|---|---|---|---| | 2 | 3.14 | 4 | 78.5% | 21.5% | | 3 | 4.19 | 8 | 52.4% | 47.6% | | 4 | 4.93 | 16 | 30.8% | 69.2% | | 5 | 5.26 | 32 | 16.4% | 83.6% | | 6 | 5.17 | 64 | 8.1% | 91.9% | | 10 | 2.55 | 1024 | 0.25% | 99.75% |
Implications for Optimization
The corner paradox has direct consequences for optimization in high-dimensional spaces:
Random search fails. A random point in n-dimensional parameter space almost certainly lands in a corner — far from the origin, with extreme parameter values. If good solutions cluster near moderate parameter values, random search will almost never find them.
Gradient descent succeeds. By following the local gradient, you navigate the geometry systematically rather than sampling it blindly. The curse of dimensionality hits random methods; structured methods adapt.
Distance concentrates. In high dimensions, all pairwise distances between random points concentrate around the same value: they all become approximately √(2n/3) for points uniform in [0,1]^n. Nearest-neighbor methods break down because 'nearest' and 'farthest' become indistinguishable.
Hamming's prescription: understand the geometry before trusting your intuition. In high-dimensional spaces, the geometry is counterintuitive, and the math is the only reliable guide.