Distance in Binary Space
Richard Hamming's most famous technical contribution: error-correcting codes. The geometric idea behind them runs deeper than any specific code.
Hamming Distance
Given two binary strings of equal length, the Hamming distance d(u, v) counts positions where they differ:
``
u = 1 0 1 1 0
v = 1 1 1 0 0
↑ ↑
d(u,v) = 2
``
This satisfies all three metric axioms: d(u,v) ≥ 0; d(u,v) = 0 iff u = v; d(u,v) = d(v,u); d(u,w) ≤ d(u,v) + d(v,w). Binary n-space with Hamming distance forms a valid metric space.
The geometry visualizes clearly in low dimensions. All 3-bit strings live at the 8 vertices of a cube. Edge-adjacent vertices differ in exactly 1 bit; face-diagonal vertices differ in 2; antipodal vertices (e.g. 000 and 111) differ in 3.
Calculating Hamming Distance
Hamming weight wt(v) counts the number of 1s in v. Distance relates to weight via XOR:
d(u, v) = wt(u XOR v)
Example: u = 10110, v = 11100. XOR = 01010. Weight = 2. So d(u,v) = 2.
Error Correction via Sphere Packing
A binary code C ⊆ {0,1}^n consists of chosen codewords. When a codeword transmits over a noisy channel, some bits may flip. The receiver gets a corrupted string and must recover the original.
Define a Hamming ball of radius t centered at codeword c:
B(c, t) = { v ∈ {0,1}^n : d(c, v) ≤ t }
To correct up to t errors, the balls B(c, t) around every pair of codewords must not overlap. If they overlap, a received word could lie in two balls and the decoder cannot determine which codeword was sent.
The minimum distance d_min of a code governs everything:
- Detect up to d_min − 1 errors - Correct up to ⌊(d_min − 1) / 2⌋ errors
Hamming (7,4) code: n = 7 bits, k = 4 data bits, d_min = 3. Corrects 1 error. Detects 2.
How Many Codewords Fit?
How many codewords can a length-n code contain while still correcting t errors? Each codeword 'owns' a ball of radius t. All balls together must fit inside {0,1}^n, which has 2^n points.
Volume of a Hamming ball of radius t in {0,1}^n:
Vol(n,t) = Σ_{i=0}^{t} C(n, i)
The Hamming bound (sphere-packing bound) follows directly: a code correcting t errors satisfies
M · Vol(n,t) ≤ 2^n
where M = number of codewords. So M ≤ 2^n / Vol(n,t).
For the Hamming (7,4) code: n=7, t=1. Vol(7,1) = C(7,0) + C(7,1) = 1 + 7 = 8. Bound: M ≤ 128 / 8 = 16. The (7,4) code achieves M = 2^4 = 16: a perfect code, meaning the balls tile {0,1}^7 with no gaps.
√n vs n
Hamming used a random-walk argument to make the value of long-range vision precise. The argument converts a vague claim — 'vision helps' — into a geometric fact about distances.
Symmetric Random Walk on ℤ
At each step, move +1 or −1 with equal probability. After n steps, expected displacement from origin: E[|X_n|] ≈ √n.
This follows from the variance: Var(X_n) = n (steps independent, each ±1 variance 1). Standard deviation = √n.
Directed Walk
At each step, move +1 with probability p > 1/2 (bias toward a goal). After n steps, expected position: E[X_n] = n(2p−1). For p = 1 (fully directed): E[X_n] = n.
The contrast: random drift scales as √n; directed progress scales as n.
Hamming's Translation
In a research career, each working day represents one step. Without a clear vision of what matters, work drifts in many directions: after n days, net progress ≈ √n. With a coherent long-range vision, effort aligns: after n days, net progress ≈ n. The ratio n / √n = √n grows without bound.
The √n Ratio
The directed walk does not require perfect aim. Any persistent bias toward a goal converts √n drift into something closer to n progress.
Limits of the Model
A model that predicts 100x output from vision deserves scrutiny. Several things it omits:
1. Dimensionality: careers operate in high-dimensional space, not ℤ. The geometry of a random walk in ℝ^d changes significantly with d.
2. Correlation: research steps correlate — today's work builds on yesterday's. Correlated walks behave differently from i.i.d. steps.
3. The vision itself may be wrong: a directed walk toward the wrong attractor is worse than drift.
Doubling Time
Hamming opened his course with the claim that technical knowledge doubles roughly every 17 years. That claim has a precise mathematical structure: exponential growth.
Exponential Growth Model
y(t) = a · e^(b·t)
where a = initial quantity at t = 0, b = growth rate (per unit time), e ≈ 2.718.
Doubling time D: the time for y to double.
2a = a · e^(b·D) → 2 = e^(b·D) → ln(2) = b·D → D = ln(2) / b
ln(2) ≈ 0.693. For b = 0.693/17 ≈ 0.0408 per year, doubling time = 17 years.
Half-Life
Half-life H: time for y to decay to half its value (b < 0).
H = ln(2) / |b|
The same formula in both directions. A skill with half-life 5 years: after 5 years, half its market value gone. After 10 years: one quarter remains. After 20 years: less than 7% remains.
Knowledge Doubling
If technical knowledge doubles every 17 years, a student graduating at age 22 faces a transformed knowledge landscape by age 56 — a 34-year career spans two full doublings.
Half-Life of Expertise
The same exponential model applies to decay. A specific skill (e.g. mastery of a particular chip architecture, a deprecated API, a superseded algorithm) loses value over time as the field moves on.
If the half-life of a specialized skill H = 5 years, then after t years the fraction of original value remaining: f(t) = (1/2)^(t/H) = 2^(−t/H).
After one half-life (5 years): 50% remains. Two half-lives (10 years): 25%. Three (15 years): 12.5%. Four (20 years): 6.25%.
Hamming's implication: the value of learning how to learn compounds positively with the same exponent that specialized knowledge decays. Investing in learning strategy, problem-framing, & transferable reasoning preserves value across half-life cycles.
Geometry, Error Correction, & Career
The three geometric structures from this lesson appear disconnected. They connect.
Hamming distance formalizes the cost of error and the redundancy needed to recover from it. Every communication system, every codebase, every body of knowledge needs enough redundancy that single errors do not corrupt the whole.
The √n vs n argument translates vision into a geometric fact: drift scales as distance from a starting point, directed motion scales as displacement toward a goal. Redundancy in career strategy — keeping multiple lines of inquiry open — buffers against the occasional wrong turn.
Exponential growth & decay govern both the expanding frontier and the half-life of what you know today. The only stable investment: learning how to learn, which compounds on the same time scale that specialized knowledge decays.