un

guest
1 / ?
back to lessons

Distance in Binary Space

Richard Hamming's most famous technical contribution: error-correcting codes. The geometric idea behind them runs deeper than any specific code.

Hamming Distance

Given two binary strings of equal length, the Hamming distance d(u, v) counts positions where they differ:

`` u = 1 0 1 1 0 v = 1 1 1 0 0 ↑ ↑ d(u,v) = 2 ``

This satisfies all three metric axioms: d(u,v) ≥ 0; d(u,v) = 0 iff u = v; d(u,v) = d(v,u); d(u,w) ≤ d(u,v) + d(v,w). Binary n-space with Hamming distance forms a valid metric space.

The geometry visualizes clearly in low dimensions. All 3-bit strings live at the 8 vertices of a cube. Edge-adjacent vertices differ in exactly 1 bit; face-diagonal vertices differ in 2; antipodal vertices (e.g. 000 and 111) differ in 3.

3-bit Hypercube: Hamming Distance

Calculating Hamming Distance

Hamming weight wt(v) counts the number of 1s in v. Distance relates to weight via XOR:

d(u, v) = wt(u XOR v)

Example: u = 10110, v = 11100. XOR = 01010. Weight = 2. So d(u,v) = 2.

Compute d(u, v) for u = 10011101 and v = 11010100. Show the XOR step, then count differing bits.

Error Correction via Sphere Packing

A binary code C ⊆ {0,1}^n consists of chosen codewords. When a codeword transmits over a noisy channel, some bits may flip. The receiver gets a corrupted string and must recover the original.

Define a Hamming ball of radius t centered at codeword c:

B(c, t) = { v ∈ {0,1}^n : d(c, v) ≤ t }

To correct up to t errors, the balls B(c, t) around every pair of codewords must not overlap. If they overlap, a received word could lie in two balls and the decoder cannot determine which codeword was sent.

The minimum distance d_min of a code governs everything:

- Detect up to d_min − 1 errors - Correct up to ⌊(d_min − 1) / 2⌋ errors

Hamming (7,4) code: n = 7 bits, k = 4 data bits, d_min = 3. Corrects 1 error. Detects 2.

Error Correction: Sphere Packing

A code has minimum distance 5. How many errors can it detect? How many can it correct? Show the formulas, then compute both values.

How Many Codewords Fit?

How many codewords can a length-n code contain while still correcting t errors? Each codeword 'owns' a ball of radius t. All balls together must fit inside {0,1}^n, which has 2^n points.

Volume of a Hamming ball of radius t in {0,1}^n:

Vol(n,t) = Σ_{i=0}^{t} C(n, i)

The Hamming bound (sphere-packing bound) follows directly: a code correcting t errors satisfies

M · Vol(n,t) ≤ 2^n

where M = number of codewords. So M ≤ 2^n / Vol(n,t).

For the Hamming (7,4) code: n=7, t=1. Vol(7,1) = C(7,0) + C(7,1) = 1 + 7 = 8. Bound: M ≤ 128 / 8 = 16. The (7,4) code achieves M = 2^4 = 16: a perfect code, meaning the balls tile {0,1}^7 with no gaps.

For n = 15 and t = 1 (single-error correction), compute Vol(15, 1) and the Hamming bound on the number of codewords M. Is M = 2^11 achievable given the bound?

√n vs n

Hamming used a random-walk argument to make the value of long-range vision precise. The argument converts a vague claim — 'vision helps' — into a geometric fact about distances.

Symmetric Random Walk on ℤ

At each step, move +1 or −1 with equal probability. After n steps, expected displacement from origin: E[|X_n|] ≈ √n.

This follows from the variance: Var(X_n) = n (steps independent, each ±1 variance 1). Standard deviation = √n.

Directed Walk

At each step, move +1 with probability p > 1/2 (bias toward a goal). After n steps, expected position: E[X_n] = n(2p−1). For p = 1 (fully directed): E[X_n] = n.

The contrast: random drift scales as √n; directed progress scales as n.

Random Walk vs Directed Walk

Hamming's Translation

In a research career, each working day represents one step. Without a clear vision of what matters, work drifts in many directions: after n days, net progress ≈ √n. With a coherent long-range vision, effort aligns: after n days, net progress ≈ n. The ratio n / √n = √n grows without bound.

The √n Ratio

The directed walk does not require perfect aim. Any persistent bias toward a goal converts √n drift into something closer to n progress.

Hamming said that a person with clear vision accomplishes roughly √n times as much over a career as one without it, where n is the number of working days. If a career spans 10,000 working days, what ratio does this predict? What does the number suggest about the practical value of sustained vision?

Limits of the Model

A model that predicts 100x output from vision deserves scrutiny. Several things it omits:

1. Dimensionality: careers operate in high-dimensional space, not ℤ. The geometry of a random walk in ℝ^d changes significantly with d.

2. Correlation: research steps correlate — today's work builds on yesterday's. Correlated walks behave differently from i.i.d. steps.

3. The vision itself may be wrong: a directed walk toward the wrong attractor is worse than drift.

Identify one assumption the √n vs n argument depends on that you consider most suspect in real research careers. Explain why that assumption matters for the 100x prediction.

Doubling Time

Hamming opened his course with the claim that technical knowledge doubles roughly every 17 years. That claim has a precise mathematical structure: exponential growth.

Exponential Growth Model

y(t) = a · e^(b·t)

where a = initial quantity at t = 0, b = growth rate (per unit time), e ≈ 2.718.

Doubling time D: the time for y to double.

2a = a · e^(b·D) → 2 = e^(b·D) → ln(2) = b·D → D = ln(2) / b

ln(2) ≈ 0.693. For b = 0.693/17 ≈ 0.0408 per year, doubling time = 17 years.

Half-Life

Half-life H: time for y to decay to half its value (b < 0).

H = ln(2) / |b|

The same formula in both directions. A skill with half-life 5 years: after 5 years, half its market value gone. After 10 years: one quarter remains. After 20 years: less than 7% remains.

Knowledge Doubling

If technical knowledge doubles every 17 years, a student graduating at age 22 faces a transformed knowledge landscape by age 56 — a 34-year career spans two full doublings.

Using D = ln(2) / b, compute the annual growth rate b implied by a 17-year doubling time. Then use y(t) = e^(b·t) to find the factor by which the knowledge base multiplies over a 34-year career. Show your work.

Half-Life of Expertise

The same exponential model applies to decay. A specific skill (e.g. mastery of a particular chip architecture, a deprecated API, a superseded algorithm) loses value over time as the field moves on.

If the half-life of a specialized skill H = 5 years, then after t years the fraction of original value remaining: f(t) = (1/2)^(t/H) = 2^(−t/H).

After one half-life (5 years): 50% remains. Two half-lives (10 years): 25%. Three (15 years): 12.5%. Four (20 years): 6.25%.

Hamming's implication: the value of learning how to learn compounds positively with the same exponent that specialized knowledge decays. Investing in learning strategy, problem-framing, & transferable reasoning preserves value across half-life cycles.

A software engineer's expertise in a specific framework has a half-life of 4 years. She has 12 years until retirement. What fraction of that expertise's value remains at retirement? Then interpret: what does this suggest about how she should allocate learning time between deep specialization and transferable skills?

Geometry, Error Correction, & Career

The three geometric structures from this lesson appear disconnected. They connect.

Hamming distance formalizes the cost of error and the redundancy needed to recover from it. Every communication system, every codebase, every body of knowledge needs enough redundancy that single errors do not corrupt the whole.

The √n vs n argument translates vision into a geometric fact: drift scales as distance from a starting point, directed motion scales as displacement toward a goal. Redundancy in career strategy — keeping multiple lines of inquiry open — buffers against the occasional wrong turn.

Exponential growth & decay govern both the expanding frontier and the half-life of what you know today. The only stable investment: learning how to learn, which compounds on the same time scale that specialized knowledge decays.

Connect at least two of these three geometric ideas to a single concrete decision you face in your field or career. The connection should be specific: name the decision, name the geometric structure, and explain what the geometry tells you to do differently than you would without it.