How Huffman Builds the Optimal Code
Hamming presents Huffman coding as a greedy algorithm that constructs a minimum-average-length prefix-free code. The key idea: build the tree bottom-up, always merging the two least probable symbols.
The Forward Pass (Build)
1. Sort all symbols by probability, highest to lowest.
2. Take the two lowest-probability symbols. Combine them into a new meta-symbol with probability = sum of the two.
3. Insert the meta-symbol in its sorted position.
4. Repeat until only two symbols remain.
5. Assign 0 and 1 to the two remaining symbols.
The Backward Pass (Assign Codes)
Undo the merges in reverse: at each split, the two symbols that were merged receive the same prefix bits as their combined parent, plus an additional 0 (one child) or 1 (other child). The 0/1 assignment is arbitrary — either is valid.
Why this is optimal: if any other code had smaller average length L', applying the same forward reduction would eventually produce two symbols with average length less than 1 bit — impossible for a 2-symbol code. Contradiction.
Tracing the Algorithm
Example from Hamming: four symbols A, B, C, D with p(A)=0.5, p(B)=0.25, p(C)=0.125, p(D)=0.125.
Forward pass:
Step 1: Sorted: A(0.5), B(0.25), C(0.125), D(0.125). Lowest two: C, D.
Step 2: Merge CD with p=0.25. New list: A(0.5), B(0.25), CD(0.25).
Step 3: Lowest two: B(0.25), CD(0.25). Merge BCD with p=0.5.
Step 4: Two symbols remain: A(0.5), BCD(0.5). Assign A=0, BCD=1.
Backward pass:
BCD → B gets 10, CD gets 11. CD → C gets 110, D gets 111.
Final code: A=0 (l=1), B=10 (l=2), C=110 (l=3), D=111 (l=3).
L = 0.5·1 + 0.25·2 + 0.125·3 + 0.125·3 = 1.75 bits.
Multiple Valid Huffman Codes
Hamming notes two sources of non-uniqueness:
1. Arbitrary 0/1 assignment. At each split, you can give 0 to either child. Swapping 0 and 1 throughout gives a different code with identical L.
2. Tie-breaking. When two nodes have equal probability, either can be placed 'higher' (combined first). Different tie-breaking choices can produce structurally different trees — 'long' vs 'bushy' — with the same L but different code-length distributions.
Long vs Bushy
Long tree (skewed): one symbol at each depth level (Fibonacci-like structure). Code lengths vary widely: one symbol gets a short code, others cascade to longer codes.
Bushy tree (balanced): symbols spread evenly across depth levels. Code lengths cluster near the average. Lower variance.
Hamming's recommendation: when given a choice, prefer the bushy tree. Same L, but smaller variance in code lengths → more uniform decoding time → smaller buffer requirements in real-time applications.
Computing Code-Length Variance
Variance of code lengths: Var = E[l²] − (E[l])²
For the code {A=0 l=1, B=10 l=2, C=110 l=3, D=111 l=3} with p=[0.5, 0.25, 0.125, 0.125]:
E[l] = L = 1.75
E[l²] = 0.5·1² + 0.25·2² + 0.125·3² + 0.125·3² = 0.5 + 1.0 + 1.125 + 1.125 = 3.75
Var = 3.75 − 1.75² = 3.75 − 3.0625 = 0.6875
An alternative bushy code for equal-probability symbols uses all length-2 codes: L=2, Var=0.
Compression Gains vs Symbol Distribution
Hamming's rule: Huffman coding pays off only when symbol probabilities differ substantially.
Equal probabilities. If all 2^k symbols have equal probability 1/2^k, Huffman produces a block code: every symbol gets length k. L = H = k. No gain over the simple block code.
Skewed probabilities. If one symbol dominates (p >> 1/2^k for others), Huffman assigns it a very short code (length 1 or 2), dramatically reducing L.
The comma code (Hamming's term). When each probability exceeds 2/3 of the remaining total, Huffman naturally produces: p(s1)→0, p(s2)→10, p(s3)→110, ..., down to two equal-length codes at the end. This is a 'comma code': the trailing 0 after a run of 1s acts like a comma.
Hamming notes: real-world data compression (backup, file archiving) can cut storage by more than half when the source has skewed probabilities — English text being a prime example.
Huffman vs Block Code: Numeric Comparison
Hamming's second example: p(s1)=1/3, p(s2)=1/5, p(s3)=1/6, p(s4)=1/10, p(s5)=1/12, p(s6)=1/20, p(s7)=1/30, p(s8)=1/30.
Block code: 8 symbols → 3 bits each → L_block = 3.
Hamming computes the Huffman code and reports L_Huffman ≈ 2.58 bits.
Savings: (3 − 2.58)/3 ≈ 14% compression over block coding.
When symbol probabilities differ greatly (1/3 vs 1/30 here, a 10:1 ratio), variable-length coding pays substantially.
Self-Compiling Programs
Hamming ends Chapter 11 with a striking idea: a Huffman encoder can encode itself. The decoding tree (which tells the decoder how to read the compressed message) gets transmitted along with the compressed data. The receiver needs no prior knowledge of the code.
The encoder: samples the data, estimates probabilities, constructs the Huffman code, encodes the decoding tree as a header, then encodes the data.
The decoder: reads the header to reconstruct the tree, then decodes the data using it.
Hamming's point: this entire pipeline can run as a library function with no human involvement. You call it, it compresses and decompresses automatically. Modern archiving formats (ZIP, gzip, bzip2) implement exactly this pattern.
The deeper principle: the intelligence is in the algorithm, not in a fixed code table. The algorithm adapts to whatever data it receives. This is the pattern Hamming sees across all great engineering systems: adaptability built in, not bolted on.