un

guest
1 / ?
back to lessons

Why the Greedy Strategy is Optimal

The Huffman algorithm is a greedy algorithm: at each step, it makes the locally optimal choice (merge the two cheapest nodes) without looking ahead. Greedy algorithms are not always globally optimal — but here they are.

The Optimality Argument

Suppose a code C has minimum average length L. Consider the two symbols with lowest probability, say p_a and p_b. In any optimal code, these two symbols must appear as sibling leaves at the deepest level of the tree. Why?

If they did not share a parent, you could swap the deeper symbol with a shorter code — reducing L. Therefore the deepest leaves must be the least probable symbols.

Now, if you combine a and b into a meta-symbol ab (with p_ab = p_a + p_b), any optimal code for the reduced alphabet minus one symbol is precisely the Huffman code on the reduced problem. Induction completes the argument.

Geometric View

The Huffman algorithm constructs the binary tree bottom-up, placing the least probable symbols at the greatest depth. This minimizes Σ p_i · depth(i) = L.

An equivalent view: you are assigning labels to tree leaves to minimize the weighted path length, where each leaf's weight is its probability.

Executing the Greedy Steps

Probabilities: p(A)=0.4, p(B)=0.3, p(C)=0.2, p(D)=0.1. Sum = 1.0 ✓

Step 1: Lowest two: C(0.2), D(0.1). Merge → CD(0.3). List: A(0.4), B(0.3), CD(0.3).

Step 2: Lowest two: B(0.3), CD(0.3) (tie — either valid). Merge → BCD(0.6). List: A(0.4), BCD(0.6).

Step 3: Merge A(0.4), BCD(0.6) → root(1.0). Assign A=0, BCD=1.

Backward: BCD → B=10 (l=2), CD=11 → C=110 (l=3), D=111 (l=3).

L = 0.4·1 + 0.3·2 + 0.2·3 + 0.1·3 = 0.4+0.6+0.6+0.3 = 1.9 bits.

H for p={0.4, 0.3, 0.2, 0.1}: compute H = −Σ p_i·log₂(p_i). Use log₂(0.4)≈−1.322, log₂(0.3)≈−1.737, log₂(0.2)≈−2.322, log₂(0.1)≈−3.322. Verify that L = 1.9 ≥ H, and compute L/H.

Code-Length Variance

Two Huffman codes may achieve the same L but have different code-length distributions. The variance of code lengths measures this spread:

Var(l) = E[l²] − (E[l])²

where E[l] = L (average code length) and E[l²] = Σ p_i · l_i².

Long vs Bushy Tree Comparison

Why variance matters:

1. Buffer requirements. In real-time decoding, each symbol takes a variable number of bits. High variance means some symbols take many bits — you need a larger input buffer to guarantee you can always read a complete symbol.

2. Decoding latency. High-variance codes have long worst-case decoding paths. Low-variance (bushy) codes bound the worst-case more tightly.

3. Robustness. A lost bit in a high-variance code causes more synchronization damage because long codewords must be fully re-read.

Hamming's recommendation: when multiple valid Huffman codes exist (tie-breaking choices), prefer the one with lower variance — the bushy tree.

Computing Variance for Two Trees

Using p(A)=0.4, p(B)=0.3, p(C)=0.2, p(D)=0.1 and the code A=0, B=10, C=110, D=111:

E[l] = L = 1.9

E[l²] = 0.4·1² + 0.3·2² + 0.2·3² + 0.1·3² = 0.4+1.2+1.8+0.9 = 4.3

Var = 4.3 − 1.9² = 4.3 − 3.61 = 0.69

Now consider the bushy alternative: B=00, C=01, A=10, D=11 (all lengths = 2, L=2). Note this is NOT a Huffman code (L=2 > H≈1.846), but serves as a comparison. Compute Var for this bushy-length-2 code. Then explain: even though the bushy code has higher L than Huffman, what property makes it preferable in real-time applications?

3-Symbol Huffman Code End-to-End

A complete end-to-end example: build Huffman code, compute L, compute H, verify L ≥ H, compute Var.

Source: p(X)=0.6, p(Y)=0.3, p(Z)=0.1.

Huffman build:

Step 1: Sorted: X(0.6), Y(0.3), Z(0.1). Lowest two: Y(0.3), Z(0.1).

Merge → YZ(0.4). List: X(0.6), YZ(0.4).

Step 2: Merge X(0.6), YZ(0.4) → root(1.0). Assign X=0, YZ=1.

YZ → Y=10, Z=11.

Code: X=0 (l=1), Y=10 (l=2), Z=11 (l=2).

L = 0.6·1 + 0.3·2 + 0.1·2 = 0.6 + 0.6 + 0.2 = 1.4 bits.

Entropy: H = −0.6·log₂(0.6) − 0.3·log₂(0.3) − 0.1·log₂(0.1)

log₂(0.6) ≈ −0.737, log₂(0.3) ≈ −1.737, log₂(0.1) ≈ −3.322

H = 0.6·0.737 + 0.3·1.737 + 0.1·3.322 = 0.442 + 0.521 + 0.332 = 1.295 bits.

L = 1.4 ≥ H = 1.295 ✓. L/H = 1.4/1.295 ≈ 1.081. 8.1% above entropy.

Variance: E[l²] = 0.6·1 + 0.3·4 + 0.1·4 = 0.6+1.2+0.4 = 2.2. Var = 2.2 − 1.4² = 2.2 − 1.96 = 0.24.

Your Turn: A Full Pipeline

log₂ values for reference: log₂(0.5)=−1.000, log₂(0.25)=−2.000, log₂(0.125)=−3.000, log₂(0.375)≈−1.415, log₂(0.625)≈−0.678.

Build a Huffman code for p(A)=0.5, p(B)=0.375, p(C)=0.125. Show merge steps. Compute L, H, L/H, and Var. State one insight from comparing L to H for this source.