un — Hamming Ch. 10: Coding Theory I

un

guest

1 / ?

back to lessons

Source → Channel: Two-Stage Encoding

Hamming's Chapter 10 opens with Shannon's communication model, which applies to every system that moves information: telephone calls, radio, hard drives, DNA replication, computer memory.

Shannon Communication Model

The Two-Stage Architecture

Stage 1: Source encoding (compression). Convert the source message to a compact representation. Goal: remove redundancy native to the source. Morse code does this: common letters get short codes, rare letters get long ones.

Stage 2: Channel encoding (error protection). Add controlled redundancy adapted to the channel's noise characteristics. A noisy telephone line needs more redundancy than a fiber optic cable. The two stages decouple: optimize each independently for its own task.

The common interface between the two stages — a standardized bit stream — means any source can pair with any channel. This modularity is Shannon's key architectural insight.

Storage as communication. Hamming notes that sending a message through space (transmission) and sending it through time (storage) use the same model. A backup drive is a noisy channel in time.

The Role of Noise

Shannon's model makes a radical assumption: noise is inevitable. Every channel, every storage medium, every switching circuit introduces some probability of error. The question is not 'how do we eliminate noise?' but 'how do we recover the original message despite noise?'

This contrasts with classical physics, where noise enters as an afterthought (the uncertainty principle). Shannon treats noise as the foundational condition — all theory builds from it.

For now, Chapter 10 focuses on the noiseless case: source coding without errors. The channel coding problem (error correction) waits for later chapters.

Hamming says sending through space and sending through time use the same communication model. Give one concrete example of each and explain what plays the role of the 'channel' in your time-storage example.

When Can You Decode Uniquely?

For a variable-length code to be useful, the receiver must decode a stream of bits unambiguously. Hamming illustrates with a 4-symbol code that fails this test, then introduces the fix: prefix-free codes.

Prefix-Free Condition

A code is prefix-free (or instantaneous) if no codeword is a prefix of any other codeword. Given a received bit stream, you know a symbol has ended the moment you reach a leaf in the decoding tree — no lookahead required.

Example prefix-free code for {s1, s2, s3, s4}: s1 = 0, s2 = 10, s3 = 110, s4 = 111.

Verify: 0 is not a prefix of 10, 110, or 111. 10 is not a prefix of 110 or 111 (10 followed by more bits gives 100... or 101..., neither of which starts with 110 or 111). The code qualifies.

The decoding procedure: follow the tree, branch on each incoming bit, emit the symbol when you hit a leaf, return to the root.

The Kraft Inequality

For any prefix-free binary code with codeword lengths l_1, l_2, ..., l_n:

Σ 2^(−l_i) ≤ 1

Equality holds when the code is complete (leaves tile the full binary tree with no gaps). This is a necessary condition: no prefix-free code can violate it. Conversely, for any set of lengths satisfying Kraft's inequality, a prefix-free code exists.

Applying Kraft's Inequality

Given code lengths, verify Kraft: Σ 2^(−l_i) ≤ 1.

For {s1=0, s2=10, s3=110, s4=111}: lengths are 1, 2, 3, 3.

Σ = 2^(−1) + 2^(−2) + 2^(−3) + 2^(−3) = 1/2 + 1/4 + 1/8 + 1/8 = 1. ✓

A 5-symbol source proposes the code: s1=0, s2=10, s3=110, s4=1110, s5=1111. Verify or disprove prefix-free decodability, then compute the Kraft sum. What does Kraft = 1 tell you about this code?

Shannon Entropy

The average code length of a variable-length code: L = Σ p_i · l_i, where p_i is the probability of symbol s_i and l_i is its code length.

How short can L get? Shannon's answer: not below the source entropy.

Shannon entropy: H = −Σ p_i · log₂(p_i) (units: bits/symbol)

Entropy measures the average information per symbol. High entropy means symbols are roughly equally likely (maximum unpredictability). Low entropy means some symbols dominate (high predictability, more compressible).

Noiseless Coding Theorem

For any prefix-free code, H(source) ≤ L ≤ H(source) + 1.

No code can beat entropy. Huffman coding (next chapter) achieves the upper bound: L < H + 1. For block codes over n symbols, the bound tightens: H ≤ L/n < H + 1/n.

Entropy is also the theoretical compression limit: you cannot compress a source below H bits per symbol without losing information.

Computing Entropy

Example: 4 symbols with probabilities p = [1/2, 1/4, 1/8, 1/8].

H = −(1/2)·log₂(1/2) − (1/4)·log₂(1/4) − (1/8)·log₂(1/8) − (1/8)·log₂(1/8)

= (1/2)·1 + (1/4)·2 + (1/8)·3 + (1/8)·3

= 0.5 + 0.5 + 0.375 + 0.375 = 1.75 bits/symbol

And the Huffman code {0, 10, 110, 111} achieves L = 1.75 = H exactly.

Compute H for the 3-symbol source: p(A) = 1/2, p(B) = 1/3, p(C) = 1/6. Show all terms. Then propose a prefix-free code and verify L ≥ H.

Why Entropy Is a Lower Bound

The noiseless coding theorem's lower bound is not just a formula — it reflects something deep about information: you cannot compress what is already maximally unpredictable.

When all symbols are equally likely (uniform distribution), entropy H = log₂(n) for n symbols. A block code of length log₂(n) bits achieves exactly H. No code can do better.

When one symbol dominates (say, p(A) = 0.99, p(B) = 0.01), H is small — close to 0. A variable-length code can assign A a very short code, encoding the stream very efficiently.

The practical takeaway: large compression gains only exist when symbol probabilities differ substantially. If the source is already 'uniform' (random-looking), Huffman coding gains nothing.

Two sources: Source X has p = [0.5, 0.5] (two equiprobable symbols). Source Y has p = [0.99, 0.01] (one dominant symbol). Compute H for each. What does this tell you about the compression potential of each source?