un

guest
1 / ?
back to lessons

What the Course Covered & What It Skipped

Hamming's course covered: analog-to-digital conversion, error correction, simulation, statistics, numerical analysis, & n-dimensional geometry. He thought computationally — signal processing, coding theory, digital filtering all require computational reasoning.

He never systematically taught algorithmic complexity.

Algorithmic Complexity Growth Curves: O(1) flat, O(log N) gentle, O(N) diagonal, O(N00b2) parabola

Why the Gap Existed

Hamming's mental model formed in an era when hardware bottlenecks dominated. The question: how many transistors per chip? The algorithm ran on the hardware available. At N=100, an O(N²) algorithm costs 10,000 operations. At N=1,000, it costs 1,000,000. At N=10,000 (routine today in dependency graphs, social networks, & package managers): it costs 100,000,000. The difference between O(N) & O(N²) was invisible at the scales Hamming worked with, & catastrophic at the scales his students would inhabit.

Donald Knuth published The Art of Computer Programming starting in 1968 — the same year Hamming was at full research productivity. Big O notation became standard algorithmic vocabulary in the 1970s & 1980s. Hamming's course dates to 1995, but his mental model of computation formed earlier. He never updated this chapter.

A Fundamental by Hamming's Own Test

Hamming's test for a fundamental: (1) it has lasted a long time, (2) the rest of the field can be derived from it using standard methods.

Big O passes both tests. Growth-rate analysis has lasted since Knuth. From it, you derive algorithm selection, data structure choice, & performance prediction — most of practical computer science flows from asking 'how does cost grow as N grows?' Hamming missed the most durable fundamental of his own field.

Big O as a Fundamental

Hamming taught that fundamentals outlast specific technologies. He used calculus as an example: invented once, applicable across physics, engineering, economics, & biology for centuries. Peripheral tools come & go; fundamentals compound.

Hamming taught that fundamentals outlast specific technologies. Does algorithmic complexity count as a fundamental by his own test? Apply both of his criteria: (1) longevity, & (2) derivability — can the rest of the field be derived from it? Argue your position concretely.

How Cost Grows

Big O measures how cost grows as input grows. Not the constant factor — the rate. Two algorithms that both run in 'a few milliseconds' at N=100 may diverge by a factor of 10,000 at N=10,000, if one runs in O(N) time & the other in O(N²).

The Common Complexity Classes

O(1) — Constant. Same cost regardless of N. Examples: hash table lookup by key, array access by index, stack push/pop. Doubling N changes nothing.

O(1) growth curve: flat horizontal line

O(log N) — Logarithmic. Each step halves the remaining work. Examples: binary search in a sorted array, BVH spatial query in a game engine, balanced binary tree operations. At N=1,000,000: log₂(1,000,000) ≈ 20 steps.

O(log N) growth curve: rapid rise then flattening

O(N) — Linear. Cost grows with input. Examples: linear scan of a list, reading a file, counting items in a collection. At N=10,000: 10,000 operations.

O(N) growth curve: straight diagonal line

O(N log N) — Linearithmic. Nearly linear, slightly worse. Examples: merge sort, efficient shortest-path algorithms (Dijkstra with a heap). At N=10,000: ~133,000 operations.

O(N log N) growth curve: slightly steeper than linear

O(N²) — Quadratic. Cost explodes. Examples: list.contains() called inside a loop over the same list, bubble sort, naive all-pairs comparison. At N=1,000: 1,000,000 operations. At N=10,000: 100,000,000 operations.

O(N²) growth curve: parabolic explosion

O(2^N) — Exponential. Unusable at any meaningful scale. Examples: brute-force combinatorics, finding all subsets. At N=30: over 1,000,000,000 operations.

O(2^N) growth curve: exponential explosion

The Scale That Makes It Visible

N=10 comparisons:
  O(N):   10
  O(N²):  100
  ratio:  10x

N=1,000 comparisons:
  O(N):   1,000
  O(N²):  1,000,000
  ratio:  1,000x

N=10,000 comparisons:
  O(N):   10,000
  O(N²):  100,000,000
  ratio:  10,000x

At N=10, the difference barely registers. At N=10,000, the O(N²) algorithm runs 10,000 times slower. This is why MOAD-0001 stayed invisible for decades: the graphs it ran on in 1993 had 50 nodes. By 2020, the same code ran on dependency graphs with 50,000 nodes.

Classify Three Operations

Apply the complexity classes to concrete operations. The key question for each: how many operations does it take as N grows?

For each operation below, give the Big O complexity & explain in one sentence why: (a) Looking up a word in a dictionary by page number (b) Searching every page of a dictionary for a word (c) Sorting a shuffled deck of cards by trying every possible ordering Write one sentence of explanation for each.

Correct Code, Wrong Growth Curve

Correct code that runs in O(N²) time produces identical results to O(N) code. The tests pass. Output matches. No exception fires. The defect hides in the growth curve.

This property makes O(N²) defects sedimentary: they calcify inside working code & only become visible when N grows past a threshold. The code did not change. The world around it changed.

MOAD-0001: The Sedimentary Defect

The most widespread instance: visited = [] inside a graph traversal loop.

visited = []
for node in graph:
    if node not in visited:  # O(N) scan each call
        visited.append(node)
        process(node)

Each not in visited call scans 0 to len(visited)-1 items. N calls × N/2 average scanned items = O(N²) total. The fix: one line.

visited = set()  # O(1) membership test
for node in graph:
    if node not in visited:  # O(1) hash lookup
        visited.add(node)
        process(node)

Same behavior. Same output. Same tests passing. The growth rate changes from O(N²) to O(N). At N=1,000: 1,000× faster. At N=10,000: 10,000× faster.

Why Hamming's Era Seeded This

In early Java & C, ArrayList (dynamic array) was the default sequential container. Sets existed but were not idiomatic — developers reached for what was familiar. A 1993 graph traversal at N=50 ran in microseconds with visited = []. That code entered version control, got copied, got wrapped in libraries, got shipped in package managers. By 2020, the same pattern ran on dependency graphs with 50,000 nodes.

The code was correct in 1993. It became expensive when the world around it changed. Hamming's era seeded this class of defect by never naming the growth-rate question.

Diagnose & Fix

Apply the diagnosis: find the O(N²) pattern, identify the fix.

You find this code in a production codebase: `if item not in visited_list: visited_list.append(item)` inside a loop over 50,000 items. How many operations does the membership test perform on average across the full loop, & what would you replace `visited_list` with to fix it?

What Naming Changes

Before Big O had a name, programmers noticed 'this runs slow.' They profiled. They rewrote. But they could not teach the pattern — they could only point at instances. A pattern without a name cannot propagate.

After Big O had a name, a senior engineer could say 'this is O(N²), replace it with a set' & a junior engineer could understand, apply, & teach the pattern in turn. The name made the pattern a transferable unit of knowledge.

Hamming knew this dynamic in other domains. He described how naming 'spaghetti code' in the 1960s allowed the community to recognize, discuss, & eventually eradicate a class of defect that everyone had experienced but no one had named. The same mechanism applies to complexity classes.

Unhamming adds the vocabulary Hamming omitted: O(1), O(log N), O(N), O(N log N), O(N²), O(2^N). These names let you see the growth curve in code you read, predict performance at scale, & communicate fixes precisely. They satisfy Hamming's own test for a fundamental: durable, & generative of most of the rest of the field.

From Number Theory to Computer Science

Big O notation did not originate in computer science. It originated in pure mathematics — specifically in number theory — 74 years before Donald Knuth adapted it for algorithms.

Paul Bachmann, 1894

Paul Bachmann, a German mathematician, introduced the O notation in his 1894 book Die Analytische Zahlentheorie (Analytic Number Theory). He needed a compact way to express that one quantity grows no faster than another, up to a constant factor. Writing 'f(n) = O(g(n))' said: f grows at most as fast as g. This let number theorists reason about error terms in approximations without tracking exact constants.

Bachmann was not thinking about loops, data structures, or execution time. He was thinking about how prime numbers distribute — specifically about the error terms in the Prime Number Theorem.

Edmund Landau, 1909

Edmund Landau, another German mathematician, popularized the notation in his 1909 Handbuch der Lehre von der Verteilung der Primzahlen (Handbook on the Distribution of Prime Numbers). Landau also introduced the related notations o (little-o) and used the family of asymptotic symbols so extensively that the family became known as Bachmann-Landau notation or simply Landau notation.

For six decades, O notation lived entirely in mathematics. No programmer thought about it.

Donald Knuth, 1968 & 1976

Donald Knuth translated the notation into computer science. In The Art of Computer Programming (Vol. 1, 1968), Knuth used O notation to describe the running time of algorithms — a direct transplant of Bachmann's tool into a new domain. He was the first to apply it systematically to algorithm analysis.

In 1976, Knuth published a short paper in SIGACT News titled 'Big Omicron and Big Omega and Big Theta.' He introduced Omega (Omega) for lower bounds and Theta for tight bounds, completing the three-symbol vocabulary that computer science uses today. He also argued for standardizing the use of these symbols across the field — an act of deliberate standardization that accelerated adoption.

By the early 1980s, Big O appeared in every algorithms textbook. By the 1990s, it was standard interview vocabulary.

A 74-Year Journey

1894 — Bachmann introduces O in number theory
1909 — Landau popularizes O, o, and the full notation family
1953 — Hamming begins research at Bell Labs (never learns Big O as CS tool)
1968 — Knuth applies O to algorithm analysis in TAOCP Vol. 1
1976 — Knuth adds Omega and Theta; argues for standardization
1980s — Big O enters all CS curricula
1993 — MOAD-0001 sediment layer forms in production code
1995 — Hamming teaches final version of his course; never covers Big O
2026 — MOAD-0001 found in 1,000+ open-source projects

Bachmann's tool spent 74 years in pure mathematics, visible to every mathematician but invisible to every programmer. Knuth saw the transplant. Hamming — working in the same era, collaborating with the same computing community — never made it part of his course.

Why Knuth's Standardization Mattered

Knuth's 1976 paper did something beyond introducing Omega and Theta. It argued that the field needed consistent notation, and that inconsistent notation was actively harmful. Different textbooks used O to mean different things — sometimes as an upper bound, sometimes as an approximation, sometimes without specifying whether constants mattered. Knuth cleaned this up.

This is Hamming's pattern-naming dynamic applied to notation itself. Before Knuth standardized the symbols, engineers could not communicate complexity claims precisely across textbooks, papers, or teams. After standardization, 'this runs in O(N log N)' carried exactly one meaning regardless of who said it.

Knuth also contributed the informal translation: 'O(f(n)) means the running time is at most a constant times f(n), for all sufficiently large n.' This interpretation — upper bound, up to a constant factor, for large inputs — became the standard intuition every programmer learns.

Bachmann invented the O notation for number theory in 1894. Knuth adopted it for algorithm analysis in 1968. What had to change — in computing, in scale, or in how programmers worked — for a pure mathematician's tool to become essential vocabulary for software engineers? Give at least two factors.