un

guest
1 / ?
back to lessons

Grammar as Graph Structure

A compiler translates source code by building a parse tree — a rooted tree where each node represents a grammar rule applied, & each leaf represents a terminal symbol (variable name, literal, operator).

Tree Geometry

A tree with n nodes & n−1 edges. Depth d: the maximum distance from root to any leaf. For a balanced binary tree of depth d: up to 2^d leaves & 2^(d+1)−1 total nodes.

Parse trees for typical expressions are not balanced: operator precedence creates right-skewed or left-skewed trees. A + B C produces a tree where is deeper than +, encoding that * binds more tightly.

BNF & ALGOL

Backus-Naur Form (BNF), invented to specify ALGOL, formalizes grammar as production rules:

`` <expr> ::= <term> | <expr> + <term> <term> ::= <factor> | <term> * <factor> <factor> ::= id | (<expr>) ``

Each production rule defines one level of the tree. The grammar is a directed graph on non-terminal symbols; parsing a sentence traces a path through that graph from start symbol to leaves.

Software Geometry: Complexity & Redundancy

Parse Tree Depth & Expression Complexity

For the expression (A + B) * (C + D), the parse tree has a specific structure under standard operator precedence.

The depth of a parse tree affects compiler performance: deeper trees require more stack frames during recursive descent parsing.

Draw or describe the parse tree for `(A + B) * (C + D)` using the grammar above. How many internal nodes does it have? What is the tree depth (root = depth 0)? Then: a recursive-descent parser uses O(depth) stack space. For a fully parenthesized expression of n operators (each at depth proportional to n), what is the Big-O stack space?

Shannon Entropy & Redundancy

Hamming noted that spoken language is ~60% redundant, written language ~40%. This has a precise information-theoretic meaning.

Shannon Entropy

For a source generating symbols from alphabet A with probabilities {p₁, p₂, ..., pₙ}:

H = −Σ pᵢ log₂(pᵢ) (bits per symbol)

Maximum entropy: uniform distribution (all symbols equally likely). H_max = log₂(n) for n symbols.

Redundancy R: fraction of bits that carry no information (could be removed without losing content):

R = 1 − H / H_max

If R = 0.4 (40% redundant): 40% of bits can be predicted from context. A channel carrying English text at 8 bits/character is using only 60% of its capacity for information; 40% is structure the receiver already knows.

Error detection uses redundancy: if 40% of bits are predictable, a transmission error may produce a sequence that violates the redundancy structure — detectable even without error-correcting codes.

APL's near-zero redundancy: a single character change is almost never detectable from context alone. English's 60% redundancy: you can often reconstruct a word from surrounding context even if a letter is missing or wrong.

Computing Redundancy

English letter frequency (approximate): 'e' appears ~12.7% of the time; 'z' appears ~0.07%. The actual entropy of English is approximately H ≈ 1.0 bit/character (accounting for word-level context). Maximum entropy for 26-letter alphabet: H_max = log₂(26) ≈ 4.70 bits/character.

Compute redundancy R = 1 − H/H_max for English using H = 1.0 bit/char and H_max = log₂(26). Then compute R for a language with H = 3.5 bits/char and the same 26-symbol alphabet. Which language has more error-detection capacity, and by what factor?

Growth Curves as Geometry

Compiler algorithms process programs of size n (tokens, lines, or nodes). The choice of algorithm determines how compile time scales with program size.

Common Complexity Classes

| Class | Runtime | Example | |---|---|---| | O(n) | linear | lexical scanning | | O(n log n) | quasi-linear | optimal sorting | | O(n²) | quadratic | naive duplicate detection | | O(n³) | cubic | matrix multiply, CYK parsing | | O(2ⁿ) | exponential | naive theorem proving |

The geometric picture: plot runtime vs n. O(n) is a line. O(n²) is a parabola. O(2ⁿ) is an exponential curve that looks flat near n=0 and nearly vertical near n=50.

General context-free grammar parsing (CYK algorithm) runs in O(n³). For most practical programming languages, the grammar is designed to be LR(1)-parseable, allowing O(n) parsing. ALGOL's grammar was more complex than FORTRAN's, contributing to slower compilers — a practical friction that mattered in 1958 when compiling took hours.

Crossover Points

A naive symbol-table lookup uses O(n²) total operations for a program of n identifiers (linear scan for each of n lookups). A hash-table symbol table uses O(n) expected total operations.

Suppose: O(n²) algorithm runs at 10⁶ operations/sec on 1958 hardware. O(n) algorithm runs at the same speed but requires 100n setup operations (hash table construction).

For a program with n = 1000 identifiers: compute total time for both algorithms (in seconds). At what value of n do the two algorithms take equal time? Show the algebra.