un — Geometry of CS Algorithms

un

guest

1 / ?

back to lessons

Every Complexity Class Draws a Curve

Plot Cost Against Input Size

Put input size N on the x-axis. Put cost (operations, time) on the y-axis. Each complexity class produces a distinct curve. You can recognize an algorithm's growth rate from the shape of its performance curve.

Complexity Growth Shapes

O(1) — flat horizontal line. The function f(N) = 1. No slope. Cost never changes regardless of N. Hash table lookup: whether the table holds 10 or 10,000,000 elements, one hash computation jumps to the right bucket.

O(log N) — gentle upward curve, slope decreasing. At N = 1,000,000: cost ≈ 20 operations. The curve rises steeply at small N, then flattens. Each doubling of N adds exactly one unit of cost.

O(N) — diagonal straight line. Slope = 1 (in the asymptotic sense). Cost grows at the same rate as N. If N triples, cost triples.

O(N log N) — steeper diagonal with slight upward curvature. Sits above O(N) but below O(N²). The log factor adds a slowly growing multiplier to the linear slope.

O(N²) — parabola opening upward. Slope increases as N grows. At N = 10: cost = 100. At N = 100: cost = 10,000. At N = 1,000: cost = 1,000,000.

The Growing Gap

The ratio O(N²) / O(N) = N. The gap between the parabola and the diagonal widens as N grows. At N = 10: 10× gap. At N = 100: 100× gap. At N = 1,000: 1,000× gap. At N = 50,000: 50,000× gap. The gap equals N — always.

Calculating the Scale Gap

A large dependency graph contains 50,000 packages (N = 50,000). One algorithm runs in O(N) time. A second runs in O(N²). At this N, the ratio O(N²)/O(N) = N = 50,000.

If the O(N) algorithm takes 10 seconds at N = 50,000, how long does the O(N²) algorithm take? Express your answer in hours. Show the calculation.

Bisecting a Line Segment

Binary Search as Repeated Bisection

A sorted array of N elements forms a line segment of length N. Binary search repeatedly bisects this segment:

1. Point to the midpoint of the remaining segment.

2. If target < midpoint: the right half disappears. Recurse on the left half.

3. If target > midpoint: the left half disappears. Recurse on the right half.

4. If target = midpoint: done.

Binary Search Halving

After k bisections, the remaining segment has length N / 2^k. The search ends when that length equals 1:

N / 2^k = 1 → 2^k = N → k = log₂N

So binary search on N elements requires at most log₂N comparisons.

Halving & Doubling: Two Sides of the Same Curve

Halving and doubling are geometric inverses. The exponential curve 2^k and the logarithmic curve log₂N are reflections of each other across the line y = x.

Consider paper folding: a sheet starts at 0.1 mm thick. Each fold doubles thickness. After 42 folds: 0.1 mm × 2^42 ≈ 439,804 km — past the moon (384,400 km). Binary search runs the inverse: start at N elements, each step halves the count, reach 1 element in log₂N steps.

The geometry: bisection is a self-similar operation. Each bisection produces two halves that look identical in structure to the whole. Recursion and logarithms share this self-similarity.

Comparisons & Doublings

A sorted array contains 1,048,576 elements. Note: 1,048,576 = 2^20.

Binary search finds the target in at most how many comparisons? Show the logarithm calculation. Then describe what changes geometrically if the array doubles to 2,097,152 elements (2^21).

A Hash Function as a Geometric Map

Hash Function as Function

A hash table maps a key to a bucket using a hash function:

bucket = hash(key) mod table_size

This is a function in the strict mathematical sense: it maps a domain (all possible keys) to a range (bucket indices 0 to table_size − 1). The geometric picture: keys occupy one space; buckets occupy another. The hash function projects keys onto bucket indices.

Hash Table Geometry

O(1) lookup — direct jump, no search. The hash function computes the bucket index in constant time. Jump directly to that bucket. No traversal, no comparison loop.

Load factor. At load factor 0.75, 75% of buckets contain at least one element. Above ~0.9, collisions increase: two keys hash to the same bucket, forming a chain of elements inside that bucket. Lookup in a long chain degrades to O(N) in the worst case.

Distribution Quality as Geometry

A well-distributed hash function spreads keys uniformly across all buckets. Geometrically: the function's range covers its codomain evenly. Every bucket receives approximately N / table_size keys.

A poor hash function clusters keys into a few buckets. Geometrically: the function's range collapses to a subset of the codomain — most keys map to the same few points. The remaining buckets sit empty.

Connection to MOAD-0001

Replacing a list with a set fixes MOAD-0001 because a set uses a hash table internally. O(N) list scan → O(1) hash table lookup. Geometrically: linear traversal along a sequence transforms into a direct projection via a function. The sequence disappears; the function replaces it.

Analyzing a Poorly Distributed Hash

A hash table has 1,000 buckets and 900 elements (load factor 0.9). A poor hash function sends 500 of those elements to the same bucket.

Analyze the performance impact: (a) What is the average lookup time for elements in the crowded bucket? (b) What is the average lookup time across all 900 elements? (c) How does this explain why choosing a good hash function matters as much as choosing a hash table?