Practice & the Power Law
Across a remarkable range of skills — typing, reading, solving arithmetic problems, assembling equipment — performance improves according to a power law:
y = a · x^(−b)
where y = errors per trial (or time per trial), x = cumulative practice trials, a = initial performance level, b = learning rate exponent (b > 0 for improvement).
The power law has a clean property: in log-log space, it becomes a straight line.
ln y = ln a − b · ln x
Slope of the line in log-log space: −b. Steeper slope = faster learning. The same exponent b describes the learning rate regardless of the initial performance level a.
Why log-log? Early practice produces large gains; later practice produces diminishing returns. A linear plot shows a dramatic initial drop then a flat tail. Log-log reveals the self-similar structure: each doubling of practice reduces errors by the same fraction 2^(−b).
Computing Learning Rate
If a learner makes 100 errors on trial 1 and 50 errors on trial 8, what is b?
y₁ = a · 1^(−b) = a = 100
y₈ = a · 8^(−b) = 100 · 8^(−b) = 50
8^(−b) = 0.5 → −b · ln(8) = ln(0.5) = −0.693 → b = 0.693 / ln(8) = 0.693 / 2.079 ≈ 0.333
Ebbinghaus & Exponential Forgetting
Hermann Ebbinghaus (1885) measured his own retention of nonsense syllables over time and found that retention follows an exponential decay:
r(t) = e^(−t/S)
where r(t) = fraction retained at time t, S = memory strength (increases with each review). At t = 0: r = 1 (100% retained). At t = S: r = 1/e ≈ 37%.
The spacing effect: reviewing material at the moment of near-forgetting (when r ≈ 0.8 or lower) produces a larger increase in S than reviewing immediately after learning.
Optimal review timing: if S grows by a fixed factor k with each review, the optimal intervals form a geometric sequence. After learning with S₀, review at times S₀, k·S₀, k²·S₀, .... Each interval is k times longer than the previous.
Typical k values from empirical data: 2.0–2.5. A student who reviews at days 1, 2, 4, 8, 16 follows this geometric spacing pattern.
Computing Optimal Review Intervals
A student learns material with initial memory strength S₀ = 2 days. Each review multiplies S by k = 2.5. The student reviews just before retention drops to 80% (r ≥ 0.80 threshold).
At the threshold: e^(−t/S) = 0.80, so t = −S · ln(0.80) ≈ S · 0.223.
The Curriculum as a Graph
A branching program defines a directed graph G = (V, E) where:
- Vertices V: instructional nodes (content blocks, questions, feedback)
- Edges E: transitions labeled by student response classifications (correct, partial, incorrect, clarification)
Each student traces a path through G from an entry vertex to an exit vertex. The path depends entirely on which edges activate at each step.
Properties the graph structure determines:
1. Reachability: can every vertex be reached from the entry? An unreachable vertex is dead content — the student can never see it.
2. Cycle detection: does the graph contain cycles? A cycle means a student can loop indefinitely. Adaptive programs use cycles deliberately (retry loops) but must guarantee eventual exit (a max-attempts edge that forces progress).
3. Path length distribution: how many steps does the typical student take? A good branching program lets advanced students take short paths; struggling students take longer remedial paths.
Analyzing a Branching Program's Properties
Consider a branching program with 5 question nodes (Q1–Q5) and 3 remedial nodes (R1–R3). An advanced student path: Q1 → Q2 → Q3 → Q4 → Q5. A struggling student path: Q1 → R1 → Q1 → Q2 → R2 → Q2 → Q3 → Q4 → Q5.
The graph guarantees progress via max-attempts edges: after 3 failed attempts at any Qn, the student advances to Qn+1 regardless of performance.