un

guest
1 / ?
back to lessons

Practice & the Power Law

Across a remarkable range of skills — typing, reading, solving arithmetic problems, assembling equipment — performance improves according to a power law:

y = a · x^(−b)

where y = errors per trial (or time per trial), x = cumulative practice trials, a = initial performance level, b = learning rate exponent (b > 0 for improvement).

The power law has a clean property: in log-log space, it becomes a straight line.

ln y = ln a − b · ln x

Slope of the line in log-log space: −b. Steeper slope = faster learning. The same exponent b describes the learning rate regardless of the initial performance level a.

Learning Curve & the Spacing Effect

Why log-log? Early practice produces large gains; later practice produces diminishing returns. A linear plot shows a dramatic initial drop then a flat tail. Log-log reveals the self-similar structure: each doubling of practice reduces errors by the same fraction 2^(−b).

Computing Learning Rate

If a learner makes 100 errors on trial 1 and 50 errors on trial 8, what is b?

y₁ = a · 1^(−b) = a = 100

y₈ = a · 8^(−b) = 100 · 8^(−b) = 50

8^(−b) = 0.5 → −b · ln(8) = ln(0.5) = −0.693 → b = 0.693 / ln(8) = 0.693 / 2.079 ≈ 0.333

A typist makes 80 errors per 100 words on day 1 and 20 errors on day 16. Assuming a power law y = a · x^(−b), find b. Show the algebraic steps. Then predict the error rate on day 64.

Ebbinghaus & Exponential Forgetting

Hermann Ebbinghaus (1885) measured his own retention of nonsense syllables over time and found that retention follows an exponential decay:

r(t) = e^(−t/S)

where r(t) = fraction retained at time t, S = memory strength (increases with each review). At t = 0: r = 1 (100% retained). At t = S: r = 1/e ≈ 37%.

The spacing effect: reviewing material at the moment of near-forgetting (when r ≈ 0.8 or lower) produces a larger increase in S than reviewing immediately after learning.

Optimal review timing: if S grows by a fixed factor k with each review, the optimal intervals form a geometric sequence. After learning with S₀, review at times S₀, k·S₀, k²·S₀, .... Each interval is k times longer than the previous.

Typical k values from empirical data: 2.0–2.5. A student who reviews at days 1, 2, 4, 8, 16 follows this geometric spacing pattern.

Computing Optimal Review Intervals

A student learns material with initial memory strength S₀ = 2 days. Each review multiplies S by k = 2.5. The student reviews just before retention drops to 80% (r ≥ 0.80 threshold).

At the threshold: e^(−t/S) = 0.80, so t = −S · ln(0.80) ≈ S · 0.223.

Compute the first four review times using the spacing model above: S₀ = 2, k = 2.5, review at t = 0.223 · S_n after each review. Round to one decimal place. Then find the total calendar time elapsed at the fourth review.

The Curriculum as a Graph

A branching program defines a directed graph G = (V, E) where:

- Vertices V: instructional nodes (content blocks, questions, feedback)

- Edges E: transitions labeled by student response classifications (correct, partial, incorrect, clarification)

Each student traces a path through G from an entry vertex to an exit vertex. The path depends entirely on which edges activate at each step.

Properties the graph structure determines:

1. Reachability: can every vertex be reached from the entry? An unreachable vertex is dead content — the student can never see it.

2. Cycle detection: does the graph contain cycles? A cycle means a student can loop indefinitely. Adaptive programs use cycles deliberately (retry loops) but must guarantee eventual exit (a max-attempts edge that forces progress).

3. Path length distribution: how many steps does the typical student take? A good branching program lets advanced students take short paths; struggling students take longer remedial paths.

Geometry of Computer-Aided Instruction

Analyzing a Branching Program's Properties

Consider a branching program with 5 question nodes (Q1–Q5) and 3 remedial nodes (R1–R3). An advanced student path: Q1 → Q2 → Q3 → Q4 → Q5. A struggling student path: Q1 → R1 → Q1 → Q2 → R2 → Q2 → Q3 → Q4 → Q5.

The graph guarantees progress via max-attempts edges: after 3 failed attempts at any Qn, the student advances to Qn+1 regardless of performance.

In the branching program above, what graph property guarantees that every student eventually finishes the lesson, even if they answer every question incorrectly? Name the property, describe how it is implemented via max-attempts edges, and explain why a branching program without this property could trap a student permanently.