Value Space vs Measurement Space
Model the world as two spaces:
Value space V: the set of states of the world with respect to what you actually care about. Points in V represent different levels of the true underlying quantity (student learning, military progress, economic wellbeing).
Measurement space M: the set of values the metric can take. A metric is a function f: V → M — a mapping from value space to measurement space.
A valid metric is one where f is close to an isometry in the relevant region: equal changes in M correspond to equal changes in V. Nearby points in M correspond to nearby points in V.
A distorted metric is one where f is non-isometric: the metric compresses some regions of V (making large changes invisible) and expands others (making small changes appear large). The IQ calibration is a designed distortion: it maps the raw score distribution to a Gaussian in M, regardless of the true distribution of intelligence in V.
Goodhart's law in mapping terms: when M becomes a target, agents apply gradient ascent in M. Because f is a distortion, gradient ascent in M does not correspond to gradient ascent in V. The agent moves in M without moving (or moving backward) in V.
Testing Metric Validity
A company evaluates employee performance on a 1-5 star scale. The scale is calibrated so that 80% of employees receive 3 or higher. The performance review system is used for both compensation decisions (where rank-order matters) and improvement plans (where absolute level matters).
Gradient Ascent in the Wrong Space
Model the optimization problem geometrically. Let V = value space (true student learning, military progress, etc.) and M = metric space (test scores, body counts, etc.).
The gradient of true value: ∇_V(value) points in the direction in V that increases the underlying quantity you care about.
The gradient of the metric: ∇_M(metric) points in the direction in M that increases the metric.
Because f: V → M is not an isometry, the gradient of the metric in value space (f(∇_M)) is not aligned with ∇_V. The angle between them, θ = arccos(∇_V · f(∇_M) / (|∇_V| |f*(∇_M)|)), measures the severity of the Goodhart failure.
If θ = 0: the metric gradient and value gradient point the same direction. Optimizing the metric optimizes value. No Goodhart corruption.
If θ = 90°: the metric gradient is orthogonal to value. Optimizing the metric moves in M without moving in V at all.
If θ = 180°: the metric gradient points opposite to value. Optimizing the metric actively degrades value.
When the metric becomes a target and agents apply gradient ascent on the metric, they follow f*(∇_M), not ∇_V. The divergence angle θ grows over time as the metric is gamed — the mapping f becomes less isometric as agents find the regions where ∇_M and ∇_V diverge most, because those are the most efficient paths for gaming.
Measuring the Divergence
Consider a simple two-dimensional value space V = (skill, compliance) where skill = student's actual understanding, compliance = student's ability to follow test-taking procedures.
A test metric M = 0.3 × skill + 0.7 × compliance (a specific linear combination, where compliance has 70% weight).
Multi-Objective Optimization as Defense Against Goodhart
Hamming's defense: use multiple metrics simultaneously. The geometric interpretation: instead of maximizing a single objective function f(x), optimize over a vector of objectives F(x) = (f₁(x), f₂(x), ..., fₖ(x)).
For a vector objective, the solution concept is the Pareto frontier: the set of solutions where no objective can be improved without degrading another. The Pareto frontier replaces the single optimum.
Why this defends against Goodhart: to game the metrics, a rational agent must find a direction in value space that increases all fᵢ simultaneously (or at least the metrics they are being judged on). If the metrics are sufficiently independent — their gradient directions are sufficiently non-parallel — there is no such direction. Gaming one metric degrades another.
The degree of defense: if the k metric gradients span the k-dimensional space (are linearly independent), then optimizing any proper subset of metrics degrades at least one excluded metric. Full Pareto defense requires that no gaming direction exists that improves all metrics.
Measurement invariance: a metric M is invariant with respect to irrelevant attribute α if M(x + δα) = M(x) for changes δ in α. The IQ metric is not invariant with respect to test-taking practice: IQ changes when students practice the test without genuine gains in the underlying construct.
Design a Pareto-Defended Metric System
Consider evaluating a research scientist on a two-metric system: M₁ = publications per year, M₂ = citation rate per paper (citations per paper).