un

guest
1 / ?
back to lessons

Value Space vs Measurement Space

Model the world as two spaces:

Value space V: the set of states of the world with respect to what you actually care about. Points in V represent different levels of the true underlying quantity (student learning, military progress, economic wellbeing).

Measurement space M: the set of values the metric can take. A metric is a function f: V → M — a mapping from value space to measurement space.

A valid metric is one where f is close to an isometry in the relevant region: equal changes in M correspond to equal changes in V. Nearby points in M correspond to nearby points in V.

A distorted metric is one where f is non-isometric: the metric compresses some regions of V (making large changes invisible) and expands others (making small changes appear large). The IQ calibration is a designed distortion: it maps the raw score distribution to a Gaussian in M, regardless of the true distribution of intelligence in V.

Goodhart's law in mapping terms: when M becomes a target, agents apply gradient ascent in M. Because f is a distortion, gradient ascent in M does not correspond to gradient ascent in V. The agent moves in M without moving (or moving backward) in V.

Metric Distortion: Value Space vs Measurement Space

Testing Metric Validity

A company evaluates employee performance on a 1-5 star scale. The scale is calibrated so that 80% of employees receive 3 or higher. The performance review system is used for both compensation decisions (where rank-order matters) and improvement plans (where absolute level matters).

Is this metric closer to an isometric mapping or a distorted mapping of true performance? Explain using the concepts of compression and expansion. Then: for which use case (compensation or improvement plans) does the distortion matter more, and why?

Gradient Ascent in the Wrong Space

Model the optimization problem geometrically. Let V = value space (true student learning, military progress, etc.) and M = metric space (test scores, body counts, etc.).

The gradient of true value: ∇_V(value) points in the direction in V that increases the underlying quantity you care about.

The gradient of the metric: ∇_M(metric) points in the direction in M that increases the metric.

Because f: V → M is not an isometry, the gradient of the metric in value space (f(∇_M)) is not aligned with ∇_V. The angle between them, θ = arccos(∇_V · f(∇_M) / (|∇_V| |f*(∇_M)|)), measures the severity of the Goodhart failure.

If θ = 0: the metric gradient and value gradient point the same direction. Optimizing the metric optimizes value. No Goodhart corruption.

If θ = 90°: the metric gradient is orthogonal to value. Optimizing the metric moves in M without moving in V at all.

If θ = 180°: the metric gradient points opposite to value. Optimizing the metric actively degrades value.

When the metric becomes a target and agents apply gradient ascent on the metric, they follow f*(∇_M), not ∇_V. The divergence angle θ grows over time as the metric is gamed — the mapping f becomes less isometric as agents find the regions where ∇_M and ∇_V diverge most, because those are the most efficient paths for gaming.

Measuring the Divergence

Consider a simple two-dimensional value space V = (skill, compliance) where skill = student's actual understanding, compliance = student's ability to follow test-taking procedures.

A test metric M = 0.3 × skill + 0.7 × compliance (a specific linear combination, where compliance has 70% weight).

In this 2D model, the gradient of the metric is the vector (0.3, 0.7) in (skill, compliance) space. A student optimizes the metric by improving compliance only (moving in the (0, 1) direction in value space). Calculate the cosine of the angle between the metric gradient and the pure-skill direction (1, 0). Explain: is the metric gradient well-aligned with 'increasing skill' (θ small) or poorly aligned (θ large)? What does this predict about what happens when students optimize for this metric?

Multi-Objective Optimization as Defense Against Goodhart

Hamming's defense: use multiple metrics simultaneously. The geometric interpretation: instead of maximizing a single objective function f(x), optimize over a vector of objectives F(x) = (f₁(x), f₂(x), ..., fₖ(x)).

For a vector objective, the solution concept is the Pareto frontier: the set of solutions where no objective can be improved without degrading another. The Pareto frontier replaces the single optimum.

Why this defends against Goodhart: to game the metrics, a rational agent must find a direction in value space that increases all fᵢ simultaneously (or at least the metrics they are being judged on). If the metrics are sufficiently independent — their gradient directions are sufficiently non-parallel — there is no such direction. Gaming one metric degrades another.

The degree of defense: if the k metric gradients span the k-dimensional space (are linearly independent), then optimizing any proper subset of metrics degrades at least one excluded metric. Full Pareto defense requires that no gaming direction exists that improves all metrics.

Measurement invariance: a metric M is invariant with respect to irrelevant attribute α if M(x + δα) = M(x) for changes δ in α. The IQ metric is not invariant with respect to test-taking practice: IQ changes when students practice the test without genuine gains in the underlying construct.

Design a Pareto-Defended Metric System

Consider evaluating a research scientist on a two-metric system: M₁ = publications per year, M₂ = citation rate per paper (citations per paper).

Explain geometrically why these two metrics together are harder to game than either metric alone. Specifically: describe a strategy for maximizing M₁ alone, a strategy for maximizing M₂ alone, and then show that each of those strategies degrades the other metric. Then: is there any residual gaming strategy that increases both simultaneously without producing genuine research value, and if so, what is it?