un — Hamming Ch 27: Unreliable Data

un

guest

1 / ?

back to lessons

The Statistician's Briefcase

Hamming opens Chapter 27 with a story. A statistician friend at Bell Labs suspected that measurements in a study were inaccurate. He argued with the department head, who refused remeasurement — 'the instruments have brass labels on them saying they were that accurate, and my people are reliable.'

On Monday, the statistician arrived and said he had left his briefcase on the train and lost all his data. There was nothing to do but remeasure. When the new measurements arrived, the statistician produced the original records — showing how far off they had been. He was not popular for the move, but the inaccuracy was now undeniable.

Hamming draws a harder lesson from another case: a study of phone call patterns, being recorded by the same central office equipment that placed the calls. One day the statistician noticed a call billed to a non-existent central office. Looking further, he found a large percentage of calls were connecting — for some minutes — to non-existent offices. The machine was generating bad data about its own operation. You cannot trust a machine to gather data about itself correctly.

His third example: his brother at the Los Angeles Air Pollution department, who found it necessary to disassemble, reassemble, and recalibrate every new instrument they received, regardless of the manufacturer's claims.

Hamming's rule: always examine data carefully before processing it. Plot it. Look for patterns that should not be there. Check for inconsistencies. No matter how urgent the answer, pretest the data first.

Random Error, Systematic Error & the Calibration Chain

Pre-Testing Data

Hamming's inventory study: he received 18 months of inventory records for ~100 items and naively believed the supplier's assurance that inconsistencies had been removed. Late in the project, he found residual inconsistencies — entries that could not have occurred without error (e.g., withdrawals from empty inventory).

He concluded: 'I had first to find them, then eliminate them, and then run the data all over again. From that experience I learned never to process any data until I had first examined it carefully for errors.'

Describe three specific consistency checks you would apply to a new dataset before trusting it for analysis. For each check, explain what type of error it would catch — and why that type of error might exist in the data despite the supplier's assurances.

Two Kinds of Error

Every physical measurement carries two types of error:

Random error: unpredictable variation around the true value. It follows a distribution (often approximately Gaussian) centered on the true value. Random errors cancel with averaging: take enough measurements and the mean approaches the true value.

Systematic error (bias): a consistent offset in one direction. All your measurements are shifted by the same amount. No amount of averaging removes it, because the mean of many biased measurements is still biased.

Hamming's example from physics: a table of the 10 fundamental constants (speed of light, Avogadro's number, charge of the electron, etc.) was compiled, and then recompiled 24 years later with improved instruments. On average, the new values fell 5.267 times outside the old stated error bars. This is not plausible from random error alone — random errors this large would be detectable. The explanation: the old instruments had systematic errors not captured in the stated uncertainty, and the techniques themselves had a shared flaw passed through the community.

Shannon's remark: 'Calibration is the most important thing in measurement.' Calibration addresses systematic error. If your instrument is consistently reading 3% too high, no amount of repeated measurement fixes that — you must calibrate.

Identifying Systematic Error

The Hubble constant: the rate at which the universe expands, measured from the redshift-distance relationship of galaxies. Multiple independent groups have measured it over the past 50 years. Historically, many of the published values fell outside the error bars of other published values — meaning the disagreements were larger than the stated uncertainties predicted.

Explain why independent measurements of the Hubble constant could each have small stated random errors but still disagree by amounts larger than those errors. What type of error causes this pattern, and how would you distinguish it from random error experimentally?

How Do You Test What You Cannot Test?

Hamming poses a problem with no clean solution, but which every practicing engineer eventually faces: How do you test a device for reliability when the testing itself takes longer than you have, and your test equipment is less reliable than the device you are testing?

The scenario: a device must last 20 years in the field (175,000 hours). Your life-test laboratory is rated for 10,000 hours of operation. Your test period budget is 3 months (about 2,000 hours). The device is expected to face operating temperatures of up to 85°C in the field.

Accelerated testing: run the device at 105°C and assume failures occur 10× faster than at 85°C (a common engineering rule of thumb). Then 2,000 hours at 105°C 'represents' 20,000 hours at 85°C. But does it?

The problem: the failure mode at 105°C may be different from the failure mode at 85°C. If solder joints fail by thermal fatigue at 85°C but by oxidation at 105°C, the accelerated test tells you nothing useful about field lifetime.

Shannon's advice applies: calibration — understanding what your measurement actually measures — is the critical step. Accelerated testing calibrates temperature against failure rate only if the failure mode is the same. Verifying this requires a separate study.

Design a Life Test

You are a reliability engineer for a medical device implanted in the human body. It must last 10 years (87,600 hours). Your laboratory budget allows for 6 months of testing (4,380 hours). The device operates at body temperature (37°C).

What is the fundamental problem with simply running accelerated tests at 50°C or 60°C and extrapolating to predict 10-year reliability? Describe at least two specific failure modes the accelerated test might miss or mischaracterize, and explain what additional evidence you would gather to validate the extrapolation.