Regression to the Mean¶

Prime #: 439
Origin domain: Statistics & Experimental Design
Aliases: Regression Toward the Mean, RTM, Statistical Regression, Galton Regression
Related primes: Selection Bias, Confounding, Randomization, Sampling (Representativeness), Hypothesis Testing (Null vs. Alternative), Effect Size, Reproducibility & Replicability

Core Idea¶

Regression to the Mean recognizes that extremely high or low measurements tend to be followed by values closer to the average on subsequent measurements, often misattributed to an intervention or other external factor.

How would you explain it like I'm…

Lucky Streaks Don't Last

Regression to the mean is when something extreme gets less extreme the next time. If you have the best round of mini-golf you have ever played, your next round probably will not be as amazing, even if you do not do anything different. That is not because you got worse. It is because your big score had a little bit of lucky bounces in it, and luck does not stick around.

Extremes Drift Back to Normal

Regression to the mean is the rule that after an extreme measurement, the next measurement tends to be closer to average. Test scores, sports performances, and even how sick someone feels usually have a stable part and a random lucky-or-unlucky part. When you pick the highest or lowest cases, you've also picked the ones where luck swung hardest. Next time, the luck doesn't swing the same way, so the value drifts back toward normal, even if nothing was done.

Regression to the Mean

Regression to the mean is the principle that observations selected for being extreme on an initial measurement will, on a subsequent measurement, tend to be less extreme — closer to the overall average. The reason is that any extreme value usually combines a stable underlying component with a transient random one, and the random part is unlikely to be as extreme the second time. The effect is purely statistical, not causal. It's a major source of false 'improvement' stories: extreme conditions targeted for intervention often improve on their own, fooling people into crediting the intervention.

Regression to the mean is the principle that observations selected for having extreme values on an initial measurement will, on a subsequent measurement of the same or a related variable, tend to be less extreme — closer to the overall mean of the distribution — simply because the initial extreme value typically combined a stable underlying component with a transient random component, and the random component is unlikely to reach the same extreme again. The magnitude of the effect is proportional to (1 minus r), where r is the correlation between the two measurements, multiplied by the initial deviation from the mean: perfect correlation (r = 1) means no regression, zero correlation means complete regression to the population mean, and typical real-world correlations of 0.3 to 0.8 produce substantial but partial regression. The phenomenon was discovered by Francis Galton in 1886, who observed that tall parents tended to have children shorter than themselves and short parents tended to have taller children, and initially read this as a causal pull toward mediocrity before later work clarified it as a purely statistical consequence of imperfect correlation. The canonical mistake — Galton's fallacy — is to select cases on an extreme baseline (low-performing students, peak-symptom patients, slumping athletes), apply an intervention, observe the natural drift back toward average, and credit the intervention; the canonical defense is a control group selected on the same criteria so that the regression effect cancels out.

Broad Use¶

Education Interventions: Students scoring extremely poorly on a test typically improve next time just by natural variation, even without special tutoring.
Medicine & Health: Patients at peak symptoms might appear to improve simply because they had nowhere to go but "up," leading to illusions of treatment efficacy.
Sports & Performance: An athlete's outstanding performance season is often followed by a more average one, a phenomenon sometimes mislabeled as a "slump."
Business Analytics: Sales teams that had an exceptionally bad quarter might look better next quarter by random fluctuation, not necessarily from new strategies.

Clarity¶

Warns that any extreme observation or group tends to move closer to typical performance, so attributing that move to a particular cause can be erroneous if regression to the mean is not accounted for.

Manages Complexity¶

By recognizing natural fluctuations around a mean, one avoids over-interpreting random highs and lows as evidence of strong external influences or "miracle cures."

Abstract Reasoning¶

Demonstrates that repeated measurements converge toward average outcomes in the absence of consistent external causes, pointing to a general phenomenon in time-series or repeated sampling.

Knowledge Transfer¶

Policy Evaluation: If a region was chosen because it had unusually high crime one year, the next year's dip might not result from the new policy but simply from regression to the mean.
Coaching & Performance: After praising or scolding an athlete at a performance extreme, subsequent improvement or decline might be mostly statistical bounce-back.

Example¶

A car insurance company identifying "risky drivers" based on one extremely bad month sees many appear "safer" next month—some of that improvement is mere regression to the mean, not policy changes or driver training.

Relationships to Other Abstractions¶

Current abstraction Regression to the Mean Prime

Parents (2) — more general patterns this builds on

Regression to the Mean is a kind of Probability Prime

Regression to the mean is a kind of probability phenomenon in which extreme observations re-measure closer to the population mean due to transient noise.
Regression to the Mean presupposes Bias Prime

Regression to the mean presupposes bias because uncorrected use of extreme-selected observations yields a systematic offset away from the underlying mean.

Hierarchy paths (3) — routes to 3 parentless roots

Regression to the Mean → Probability → Measure → Aggregation → Micro Macro Linkage

Show alternative paths (2)

Not to Be Confused With¶

Regression to the Mean is not Variability because regression to the mean is a statistical artifact arising from imperfect correlation between repeated measurements, while variability is the observable spread or dispersion in a collection of values—regression is about why extreme values become less extreme on re-measurement; variability is about quantifying the range of fluctuation as a property of data itself.
Regression to the Mean is not Statistical Inference because regression to the mean is a specific confounding phenomenon to account for when selecting subjects at baseline extremes, while statistical inference is the broader reasoning process of drawing conclusions about populations from samples—regression is a pitfall in inference when timing coincides with selection; statistical inference is the methodology that must defend against it.
Regression to the Mean is not Calibration because regression to the mean is a measurement artifact (imperfect correlation producing spurious improvements), while calibration is the active process of aligning a system's outputs to a trusted standard—regression occurs passively in re-measurement; calibration requires intentional adjustment and verification.