Probability & Statistics

Regression to the Mean

Regression to the Mean

Regression to the Mean

Concept Overview

Regression to the mean is a statistical phenomenon that occurs when repeated measurements are taken on the same subject or group. It states that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement—and if it is extreme on its second measurement, it will tend to have been closer to the average on its first. This is a natural consequence of random error present in almost all measurements.

Definition

To formally define regression to the mean, we start with the classical test theory model. A measured score (Observed Score) consists of two components: a True Score and Measurement Error.

X = T + E

Where:

  • X is the observed score.
  • T is the true score.
  • E is the random measurement error with mean 0.

When we take two measurements X1 and X2 on the same population, the expected value of the second measurement given the first is expressed using the correlation coefficient (ρ) between the two measurements:

E(X2 | X1 = x) = μ + ρ(x - μ)

Because the correlation ρ is strictly between -1 and 1, the term ρ(x - μ) is smaller in magnitude than (x - μ). This means the expected value of X2 is closer to the mean μ than X1 was.

Key Concepts

Correlation and Regression

The strength of regression to the mean is inversely related to the correlation between the variables. If two variables are perfectly correlated (ρ = 1), there is no regression to the mean. If they are completely uncorrelated (ρ = 0), the expected value of the second measurement is simply the population mean, representing complete regression to the mean.

The Illusion of Causality

Regression to the mean is frequently mistaken for a causal effect. For example, if a patient feels exceptionally ill (an extreme state), they might seek treatment. When they naturally regress toward their average state of health, the improvement is often falsely attributed to the treatment rather than the statistical inevitability of returning to the baseline.

Historical Context

The concept was first identified by Sir Francis Galton in the late 19th century. While studying the heights of parents and their adult children, Galton noticed that extremely tall parents tended to have children who were shorter than they were, while extremely short parents tended to have children taller than themselves. He originally termed this "regression towards mediocrity." Galton realized this was a fundamental mathematical property of bivariate normal distributions rather than a biological force shrinking the human race.

Applications

  • Medical Trials: Clinical trials often select participants based on extreme baseline measurements (e.g., high blood pressure). Without a control group, natural regression to the mean can make an ineffective drug appear successful.
  • Sports Analytics: The "Sports Illustrated Cover Jinx"—where athletes perform poorly after appearing on the cover—is typically regression to the mean. Athletes are chosen for the cover after an exceptionally good, outlier performance, making a subsequent drop in performance statistically probable.
  • Education and Policy: Interventions targeted at the lowest-performing schools or students will often appear to work simply because those extreme low scores were partly due to negative random error (e.g., a bad testing day), which doesn't repeat on the second test.
  • Traffic Safety: Speed cameras are often installed at locations that recently experienced a spike in accidents. A subsequent drop in accidents is often credited to the cameras, when it is frequently just a return to the average rate.

Related Concepts

  • Probability Distributions — The foundation for understanding true scores and measurement error distributions.
  • Hypothesis Testing — Understanding regression is crucial for designing valid experiments and interpreting p-values correctly without confounding effects.
  • Central Limit Theorem — Explains why measurement errors are often normally distributed.

Experience it interactively

Adjust parameters, observe in real time, and build deep intuition with Riano’s interactive Regression to the Mean module.

Try Regression to the Mean on Riano →

More in Probability & Statistics