Central Limit Theorem

Sample means converge to a normal distribution regardless of the population shape.

Central Limit Theorem

Concept Overview

The Central Limit Theorem (CLT) is arguably the most important result in probability theory. It states that the average of many independent random variables, regardless of their original distribution, tends toward a normal (Gaussian) distribution as the sample size grows. This explains why the bell curve appears so frequently in nature — many real-world quantities are the sum of numerous small, independent effects.

Mathematical Definition

Let X₁, X₂, …, X_n be independent and identically distributed (i.i.d.) random variables with mean μ and finite variance σ². Define the sample mean:

X_n = (1/n) · Σ_i=1ⁿ X_i

Then as n → ∞:

√n · (X_n − μ) / σ → N(0, 1)

Equivalently:

X_n ~ N(μ, σ²/n) approximately, for large n

The convergence is in distribution — the histogram of sample means becomes indistinguishable from the normal curve as n increases. The interactive visualization demonstrates this directly: try increasing the sample size and watch the histogram conform to the overlaid Gaussian.

Key Concepts

Why It Works for Any Distribution

The CLT requires only two conditions: finite mean and finite variance. It doesn't matter if the population is uniform, exponential, Bernoulli, or any other shape. The interactive lets you switch between these distributions to see the theorem in action:

Uniform [0, 1]: Flat distribution with μ = 0.5, σ² = 1/12. Even with small n, the sample mean distribution looks bell-shaped.
Exponential (λ = 1): Highly right-skewed with μ = 1, σ² = 1. Requires larger n to approach normality.
Bernoulli (p): Discrete distribution taking values 0 or 1. The sample mean is the sample proportion. Normality requires np ≥ 5 and n(1−p) ≥ 5 as a rough rule of thumb.

Rate of Convergence

How quickly the CLT kicks in depends on the population's skewness and kurtosis. The Berry–Esseen theorem provides a bound:

sup|F_n(x) − Φ(x)| ≤ C · ρ / (σ³ · √n)

where ρ = E[|X − μ|³], C ≤ 0.4748

Symmetric distributions (like uniform) converge faster. Highly skewed distributions (like exponential) converge more slowly. In the interactive, notice how the exponential distribution needs n ≈ 30+ to look Gaussian, while uniform looks normal by n ≈ 10.

Standard Error

The standard deviation of the sampling distribution is σ/√n, called the standard error. This shrinks as n grows — larger samples yield more precise estimates of the mean. Doubling the sample size reduces the standard error by a factor of √2, not 2. This diminishing-returns relationship governs sample size decisions in experimental design.

When the CLT Does Not Apply

Infinite variance: Distributions like the Cauchy distribution have no finite variance. The sum of Cauchy random variables is still Cauchy — the CLT does not apply.
Strong dependence: The CLT assumes independence (or weak dependence). Strongly correlated variables can produce non-normal sums.
Heavy tails: Distributions like Pareto with α ≤ 2 have infinite variance. Generalized versions of the CLT (stable distributions) apply instead.

Historical Context

Abraham de Moivre first discovered a special case of the CLT in 1733, showing that the binomial distribution approaches a normal curve. Pierre- Simon Laplace generalized this in 1812. The theorem was rigorously proven in increasing generality by Chebyshev (1887), Markov, Lyapunov (1901), and finally Lindeberg (1922) and Feller (1935), who established necessary and sufficient conditions.

The CLT's universality is why the normal distribution earned its name — it was considered the "normal" state of affairs when averaging measurements. Today, it underpins all of frequentist statistics: confidence intervals, hypothesis tests, and polling margins all rely on the CLT.

Real-world Applications

Polling and surveys: The margin of error in political polls comes directly from σ/√n — the CLT guarantees that the sample proportion is approximately normal.
Quality control: Manufacturing processes use X control charts (monitoring sample means) that rely on the CLT to set alarm thresholds.
Finance: Portfolio returns (averages of many asset returns) are modeled as normal under the CLT, forming the basis of modern portfolio theory and the Black-Scholes model.
A/B testing: Comparing conversion rates between groups uses the CLT to construct confidence intervals and compute p-values.
Physics: Measurement errors arise from many small independent sources; by the CLT, total error is normally distributed, justifying least-squares fitting.

Related Concepts

Probability Distributions — the CLT explains why the normal distribution dominates: it is the limiting distribution for sums of independent random variables
K-Means Clustering — cluster centroids are sample means, and the CLT governs their sampling variability
Gradient Descent — mini-batch gradient estimates are sample means of per-example gradients; the CLT justifies treating them as approximately normal

Central Limit Theorem