Probability Distributions

Concept Overview

A probability distribution describes how the values of a random variable are spread across possible outcomes. For continuous random variables, the distribution is characterized by a probability density function (PDF), which gives the relative likelihood of the variable taking on a given value. The area under the PDF over any interval equals the probability of the variable falling within that interval, and the total area under the entire curve is always 1.

Mathematical Definition

A continuous probability distribution is defined by its probability density function f(x), which satisfies two properties:

1. f(x) ≥ 0 for all x

2. ∫_-∞^+∞ f(x) dx = 1

The probability of X falling in [a, b]:

P(a ≤ X ≤ b) = ∫_a^b f(x) dx

Key summary statistics:

E[X] = ∫_-∞^+∞ x · f(x) dx (expected value / mean)

Var(X) = E[(X − μ)²] = E[X²] − (E[X])² (variance)

σ = √Var(X) (standard deviation)

Key Concepts

Normal (Gaussian) Distribution

The normal distribution is the most important distribution in statistics. Its bell-shaped curve is parameterized by the mean μ (center) and standard deviation σ (spread). The famous 68-95-99.7 rule states that approximately 68% of values fall within 1σ of the mean, 95% within 2σ, and 99.7% within 3σ. By the Central Limit Theorem, the sum of many independent random variables tends toward a normal distribution, which explains its ubiquity in nature and measurement.

f(x) = (1/σ√2π) exp(−(x−μ)²/2σ²)

Uniform Distribution

The uniform distribution on [a, b] assigns equal probability density to every point in the interval. It represents complete uncertainty between two bounds — no value is more likely than any other. It serves as the baseline "maximum ignorance" distribution for bounded variables and is fundamental to random number generation algorithms.

f(x) = 1/(b−a) for a ≤ x ≤ b, 0 otherwise

Exponential Distribution

The exponential distribution models the time between events in a Poisson process — events that occur continuously and independently at a constant average rate λ. It is the only continuous distribution with the "memoryless" property: the probability of waiting another t units is the same regardless of how long you've already waited. This makes it ideal for modeling lifetimes, service times, and radioactive decay.

f(x) = λe^-λx for x ≥ 0, 0 otherwise

Cumulative Distribution Function (CDF)

The CDF, F(x) = P(X ≤ x), gives the probability that the random variable takes a value less than or equal to x. It is the integral of the PDF from −∞ to x. Every CDF is monotonically non-decreasing, starts at 0, and approaches 1. The CDF provides a direct way to compute probabilities for intervals: P(a < X ≤ b) = F(b) − F(a).

Historical Context

The normal distribution was first described by Abraham de Moivre in 1733 as an approximation to the binomial distribution. Carl Friedrich Gauss later used it extensively in his work on astronomical observations and measurement error, giving it the alternate name "Gaussian distribution." Pierre-Simon Laplace proved early versions of the Central Limit Theorem, explaining why the bell curve appears so frequently in nature.

The exponential distribution emerged from Siméon Denis Poisson's work on rare events in the 1830s, while the formal axiomatization of probability theory by Andrey Kolmogorov in 1933 provided the rigorous mathematical foundation that unified discrete and continuous distributions into a single framework based on measure theory.

Real-world Applications

Quality control: Manufacturing tolerances are modeled with normal distributions. Six Sigma methodology uses the standard deviation to define acceptable defect rates.
Finance: Stock returns are often modeled as log-normal distributions. Value-at-Risk (VaR) and options pricing (Black-Scholes) rely heavily on normal distribution assumptions.
Reliability engineering: The exponential distribution models component lifetimes and failure rates. Mean Time Between Failures (MTBF) is the reciprocal of the failure rate λ.
Machine learning: Gaussian distributions underpin Bayesian inference, Gaussian processes, mixture models, and the reparameterization trick used in variational autoencoders.
Simulation: Uniform random variables are the foundation of Monte Carlo methods. Other distributions are generated by transforming uniform samples via inverse CDF or rejection sampling.

Related Concepts

Gradient Descent — stochastic gradient descent samples mini-batches, and learning rate schedules often follow exponential decay
Logistic Growth — the logistic function is the CDF of the logistic distribution, a close relative of the normal
Taylor Series — moment-generating functions use power series to encode all moments of a distribution
Linear Transformations — covariance matrices describe multivariate normal distributions and are transformed by linear maps

Probability Distributions