Fisher Information

Visualize Fisher Information and the Cramér-Rao bound for normal distribution parameter estimation.

Concept Overview

Fisher Information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ upon which the probability of X depends. Informally, it measures the sharpness or curvature of the log-likelihood function near its maximum. Higher Fisher Information implies that the parameter can be estimated with greater precision from the given data.

Mathematical Definition

Let f(X; θ) be the probability density function (or probability mass function) for a random variable X conditioned on the parameter θ. The score function is the partial derivative with respect to θ of the natural logarithm of the likelihood function. Fisher Information, denoted as I(θ), is defined as the variance of the score function:

I(θ) = E [ (∂/∂θ ln f(X; θ))² ]

Under certain regularity conditions, the Fisher Information can also be expressed equivalently as the expected value of the negative second derivative of the log-likelihood (the Hessian):

I(θ) = -E [ ∂²/∂θ² ln f(X; θ) ]

For an independent and identically distributed (i.i.d.) random sample of size n, the total Fisher Information is simply n times the Fisher Information of a single observation: I_n(θ) = n * I(θ).

Key Concepts

Score Function: The gradient of the log-likelihood function. Its expected value is zero, and its variance is the Fisher Information.
Cramér-Rao Lower Bound (CRLB): A fundamental theorem in statistics stating that the variance of any unbiased estimator theta-hat is bounded below by the reciprocal of the Fisher Information. Var(theta-hat) ≥ 1 / I_n(θ). This means that higher Fisher Information yields a lower possible variance for the best estimator.
Maximum Likelihood Estimation (MLE): Maximum likelihood estimators are asymptotically efficient, meaning that as the sample size n → ∞, the variance of the MLE approaches the Cramér-Rao bound.
Curvature of Log-Likelihood: Visually, Fisher Information corresponds to the expected curvature (the second derivative) of the log-likelihood function around the true parameter value. A sharper peak indicates high information and lower uncertainty.

Historical Context

Fisher Information was introduced by the statistician and geneticist Ronald A. Fisher in a series of papers starting in 1922. Fisher developed the concept while formalizing the foundations of statistical estimation, introducing concepts like sufficiency, efficiency, and maximum likelihood. The concept formalized what it means for statistics to extract all available "information" from data about an unknown parameter.

Real-world Applications

Statistical Inference: Calculating confidence intervals and performing hypothesis tests (e.g., Wald tests) rely on Fisher Information to estimate standard errors.
Experimental Design: Optimal experimental design seeks to maximize the Fisher Information matrix to yield the most precise parameter estimates for a given cost.
Information Geometry: Fisher Information defines a Riemannian metric (the Fisher information metric) on statistical manifolds, connecting statistics to differential geometry.
Machine Learning: Concepts like Natural Gradient Descent use the Fisher Information matrix to adjust gradient updates in neural networks, accounting for the geometry of the parameter space.

Related Concepts

Maximum Likelihood Estimation (MLE) — A method of estimating parameters of a statistical model by maximizing the likelihood function.
Shannon Entropy — A measure of information content in information theory, intimately connected to Fisher Information via de Bruijn's identity.
Confidence Intervals — Ranges of values derived from sample statistics that are likely to contain the value of an unknown population parameter.

Fisher Information