Probability & Statistics

Benford's Law

Visualize how the leading digits of many real-life numerical datasets follow a logarithmic distribution.

Benford's Law

Concept Overview

Benford's Law, also known as the Newcomb-Benford Law or the law of anomalous numbers, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. Counterintuitively, the leading digit is more likely to be small than large. For example, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time.

Mathematical Definition

According to Benford's Law, a set of numbers satisfies the law if the leading digit d (where d{1, ..., 9}) occurs with probability:

P(d) = log10(d + 1) - log10(d)

P(d) = log10(1 + 1/d)

This formula demonstrates a logarithmic distribution rather than a uniform one. This distribution implies that for the law to hold, the numbers must be spread across multiple orders of magnitude.

Key Concepts

  • Scale Invariance: If a dataset satisfies Benford's Law, multiplying all values by a non-zero constant does not change the distribution of the leading digits. This means the law works regardless of whether you measure lengths in meters or feet.
  • Base Invariance: Benford's Law can be generalized to any base b ≥ 2. The formula becomes P(d) = logb(1 + 1/d).
  • Multiple Orders of Magnitude: Datasets that follow Benford's Law typically span multiple orders of magnitude. Datasets constrained within a tight range (e.g., human heights) usually do not follow the law.
  • Beyond the First Digit: Benford's Law can also be applied to second, third, or combinations of digits. As you move to later digits, the distribution gradually approaches a uniform distribution.

Historical Context

The phenomenon was first discovered in 1881 by astronomer Simon Newcomb, who noticed that the earlier pages of logarithm tables (those starting with 1) were much more worn than the later pages. However, his observation went largely unnoticed. In 1938, physicist Frank Benford independently rediscovered the phenomenon and empirically tested it across various datasets, including the surface areas of 335 rivers, the sizes of 3259 US populations, and 104 physical constants. Benford's widespread publication led to the law bearing his name.

Real-world Applications

  • Fraud Detection: Forensic accountants and auditors use Benford's Law to detect anomalies in financial statements, tax returns, and expense reports. Fraudulent numbers are often generated uniformly or normally, which visibly deviates from Benford's Law.
  • Election Forensics: Analyzing the leading digits of vote counts across precincts can sometimes help identify potential election fraud, though the applicability here is a subject of debate among statisticians.
  • Data Quality Control: Benford's Law acts as a sanity check for large scientific or socio-economic datasets to ensure the data was not artificially manipulated or generated.
  • Computer Science: Informing data compression algorithms or predicting memory allocation needs by understanding the expected distribution of numbers.

Related Concepts

  • Probability Distributions — Benford's Law is a specific, discrete probability distribution.
  • Law of Large Numbers — Ensures that as a dataset grows, the empirical distribution of leading digits converges to Benford's Law probabilities (assuming the dataset inherently follows it).

Experience it interactively

Adjust parameters, observe in real time, and build deep intuition with Riano’s interactive Benford's Law module.

Try Benford's Law on Riano →

More in Probability & Statistics