Chi-Squared Test

Visualize the Chi-Squared goodness-of-fit test by comparing observed and expected frequencies.

Concept Overview

The Chi-Squared (χ²) test is a statistical method used to determine if there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. It evaluates how well a theoretical distribution fits the empirical data, helping us decide whether discrepancies are due to chance or an underlying relationship.

Mathematical Definition

The test statistic is calculated by summing the squared differences between observed and expected frequencies, divided by the expected frequencies:

χ² = Σ_i=1^k [ (O_i - E_i)² / E_i ]

Where:

O_i = Observed frequency for category i

E_i = Expected frequency for category i

k = Number of categories

Degrees of Freedom:

df = k - 1

We compare the calculated χ² statistic to a critical value from the Chi-Squared distribution table based on the degrees of freedom and the chosen significance level (α). If the statistic is larger than the critical value, we reject the null hypothesis (H₀).

Key Concepts

Goodness-of-Fit vs. Test of Independence

The Chi-Squared test has two main variations:

Goodness-of-Fit: Tests if a single categorical variable matches a specific expected distribution (like the one in the visualization).
Test of Independence: Tests whether two categorical variables are independent of each other, usually organized in a contingency table.

Assumptions and Limitations

For the test to be valid, certain assumptions must be met:

The data must be raw counts (frequencies), not percentages or proportions.
Observations must be independent.
Categories must be mutually exclusive.
The expected frequency (E_i) in each category should ideally be at least 5 to avoid inaccurate p-values.

Historical Context

The Chi-Squared test was introduced by Karl Pearson in 1900 as a formal mathematical tool to assess "goodness of fit." It arose during the foundational period of modern statistics when researchers needed objective methods to evaluate how well their theoretical probability models described real-world data, such as biological traits and physical measurements.

Real-world Applications

Genetics: Verifying if the observed traits in offspring match the expected Mendelian inheritance ratios.
Market Research: Analyzing survey data to see if consumer preferences are uniform or if certain demographics prefer specific products.
Quality Control: Checking if the distribution of defects in a manufacturing process matches the expected historical distribution.
A/B Testing: Evaluating if user behavior (like clicking or not clicking) is dependent on the version of a webpage shown.

Related Concepts

Hypothesis Testing — the overarching framework that uses test statistics and p-values to make decisions about population parameters.
Probability Distributions — specifically the Chi-Squared distribution, which models the sum of squared independent standard normal variables.

Chi-Squared Test

Chi-Squared Test

Concept Overview

Mathematical Definition

Key Concepts

Goodness-of-Fit vs. Test of Independence

Assumptions and Limitations

Historical Context

Real-world Applications

Related Concepts

Experience it interactively

More in Probability & Statistics

Random Walk

Monte Carlo Simulation

Central Limit Theorem