Correlation & Covariance
Learn how covariance and correlation measure relationships between random variables and how they differ.
Correlation & Covariance
Concept Overview
Covariance is a measure of how much two random variables vary together. If greater values of one variable mainly correspond with greater values of the other variable, the covariance is positive. Correlation is a normalized version of covariance that standardizes the measure to a scale between -1 and 1, making it easier to interpret the strength of the linear relationship regardless of the variables' units.
Mathematical Definition
For two jointly distributed real-valued random variables X and Y, the covariance is defined as the expected value of the product of their deviations from their individual expected values:
Cov(X, Y) = E[XY] - μxμy
The Pearson correlation coefficient (ρ) is obtained by dividing the covariance by the product of the standard deviations of X and Y:
Key Concepts
- Direction of Relationship: A positive covariance or correlation means that as X increases, Y tends to increase. A negative value means that as X increases, Y tends to decrease.
- Scale of Measurement: Covariance is unbounded and depends on the units of X and Y. Correlation is always between -1 and 1, where 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship.
- Correlation does not imply causation: Just because two variables move together does not mean one causes the other. They could both be influenced by a confounding third variable.
- Linearity constraint: Pearson correlation only measures the strength oflinear relationships. Two variables can have zero correlation but still have a strong non-linear dependency (e.g., Y = X2 for a symmetric distribution of X).
Historical Context
The concepts of correlation and regression were first conceptualized by Sir Francis Galton in the late 19th century, while studying the relationship between the heights of parents and their adult children. He observed that extreme characteristics in parents tended to "regress" towards the mean in their offspring.
Karl Pearson, building upon Galton's work, formalized the mathematical definition of correlation in 1895, creating the Pearson product-moment correlation coefficient that is still widely used today.
Real-world Applications
- Finance: Calculating the covariance matrix of different assets is crucial for Modern Portfolio Theory to optimize returns while minimizing risk through diversification.
- Machine Learning: Feature selection algorithms often use correlation to identify redundant features (highly correlated independent variables) or informative features (highly correlated with the target variable).
- Genetics: Understanding the correlation between specific genetic markers and phenotypic traits.
- Meteorology: Forecasting weather models relies heavily on analyzing the correlation between atmospheric pressure, humidity, and temperature.
Related Concepts
- Regression to the Mean — How extreme values relate to expected values on subsequent measurements.
- Linear Regression — Modeling the relationship between a dependent variable and one or more independent variables.
- Principal Component Analysis (PCA) — A dimensionality reduction technique that relies heavily on the covariance matrix.
Experience it interactively
Adjust parameters, observe in real time, and build deep intuition with Riano’s interactive Correlation & Covariance module.
Try Correlation & Covariance on Riano →