PCA (Dimensionality Reduction)
Find the principal axes that maximize variance and project data into a lower-dimensional space.
Principal Component Analysis (PCA)
Concept Overview
Principal Component Analysis (PCA) is a foundational technique in machine learning and statistics used for dimensionality reduction. It works by transforming a dataset with many possibly correlated variables into a smaller set of uncorrelated variables, called principal components. These components are chosen such that the first principal component captures the maximum possible variance in the data, the second captures the maximum remaining variance orthogonal to the first, and so on. This allows complex datasets to be visualized, compressed, and modeled efficiently while retaining most of the important structural information.
Mathematical Definition
Given a centered dataset X (an n × d matrix where n is the number of samples and d is the number of features), PCA seeks a set of orthogonal unit vectors (weights) w that maximize the variance of the projected data.
The solution to this optimization problem relies on finding the eigenvectors and eigenvalues of the covariance matrix Σ. The eigenvectors correspond to the principal components (directions), and the eigenvalues represent the variance of the data along those respective directions.
Key Concepts
Variance and Information
In the context of PCA, variance is synonymous with information. A direction with high variance means the data points are spread out along that axis, making it easier to distinguish between different observations. A direction with low variance contains less discriminative information and is often dominated by noise.
Orthogonality
Each principal component is constrained to be orthogonal (perpendicular) to all previous components. This ensures that each new component captures a completely independent (uncorrelated) dimension of the data's variance, preventing redundancy.
Dimensionality Reduction vs. Feature Selection
Unlike feature selection techniques which simply discard original features, PCA creates entirely new features that are linear combinations of all original features. This makes it a feature extraction technique. While it reduces the number of dimensions, the resulting components often lack direct physical interpretation.
Historical Context
PCA was independently invented by Karl Pearson in 1901 and later developed independently by Harold Hotelling in the 1930s. Pearson described it as finding "lines and planes of closest fit to systems of points in space." Hotelling coined the term "principal components" while working on educational psychology and test scoring, aiming to define a smaller set of fundamental psychological traits from numerous test scores.
Despite predating modern computing by decades, PCA remains one of the most widely used algorithms today, forming the basis for techniques like Eigenfaces in early computer vision and Latent Semantic Analysis in natural language processing.
Real-world Applications
- Data Visualization: Reducing high-dimensional datasets (like gene expression data or word embeddings) down to 2D or 3D for human inspection.
- Image Compression: Representing images using only the most significant principal components, dramatically reducing storage size while maintaining recognizable visual features.
- Noise Filtering: By discarding principal components associated with very small variances, random noise in sensor data can be effectively filtered out.
- Preprocessing for Machine Learning: Speeding up training times and reducing overfitting by feeding models a smaller, uncorrelated set of features instead of the raw data.
Related Concepts
- Linear Transformations — PCA relies heavily on matrix operations, eigenvectors, and singular value decomposition (SVD).
- K-Means Clustering — PCA is often used as a preprocessing step before applying K-Means to alleviate the curse of dimensionality.
- Autoencoders — In neural networks, linear autoencoders learn transformations that are essentially equivalent to PCA.
Experience it interactively
Adjust parameters, observe in real time, and build deep intuition with Riano’s interactive PCA (Dimensionality Reduction) module.
Try PCA (Dimensionality Reduction) on Riano →