Least Squares Approximation

Explore how least squares approximation fits a line to data points by minimizing the sum of squared residuals.

Least Squares Approximation

Concept Overview

Least Squares Approximation is a standard approach in regression analysis used to find the best-fitting curve (often a line) for a given set of data points. It works by minimizing the sum of the squares of the offsets or "residuals" (the differences between the observed values and the corresponding fitted values). In linear algebra, it provides a way to solve overdetermined systems of equations where no exact solution exists.

Mathematical Definition

Given a system of linear equations Ax = b where A is an m × n matrix (with m > n, meaning more equations than unknowns), there is typically no exact solution. Instead, we seek a vector x that minimizes the length of the error vector e = b - Ax.

Objective: Minimize ||Ax - b||²

The solution x must satisfy the Normal Equations:

A^TAx = A^Tb

If A^TA is invertible, the unique solution is:

x = (A^TA)^-1A^Tb

For simple linear regression (y = mx + c):

y_i = mx_i + c

Error E = Σ_i=1ⁿ (y_i - y_i)²

Key Concepts

Orthogonal Projection

Geometrically, the least squares solution x represents the projection of the vector b onto the column space of A. The error vector e = b - Ax is orthogonal to the column space of A. This orthogonality leads directly to the Normal Equations, as A^Te = 0.

Residuals

The difference between an observed value and the value predicted by the model is called a residual. The method minimizes the sum of squared residuals (SSE). Squaring the residuals ensures that positive and negative errors don't cancel out, and it strongly penalizes large errors (outliers).

R-squared (Coefficient of Determination)

R² is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable in a regression model. An R² of 1 indicates that the regression predictions perfectly fit the data.

Historical Context

The method of least squares was first described by Adrien-Marie Legendre in 1805, who introduced it as an algebraic procedure for fitting linear equations to data. However, Carl Friedrich Gauss claimed to have developed the method in 1795, though he did not publish it until 1809.

Gauss used the method to successfully predict the location of the asteroid Ceres after it emerged from behind the sun in 1801. This spectacular success cemented the method's reputation and led to its widespread adoption in astronomy and geodesy.

Real-world Applications

Machine Learning: Linear regression, which relies on least squares, is a foundational algorithm for predicting continuous values based on historical data.
Economics: Econometricians use least squares to model relationships between economic variables, such as the effect of interest rates on inflation.
Signal Processing: Filtering and system identification often involve minimizing the mean square error between a true signal and an estimate.
Surveying and Geodesy: Adjusting observations to find the most probable locations of points on the Earth's surface.

Related Concepts

Linear Transformations — Linear regression is a transformation that maps inputs to predicted outputs.
Gram-Schmidt Process — Orthogonalization methods can provide numerically stable ways to solve least squares problems (e.g., QR decomposition).
Vector Spaces — Understanding column spaces is essential for grasping the geometric meaning of least squares.

Least Squares Approximation