Diagonalization

Visualize matrix diagonalization as changing to an eigenbasis to scale coordinates.

Concept Overview

Diagonalization is the process of representing a linear transformation in its simplest possible form—a diagonal matrix—by choosing the right coordinate system. This special coordinate system is formed by the matrix's eigenvectors. In this basis, the transformation acts merely by scaling along each axis independently, without any cross-talk between the coordinates. Not all matrices can be diagonalized, but those that can provide deep insights into the behavior of the linear system they represent.

Mathematical Definition

A square matrix A is called diagonalizable if there exists an invertible matrix P and a diagonal matrix D such that:

A = P D P^-1

Where:

A is the original n×n matrix

P is an n×n matrix whose columns are n linearly independent eigenvectors of A

D is an n×n diagonal matrix whose diagonal entries are the corresponding eigenvalues

Equivalently, multiplying both sides by P on the right gives:

A P = P D

This formula can be interpreted as a three-step process:

Apply P^-1 to change from the standard basis to the eigenvector basis.
Apply D to scale the vectors along the new axes.
Apply P to change the coordinates back to the standard basis.

Key Concepts

Conditions for Diagonalizability

An n×n matrix A is diagonalizable over the real numbers if and only if it has n linearly independent eigenvectors. This happens when:

The matrix has n distinct real eigenvalues.
For any repeated eigenvalue (an eigenvalue with algebraic multiplicity greater than 1), the dimension of its eigenspace (geometric multiplicity) is equal to its algebraic multiplicity.
The matrix is symmetric (A = A^T). The Spectral Theorem guarantees that all symmetric matrices are real-diagonalizable and their eigenvectors are orthogonal.

Defective Matrices

If a matrix cannot be diagonalized, it is called a defective matrix. This occurs when an eigenvalue has a geometric multiplicity strictly less than its algebraic multiplicity, meaning there are not enough independent eigenvectors to form a basis for the space. An example is the shear matrix:

A = [1 1; 0 1]

It has a repeated eigenvalue λ=1, but only one independent eigenvector [1, 0]^T. Instead of being diagonalized, such matrices can only be reduced to the Jordan Normal Form.

Matrix Powers

One of the most powerful applications of diagonalization is computing high powers of a matrix. Because P^-1P = I, terms cancel out when computing A^k:

A^k = (P D P^-1) (P D P^-1) ... (P D P^-1)

A^k = P D (P^-1 P) D (P^-1 ... P) D P^-1

A^k = P D^k P^-1

Since D is a diagonal matrix, D^k is simply computed by raising each diagonal entry to the k-th power. ComputingD^k itself therefore takes only O(n log k) time (using fast exponentiation on each of the n diagonal entries), although forming the full matrix A^k = P D^k P^-1 still requires dense matrix multiplications, which are on the order of O(n³).

Historical Context

The roots of diagonalization are deeply tied to the development of matrix theory and eigenvalue problems in the 19th century. Augustin-Louis Cauchy (1829) proved that symmetric matrices have real eigenvalues and orthogonal eigenvectors, effectively discovering the orthogonal diagonalization theorem.

Later, Camille Jordan (1870) generalized these concepts by introducing the Jordan Canonical Form, which demonstrated how non-diagonalizable matrices could still be factored into a nearly diagonal block structure. The explicit formulation A = PDP^-1 became standard in the 20th century as linear algebra was formalized and applied extensively to quantum mechanics and functional analysis.

Real-world Applications

Dynamical Systems: Diagonalization decouples systems of linear differential equations. A system dx/dt = Ax becomes dy/dt = Dy (where y = P^-1x), which is trivial to solve as each variable evolves independently.
Markov Chains: Finding the long-term steady-state probabilities of stochastic processes requires computing the limit of Pⁿ as n goes to infinity, which is drastically simplified by diagonalizing the transition matrix.
Quantum Mechanics: In quantum theory, diagonalizing the Hamiltonian operator matrix of a system yields its stationary states (eigenvectors) and corresponding energy levels (eigenvalues).
Data Compression and PCA: Principal Component Analysis involves computing the covariance matrix of data and diagonalizing it. The eigenvectors become the principal axes, enabling dimensionality reduction by discarding axes with small eigenvalues (variances).

Related Concepts

Eigenvalues and Eigenvectors — the fundamental mathematical objects that enable diagonalization
Change of Basis — interpreting the matrix P and its inverse geometrically as transitioning between the standard and eigenvector coordinate systems
Markov Chains — an application area where computing matrix powers (Aⁿ) via diagonalization reveals steady-state behavior

Diagonalization