Chapter 19
SVD (Singular Value Decomposition)
Every matrix -- every transformation -- can be broken into three simple steps: rotate, scale along the axes, rotate again. That's the SVD, and it's the most powerful decomposition in linear algebra.
We've built up a rich vocabulary for understanding transformations. We know how matrices stretch, rotate, shear, and collapse space. We've seen how eigenvalues reveal the scaling along special directions, how orthogonal matrices preserve shape, and how rank tells us what survives. But eigenvectors have a limitation: not every matrix has enough of them, and when the matrix isn't square, eigenvalues don't even apply.
The Singular Value Decomposition (SVD) has no such limitation. It works for every matrix -- square or rectangular, full rank or rank-deficient, symmetric or not. And it answers the most fundamental geometric question you can ask about a transformation: what does it do to the unit circle?
The unit circle becomes an ellipse
Take any 2x2 matrix and apply it to every point on the unit circle. The result is always an ellipse (which could be degenerate -- collapsed to a line segment or a point, but let's start with the full case).
Consider the matrix:
Every point on the unit circle () gets mapped to a new point . The collection of all those output points forms an ellipse. The SVD tells you everything about that ellipse: how long its axes are, and which directions they point.
The dashed circle is the unit circle -- all vectors of length 1. The blue ellipse is where those vectors land after applying . Dashed lines connect a few sample points to their images. The ellipse is tilted and stretched: some directions got amplified more than others. The SVD explains exactly which directions and by exactly how much.
The shape of this ellipse encodes everything about the transformation. The length of the longest axis tells you the maximum stretching. The length of the shortest axis tells you the minimum. The directions of the axes tell you where the stretching happens. These are precisely the singular values and singular vectors.
The three-step decomposition
The SVD says that any matrix can be factored as:
where is a rotation (or reflection), is a diagonal scaling, and is another rotation (or reflection). Each step is simple on its own. Together, they produce the full transformation.
For our matrix , the decomposition works out to singular values and . Watch what happens to the unit circle at each stage:
Three steps, each simple. Step 1 (): rotate the input space -- the circle stays a circle, but coordinates are aligned to special directions. Step 2 (): scale along the axes -- the circle stretches into an axis-aligned ellipse, with lengths and . Step 3 (): rotate the ellipse into its final orientation. The result is the same ellipse as applying directly.
This is the deep insight: no matter how complicated a transformation looks, it's secretly just rotate-scale-rotate. The rotations are rigid (they preserve shape), so all the "interesting" geometry -- the stretching and compressing -- happens in the middle step. The singular values and are the stretching factors, and they capture the transformation's essential character.
Singular values as semi-axes
The singular values and are the semi-axis lengths of the output ellipse. The directions of those axes come from the columns of (the "output" singular vectors). The directions on the input circle that map to those axes come from the columns of (the "input" singular vectors).
For our matrix, is the semi-major axis (the most stretching), and is the semi-minor axis (the least stretching). The column of points along the major axis. The column points along the minor axis. On the input side, is the direction that gets stretched the most, and gets stretched the least.
Left: the unit circle with the input singular vectors (green) and (purple) -- the directions on the input that get mapped to the ellipse axes. Right: the output ellipse with the output singular vectors (blue) and (purple), scaled by and . The transformation maps to and to . The singular values are the semi-axis lengths of the ellipse.
This is the cleanest summary of what any matrix does: it picks out special input directions ( columns), maps them to special output directions ( columns), and scales each by its singular value. Everything else follows from linearity.
Low-rank approximation
Here's where SVD becomes a practical superpower. If some singular values are small, you can set them to zero and get a matrix that's close to the original but has lower rank. That lower-rank matrix is the best rank- approximation of -- no other rank- matrix is closer (in the Frobenius norm or the operator norm).
Imagine a matrix with singular values , , . The third singular value is tiny -- the transformation barely does anything in that direction. If you drop it, setting , you get a rank-2 approximation that captures of the matrix's "energy."
Left: all three singular values of a matrix. The third () is tiny compared to the others. Right: set , reducing the rank from 3 to 2. The approximation captures 99.99% of the matrix's energy. For data compression, this means massive storage savings with minimal quality loss.
This is why Netflix uses SVD for recommendations (users and movies form a matrix; most of the structure lives in a few singular values). It's why image compression works (natural images have rapidly decaying singular values). It's why principal component analysis (PCA) is SVD in disguise -- the principal components are the leading singular vectors.
The formal bit
The Singular Value Decomposition of any matrix is:
where:
- is an orthogonal matrix. Its columns are the left singular vectors.
- is an orthogonal matrix. Its columns are the right singular vectors.
- is an diagonal matrix with non-negative entries on the diagonal. These are the singular values.
The key relationships:
This says: the matrix maps the right singular vector to the left singular vector , scaled by . That's the entire content of the decomposition in one equation.
Connection to eigenvalues. The singular values are the square roots of the eigenvalues of (or equivalently, of ):
The matrix is symmetric and positive semi-definite, so it always has non-negative eigenvalues and an orthogonal eigenvector basis. The eigenvalues of are , and the eigenvectors are the columns of . Similarly, the eigenvalues of are the same , with eigenvectors being the columns of .
Rank and singular values. The rank of equals the number of nonzero singular values. A rank- matrix has exactly nonzero singular values. This gives us the best rank- approximation: truncate the SVD to the largest singular values:
The Eckart-Young theorem guarantees that is the closest rank- matrix to in both the Frobenius norm and the operator norm. No other rank- matrix does better.
Let's verify with our example. .
First, compute :
The eigenvalues of are found from:
So and . The maximum stretch is about 3.6x, and the minimum is about 1.4x. The transformation stretches space in every direction -- nothing gets compressed below the original size.
Worked example: image compression with SVD
A grayscale image is just a matrix -- each entry is a pixel intensity from 0 (black) to 255 (white). An 800x600 image is an 800x600 matrix with 480,000 values. The SVD can compress this dramatically.
Here's the idea. Compute the SVD of the image matrix: . The matrix has at most 600 singular values (the minimum dimension). But natural images have rapidly decaying singular values -- the first few capture the broad shapes, the next few add detail, and most of the rest are near zero.
Instead of storing all 480,000 pixel values, store only the largest singular values and their corresponding singular vectors:
Each term is a rank-1 matrix: an 800-dimensional column times a 600-dimensional row, which costs values to store. So a rank- approximation costs values.
| Rank | Storage | Compression | Quality |
|---|---|---|---|
| 1 | 1,401 | 0.3% of original | Blobby -- a single weighted average of the image's dominant gradient |
| 5 | 7,005 | 1.5% of original | Recognizable -- major shapes and contrast visible |
| 20 | 28,020 | 5.8% of original | Sharp -- most details recovered, minor artifacts |
| 50 | 70,050 | 14.6% of original | Nearly indistinguishable from original |
| 600 (full) | 480,000 | 100% | Exact reproduction |
function svdCompress(imageMatrix, k) {
// imageMatrix is m x n
const { U, sigma, V } = computeSVD(imageMatrix);
// Keep only k largest singular values
let compressed = zeros(imageMatrix.length, imageMatrix[0].length);
for (let i = 0; i < k; i++) {
// Add rank-1 contribution: sigma[i] * U[:,i] * V[:,i]^T
for (let row = 0; row < imageMatrix.length; row++) {
for (let col = 0; col < imageMatrix[0].length; col++) {
compressed[row][col] += sigma[i] * U[row][i] * V[col][i];
}
}
}
return compressed;
}
// Storage comparison
const m = 800, n = 600;
const fullStorage = m * n; // 480,000
const rank50Storage = 50 * (m + n + 1); // 70,050
const compressionRatio = rank50Storage / fullStorage; // ~14.6%
What's happening geometrically? Each rank-1 term is a "layer" of the image. The first layer (, largest ) captures the coarsest structure -- the overall brightness gradient. The next layers add progressively finer detail. Dropping the layers with small means dropping the barely-visible fine details.
This same principle powers dimensionality reduction in machine learning (PCA), latent semantic analysis in search engines (treating documents as vectors), and noise reduction (noise contributes to small singular values, so truncating the SVD filters it out).
Key Takeaway: SVD reveals the skeleton of any transformation: its fundamental directions and magnitudes. Every matrix is rotate-scale-rotate. It's the Swiss Army knife of linear algebra -- it works for any matrix (square or not), finds the best low-rank approximation, and decomposes the transformation into its simplest possible pieces.
Where to go from here
We started this book with a single arrow in space -- a vector. We learned to combine vectors, transform them with matrices, and measure the effect of those transformations with determinants, rank, and null spaces. We discovered that dot products reveal projections and angles, cross products give us perpendicular directions and areas, and eigenvalues expose the directions where a transformation acts most simply. We built orthonormal bases with Gram-Schmidt, and finally arrived here: the SVD, which unifies all of it into a single factorization that works for every matrix.
That's the visual core of linear algebra. You can now look at any matrix and see what it does: the stretching, the rotation, the collapsing, the rank. Every concept connects to a geometric picture.
If you want to keep going, the trail branches in several directions. Abstract vector spaces generalize these ideas beyond arrows in -- function spaces, polynomial spaces, and infinite-dimensional spaces all obey the same rules. Tensors extend matrices to higher dimensions, essential for physics and deep learning. Differential forms bring linear algebra into calculus, connecting determinants to integration and cross products to curl. And numerical linear algebra -- the study of how to compute all of this efficiently and stably on a real computer -- is a field unto itself, where the SVD is the undisputed champion algorithm.
Every one of those paths starts from what you've learned here. The geometry doesn't change -- it just gets richer.