Chapter 5

Matrix Multiplication

Apply one transformation, then another. The combined effect is a single transformation. That's matrix multiplication -- and it explains why the order matters.

The last chapter showed that a matrix encodes a transformation: its columns are the new basis vectors. This chapter asks the obvious next question: what happens when you chain two transformations together? If you rotate and then shear, you get some combined result. That combined result is itself a linear transformation, so it must have its own matrix. Finding that matrix is what we call matrix multiplication.

Two transformations in sequence

Let's start with two familiar transformations. First, a rotation of about 30 degrees:

R = \begin{bmatrix} 0.87 & -0.5 \\ 0.5 & 0.87 \end{bmatrix}

Then a horizontal shear:

S = \begin{bmatrix} 1 & 0.5 \\ 0 & 1 \end{bmatrix}

Watch what happens when we apply them one after the other. The original grid (dashed) rotates first, then the rotated grid gets sheared. The final result (solid) is a grid that has been both rotated and sheared.

The dashed vectors are the original basis. Purple dashed shows the intermediate state after rotation. The solid vectors are the final result after both transformations.

The key insight: no matter how many transformations you chain together, the final result is always a linear transformation. Grid lines stay straight, parallel, and evenly spaced. The origin stays fixed. So the combined effect must itself be some matrix.

Composition equals a single matrix

How do we find that combined matrix? We track where the basis vectors end up after both transformations. Those final landing spots become the columns of the product matrix.

Start with $\hat{\imath} = (1, 0)$ . First, $R$ maps it to $(0.87, 0.5)$ . Then $S$ maps that result to:

S \begin{bmatrix} 0.87 \\ 0.5 \end{bmatrix} = \begin{bmatrix} 1 \cdot 0.87 + 0.5 \cdot 0.5 \\ 0 \cdot 0.87 + 1 \cdot 0.5 \end{bmatrix} = \begin{bmatrix} 1.12 \\ 0.5 \end{bmatrix}

Same for $\hat{\jmath} = (0, 1)$ . First, $R$ maps it to $(-0.5, 0.87)$ . Then $S$ maps that to:

S \begin{bmatrix} -0.5 \\ 0.87 \end{bmatrix} = \begin{bmatrix} 1 \cdot (-0.5) + 0.5 \cdot 0.87 \\ 0 \cdot (-0.5) + 1 \cdot 0.87 \end{bmatrix} = \begin{bmatrix} -0.07 \\ 0.87 \end{bmatrix}

Pack those into columns and you get the product:

SR = \begin{bmatrix} 1.12 & -0.07 \\ 0.5 & 0.87 \end{bmatrix}

Track each basis vector through both transformations. Where they land becomes the columns of the product matrix.

This is the fundamental idea: matrix multiplication is function composition. The product $SR$ is the matrix whose transformation equals "first do $R$ , then do $S$ ." You read it right to left, just like function composition: $f(g(x))$ means "first $g$ , then $f$ ."

Order matters: $AB \neq BA$

Here's where things get interesting. Does it matter which transformation you do first? Absolutely. Let's see the proof visually.

Consider the same rotation $R$ and shear $S$ from before. On the left we rotate then shear (computing $SR$ ). On the right we shear then rotate (computing $RS$ ). We'll transform an L-shape through each pipeline.

Same two transformations, different order, different result. The L-shape ends up in a different position and orientation depending on which operation comes first.

This shouldn't be surprising if you think about it physically. Rotating a square and then shearing it gives you a different parallelogram than shearing first and then rotating. The order in which you apply geometric operations matters.

In code, this is the difference between:

# These produce different results!
result_1 = shear(rotate(point))   # SR
result_2 = rotate(shear(point))   # RS

This is why matrix multiplication is not commutative: $AB \neq BA$ in general. The product $AB$ means "first $B$ , then $A$ ," and swapping that order changes the outcome.

Computing the product

Now that we understand what matrix multiplication means (composition), let's see how to compute it. The rule is surprisingly simple: each column of the product $AB$ equals $A$ times the corresponding column of $B$ .

Think of it this way. The columns of $B$ are the new basis vectors under transformation $B$ . To find $AB$ , we need to know where those basis vectors end up after we also apply $A$ . So we just run each column of $B$ through the matrix $A$ .

Each column of the product AB is found by multiplying A by the corresponding column of B. The blue column of B becomes the blue column of AB. The green column of B becomes the green column of AB.

This column-by-column view is the clearest way to think about matrix multiplication. You're applying the left matrix to each column of the right matrix individually. Each column of $B$ represents what happens to a basis vector under transformation $B$ . Multiplying by $A$ tells you what happens to that result under transformation $A$ . The columns of $AB$ are the final destinations of the basis vectors after both transformations.

The formal bit

Let's write this down precisely. If $A$ and $B$ are 2x2 matrices:

AB = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\begin{bmatrix} e & g \\ f & h \end{bmatrix} = \begin{bmatrix} ae+bf & ag+bh \\ ce+df & cg+dh \end{bmatrix}

The meaning behind the formula:

Composition: $(AB)\mathbf{v} = A(B\mathbf{v})$ for every vector $\mathbf{v}$ . First apply $B$ , then apply $A$ .

Column-by-column: The $k$ -th column of $AB$ equals $A$ times the $k$ -th column of $B$ .

\text{col}_k(AB) = A \cdot \text{col}_k(B)

Index formula: For the general case, entry $(i, j)$ of the product is:

(AB)_{ij} = \sum_k A_{ik} B_{kj}

This is the "row times column" rule you might have memorized. Row $i$ of $A$ is dotted with column $j$ of $B$ . It gives the right answer, but the column-by-column view tells you why it's the right answer.

Some important properties:

Not commutative: $AB \neq BA$ in general (we saw this visually)
Associative: $(AB)C = A(BC)$ -- chaining three transformations, the grouping doesn't matter
Distributive: $A(B + C) = AB + AC$
Identity: $AI = IA = A$ , where $I$ is the identity matrix

The associativity one is worth pausing on. Whether you first compose $A$ with $B$ and then apply $C$ , or first compose $B$ with $C$ and then apply $A$ , you get the same final transformation. That's because function composition is always associative -- the grouping of parentheses never affects the result.

Worked example: 2D graphics pipeline

Let's put this to work. Suppose you're building a 2D game engine and you need to:

Scale a sprite by 2x horizontally (double its width)
Rotate it 90 degrees counterclockwise

The scale matrix:

S = \begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix}

The 90-degree rotation matrix (counterclockwise):

R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}

Pipeline A: Scale first, then rotate ( $RS$ )

RS = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 0 \cdot 2 + (-1) \cdot 0 & 0 \cdot 0 + (-1) \cdot 1 \\ 1 \cdot 2 + 0 \cdot 0 & 1 \cdot 0 + 0 \cdot 1 \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ 2 & 0 \end{bmatrix}

Pipeline B: Rotate first, then scale ( $SR$ )

SR = \begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 2 \cdot 0 + 0 \cdot 1 & 2 \cdot (-1) + 0 \cdot 0 \\ 0 \cdot 0 + 1 \cdot 1 & 0 \cdot (-1) + 1 \cdot 0 \end{bmatrix} = \begin{bmatrix} 0 & -2 \\ 1 & 0 \end{bmatrix}

These are different matrices. Let's see what they do to a simple house-shaped sprite.

Scaling then rotating versus rotating then scaling. In the left panel, the 2x horizontal scale happens first, making the house wide, then rotation turns that width into height. In the right panel, rotation happens first, then the 2x horizontal scale stretches the already-rotated house sideways.

In a real graphics engine, you compose many transformations into a single matrix before applying it to each vertex. This is far more efficient than running each vertex through multiple separate transformations. You multiply the matrices once, then multiply each vertex by the combined matrix. That's why understanding matrix multiplication matters for performance: one matrix-vector multiply instead of five.

# Inefficient: apply each transform separately
for vertex in sprite.vertices:
    v = scale(vertex)
    v = rotate(v)
    v = translate(v)  # (with homogeneous coordinates)
    draw(v)

# Efficient: combine transforms, apply once
M = translate_matrix @ rotate_matrix @ scale_matrix
for vertex in sprite.vertices:
    draw(M @ vertex)

Notice the order in the code: the matrix on the right is applied first. scale_matrix is rightmost because we scale first. This matches the right-to-left reading of matrix multiplication.

Thinking about it as a programmer

If you're used to function composition, matrix multiplication is natural. A matrix is a function (a linear one). Matrix multiplication is composing those functions. The "formula" for matrix multiplication isn't arbitrary -- it's the only way to compute the composition.

Here's another way to see why the formula works. Suppose $A$ and $B$ are both 2x2 matrices. The product $AB$ needs to satisfy:

(AB)\mathbf{v} = A(B\mathbf{v})

for every vector $\mathbf{v}$ . Let's compute $A(B\mathbf{v})$ step by step:

B\mathbf{v} = \begin{bmatrix} e & g \\ f & h \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} ex + gy \\ fx + hy \end{bmatrix}

A(B\mathbf{v}) = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\begin{bmatrix} ex+gy \\ fx+hy \end{bmatrix} = \begin{bmatrix} a(ex+gy) + b(fx+hy) \\ c(ex+gy) + d(fx+hy) \end{bmatrix}

Expanding:

= \begin{bmatrix} (ae+bf)x + (ag+bh)y \\ (ce+df)x + (cg+dh)y \end{bmatrix} = \begin{bmatrix} ae+bf & ag+bh \\ ce+df & cg+dh \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix}

The matrix on the right is exactly $AB$ . The formula isn't a definition -- it's a consequence of what composition means. The entries of $AB$ are forced by the requirement that $(AB)\mathbf{v} = A(B\mathbf{v})$ .

Key Takeaway: Matrix multiplication is function composition. The product $AB$ means "first apply $B$ , then apply $A$ ." Order matters because transformations don't commute. To compute $AB$ , apply $A$ to each column of $B$ -- the results become the columns of the product.

What's next

We can compose transformations. But how much does a transformation stretch or squish space? If you scale up in one direction and scale down in another, does the area increase or decrease? That measurement is the determinant -- a single number that captures how a transformation changes area and orientation.

Two transformations in sequence

Composition equals a single matrix

Order matters: AB≠BAAB \neq BAAB=BA

Computing the product

The formal bit

Worked example: 2D graphics pipeline

Thinking about it as a programmer

What's next

Order matters: $AB \neq BA$