Chapter 5

Matrix Multiplication

Apply one transformation, then another. The combined effect is a single transformation. That's matrix multiplication -- and it explains why the order matters.

The last chapter showed that a matrix encodes a transformation: its columns are the new basis vectors. This chapter asks the obvious next question: what happens when you chain two transformations together? If you rotate and then shear, you get some combined result. That combined result is itself a linear transformation, so it must have its own matrix. Finding that matrix is what we call matrix multiplication.

Two transformations in sequence

Let's start with two familiar transformations. First, a rotation of about 30 degrees:

R=[0.870.50.50.87]R = \begin{bmatrix} 0.87 & -0.5 \\ 0.5 & 0.87 \end{bmatrix}

Then a horizontal shear:

S=[10.501]S = \begin{bmatrix} 1 & 0.5 \\ 0 & 1 \end{bmatrix}

Watch what happens when we apply them one after the other. The original grid (dashed) rotates first, then the rotated grid gets sheared. The final result (solid) is a grid that has been both rotated and sheared.

after R after R î'' (1.12, 0.5) ĵ'' (-0.07, 0.87) Step 1: Rotate 30° Step 2: Shear Combined result shown solid

The dashed vectors are the original basis. Purple dashed shows the intermediate state after rotation. The solid vectors are the final result after both transformations.

The key insight: no matter how many transformations you chain together, the final result is always a linear transformation. Grid lines stay straight, parallel, and evenly spaced. The origin stays fixed. So the combined effect must itself be some matrix.

Composition equals a single matrix

How do we find that combined matrix? We track where the basis vectors end up after both transformations. Those final landing spots become the columns of the product matrix.

Start with ı^=(1,0)\hat{\imath} = (1, 0). First, RR maps it to (0.87,0.5)(0.87, 0.5). Then SS maps that result to:

S[0.870.5]=[10.87+0.50.500.87+10.5]=[1.120.5]S \begin{bmatrix} 0.87 \\ 0.5 \end{bmatrix} = \begin{bmatrix} 1 \cdot 0.87 + 0.5 \cdot 0.5 \\ 0 \cdot 0.87 + 1 \cdot 0.5 \end{bmatrix} = \begin{bmatrix} 1.12 \\ 0.5 \end{bmatrix}

Same for ȷ^=(0,1)\hat{\jmath} = (0, 1). First, RR maps it to (0.5,0.87)(-0.5, 0.87). Then SS maps that to:

S[0.50.87]=[1(0.5)+0.50.870(0.5)+10.87]=[0.070.87]S \begin{bmatrix} -0.5 \\ 0.87 \end{bmatrix} = \begin{bmatrix} 1 \cdot (-0.5) + 0.5 \cdot 0.87 \\ 0 \cdot (-0.5) + 1 \cdot 0.87 \end{bmatrix} = \begin{bmatrix} -0.07 \\ 0.87 \end{bmatrix}

Pack those into columns and you get the product:

SR=[1.120.070.50.87]SR = \begin{bmatrix} 1.12 & -0.07 \\ 0.5 & 0.87 \end{bmatrix}

î = (1,0) R(î) = (0.87, 0.5) SR(î) = (1.12, 0.5) ĵ = (0,1) R(ĵ) SR(ĵ) (-0.07, 0.87) Product matrix SR = [ 1.12 -0.07 0.5 0.87 ] col 1: where î lands col 2: where ĵ lands

Track each basis vector through both transformations. Where they land becomes the columns of the product matrix.

This is the fundamental idea: matrix multiplication is function composition. The product SRSR is the matrix whose transformation equals "first do RR, then do SS." You read it right to left, just like function composition: f(g(x))f(g(x)) means "first gg, then ff."

Order matters: ABBAAB \neq BA

Here's where things get interesting. Does it matter which transformation you do first? Absolutely. Let's see the proof visually.

Consider the same rotation RR and shear SS from before. On the left we rotate then shear (computing SRSR). On the right we shear then rotate (computing RSRS). We'll transform an L-shape through each pipeline.

Rotate, then Shear SR Shear, then Rotate RS Same two transformations. Different order. Different result. The dashed L is the original. The solid shape is the result.

Same two transformations, different order, different result. The L-shape ends up in a different position and orientation depending on which operation comes first.

This shouldn't be surprising if you think about it physically. Rotating a square and then shearing it gives you a different parallelogram than shearing first and then rotating. The order in which you apply geometric operations matters.

In code, this is the difference between:

# These produce different results!
result_1 = shear(rotate(point))   # SR
result_2 = rotate(shear(point))   # RS

This is why matrix multiplication is not commutative: ABBAAB \neq BA in general. The product ABAB means "first BB, then AA," and swapping that order changes the outcome.

Computing the product

Now that we understand what matrix multiplication means (composition), let's see how to compute it. The rule is surprisingly simple: each column of the product ABAB equals AA times the corresponding column of BB.

Think of it this way. The columns of BB are the new basis vectors under transformation BB. To find ABAB, we need to know where those basis vectors end up after we also apply AA. So we just run each column of BB through the matrix AA.

A [ a b c d ] × B [ e g f h ] = AB [ ae+bf ag+bh ce+df cg+dh ] Column by column: Col 1 of AB = A × Col 1 of B [ a b c d ] [ e f ] = [ ae+bf ce+df ] Col 2 of AB = A × Col 2 of B [ a b c d ] [ g h ] = [ ag+bh cg+dh ]

Each column of the product AB is found by multiplying A by the corresponding column of B. The blue column of B becomes the blue column of AB. The green column of B becomes the green column of AB.

This column-by-column view is the clearest way to think about matrix multiplication. You're applying the left matrix to each column of the right matrix individually. Each column of BB represents what happens to a basis vector under transformation BB. Multiplying by AA tells you what happens to that result under transformation AA. The columns of ABAB are the final destinations of the basis vectors after both transformations.

The formal bit

Let's write this down precisely. If AA and BB are 2x2 matrices:

AB=[abcd][egfh]=[ae+bfag+bhce+dfcg+dh]AB = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\begin{bmatrix} e & g \\ f & h \end{bmatrix} = \begin{bmatrix} ae+bf & ag+bh \\ ce+df & cg+dh \end{bmatrix}

The meaning behind the formula:

Composition: (AB)v=A(Bv)(AB)\mathbf{v} = A(B\mathbf{v}) for every vector v\mathbf{v}. First apply BB, then apply AA.

Column-by-column: The kk-th column of ABAB equals AA times the kk-th column of BB.

colk(AB)=Acolk(B)\text{col}_k(AB) = A \cdot \text{col}_k(B)

Index formula: For the general case, entry (i,j)(i, j) of the product is:

(AB)ij=kAikBkj(AB)_{ij} = \sum_k A_{ik} B_{kj}

This is the "row times column" rule you might have memorized. Row ii of AA is dotted with column jj of BB. It gives the right answer, but the column-by-column view tells you why it's the right answer.

Some important properties:

The associativity one is worth pausing on. Whether you first compose AA with BB and then apply CC, or first compose BB with CC and then apply AA, you get the same final transformation. That's because function composition is always associative -- the grouping of parentheses never affects the result.

Worked example: 2D graphics pipeline

Let's put this to work. Suppose you're building a 2D game engine and you need to:

  1. Scale a sprite by 2x horizontally (double its width)
  2. Rotate it 90 degrees counterclockwise

The scale matrix:

S=[2001]S = \begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix}

The 90-degree rotation matrix (counterclockwise):

R=[0110]R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}

Pipeline A: Scale first, then rotate (RSRS)

RS=[0110][2001]=[02+(1)000+(1)112+0010+01]=[0120]RS = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 0 \cdot 2 + (-1) \cdot 0 & 0 \cdot 0 + (-1) \cdot 1 \\ 1 \cdot 2 + 0 \cdot 0 & 1 \cdot 0 + 0 \cdot 1 \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ 2 & 0 \end{bmatrix}

Pipeline B: Rotate first, then scale (SRSR)

SR=[2001][0110]=[20+012(1)+0000+110(1)+10]=[0210]SR = \begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 2 \cdot 0 + 0 \cdot 1 & 2 \cdot (-1) + 0 \cdot 0 \\ 0 \cdot 0 + 1 \cdot 1 & 0 \cdot (-1) + 1 \cdot 0 \end{bmatrix} = \begin{bmatrix} 0 & -2 \\ 1 & 0 \end{bmatrix}

These are different matrices. Let's see what they do to a simple house-shaped sprite.

Scale 2x, then Rotate 90° RS = [[0,-1],[2,0]] original result î'=(0,2) ĵ'=(-1,0) Rotate 90°, then Scale 2x SR = [[0,-2],[1,0]] original result î'=(0,1) ĵ'=(-2,0) Left: the house is tall (scaled vertically by 2 after rotation). Right: the house is wide (scaled horizontally before rotation). Same operations, different order, different result.

Scaling then rotating versus rotating then scaling. In the left panel, the 2x horizontal scale happens first, making the house wide, then rotation turns that width into height. In the right panel, rotation happens first, then the 2x horizontal scale stretches the already-rotated house sideways.

In a real graphics engine, you compose many transformations into a single matrix before applying it to each vertex. This is far more efficient than running each vertex through multiple separate transformations. You multiply the matrices once, then multiply each vertex by the combined matrix. That's why understanding matrix multiplication matters for performance: one matrix-vector multiply instead of five.

# Inefficient: apply each transform separately
for vertex in sprite.vertices:
    v = scale(vertex)
    v = rotate(v)
    v = translate(v)  # (with homogeneous coordinates)
    draw(v)

# Efficient: combine transforms, apply once
M = translate_matrix @ rotate_matrix @ scale_matrix
for vertex in sprite.vertices:
    draw(M @ vertex)

Notice the order in the code: the matrix on the right is applied first. scale_matrix is rightmost because we scale first. This matches the right-to-left reading of matrix multiplication.

Thinking about it as a programmer

If you're used to function composition, matrix multiplication is natural. A matrix is a function (a linear one). Matrix multiplication is composing those functions. The "formula" for matrix multiplication isn't arbitrary -- it's the only way to compute the composition.

Here's another way to see why the formula works. Suppose AA and BB are both 2x2 matrices. The product ABAB needs to satisfy:

(AB)v=A(Bv)(AB)\mathbf{v} = A(B\mathbf{v})

for every vector v\mathbf{v}. Let's compute A(Bv)A(B\mathbf{v}) step by step:

Bv=[egfh][xy]=[ex+gyfx+hy]B\mathbf{v} = \begin{bmatrix} e & g \\ f & h \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} ex + gy \\ fx + hy \end{bmatrix}

A(Bv)=[abcd][ex+gyfx+hy]=[a(ex+gy)+b(fx+hy)c(ex+gy)+d(fx+hy)]A(B\mathbf{v}) = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\begin{bmatrix} ex+gy \\ fx+hy \end{bmatrix} = \begin{bmatrix} a(ex+gy) + b(fx+hy) \\ c(ex+gy) + d(fx+hy) \end{bmatrix}

Expanding:

=[(ae+bf)x+(ag+bh)y(ce+df)x+(cg+dh)y]=[ae+bfag+bhce+dfcg+dh][xy]= \begin{bmatrix} (ae+bf)x + (ag+bh)y \\ (ce+df)x + (cg+dh)y \end{bmatrix} = \begin{bmatrix} ae+bf & ag+bh \\ ce+df & cg+dh \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix}

The matrix on the right is exactly ABAB. The formula isn't a definition -- it's a consequence of what composition means. The entries of ABAB are forced by the requirement that (AB)v=A(Bv)(AB)\mathbf{v} = A(B\mathbf{v}).

Key Takeaway: Matrix multiplication is function composition. The product ABAB means "first apply BB, then apply AA." Order matters because transformations don't commute. To compute ABAB, apply AA to each column of BB -- the results become the columns of the product.

What's next

We can compose transformations. But how much does a transformation stretch or squish space? If you scale up in one direction and scale down in another, does the area increase or decrease? That measurement is the determinant -- a single number that captures how a transformation changes area and orientation.