Chapter 17

Homogeneous Coordinates

Translation isn't a linear transformation -- it breaks the rule that the origin stays fixed. So how do game engines handle translation with matrices? A clever dimensional trick.

We've spent many chapters building up the theory of linear transformations: matrices that rotate, scale, shear, and reflect. But we've been quietly avoiding a transformation that every game engine, every graphics pipeline, and every robotics system needs constantly: moving things around. Translation -- shifting every point by some offset -- is the most basic geometric operation, and it's the one thing a 2x2 matrix cannot do.

The reason is fundamental. Linear transformations must map the origin to the origin. That's baked into the definition: $A\vec{0} = \vec{0}$ for any matrix $A$ . But translation moves the origin somewhere else. If you want to shift everything right by 3 and up by 2, the origin goes to $(3, 2)$ . No 2x2 matrix can do that.

The solution is one of the most elegant tricks in all of computer graphics: embed 2D space into 3D, perform the translation as a linear transformation in the higher dimension, and project back down. This is the idea behind homogeneous coordinates, and it's the reason every transform in a modern graphics engine is a matrix multiply.

Translation can't be a 2x2 matrix

Let's see the problem concretely. Take a small triangle at the origin and try to translate it to position $(3, 2)$ . Every vertex needs to shift right by 3 and up by 2.

A 2x2 matrix maps $\vec{v}$ to $A\vec{v}$ . When $\vec{v} = \vec{0}$ , the result is always $\vec{0}$ . The origin is pinned. But we need the origin to move to $(3, 2)$ . No 2x2 matrix can accomplish this.

The blue triangle sits at the origin. We want to move it to where the orange triangle is -- shifted by $(3, 2)$ . But any 2x2 matrix maps $(0, 0)$ to $(0, 0)$ . The origin is immovable. Translation is not a linear transformation in 2D.

You could handle this outside the matrix framework -- just add a translation vector separately:

\vec{v}' = A\vec{v} + \vec{t}

And many systems do exactly that. But this breaks the elegant composability of matrices. You can't combine a rotation and a translation into a single matrix multiply. You lose the ability to represent your entire transform pipeline as one matrix. Game engines need a better solution.

The trick: add a dimension

The trick is deceptively simple. Instead of representing a 2D point as $(x, y)$ , represent it as $(x, y, 1)$ . This extra coordinate -- always set to 1 for points -- lifts 2D space into a slice of 3D space. Now a 3x3 matrix can encode translation in its third column, because that column gets multiplied by the 1.

\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} x + t_x \\ y + t_y \\ 1 \end{bmatrix}

The third column $(t_x, t_y, 1)$ is what makes translation work. When multiplied by the 1 in the coordinate vector, it adds the translation directly to $x$ and $y$ . The bottom row $[0\;0\;1]$ preserves the 1, keeping us in the homogeneous "slice."

A 2D point $(x, y)$ gets lifted to $(x, y, 1)$ in homogeneous coordinates. The 3x3 matrix has a clear structure: the top-left 2x2 block handles rotation and scaling (the linear part), the third column handles translation, and the bottom row is always $[0\;0\;1]$ to preserve the homogeneous coordinate.

The name "homogeneous coordinates" comes from projective geometry, but the practical idea is straightforward. By embedding 2D space as the $w = 1$ plane in 3D, we turn translation (which is affine, not linear) into a linear operation in the higher-dimensional space. The 3x3 matrix is a genuine linear transformation in 3D -- it's just that we only care about the $w = 1$ slice.

Translation as a 3x3 matrix

Now let's see this in action. The homogeneous translation matrix that moves every point by $(t_x, t_y)$ is:

T = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix}

The top-left block is the identity (no rotation or scaling), and the translation sits in the third column. Let's translate a triangle by $(3, 2)$ :

\begin{bmatrix} 1 & 0 & 3 \\ 0 & 1 & 2 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix}

The origin moved to $(3, 2)$ . Translation is now a matrix multiply.

The translation matrix shifts every vertex by $(3, 2)$ . The dashed blue triangle is the original, and the solid orange triangle is the translated result. Each vertex moves by exactly the same offset -- that's what translation does. And now it's a matrix multiply.

Notice what happens if you apply this matrix to the origin in homogeneous coordinates:

T \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix}

The origin moved to $(3, 2)$ . In 3D, this is perfectly valid -- the origin of 3D space $(0, 0, 0)$ isn't being mapped to itself. We're operating on the point $(0, 0, 1)$ , which represents 2D point $(0, 0)$ , and it maps to $(3, 2, 1)$ , representing 2D point $(3, 2)$ . The linearity rules are satisfied in 3D. We've sidestepped the constraint by working in a higher dimension.

Full transform: scale, rotate, translate

The real power of homogeneous coordinates isn't just translation -- it's that you can combine scale, rotation, and translation into a single 3x3 matrix. One matrix multiply does everything.

Each transformation type embeds into 3x3 form:

Scale by $(s_x, s_y)$ : put the scale in the top-left 2x2 block.

S = \begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{bmatrix}

Rotate by angle $\theta$ : the rotation goes in the top-left 2x2 block.

R = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}

Translate by $(t_x, t_y)$ : translation sits in the third column.

T = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix}

To apply all three, you multiply the matrices together. The standard order is TRS (translate $\times$ rotate $\times$ scale), applied right to left. This means: first scale the object in its local space, then rotate it, then translate it to its final position.

The pipeline: start with the unit square (blue dashed), scale it by 1.5 (green dashed), rotate 30 degrees (purple dashed), then translate to $(2, 1)$ (solid orange). The composition $M = TRS$ is a single 3x3 matrix that does all three steps in one multiply.

This is exactly how every 2D game engine works. Each game object stores a position, rotation angle, and scale. At render time, those values get packed into a single 3x3 matrix $M = TRS$ . Then every vertex of the object gets multiplied by $M$ . One matrix-vector multiply per vertex, no matter how complex the transformation.

The formal bit

Homogeneous coordinates represent a 2D point $(x, y)$ as the 3D vector $(x, y, 1)$ . More generally, any scalar multiple $(wx, wy, w)$ with $w \neq 0$ represents the same 2D point $(x, y)$ -- you recover the 2D point by dividing by $w$ . The case $w = 1$ is the standard representation.

The translation matrix in homogeneous coordinates:

T(t_x, t_y) = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix}

Any 2D linear transformation $A$ embeds into 3x3 form by placing $A$ in the top-left 2x2 block:

\begin{bmatrix} A & \vec{0} \\ \vec{0}^T & 1 \end{bmatrix} = \begin{bmatrix} a & b & 0 \\ c & d & 0 \\ 0 & 0 & 1 \end{bmatrix}

The composition order for a typical game transform is:

M = T \cdot R \cdot S

Applied right to left: first scale (in local space), then rotate (around the origin), then translate (to the final position). The combined matrix is:

M = \begin{bmatrix} s_x\cos\theta & -s_y\sin\theta & t_x \\ s_x\sin\theta & s_y\cos\theta & t_y \\ 0 & 0 & 1 \end{bmatrix}

This single matrix encodes scale, rotation, and translation. To transform a point, you do one matrix-vector multiply. To compose two transforms (parent and child in a scene graph), you do one matrix-matrix multiply. The entire transform hierarchy reduces to matrix arithmetic.

Key properties of homogeneous transform matrices:

The inverse of a translation is a translation by $(-t_x, -t_y)$
The inverse of $M = TRS$ is $M^{-1} = S^{-1}R^{-1}T^{-1}$ (reverse order, invert each)
The bottom row is always $[0\;0\;1]$ for affine transforms in 2D
Points use $w = 1$ : they get translated. Direction vectors use $w = 0$ : they don't. This distinction is built into the coordinate system.

That last point is subtle and powerful. If you set the third coordinate to 0 instead of 1:

\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 0 \end{bmatrix} = \begin{bmatrix} x \\ y \\ 0 \end{bmatrix}

The translation has no effect. This is exactly right for direction vectors -- a velocity or a surface normal shouldn't change when you move an object. Homogeneous coordinates automatically handle the distinction between points (which translate) and directions (which don't).

Worked example: a 2D game sprite transform

Let's build a complete 2D game object transform from scratch. We have a sprite that needs to be:

Scaled by 2 (doubled in size)
Rotated 30 degrees counterclockwise
Translated to position $(100, 50)$ in world coordinates

We'll work in world units (not pixels) and build each 3x3 matrix.

Step 1: Scale matrix.

S = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Step 2: Rotation matrix (30 degrees, $\cos 30° \approx 0.866$ , $\sin 30° = 0.5$ ).

R = \begin{bmatrix} 0.866 & -0.5 & 0 \\ 0.5 & 0.866 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Step 3: Translation matrix.

T = \begin{bmatrix} 1 & 0 & 100 \\ 0 & 1 & 50 \\ 0 & 0 & 1 \end{bmatrix}

Step 4: Compose $M = T \cdot R \cdot S$ (right to left).

First, compute $RS$ :

RS = \begin{bmatrix} 0.866 & -0.5 & 0 \\ 0.5 & 0.866 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1.732 & -1.0 & 0 \\ 1.0 & 1.732 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Then multiply by $T$ :

M = T \cdot RS = \begin{bmatrix} 1 & 0 & 100 \\ 0 & 1 & 50 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1.732 & -1.0 & 0 \\ 1.0 & 1.732 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1.732 & -1.0 & 100 \\ 1.0 & 1.732 & 50 \\ 0 & 0 & 1 \end{bmatrix}

The final matrix $M$ encodes the entire transform. Let's verify by applying it to a vertex, say the corner at $(1, 0)$ :

M \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 1.732 \cdot 1 + (-1.0) \cdot 0 + 100 \cdot 1 \\ 1.0 \cdot 1 + 1.732 \cdot 0 + 50 \cdot 1 \\ 0 + 0 + 1 \end{bmatrix} = \begin{bmatrix} 101.732 \\ 51.0 \\ 1 \end{bmatrix}

The vertex $(1, 0)$ ended up at approximately $(101.7, 51.0)$ in world coordinates. That's exactly what we'd expect: the point was scaled to $(2, 0)$ , rotated 30 degrees to $(1.732, 1.0)$ , then translated by $(100, 50)$ to land at $(101.732, 51.0)$ .

In code, this looks like:

function makeTransform(tx, ty, angleDeg, sx, sy) {
  const rad = angleDeg * Math.PI / 180;
  const cos = Math.cos(rad);
  const sin = Math.sin(rad);

  // M = T * R * S, combined into one matrix
  return [
    [sx * cos, -sy * sin, tx],
    [sx * sin,  sy * cos, ty],
    [0,         0,         1]
  ];
}

function transformPoint(matrix, x, y) {
  return [
    matrix[0][0] * x + matrix[0][1] * y + matrix[0][2],
    matrix[1][0] * x + matrix[1][1] * y + matrix[1][2]
  ];
}

const M = makeTransform(100, 50, 30, 2, 2);
transformPoint(M, 1, 0);  // [101.732, 51.0]
transformPoint(M, 0, 0);  // [100, 50] (the origin moved to the position)

Every game engine and UI framework has some version of this. Unity calls it Transform, CSS calls it matrix(), SVG calls it the transform attribute. Under the hood, it's always the same 3x3 (or 4x4 in 3D) homogeneous matrix.

Key Takeaway: Homogeneous coordinates let you represent translation as a matrix multiply by working in one dimension higher. A 2D point $(x, y)$ becomes $(x, y, 1)$ , and a 3x3 matrix encodes scale, rotation, and translation all at once. One matrix per game object, one multiply per vertex. The TRS composition order -- translate $\times$ rotate $\times$ scale, applied right to left -- is the standard pipeline in every 2D (and 3D) graphics system.

What's next

Everything we've done in 2D extends to 3D. The 3D graphics pipeline adds one more transformation: perspective projection -- turning a 3D scene into a flat image. That requires 4x4 homogeneous matrices, and the same principles apply: embed 3D space into 4D with a $w$ coordinate, and suddenly projection is just another matrix multiply.