Perspective Projection Matrix

One of the first things to do, when starting to render things, is to setup projection matrix - a transformation matrix that, in a nutshell, maps a 3D point to 2D plane (which we call projection plane, and which represents our display). Perspective projection is the one that is used most, since it models the lens-like behaviour of a human eye.

Perspective projection matrix will differ depending on "handedness" of the coordinate system you build it for. Coordinate system being right- or left-handed would generally mean X axis pointing to the right, Y axis going up and Z axis pointing towards the viewer (right-handed) or away from the viewer (left-handed). General structure of projection matrix looks as follows:

Left-Handed: Right-Handed:
$$\left[ {\begin{array}{cccc} C & 0 & 0 & 0 \\ 0 & D & 0 & 0 \\ 0 & 0 & A & B \\ 0 & 0 & 1 & 0 \\ \end{array} } \right]$$
$$\left[ {\begin{array}{cccc} C & 0 & 0 & 0 \\ 0 & D & 0 & 0 \\ 0 & 0 & A & B \\ 0 & 0 & -1 & 0 \\ \end{array} } \right]$$

C and D determine coordinates X and Y after transformation.Perspective projection in gaming applications is modeling behavior of a camera lens with focal length of 1, and C and D are determined by vertical field-of-view angle α and aspect ratio R (which is width divided by height):

$$ C=\frac{1}{R \cdot tan(\frac{\alpha}{2})} \\ D=\frac{1}{tan(\frac{\alpha}{2})} $$

Denominators in expressions for C and D can be used to determine ranges of X and Y: they ranges from -<denominator> to <denominator>. For example, if we have α = 90° and R = 1.3, then Y ranges from -1 to 1, and X from -1.3 to 1.3 (since tangent of 45° is 1).

To determine the value of Z coordinate, two additional parameters are necessary: near and far plane distances, N and F. These distances determine an overall visibility of a point in space: if point is closer then N units to the camera, or more then F units away, then this point is not visible by our camera. Important notice: N and F are absolute (positive) distances along clip space line of sight. Further below I will illustrate the significance of that.

Matrix components A and B are used to map Z coordinate of a point that, when divided by the last component of clip-space vector (perspective division), will be transformed to a range required by underlying API (NDC space). When using Direct3D or Vulkan, that means a range from 0 to 1; OpenGL uses -1 to 1. I will focus on the former case, since modern OpenGL allows to switch to the same behavior using GL_ARB_clip_control extension. Also, smaller range of NDC Z allows to have better precision (which can be improved further, more on this later).

For better understanding let us walk through the process of transforming a vector (X, Y, Z, 1) using a projection matrix. First, vector is transformed into clip space using vector-matrix multiplication (in this example matrix is row-major and vector is a column vector):

Left-Handed: Right-Handed:
$$ \left[ {\begin{array}{cccc} C & 0 & 0 & 0 \\ 0 & D & 0 & 0 \\ 0 & 0 & A & B \\ 0 & 0 & 1 & 0 \\ \end{array} } \right] \cdot \left[ {\begin{array}{c} X \\ Y \\ Z \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{c} X\cdot C \\ Y\cdot D \\ Z\cdot A + B \\ Z \\ \end{array} } \right] $$
$$ \left[ {\begin{array}{cccc} C & 0 & 0 & 0 \\ 0 & D & 0 & 0 \\ 0 & 0 & A & B \\ 0 & 0 & -1 & 0 \\ \end{array} } \right] \cdot \left[ {\begin{array}{c} X \\ Y \\ Z \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{c} X\cdot C \\ Y\cdot D \\ Z\cdot A + B \\ -Z \\ \end{array} } \right] $$

Next perspective divide should be performed to transform the vector into NDC coordinate space:

Left-Handed: Right-Handed:
$$X' = \frac{X\cdot C}{Z}\\ Y' = \frac{Y\cdot D}{Z}\\ Z' = \frac{Z\cdot A + B}{Z} = A + \frac B {Z}$$
$$X' = \frac{X\cdot C}{-Z}\\ Y' = \frac{Y\cdot D}{-Z}\\ Z' = \frac{Z\cdot A + B}{-Z} = -A + \frac B {-Z}$$

With that in mind, we can derive expressions for A and B. First, we define boundary conditions for Z':

  • For left-handed system: Z' = 0 for Z = N, and Z' = 1 for Z = F;

  • For right-handed system: Z' = 0 for Z = -N, and Z' = 1for Z = -F (remember, N and F are absolute distances along the line of sight, and in right-handed system line of sight is along -Z).

Left-Handed: Right-Handed:
$$\begin{equation*} \left\{ \begin{aligned} & A + \frac B N = 0 \\ & A + \frac B F = 1 \end{aligned} \right. \end{equation*}$$
$$\begin{equation*} \left\{ \begin{aligned} & -A + \frac B {-N} = 0 \\ & -A + \frac B {-F} = 1 \end{aligned} \right. \end{equation*}$$
$$\begin{equation*} \left\{ \begin{aligned} & B = -A \cdot N \\ & B = F - A \cdot F \end{aligned} \right. \end{equation*}$$
$$\begin{equation*} \left\{ \begin{aligned} & B = -A \cdot N \\ & B = -F - A \cdot F \end{aligned} \right. \end{equation*}$$
$$\begin{equation*} -A \cdot N = F - A \cdot F \end{equation*}$$
$$\begin{equation*} -A \cdot N = - F - A \cdot F \end{equation*}$$
$$\begin{equation*} A \cdot (F - N) = F \end{equation*}$$
$$\begin{equation*} A \cdot (F - N) = -F \end{equation*}$$
$$\begin{equation*} A = \frac F {F - N} \end{equation*}$$
$$\begin{equation*} A = -\frac F {F - N} \end{equation*}$$
$$\begin{equation*} B = -A \cdot N = - \frac {F \cdot N} {F - N} \end{equation*}$$
$$\begin{equation*} B = -A \cdot N = \frac {F \cdot N} {F - N} \end{equation*}$$

And that is how the matrix takes its final shape. Given four parameters (vertical field-of-view angle α, viewport aspect ratio R, near plane distance N and far plane distance F), perspective projection matrix will be defined as follows (row-major notation):

Left-Handed: Right-Handed:
$$\left[ {\begin{array}{cccc} \frac{1}{R \cdot tan(\frac{\alpha}{2})} & 0 & 0 & 0 \\ 0 & \frac{1}{tan(\frac{\alpha}{2})} & 0 & 0 \\ 0 & 0 & \frac F {F - N} & - \frac {F \cdot N} {F - N} \\ 0 & 0 & 1 & 0 \end{array} } \right]$$
$$\left[ {\begin{array}{cccc} \frac{1}{R \cdot tan(\frac{\alpha}{2})} & 0 & 0 & 0 \\ 0 & \frac{1}{tan(\frac{\alpha}{2})} & 0 & 0 \\ 0 & 0 & - \frac F {F - N} & - \frac {F \cdot N} {F - N} \\ 0 & 0 & -1 & 0 \end{array} } \right]$$

All that remains is to store the result using the layout which you application/engine would prefer. By default, graphics APIs expect matrix data to be laid out column-major - So, if your final result is in row-major layout (as shown above), the matrix needs to be transposed before being sent down the GPU pipeline.