What is the purpose of the magic of the last 4 rows in a 4x4 matrix for 3D graphics?

Question

What is the purpose of the magic of the last 4 rows in a 4x4 matrix for 3D graphics?

When I read a book about WebGL, I saw the following matrix description:

There is information about the last line in the book (WebGL Beginner Guide for Beginner Guide Diego Cantor, Brandon Jones):

Mysterious Fourth Row The fourth row has no special meaning. Elements m4, m8, m12 are always zero. Element m 16 (uniform coordinate) will always be 1.

So, if the last line is always [ 0, 0, 0, 1 ] , I don’t understand the following:

Why is it strictly necessary [ 0, 0, 0, 1 ] , why not only all values are 0 or even some other value?

But if you look at the source code of the glMatrix javascript library, for sure the translate() method from mat4 https://github.com/toji/gl-matrix/blob/master/src/gl-matrix/mat4.js

You can see the following:

 /** * Translate a mat4 by the given vector not using SIMD * * @param {mat4} out the receiving matrix * @param {mat4} a the matrix to translate * @param {vec3} v vector to translate by * @returns {mat4} out */ mat4.scalar.translate = function (out, a, v) { var x = v[0], y = v[1], z = v[2], a00, a01, a02, a03, a10, a11, a12, a13, a20, a21, a22, a23; if (a === out) { out[12] = a[0] * x + a[4] * y + a[8] * z + a[12]; out[13] = a[1] * x + a[5] * y + a[9] * z + a[13]; out[14] = a[2] * x + a[6] * y + a[10] * z + a[14]; out[15] = a[3] * x + a[7] * y + a[11] * z + a[15]; } else { a00 = a[0]; a01 = a[1]; a02 = a[2]; a03 = a[3]; a10 = a[4]; a11 = a[5]; a12 = a[6]; a13 = a[7]; a20 = a[8]; a21 = a[9]; a22 = a[10]; a23 = a[11]; out[0] = a00; out[1] = a01; out[2] = a02; out[3] = a03; out[4] = a10; out[5] = a11; out[6] = a12; out[7] = a13; out[8] = a20; out[9] = a21; out[10] = a22; out[11] = a23; out[12] = a00 * x + a10 * y + a20 * z + a[12]; out[13] = a01 * x + a11 * y + a21 * z + a[13]; out[14] = a02 * x + a12 * y + a22 * z + a[14]; out[15] = a03 * x + a13 * y + a23 * z + a[15]; } return out; };

I will highlight the line:

 out[15] = a03 * x + a13 * y + a23 * z + a[15];

The last (uniform coordinate) changes, so it can not be equal to 1.0?

So, I rather do not understand ...

I see that the internal 3x3 matrix represents rotations and [ m13, m14, m15 ] is a translation vector for changing the initial position of the camera, but what about the last line and why sometimes I see some calculations on it in libraries?

PS

I also believe that for the 3x3 matrix there is some magic 3 matrix that is used for two-dimensional transformations, am I right?

+5

matrix linear-algebra camera opengl webgl

user4959035 Sep 14 '15 at 13:15

source share

1 answer

Bdl · Accepted Answer · 2015-09-14T14:31:49+0000

Let's start with the theory:

In general, all transformations in OpenGL are mappings between different vector spaces. This means that the transformation t takes an element from the space V and maps it to the corresponding element in the space W, which can be written as

 t: V ---> W

One of the simplest mappings is a linear map which can (under certain assumptions **) always be represented by a matrix. The dimension of the matrix is always determined by the dimension of the vector spaces in which we work, therefore the mapping from R ^ N to R ^ M will always look like this:

 t: R^N ---> R^M t(x) = A * x, A = R^(N,M)

Where A is a dimensional matrix N times M.

In OpenGL, we usually need to map from R ^ 3 to R ^ 3, which means that linear mappings will always be represented by a 3x3 matrix. Using this, at least rotation, scaling (and combinations of this ***) can be expressed. But when viewing (for example) translations, we see that there is no way to represent them using a 3x3 matrix, so we need to expand our transformations to also support these operations.

This can be achieved using affine comparisons instead of linear ones, which are defined as

 t: R^N ---> R^M t(x) = A * x + b, A = R^(N,M) is a linear transformation and b = R^M

Using this, we can express rotations, scaling, and transformations from R ^ 3 to R ^ 3 by defining a 3x3 matrix plus a three-dimensional vector. Since this formulation is not very convenient (a matrix and a vector are required, it is difficult to combine several transformations), the operation is usually stored in a matrix of dimension N + 1, which is called the supplemented matrix (or extended vector spaces):

 t: R^N ---> R^M -A- bx t(x) = [ ] * [ ] -0- 1 1

As you can see, the last row of the matrix is always zero, except for the rightmost element, which is one. This also ensures that the last dimension of the result t (x) is always 1.

Why is it strictly necessary [ 0, 0, 0, 1 ] , why not only all values are 0 or even some other value?

If we did not limit the last line exactly [0,0,0,1] , we would no longer have an extended affine mapping in R ^ 3, but a linear mapping in R ^ 4. Since in OpenGL R ^ 4 is really not relevant, and we want the translations to be included, the last line is fixed. Another thing is that when the last row is different, combining affine mappings using matrix multiplication will not work.

One problem remains: we still cannot express (perspective) projections using affine mappings. If you look at the promising projection matrix in OpenGL, you will notice that here the last row is not [0,0,0,1] , but the theory behind this is a completely different story (if you are interested in looking here or here ).

What about the last line and why sometimes I see some calculations on it in libraries? The latter (uniform coordinate) changes, so it may not be equal to 1.0?

As already mentioned, the last line is only [0,0,0,1] for affine mappings, and not for projective ones. But sometimes it makes sense to apply transformations after projection (for example, moving a projected image onto a screen), then you should observe the last row of the matrix. That is why most matrix libraries implement all operations in such a way that allows the use of common matrices. Line

 out[15] = a03 * x + a13 * y + a23 * z + a[15];

The result will be 1 if the last line (a03, a13, a23, a [15]) is [0,0,0,1] .

Since this post has already received a lot more time than I thought, I better stay here, but if you have additional questions, just ask and I will try to add something to the answer.

Footnote:

** Works when both spaces are finite-dimensional vector spaces, and a basis is defined for them.

*** Combinations, since the combination of linear transformations over a finite-dimensional space is also linear, for example, t: R ^ N → R ^ M, u: R ^ M → R ^ K, as linear => t (u (x)) linear

What is the purpose of the magic of the last 4 rows in a 4x4 matrix for 3D graphics?

More articles: