Composition of linear transformations: matrix multiplication

Matrix multiplication is funny, because it's a more universally important and perhaps more fundamental operation than matrix addition is. This is often not emphasised in linear algebra textbooks, which treat matrices primarily as arrays of numbers with interesting properties, as opposed to something motivated directly out of the study of linear transformations, but should become pretty clear once we understand where matrix multiplication actually comes from.

The first, most important property of a linear transformation one needs to recognise in order to understand the significance of matrix multiplication is that the composition of two linear transformations is also a linear transformation. This will turn out to be quite significant when discussing group theory, and will actually be one of the fundamental requirements for identifying something as a group, but let's try to make this clearer right now.

The statement can be proven pretty easily as follows: suppose the transformation in question is $L = M \circ N$. Then:

$$\begin{array}{c}L(ax + by) = M(N(ax + by))\\ = M(aN(x) + bN(y))\\ = aM(N(x)) + bM(N(y))\\ = aL(x) + bL(y)\end{array}$$
It is trivial from this point (by induction) that the composition of any number of linear transformations is also a linear transformation.

Why this is so significant is that it means we can talk about compositions of transformations without talking about them in reference to a specific vector. Much like how the fact that linear transformations are linear allowed us to write them in matrix form (remember, the matrix form is just the image of the basis), the fact that compositions of linear transformations are linear allows us to do the same thing with them. In other words, the fact that compositions of linear transformations are linear means that a composition acts on every vector in exactly the same way, since the vector can be written as a linear combination of other vectors.

When thinking about a matrix as a set of vectors, it's important to realise that we're talking about an ordered set of vectors, i.e. it's possible to distinguish between the n basis vectors even after the transformation. Transforming (1, 0) to (2, 0) and (0, 1) to (3, 1) is not the same as transforming (1, 0) to (3, 1) and (0, 1) to (2, 0), i.e. $\left[ {\begin{array}{*{20}{c}}2&3\\0&1\end{array}} \right] \ne \left[ {\begin{array}{*{20}{c}}3&2\\1&0\end{array}} \right]$.

You might be thinking: aren't the axes symmetric? Isn't there no way to distinguish between the unit vector in the x-direction and the unit vector in the y-direction? This is true -- however, once you label them, they can be distinguished. I.e. you can pick out the image of the x-unit vector under the transformation and the x-component of a vector being transformed and point out that they are related. Another way of putting this is that the symmetry between axes means switching between x and y wherever they appear leaves a correct statement correct.

For example, we know that $\left[ {\begin{array}{*{20}{c}}3&2\\8&4\end{array}} \right]\left[ {\begin{array}{*{20}{c}}6\\5\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}{28}\\{68}\end{array}} \right]$. Now suppose we switch the x- and y- axes.

Then first, we would have to switch the image of the x-basis vector and the y-basis vector: $\left[ {\begin{array}{*{20}{c}}2&3\\4&8\end{array}} \right]$. That's not enough, however, because we also need to switch the x- and y- components of these images, so our matrix looks like this: $\left[ {\begin{array}{*{20}{c}}4&8\\2&3\end{array}} \right]$. Similarly, we must switch the x- and y-components of the input vector to make it look like this: $\left[ {\begin{array}{*{20}{c}}5\\6\end{array}} \right]$.

Now, if we compute the matrix vector product, we get $\left[ {\begin{array}{*{20}{c}}{68}\\{28}\end{array}} \right]$, which is precisely the same result with the x- and y- components inverted, telling us that the output transforms in the same way as the input when x and y are switched, or that matrix multiplication is invariant under such a transformation.

More on this at Introduction to symmetry (1201-001), under "Symmetry on the complex plane".

The point of this detour was to introduce an example of an idea involving composition of transformations and how they affect the output.

Now, let's try to find a general form for the matrix representing a composition of transformations. Once again, we start with two dimensions, and once again, we note that it is sufficient to calculate the image of the basis vectors after two transformations and put them together as a matrix.

The first transformation can be represented by the matrix $\left[ {\begin{array}{*{20}{c}}a&c\\b&d\end{array}} \right]$, which means that after the first transformation, this is what the basis vectors have been transformed to.

To see what the second transformation (i.e. the transformation on the left), say $\left[ {\begin{array}{*{20}{c}}p&r\\q&s\end{array}} \right]$ does to this transformed basis, just multiply this matrix with each column of  $\left[ {\begin{array}{*{20}{c}}a&c\\b&d\end{array}} \right]$. In other words, to multiply two matrices, just multiply the matrix on the left (the transformation applied later) with each column of the matrix on the right (the transformation applied first).

The result should look like this:

$$\left[ {\begin{array}{*{20}{c}}{ap + br}&{cp + dr}\\{aq + bs}&{cq + ds}\end{array}} \right]$$
The same, of course, can be extended to more than two dimensions.

You should be referring to the computational reference guide as you read this -- often, linear algebra is learnt as if it were all about computation. While this is not true, it is useful to have a good command of the computation once you understand the insights.

However, I would advise you to stay away from using computational techniques for proving things in linear algebra -- it becomes tedious pretty quick -- for example, prove that $(AB)C=A(BC)$, i.e. that matrix multiplication is associative, in $\mathbb{R}^n$. It's possible to prove this pretty easily and intuitively from just the idea of a linear transformation -- it's unnecessarily tedious via computation.

Linear algebra is not limited to $\mathbb{R}^n$, which is why this "trivial" proof is really more rigorous than the computational one. Mathematics is really all about such heuristics, because it's ultimately these heuristics that are entailed by formal proofs, not unnecessarily complicated computation. The heuristic proofs deliver understanding much better than computational ones, and you'll find plenty of examples of this within linear algebra.

No comments:

Post a Comment