Why matrices?

This post is a bit more tedious than necessary, but I suggest you stick with it to the end and not skip to the simpler derivations at the end so you can appreciate the elegance of linear algebra. We will start with picking a few standard linear transformations in $\mathbb{R}^2$ and writing the components of the transformed vector in terms of the components of the initial vector.

Reflection
The most elementary reflection in $\mathbb{R}^2$ is a reflection across the origin. This is, in fact, the only point across which a reflection is a linear transformation, as reflecting about any other point violates the condition that the origin must remain fixed. The reflection of some vector $\left[ {\begin{array}{*{20}{c}}x\\y\end{array}} \right]$ across the origin is simply $\left[ {\begin{array}{*{20}{c}}-x\\-y\end{array}} \right]$

Next, we consider the reflection of a vector across a specific axis. It is clear that a reflection across the x-axis maps the given vector to $\left[ {\begin{array}{*{20}{c}}x\\-y\end{array}} \right]$, while a reflection across the y-axis maps it to $\left[ {\begin{array}{*{20}{c}}-x\\y\end{array}} \right]$. In addition, we observe that the reflection of the vector about the line $y=x$ is $\left[ {\begin{array}{*{20}{c}}y\\x\end{array}} \right]$, and that reflection about the line $y=-x$ maps it to $\left[ {\begin{array}{*{20}{c}}-y\\-x\end{array}} \right]$.

Now that we're done with the trivial ones, let's consider a reflection in the generic line $y=mx$ (having a non-zero y-intercept would mean the transformation isn't linear, although it would still be affine).

Visualisation of reflection transformation

We consider the problem in polar co-ordinates -- the original co-ordinates $(x,y)$ in $(r,\theta)$ form are $r=\sqrt{x^2+y^2}$, $\theta=\arctan(y/x)$. $(x',y')$ has the same value of $r'=r$ (prove it -- hint and [SPOILER ALERT]: congruent triangles), while $\theta'=2\arctan(m)-\arctan(y/x)$. Based on the identity for the sum of three inverse tangents, $\arctan a +\arctan b +\arctan c=\arctan\frac{a+b+c-abc}{1-ab-ac-bc}$ this turns into $\theta'=\arctan\frac{2m-(y/x)+m^2y/x}{1-m^2+2my/x}=\arctan\frac{2mx-y+m^2y}{2my+x-m^2x}$.

Converting back into Cartesian co-ordinates, $x'=r\cos\theta=r\cos\arctan\frac{2mx-y+m^2y}{2my+x-m^2x}$ and $y'=r\sin\theta=r\sin\arctan\frac{2mx-y+m^2y}{2my+x-m^2x}$. Recall that $\cos\arctan Y/X=X/R$ and $\sin\arctan Y/X=Y/R$ where $R=\sqrt{X^2+Y^2}$. In this case, it can easily be shown that $R=(m^2+1)\sqrt{x^2+y^2}=(m^2+1)r$, thus $x' = r\frac{2my+x-m^2x}{(m^2+1)r}$ and $y'=r\frac{2mx-y+m^2y}{(m^2+1)r}$. Hence

$$\left[ {\begin{array}{*{20}{c}}{x'}\\{y'}\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}{ - \frac{{{m^2} - 1}}{{{m^2} + 1}}x + \frac{{2m}}{{{m^2} + 1}}y}\\{\frac{{2m}}{{{m^2} + 1}}x + \frac{{{m^2} - 1}}{{{m^2} + 1}}y}\end{array}} \right]
$$
(You might notice some vague similarity to the half-angle tangent formula and complex numbers -- consider the corresponding complex number $x'+y'i$ and prove that it can be reduced to $x' + y'i = \frac{{1 + im}}{{1 - im}}x - \frac{{1 + im}}{{1-im}}yi$, i.e. $z'=\frac{1+im}{1-im}\bar{z}$)

You might be starting to notice a similarity among these transformations -- the resulting $x'$ and $y'$ are each pure linear forms in $x$ and $y$, i.e. of the form $ax+by$. How might this make our job easier?

Rotation
We know that a counter-clockwise rotation by an angle of $\pi/2$ maps the vector $\left[ {\begin{array}{*{20}{c}}x\\y\end{array}} \right]$ to $\left[ {\begin{array}{*{20}{c}}-y\\x\end{array}} \right]$. Similarly, a rotation by $\pi$ is $\left[ {\begin{array}{*{20}{c}}-x\\-y\end{array}} \right]$ (which can also be obtained by applying the $\pi/2$ transformation twice) and a rotation by $3\pi/2$ yields a transformed vector of $\left[ {\begin{array}{*{20}{c}}y\\-x\end{array}} \right]$.

We now consider a general rotation by $\phi$. Again, the problem becomes much simpler in polar co-ordinates, where it is clear that the transformation transforms $(r,\theta)$ to $(r',\theta')=(r,\theta+\phi)$. Then

$$\begin{array}{c}\left[ {\begin{array}{*{20}{c}}{x'}\\{y'}\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}{r\cos (\theta + \phi )}\\{r\sin (\theta + \phi )}\end{array}} \right]\\ = \left[ {\begin{array}{*{20}{c}}{r(\cos \theta \cos \phi - \sin \theta \sin \phi )}\\{r(\cos \theta \sin \phi + \sin \theta \cos \phi )}\end{array}} \right]\\ = \left[ {\begin{array}{*{20}{c}}{\frac{x}{{\cos \theta }}\cos \theta \cos \phi - \frac{y}{{\sin \theta }}\sin \theta \sin \phi }\\{\frac{x}{{\cos \theta }}\cos \theta \sin \phi + \frac{y}{{\sin \theta }}\sin \theta \cos \phi }\end{array}} \right]\\ = \left[ {\begin{array}{*{20}{c}}{x\cos \phi - y\sin \phi }\\{x\sin \phi + y\cos \phi }\end{array}} \right]\end{array}
$$ 
Matrices
Similarly, one could write dilation/scaling as $(x',y') = (ax,by)$, a shear parallel to the x-axis could be written as $(x',y')=(x+\lambda y,y)$ and a shear parallel to the y-axis could be written as $(x',y')=(x,y+\lambda x)$.

It seems that one needs four numbers to define any linear transformation from $\mathbb{R}^2\rightarrow\mathbb{R}^2$: the coefficient on $x$ in $x'$, the coefficient of $y$ in $x'$, the coefficient on $x$ in $y'$ and the coefficient of $y$ in $y'$. In general, a linear transformation $\mathbb{R}^n\rightarrow\mathbb{R}^n$ could be represented with $n^2$ real numbers.

$n^2$ real numbers can, of course, be arranged into an $n\times n$ array of real numbers, which is precisely what we mean by a "matrix".

The following array shows what each component of a matrix represents:

$$\left[ {\begin{array}{*{20}{c}}{x\backslash x}&{x\backslash y}&{x\backslash z}\\{y\backslash x}&{y\backslash y}&{y\backslash z}\\{z\backslash x}&{z\backslash y}&{z\backslash z}\end{array}} \right]$$
Where the notation $x \backslash y$ refers to the contribution of $y$ to $x'$, i.e. the coefficient on $y$ when $x'$ is written in terms of $x,y,z$. This can, of course, be generalised to any number of dimensions.

With matrices, we now have a way of writing down linear transformations in a clear and explicit way (a co-ordinate dependent one -- an idea which we will eventually come to). Instead of writing the components of the transformed entity (called the image) in terms of the components of the input, we can represent transformations on their own as matrices, which helps us greatly in algebraic manipulation.

Restricting ourselves to two dimensions for simplicity, let's compute the effect of a linear transformation on (1,0) and (0,1). We see that

$$\begin{array}{l}\left[ {\begin{array}{*{20}{c}}a&b\\c&d\end{array}} \right]\left[ {\begin{array}{*{20}{c}}1\\0\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}a\\c\end{array}} \right]\\\left[ {\begin{array}{*{20}{c}}a&b\\c&d\end{array}} \right]\left[ {\begin{array}{*{20}{c}}0\\1\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}b\\d\end{array}} \right]\end{array}$$
This is very important -- you can confirm this for more than two dimensions -- it tells you that the matrix is just made up of columns, each being the image of the basis vectors (arranged in the same order as the components of a vector) under the matrix. In other words, the matrix is the image of the basis under the transformation.

One may notice that a linear transformation on a general vector takes the following form:
$$\left[ {\begin{array}{*{20}{c}}a&b\\c&d\end{array}} \right]\left[ {\begin{array}{*{20}{c}}x\\y\end{array}} \right] = x\left[ {\begin{array}{*{20}{c}}a\\c\end{array}} \right] + y\left[ {\begin{array}{*{20}{c}}b\\d\end{array}} \right]$$
But since (a,c) and (b,d) are simply the images of the basis vectors under the transformation, the result is just a linear combination of the transformed basis vectors -- in fact, the same linear combination as the original vector was of the un-transformed basis vectors. In other words, the transformation $\left[ {\begin{array}{*{20}{c}}a&b\\c&d\end{array}} \right]$ does the following:

$$x\left[ {\begin{array}{*{20}{c}}1\\0\end{array}} \right] + y\left[ {\begin{array}{*{20}{c}}0\\1\end{array}} \right] \to x\left[ {\begin{array}{*{20}{c}}a\\c\end{array}} \right] + y\left[ {\begin{array}{*{20}{c}}b\\d\end{array}} \right]$$
This is, if you haven't recognised it yet, exactly why matrices work.  It is exactly why the observation we made earlier -- that all linear transformations result in components that are some linear combination of the original components -- is true.

This means that to study any linear transformation -- write down its matrix form, etc. -- we only need to look at how it transforms the $n$ basis vectors. Since any vector can be written as a linear combination of these basis vectors (that's why they form a basis), this allows us to encode the entire transformation and decide what impact it has on any other vector. This follows directly from the definition of a linear transformation -- the image of a linear combination of (basis) vectors (i.e. the original vector itself) is equal to the same linear combination of the images of each (basis) vector.

Why can't you divide vectors?
The question "what does $\vec{a}/\vec{b}$ equal?" is equivalent to asking "What do you multiply with $\vec b$ to get $\vec a$?" -- the answer is a matrix, assuming the multiplication involves some contraction afterwards. Equivalently, you multiply and contract a (1, 0) tensor $b^\mu$ by a (1,1) tensor $A_\mu^\nu$ to get a (0,1) tensor $b^\nu$.

But there are multiple matrices you can multiply $\vec b$ by to get $\vec a$. In two dimensions, you need *two* sets of "this maps to this" (and the knowledge that the mapping is linear) to pin down what the linear mapping is. In general in $n$ dimensions, you need $n$ such vectors -- so instead of dividing vectors, you divide *sets of vectors* -- these are called matrices.

(Taken from my answer to "Why is division not defined for vectors?")

No comments:

Post a Comment