Triangular matrices: Schur, QR, Cholesky and LU/LPU decompositions

Upper-triangular matrices seem like they should be something interesting. They seem to be far more "controllable" in their behavior than general matrices -- the first dimension is just stretched, and each successive dimension's behavior does not depend on the higher dimensions. And there are of course practical computational reasons why they're simpler in a certain sense.

It's sensible to wonder if a general matrix can be understood in this way -- yes, yes, of course you can with the Jordan normal form, but if there's something more interesting to be said.

Some properties of triangular matrices:
  • The determinant of a triangular matrix is the product of its diagonal values: this is a generalization of the fact about the area of a parallelogram being its base times perpendicular height.
  • The eigenvalues of a triangular matrix are its diagonal values, including (algebraic) multiplicity: Follows from the above since $A-\lambda I$ remains triangular. 
  • Normal triangular matrices are diagonal: The $k+1$th eigenvector must be in the orthogonal complement of $\mathbb{R}^k$ in $\mathbb{R}^{k+1}$, which is one-dimensional and just the $x_{k+1}$-axis. 
I tend to think that all linear transformations already "geometrically" look like triangular matrices, though -- like you should be able to tilt your head some way so that the transformation. What this means is that we believe all linear transformations are unitarily triangularizable.

Is this so?

We can certainly identify the first vector in the change-of-basis matrix -- an actual eigenvector of the matrix. This corresponds to the eigenvalue represented by the top-left element of the triangular matrix.

Now we want a vector whose image is a linear combination of this eigenvector and itself. Here's an idea: take the projection of the matrix onto the orthogonal complement of the first eigenvector. Then this projection itself has an eigenvector, and the action of the entire matrix on this vector is it scaled plus some component in the direction orthogonal to the subspace, i.e. proportional to the first eigenvector. We can repeat this process to obtain the Schur decomposition of $A$:

$$A=UTU^{*}$$
For unitary $U$, triangular $T$.

(By the way, a sequence of embedded subspaces of consecutively increasing dimension is known as a flag. For this reason, the Schur decomposition is said to say that every linear transformation stabilizes a flag.)



The Gram-Schmidt process might remind you of triangular matrices in the nature of the transformations applied.

$$\begin{array}{*{20}{c}}
  {{v_1} = {x_1}}&{{u_1} = \frac{{{v_1}}}{{\left| {{v_1}} \right|}}} \\
  {{v_2} = {x_2} - ({x_2} \cdot {u_1}){u_1}}&{{u_2} = \frac{{{v_2}}}{{\left| {{v_2}} \right|}}} \\
   \vdots & \vdots  \\
  {{v_k} = {x_k} - \sum {({x_i} \cdot {u_i}){u_i}} }&{{u_k} = \frac{{{v_k}}}{{\left| {{v_k}} \right|}}} \\
   \vdots & \vdots
\end{array}$$
Indeed, one may check that this is equivalent to writing $A=QR$ for unitary $Q$, triangular $R$. This factorization is called the QR-factorization.

But there's another way to think about this, right? What $A=QR$ says is that the transformation $A$ arises from first performing a triangular transformation, then rotating it into the desired orientation.

Imagine this in three dimensions, which I couldn't be bothered to draw.
And one can easily check that the algorithm for doing so is precisely the Gram-Schmidt algorithm.

But there's yet another way to see this factorization. Instead of just calculating the orthonormal vectors at each step, one could actually try to transform our column vectors into those of the triangular matrix through a bunch of orthogonal transformations.

One specific way of doing this is through reflections, specifically reflections in $n-1$-dimensional planes, known as Householder reflections.
  1. The first reflection is in the plane midway between $a_1$ and $e_1$, and brings $a_1$ to the $e_1$ axis.
  2. The next reflection is in the plane that contains the $e_1$ axis and is midway between $a_2$ and the $(e_1,e_2)$ plane, and brings $a_2$ onto the $(e_1,e_2)$ plane.
  3. The next reflection is in the plane that contains the $(e_1,e_2)$ plane and is midway between $x_3$ and the $(e_1,e_2,e_3)$ plane, and brings $a_3$ onto the $(e_1, e_2, e_3)$ plane.
  4. ...
I.e. in step $i$ (between 1 and $n$), all the vectors of $A$ are reflected in the plane that contains the $(e_1,\dots e_{i-1})$ plane and is midway between $a_i$ and the $(e_1,\dots e_{i})$ plane. The vector is then never reflected again, as it is in all the future planes of reflection.

The reflection matrix $Q_i$ can be seen to be of the form (do you see why?):

$${Q_i} = \left[ {\begin{array}{*{20}{c}}
  I&0 \\
  0&{{F_i}}
\end{array}} \right]$$
Where $F_i$ is $(n-i)$-dimensional.

Here's how we can calculate $F_i$: we know that $F_i$ transforms the latter $n-k$ elements of $a_i$ like this:

\[{a^{i + 1:n}_i} \mapsto \left[ {\begin{array}{*{20}{c}}
  {\left| {{a^{i + 1:n}_i}} \right|} \\
  0 \\
   \vdots  \\
  0
\end{array}} \right]\]
Although this generally isn't enough to pin down a transformation, we already know that the transformation is a reflection in an $n-1$-dimensional plane. We know that the reflection matrix can be given as $1-2hh^T$ where $h$ is the unit normal to the plane of reflection. We can calculate this normal (up to normalization) as $F_ia^{i+1:n}_i-a^{i+1:n}_i$.

The QR decomposition is basically just another "all matrices are basically ____" theorem -- similar to the polar decomposition "all matrices are basically positive-definite" -- they're just a rotation away.

In particular, this gives us a new square root of a positive-definite matrix $M=AA^T$ -- much like the polar decomposition gives a unique positive-definite square root of a positive-definite matrix, the QR decomposition gives a unique triangular square root of a positive-definite matrix $M=RR^T$.

This is known as the Cholesky decomposition.

(BTW, we can QR-decompose an $m$ by $n$ matrix ($m\ge n$) too. There are two ways to do this

  • Thinking in the Gram-Schmidt sense: we're transforming into an orthogonal basis is the image of $A$, so $Q$ is an $m$ by $n$ matrix of orthonormal vectors and $R$ is an $n$ by $n$ matrix.
  • Thinking in the Householder sense, we're obliged to end up with an $m$ by $m$ $Q$ and an $m$ by $n$ $R$ (with the bottom extra rows being zeroes). Then the extra columns of $Q$ are just some arbitrary basis for the orthogonal complement of the image of $A$ (i.e. the left null space of $A$).) 



We've all solved linear equations $Ax=b$ through elimination, but let's get a consistent general algorithm (Gaussian elimination) to do so.

The idea is to get a $A$ down to a triangular form, isn't it? And we can do that by zeroing each column one-by-one. That's LU decomposition. But sometimes you have a zero in the row you don't want to zero that column of, so you can't multiply it with infinity, so just swap two rows ("pivoting") and continue. That's LPU decomposition, where $P$ is a permutation matrix (the $P$ can come before or after the $LU$, too -- the fact that you can factor out the permutation matrix this way is left as an exercise). 

No comments:

Post a Comment