Matrix decompositions

Some decompositions

The algorithms for the QR and LU decompositions are fairly self-explanatory from their definition.

The Cholesky decomposition can be computed for a nonnegative-definite matrix through a simultaneous LU decomposition from both sides. Starting with a matrix in the block form:

\[A = \left[ {\begin{array}{*{20}{c}}
  {{a_1}}&{{v^T}} \\
  v&{{A_2}}
\end{array}} \right]\]
We perform an LU decomposition first on the columns, from the left, then on the rows, from the right:

\[\begin{gathered}
  \left[ {\begin{array}{*{20}{c}}
  {{a_1}}&{{v^T}} \\
  v&{{A_2}}
\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}
  {\sqrt {{a_1}} }&0 \\
  {v/\sqrt {{a_1}} }&1
\end{array}} \right]\left[ {\begin{array}{*{20}{c}}
  {\sqrt {{a_1}} }&{{v^T}/\sqrt {{a_1}} } \\
  0&{{A_2} - v{v^T}/{a_1}}
\end{array}} \right] \\
   = \left[ {\begin{array}{*{20}{c}}
  {\sqrt {{a_1}} }&0 \\
  {v/\sqrt {{a_1}} }&1
\end{array}} \right]\left[ {\begin{array}{*{20}{c}}
  1&0 \\
  0&{{A_2} - v{v^T}/{a_1}}
\end{array}} \right]\left[ {\begin{array}{*{20}{c}}
  {\sqrt {{a_1}} }&{{v^T}/\sqrt a } \\
  0&1
\end{array}} \right] \\
\end{gathered} \]
The process can then be continued to give an LU decomposition where the $L$ and $U$ are transpose, thus the Cholesky decomposition.

(Why does the matrix have to be nonnegative-definite?)

Eigenvalue algorithm

A lot of factorisations are fundamentally related to the "eigen"-stuff: this includes the change-of-basis factorizations: diagonalization, SVD, Schur -- and also the polar decomposition. Eigenstuff is fundamentally analytic (whatever that means), so terminating algorithms won't work, we need an asymptotic algorithm.

A simple such algorithm to calculate the eigenvectors and eigenvalues is power iteration. It is related to the flows of systems of linear differential equations, in which the eigenspace with the largest eigenvalue provides the only stable equilibrium of the system. Unless you choose an initial vector $v$ that happens to itself be an eigenvector, $A^nv$ will approach the projection of $v$ onto the principal eigenspace of $A$.

So that gives you the largest eigenvalue. There are many ways to get the remaining eigenvalues -- one inefficient way I thought of was to subtract off the part corresponding to the stretching of the primary eigenvector, but this is incredibly inefficient and numerically unstable.

An actually sensible approach is as follows: consider the matrix $(A-\mu I)^{-1}$, which has the same eigenspaces as $A$. The largest eigenvalue of this is the eigenvalue of $A$ closest to $\mu$. By varying $\mu$ across the real line, one can discover all the eigenvalues of $A$.

Schur algorithm

You can use the eigenvalue algorithm to calculate the Schur decomposition directly via its definition -- however, there is a more efficient method.

A triangular matrix is invariant under conjugation by the $Q$ in its own $QR$ decomposition (because the $Q$ is the identity matrix). So similar to power iteration, one may iteratively QR-factorize $A$ and replace it with $RQ$. This is called the QR algorithm.

Apparently this is expensive in its crude form, so you instead bring into a form called "upper Hessenberg form" (upper-triangular but with a subdiagonal) and that apparently makes the computations less expensive. The mechanism to bring a matrix into upper Hessenberg form is based on householder reflections, and involves doing householder reflections on the parts of the columns below the subdiagonal, so the corresponding householder reflections on the rows (since the Schur is a change-of-basis transformation) do not interfere with the column of interest and screw up all your zeroes. It's not very interesting.

Least-squares

Suppose you wanted to solve $Ax\approx b$, i.e. find the $x$ that minimizes $\|Ax-b\|$. This is relevant when $A$ is not surjective.

The idea is that $\|Ax-b\|$ is minimised when $Ax$ is the projection $b_A$ of $b$ onto the image of $A$. One can equivalently write $A^{T}(Ax-b)=0$, as $Ax-b$ is perpendicular to the column space of $A$. Thus it suffices to solve $A^TAx=A^Tb$.

If $A$ is full-rank (this does not mean surjective), we can actually have a unique $x$, given by $x=(A^TA)^{-1}A^Tb$, and $A^+=(A^TA)^{-1}A^T$ is called the "Moore-Penrose pseudoinverse" of $A$.

An alternative, more efficient algorithm is to use the QR factorization (the one with the non-square $Q$) to construct the projector as $QQ^T$ (which works, as $Q$ definitionally provides a basis for the image, and $QQ^T$ satisfies the defining property of a Hermitian projector). Then one can solve $QRx=QQ^Tb$, or equivalently $Rx=Q^Tb$. In the full-rank case, $R^{-1}Q^T$ is the Moore-Penrose pseudoinverse.

To add:
  • SVD algorithm
  • Algorithm complexity for each thing

No comments:

Post a Comment