SVD, polar decomposition, normal matrices; a re-look at transposes and FTLA

Back in Null, row spaces, transpose, fundamental theorem of algebra, we first introduced some hand-wavy intuition for the transpose and the orthogonality of the row space and the null space (and the following fundamental theorem of linear algebra). Here, we solidify this intuition a bit more clearly.

Consider the "symmetric collapse" discussed in the above article. Our study of the transformation relied specifically on looking at it in a specific basis -- an orthogonal basis -- comprised of the column space and the null space. In this basis, the transformation is a scaling on both axes. In the more general case of an asymmetric collapse -- in which we rotated our space before collapsing, we looked at a basis formed by the row space and the null space -- the basis got rotated and scaled into the new basis, that was the column space and an arbitrary other vector (that could be perpendicular to the column space).

A sensible question to ask is if any transformation can be written in this fashion -- as a transformation of an orthogonal basis into another orthogonal basis. Analogous to how an eigenvalue decomposition of a matrix writes it as scalings in some basis, we're looking to represent the matrix as a spiral (i.e. a scaling combined with a rotation) in some basis. But let's stick with the first formulation of our question -- for any linear transformation $A: \mathbb{R}^n \to \mathbb{R}^m$, can we find an orthogonal basis on $\mathbb{R}^n$ that is mapped to an orthogonal basis $\mathbb{R}^m$?

One could, e.g. consider the images under $A$ of the angle of each orthonormal basis in $\mathbb{R}^2$ (i.e. look at the function $AU(\theta)\vec{e}_1 \cdot AU(\theta)\vec {e}_2$ for varying $\theta$ where $U(\theta)$ is the rotation matrix by angle $\theta$) and apply the intermediate value theorem, etc. And such a proof could in principle be extended to $\mathbb{R}^n$.

(See here for a thorough explanation.)

Here's another, more insightful way you might come to prove this -- we've been visualising linear transformations so far by looking at the image of the basis vectors, but another way to do visualise these transformations is by looking at the image of the unit circle under the transformation.

Why does this make sense? Well, one can find an ellipse passing through any two vectors centered at the origin. Elaborate on this argument. Is the resulting ellipse unique? (Hint: no, unless you mark points on the circumference)

Specifically, consider the axes of the image ellipse $\sigma_1 u_1$, $\sigma_2 u_2$ where $u_1$, $u_2$ are unit vectors. In the original unit circle, any pair of orthogonal vectors on the circle can be axes, so consider the pre-image of the axes of the image ellipse $v_1$, $v_2$.  So we have:
$$ A v_1 = \sigma_1 u_1\\
A v_2 = \sigma_2 u_2 $$
Or in general:
$$AV = U\Sigma\\
A = U\Sigma V^*$$
Where $\Sigma$ is diagonal and positive-definite, while $U$ and $V$ are orthogonal/unitary. This is called the Singular-Value Decomposition (SVD) of $A$.

In a sense, one can view this as an alternative to the eigen-decomposition. In the eigendecomposition, one looks for a single basis in which the transformation is a scaling. In the singular value decomposition, one looks at scaling and then "re-interpreting" in another basis, but requires that the bases be orthogonal, and that the diagonal matrix be positive and real-valued.

The entries $\Sigma$ are called the singular values of $A$, the columns of $U$ and $V$ respectively are the left-singular vectors and the right-singular vectors of $A$ respectively.

(Exercise: You know that $\Sigma$ is the scaling of the orthogonal basis, i.e. of the right-singular basis. Convince yourself that the rotation of the basis is given by $UV^*$.)

This gives us a much better intuition for the transpose. The SVD of $A^*$ is clearly:

$$A^* = V\Sigma U^*$$
I.e. the transpose has precisely the opposite rotational effect as $A$ and the same scaling. This is as opposed to the inverse matrix, which has both the opposite rotational and scaling effect as the matrix. For a rotation matrix (or generally an orthogonal matrix $A^*A=I$), the transpose equals the inverse, analogous to how the conjugate equals the inverse for a unit complex number $\bar{z}z=1$. A Hermitian matrix, $A=A^*$, by contrast, is one for which is irrotational, $UV^*=1$, i.e. for which the SVD equals the eigendecomposition.

Use the SVD to get some intuition for transpose identities like $(AB)^*=B^*A^*$

It's instructive to consider the SVD in the case of our original motivating example -- an asymmetric matrix representing a collapse of $\mathbb{R}^2$ into a line. What are the singular bases of this transformation? Well, it maps the orthogonal basis formed by row space and the null space into the orthogonal basis formed by the column space and the left-nullspace.

Think about its transpose.

Arranging the singular values from largest to smallest, we then have the following relation between the SVD and the items in the fundamental theorem of linear algebra, where $n$ is the dimension of the domain, $m$ is the dimension of the codomain, and $r$ is the dimension of the image/column space:
  • The last $n - r$ singular values are zeroes
  • The first $r$ singular values are positive.
  • The last $n - r$ right-singular vectors span the null space.
  • The first $r$ right-singular vectors span the row space.
  • The last $m-r$ left-singular vectors span the left-null space.
  • The first $r$ left-singular vectors span the column space.
Note that the terms kernel, coimage, cokernel and image are also used for the null space, row space, left-null space and column space, sometimes in a more general setting.

This is the full form of the Fundamental Theorem of Linear Algebra, which generalizes the rank-nullity theorem, row rank equals column rank and row space being perpendicular to null space.

Spend some time thinking about the SVD of non-square matrices, relating them to square collapse matrices. Think about their transposes.

Show that the right-singular vectors $V$ of a matrix $A$ are given by the eigenvectors of $A^*A$ (hint: start by considering the two-dimensional case, relating the right-singular vectors to a maximisation/minimisation problem, and extend the idea to more dimensions).

From this, it is clear that the left-singular eigenvectors $U$ (which are the right-singular eigenvectors of $A^*$) are given by the eigenvectors of $AA^*$ and the singular values are the square roots of the singular values of $A^*A$. Well, some of them are (which ones?).

We observed earlier that the rotational effect of the matrix $A = U \Sigma V^*$ can be given by $UV^*$. The scaling effect is given by $\Sigma$ on the basis of $V$. Hence we can write:

$$A = (UV^*)(V\Sigma V^*)$$
Letting $W=UV^*$ and $R = V\Sigma V^*$, this gives us a representation of $A$ as:

Where $W$ is orthogonal and $R$ is positive-semidefinite. This is known as the right-polar decomposition of $A$. Analogously, one may consider the left-polar decomposition:

$$A = (U\Sigma U^*)(UV^*) = R'W$$
Interpret the above decomposition like we did the right decomposition. Note how $R' = WRW^*$, and how a right-polar decomposition leads to a left-polar decomposition of the transpose, and vice versa.

When is the polar decomposition unique? Compare the situation to the polar decomposition of complex numbers.

Here's a question: when is $R=R'$? One way of putting it is that $WRW^* = R$, i.e. $R$ commutes with $W$. This is a bit difficult to work with. Instead, one may show that the matrices $R$ and $R'$ can be given by $(A^*A)^{1/2}$ and $(AA^*)^{1/2}$ respectively (prove it!) -- noting that a principal square root can uniquely be defined for positive semidefinite matrices. So $R=R'\Leftrightarrow A^*A=AA^*$. These are known as normal matrices. Below is an exercise that provides more intuition into the behaviour of normal matrices.

What does it mean for a matrix to commute with its transpose? Earlier, when discussing commuting matrices, we referred to it as "matrices that do not disturb each other" -- specifically, they preserve each other's (generalised) eigenspaces. In the case of commuting with its transpose, it's easy to show that this means having the same eigenvectors (prove this!).

Here's a fact about the eigenvectors of the Hermitian transpose: the eigenvector of $A$ corresponding to the eigenvalue $\lambda$ is orthogonal to all eigenvectors of $A^*$ corresponding to any eigenvalue other than $\lambda^*$ (prove this!).

From these two facts, it follows that the following are equivalent (write down the proofs clearly!):

  • $A$ is normal.
  • $A$ commutes with $A^*$.
  • $R$ commutes with $W$.
  • $A$ is unitarily diagonalisable.
This is known as the spectral theorem.

(Anyone ever thought about how weird the word "normal" is? Sometimes, it means "perpendicular", sometimes -- as in "orthonormal" and "normalisation" -- it means "unit length", because "norm". What does it mean in this context? It's probably just referring to its eigenvectors being normal/orthogonal, but I like to think it's referring to the fact that the two alternative Hermitian-valued "norms" of the matrix, $A^*A$ and $AA^*$, are equal, so the matrix has a single "norm".)

Fundamentally, normal matrices are "analogous" to complex numbers, or they "generalise" complex numbers, in the sense that each of its eigenvalues acts as a complex number, transforming, acting as spirals on each of the orthogonal eigenvectors within its own copy of $\mathbb{C}$. One may construct the following table of analogies between normal matrices and complex numbers:

Complex numbers Normal matrices
Zero (sorta) Singular
Non-zero Invertible
Real Hermitian
Positive real Positive-definite
Nonnegative real Positive-semidefinite
Imaginary Anti-Hermitian
Unit Unitary
Conjugate Hermitian transpose
Norm-squared Gram matrix $A^*A$
Magnitude $(A^*A)^{1/2}=R=V\Sigma V^*$
Argument $AR^{-1}=W=UV^*$
Real Part $\frac12(A+A^*)$
Imaginary Part times $i$ $\frac12(A-A^*)$

Exercise: an EP matrix or range-Hermitian matrix is a weakened version of a Hermitian matrix -- the row space of the matrix equals the column space. Although this was a bit hard to understand in the first article and was only briefly mentioned towards the end, we now have the intuition to comprehend them. Explain why a matrix is range-Hermitian if and only if it is unitarily similar to a matrix of the form of a block matrix:

$$\left[ {\begin{array}{*{20}{c}}C&0\\0&0\end{array}} \right]$$
Where $C$ is a non-singular square matrix and the zeroes are zero block matrices. This decomposition is called the core-nilpotent decomposition. Hence, show that being range-Hermitian is a weakened form of being normal.

No comments:

Post a Comment