### Invariant and generalised eigenspaces; Jordan normal form

We defined "eigenvectors" -- or really "eigenlines" -- in order to understand the behaviour of linear transformations as scalings across certain axes (which may be complex, and the scalings may be complex too). But simply thinking of eigenlines as 1-dimensional spaces that a transformation leaves invariant (the fancy phrase here is: "the 1-dimensional subspaces on which it is an endomorphism"), it is natural to wonder about higher-dimensional invariant spaces -- subspaces on which some transformation $A$ acts as an endomorphism.

The problem is that any transformation has all sorts of useless invariant subspaces -- for instance any transformation $A:F^n\to F^m$ (where $m \le n$) has the entirety of $F^n$ as an invariant subspace (for any $F$), and rotations -- although fully described by their eigenvectors and eigenvalues -- have a bunch of real and complex planes as unnecessary invariant subspaces. And if $A$ has an eigenvalue with geometric multiplicity $>1$, there are an infinite number of useless invariant subspaces.

Specifically, if the goal is to find useful representations of defective matrices (non-diagonalisable), invariant subspaces seem completely useless -- they certainly have no hope of giving us any sort of unique representation. Perhaps more on the point, our "eigenlines" have corresponding eigenvalues that tell us how the transformation behaves within an eigenline. Our invariant subspaces currently have nothing of the sort -- the transformation can have any sort of behaviour on the invariant subspace -- rotation, skewing/scaling, shearing, skewering -- and we'd have no idea. We need a convenient way to write down the behaviour of the transformation on an invariant subspace.

Here's something we can start to think about: ordinary eigenvectors satisfy $(A-\lambda I)v=0$, which gives us a one-dimensional solution space. In analogy with solutions to linear differential equations (linear homogenous if you use the conventional terminology, but I reserve "affine" for linear non-homogenous), an equation like

$$(A-\lambda_1 I)(A-\lambda_2 I)v=0$$
(where ${\lambda _1},{\lambda _2}$ are both eigenvalues of $A$) would have a 2-dimensional solution space, etc.

Note that when $\lambda_1, \lambda_2$ are not eigenvalues, we don't have a 2-dimensional solution space (what does the solution space look like then?). Why does it work with differential equations for any $\lambda_1, \lambda_2$? (hint: what do the eigenvalues look like?)

It's sensible to ask: are these solution spaces the same as our invariant subspaces? I.e. is every member of a $k$-dimensional invariant subspace a solution to an equation of the form

$$(A - {\lambda _1}I)...(A - {\lambda _k}I)v = 0$$
for eigenvalues $\lambda_1,...\lambda_k$?

The answer is yes. I encourage you to try and prove it for yourself -- it is instructive to first consider special cases: (i) rotation in a plane, where indeed $(A-iI)(A+iI)=A^2+1=0$ is the minimal polynomial of $A$ (ii) more generally, $F^n$ is an invariant subspace for all isomorphisms, and indeed for all $v$ in this subspace (i.e. $\forall v \in F^n$), $p(A)v=0$ where $p$ is the characteristic polynomial of $A$ by the Cayley-Hamilton theorem.

The key to proving that every invariant subspace is given by solutions to an equation of the form $(A - {\lambda _1}I)...(A - {\lambda _k}I)v = 0$ (and vice versa) lies in recognising that on any $k$-dimensional invariant subspace, $A$ is acts as an endomorphism, and therefore the Cayley-Hamilton theorem applies to it, with a $k$-order characteristic polynomial.

I encourage you to spend some time thinking about this -- try relating it to differential equations. Come up with another proof of the statement -- an inductive one. See if this results in a better intuition for the Cayley-Hamilton theorem.

Is it true that there $2^n$ invariant subspaces of any transformation on an $n$-dimensional linear space? What about the identity transformation?

Now, let's discard the invariant subspaces we don't want. We already know how to handle cases with distinct eigenvalues -- i.e. we have distinct eigenvalues in $\lambda_1...\lambda_k$ -- we just get an eigenvector for each eigenvalue. So we're really just concerned with subspaces of the form ${(A - \lambda I)^k}v = 0$. This is analogous to linear differential equations with repeated roots being weirder than ones with distinct roots.

Note that we still know how to handle ${(A - \lambda I)^k}v = 0$-like equations when the algebraic multiplicity is accounted for by geometric multiplicity -- when this is the case, you can reduce the power from $k$ (by subtracting from it the geometric multiplicity). This nuance doesn't exist with differential equations, because distinct eigenvectors have distinct eigenvalues.

Vectors satisfying such an equation are called generalised eigenvectors of order $k$ where $k$ is the minimum value for which it does satisfy the equation, and the invariant subspaces formed by generalised eigenvectors of the same eigenvalue are called generalised eigenspaces. The dimension of the generalised eigenspace always equals the algebraic multiplicity, unlike the eigenspace, whose dimension equals the geometric multiplicity.

Check this, and that $k$ is the difference between algebraic and geometric multiplicity.

What kind of transformations precisely do generalised eigenvectors with degree greater than 1 correspond to? Clearly, skews and rotations are out of the question. But some insight can be gained from looking at the nature of skews and rotations on a plane.

In two dimensions, a characteristic polynomial with a positive discriminant yields a skew along some axis, a negative discriminant yields a rotation, and the case we're interested -- the presence of repeated roots -- corresponds to the point "between" skews and rotations (speaking hand-wavily), shears.

In a more general setting, if one has: ${(A - \lambda I)^k}v_k = 0$  for generalised eigenvector  $v_k$of degree $k$, then one can extract generalised eigenvectors of each degree lower:

$$\begin{array}{l}{(A - \lambda )^i}{(A - \lambda )^{k - i}}{v_k} = 0 \\ \Rightarrow {v_i} = {(A - \lambda I)^{k - i}}{v_k}\end{array}$$
Implying that the generalised eigenvectors with the same eigenvalue (can) form a basis for the corresponding generalised eigenspace.

$$\begin{array}{*{20}{r}}{(A - \lambda I){v_1} = 0 \Leftrightarrow A{v_1} = \lambda {v_1}}\\{{{(A - \lambda I)}^2}{v_2} = 0 \Leftarrow (A - \lambda I){v_2} = {c_1}{v_1} \Leftrightarrow A{v_2} = {c_1}{v_1} + \lambda {v_2}}\\{{{(A - \lambda I)}^3}{v_3} = 0 \Leftarrow (A - \lambda I){v_3} = {c_2}{v_2} \Leftrightarrow A{v_3} = {c_2}{v_2} + \lambda {v_3}}\\{ \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,}\\{{{(A - \lambda I)}^k}{v_k} = 0 \Leftarrow (A - \lambda I){v_k} = {c_{k - 1}}{v_{k - 1}} \Leftrightarrow A{v_k} = {c_{k - 1}}{v_{k - 1}} + \lambda {v_k}}\end{array}$$
This gives us a very clear picture of how these general "sheary" transformations look in the basis of the generalised eigenvectors -- there is a shear in each plane $\left\langle {{v_{k - 1}},{v_k}} \right\rangle$. Suitable scalings could of course be chosen to make all the $c_i=1$.

It's hard to overstate the significance of this -- what we've just found is that any defective matrix can be decomposed into shears on each of its generalised eigenspaces. This completely classifies the diagonalisability of matrices. If it's defective, it's a shear.

Draw out some of these transformations in three or more dimensions.

Notice the directions of the implication signs -- can we make them double-sided? What if we have some geometric multiplicity? How would our shears look like then? How many dimensions do you need to visualise this?

Think about why this characterisation of defective matrices makes sense. What effect does adding a 1 to the subdiagonal have on the determinant? Why? (hint: area of a parallelogram) What about the other coefficients of the characteristic polynomial? (hint: think of these in terms of traces of some matrix). So these matrices are precisely those which have the same characteristic polynomial as a diagonalisable matrix without actually being similar to one.

Clearly, the generalised eigenspaces of a transformation are pairwise disjoint (i.e. intersect only at the origin). Since all eigenvalues are being considered, their union is all of $F^n$. Thus the union of their bases forms a basis for $F^n$. This gives us a representation of the transformation $A$ in this basis.

From the last section, it is clear that the effective transformation on a $k$-dimensional eigenspace with eigenvalue $\lambda$ (called a "Jordan block") is given by:

$$\left[ {\begin{array}{*{20}{c}}\lambda &0&0& \cdots &0\\1&\lambda &0& \cdots &0\\0&1&\lambda & \cdots &0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\0&0&0& \cdots &\lambda \end{array}} \right]$$
The Jordan normal form of a transformation is then the matrix formed by putting all Jordan normal blocks along the diagonal -- i.e. the representation of $A$ in its generalised eigenbasis.

Some writers define the Jordan normal form with the 1's on the superdiagonal. It should be clear to you from the work we've done that this form is obtained by taking the basis vectors in reverse order (i.e. changing the basis to $\langle e_n,...,e_1 \rangle$. If this isn't clear to you, go back and work through the previous section once more. Loop.

(Stay tuned for the next article to see how we can -- instead of defining generalised eigenspaces whose dimension is defined by the algebraic, rather than geometric multiplicity -- "force" algebraic multiplicity to equal geometric multiplicity, i.e. diagonalise any matrix -- with a ring extension.)