### All matrices can be diagonalised over R[X]/(X^n)

This post follows from my answer to the math stackexchange question What kind of matrices are non-diagonalisable?

Non-diagonalisable 2 by 2 matrices can be diagonalised over the dual numbers -- and the "weird cases" like the Galilean transformation are not fundamentally different from the nilpotent matrices.

The intuition here is that the Galilean transformation is sort of a "boundary case" between real-diagonalisability (skews) and complex-diagonalisability (rotations) (which you can sort of think in terms of discriminants). In the case of the Galilean transformation $\left[\begin{array}{*{20}{c}}{1}&{v}\\{0}&{1}\end{array}\right]$, it's a small perturbation away from being diagonalisable, i.e. it sort of has "repeated eigenvectors" (you can visualise this with MatVis). So one may imagine that the two eigenvectors are only an "epsilon" away, where $\varepsilon$ is the unit dual satisfying $\varepsilon^2=0$ (called the "soul"). Indeed, its characteristic polynomial is:

$$(\lambda-1)^2=0$$
Whose solutions among the dual numbers are $\lambda=1+k\varepsilon$ for real $k$. So one may "diagonalise" the Galilean transformation over the dual numbers as e.g.:

$$\left[\begin{array}{*{20}{c}}{1}&{0}\\{0}&{1+v\varepsilon}\end{array}\right]$$
Granted this is not unique, this is formed from the change-of-basis matrix $\left[\begin{array}{*{20}{c}}{1}&{1}\\{0}&{\epsilon}\end{array}\right]$, but any vector of the form $(1,k\varepsilon)$ is a valid eigenvector. You could, if you like, consider this a canonical or "principal value" of the diagonalisation, and in general each diagonalisation corresponds to a limit you can take of real/complex-diagonalisable transformations. Another way of thinking about this is that there is an entire eigenspace spanned by $(1,0)$ and $(1,\varepsilon)$ in that little gap of multiplicity. In this sense, the geometric multiplicity is forced to be equal to the algebraic multiplicity*.

Then a nilpotent matrix with characteristic polynomial $\lambda^2=0$ has solutions $\lambda=k\varepsilon$, and is simply diagonalised as:

$$\left[\begin{array}{*{20}{c}}{0}&{0}\\{0}&{\varepsilon}\end{array}\right]$$
(Think about this.) Indeed, the resulting matrix has minimal polynomial $\lambda^2=0$, and the eigenvectors are as before.

What about higher dimensional matrices? Consider:

$$\left[ {\begin{array}{*{20}{c}}0&v&0\\0&0&w\\0&0&0\end{array}} \right]$$
This is a nilpotent matrix $A$ satisfying $A^3=0$ (but not $A^2=0$). The characteristic polynomial is $\lambda^3=0$. Although $\varepsilon$ might seem like a sensible choice, it doesn't really do the trick -- if you try a diagonalisation of the form $\mathrm{diag}(0,v\varepsilon,w\varepsilon)$, it has minimal polynomial $A^2=0$, which is wrong. Indeed, you won't be able to find three linearly independent eigenvectors to diagonalise the matrix this way -- they'll all take the form $(a+b\varepsilon,0,0)$.

Instead, you need to consider a generalisation of the dual numbers, sometimes called (in computing mathematics and non-standard analysis) the "hyperdual numbers", with the soul satisfying $\epsilon^n=0$. Then the diagonalisation takes for instance the form:

$$\left[ {\begin{array}{*{20}{c}}0&0&0\\0&{v\epsilon}&0\\0&0&{w\epsilon}\end{array}} \right]$$

*Over the reals and complexes, when one defines algebraic multiplicity (as "the multiplicity of the corresponding factor in the characteristic polynomial"), there is a single eigenvalue corresponding to that factor. This is of course no longer true over the hyperdual numbers, because they are not a field, and $ab=0$ no longer implies "$a=0$ or $b=0$".

In general, if you want to prove things about these numbers, the way to formalise them is by constructing them as the quotient $\mathbb{R}[X]/(X^n)$, so you actually have something clear to work with.

(Perhaps relevant: Grassmann numbers as eigenvalues of nilpotent operators -- the Hyperdual numbers are not the same as the Grassmann numbers, and the algebra of the Grassmann numbers is definitely different from that of nilpotent and shear matrices, but go see if you can make sense of it.)

Something important to note is that the diagonalisation is not of the form $D=P^{-1}AP$, as the eigenvector matrices are not invertible. However, it is still true that $PD=AP$ -- nonetheless, this limitation prevents this formalism for being any good for e.g. dealing with polynomial-ish differential equations with repeated roots, for instance, as far as I can see. The infinitesimal-perturbation/"take a limit" approach we talked about in Limiting Cases II: repeated roots of a differential equation are still the right approach for that.