Geometry, positive definiteness, and Sylvester's law of inertia

Something I found absurdly dissatisfying when first studying linear algebra was the idea of a positive-definite matrix (or a nonnegative-definite one). Here are some explanations of the idea I found online:
  • it's a generalisation of a positive real number -- correct, but why? In what sense? Sure, you could say "all the eigenvalues are positive real numbers in an orthogonal eigenbasis" -- but why is this really important? It just doesn't feel complete. This will pretty much be our second motivation, but with more backstory.
  • it keeps vectors "roughly" near where they started -- this is by far the worst explanation I've found of the idea -- the condition that $x\cdot Ax > 0$ means that any vector remains within a right angle from where it started. Not only is this terrible because it seems absurdly arbitrary (to choose $\pi/2$ as our special angle), but it also fails to make clear why we're interested only in positive-definite symmetric matrices, analogous to positive real numbers. On the reals, certainly, there are plenty of transformations that achieve this with real vectors (rotations of less than $\pi/2$, for instance), but we don't care about them. The most serious problem with this explanation, though, is that it tries to reinterpret the condition $x^TAx>0$ in terms of the conventional Euclidean dot product, while the whole point is to look at generalised dot products, with bilinear forms other than the Euclidean one.
In this article, I'll motivate the idea of a positive-definite matrix by considering "generalised geometries" and generalised inner products -- through a series of exercises (well, I'll try to keep them exercises, but maybe I won't be able to stop myself from answering them myself).



"Definite geometries"

  1. First, we come up with a definition of geometry (no pun intended). Much of the linear algebra we've dealt with -- specifically the dot product -- was with Euclidean geometry in mind, and it's interesting to think about what kind of linear algebra we would come up with if we considered other sorts of geometries.
    1. The first image in your head when hearing of geometry is that of a space, or manifold -- perhaps $\mathbb{R}^n$. But a space is just a set of points. Most geometric properties you've dealt with deal with properties like length and angles and shapes. These properties don't depend on e.g. where you place an object on the manifold, i.e. translations -- as well as some other transformations. Can you characterise all the linear transformations under which geometric properties (like the ones we mentioned) are invariant under? -- i.e. the symmetries of Euclidean geometry.
    2. These transformations are known as "rigid transformations" and form a group (prove it if you want, but come on -- they're symmetries, of course they form a group). Can you identify this group (discard translations if you prefer)?
    3. So Euclidean geometry can be defined as a the symmetries of $\mathbb{R}^n$ under the group $O(n)$ acting on it. It is then natural to generally define a geometry as the symmetries of a manifold under a group acting on it
  2. Now that we have generalised our definition of a geometry, let's specialise to a specific sort of geometry somewhat "analogous" to the traditional orthogonal-group (i.e. Euclidean) geometry. We will let our space be $\mathbb{R}^n$ or $\mathbb{C}^n$ but experiment with our group. By analogous, we mean not that the geometries are identical, but at least that the same notions -- like lengths and angles and shapes -- can be defined for them, that the ideas in the geometry aren't completely foreign to us.
    1. Much like the Orthogonal group can be defined by the invariance of the identity bilinear form $\mathrm{diag}(1,1,1,1)$ under a "bilinear form similarity transformation", more commonly known as a congruence (i.e. $A^T I A = I$), we can consider groups that are defined by some general $A^T M A = M$. 
    2. Obviously, not all matrix groups can be written this way -- for example, any subgroup of $O(n)$ cannot be. But groups of this form define in some sense geometries not very different from Euclidean geometry -- why? Because the form preserved by the Orthogonal group -- the identity form -- is the dot product on Euclidean geometry. Preservation of the identity form is equivalent to the preservation of the Euclidean dot product -- prove this -- which also means lengths and angles are preserved. 
    3. As any dot product is necessarily a bilinear form, it can be represented by a bilinear form $M$ called the metric as $v^T M w$, and its preservation is equivalent to the preservation of $M$ under bilinear form conjugation, i.e. $A^T M A = M$ -- prove this! (the proofs are absurdly trivial).
    4. Examples of such groups include: the "indefinite orthogonal group", which you may know as the Lorentz group $O(1,3)$ from special relativity, the group of linear transformations that preserve the bilinear form $\mathrm{diag}(-1,1,1,1)$ (called the Minkowski metric). Indeed, Minkowskian geometry has notions of length (the spacetime interval) and angles (some combination of rapidity and angles).
  3. Next, we are interested in thinking about when two geometries are "basically the same".
    1. Try to write down some simple two-dimensional geometries -- consider e.g. $M = \left[ {\begin{array}{*{20}{c}}0&{ - 1}\\1&0\end{array}} \right]$. Study some of its properties. (this is "symplectic geometry", by the way) Do you think this is the "same" as Euclidean geometry?
    2. Think about what kind of properties you looked at to establish the answer as "No". Use them to come up with a simple and snappy definition of two geometries being the same, or isomorphic. 
    3. You should have come to the conclusion -- two geometries on the same manifold are isomorphic iff their groups are isomorphic (If you read this before figuring out the answer for yourself, bleach your brain and try again.) (For legal reasons, that's a joke.)
    4. Let's study some examples of such an isomorphism. The trivial case is where the groups are equal, e.g. if $M = kN$ or $M = N^T$ (prove these). What about some non-trivial isomorphism? Here's an idea: groups defined by congruent metrics are isomorphic. I.e. if $M=P^TNP$ for some change of basis matrix $P$, the groups $\{A^TMA=M\}$ and $\{A^TNA=N\}$ are isomorphic. Prove this (again, the proof is trivial) -- you will see that the isomorphism is a similarity relation $A \leftrightarrow P^{-1}AP$.
    5. Is this an iff statement? If two groups bilinear form-preserving groups are isomorphic, is there a way to write them as $\{A^HMA=M\}$, $\{A^HNA=N\}$ such that $M$ and $N$ congruent? I'm not sure. It would suffice to prove that all isomorphisms of a matrix group are similarity transformations. Perhaps this is implied by Isomorphic iff potentially conjugate, but how do we know the conjugacy isn't in some weird group that $GL_{\mathbb{R}}(n)$ is homomorphic to?
    6. One can also consider the case of non-invertible $P$ -- do we have an isomorphism between $M$-preservers and $N$-preservers if $M=P^TNP$ for non-invertible $P$? No? What about a homomorphism? In which direction?
  4. So which geometries are isomorphic to Euclidean geometry? What matrices are congruent to the identity form?
    1. No prizes for saying "metric tensors of the form $P^TP$ (or more generally $P^H P$)" for invertible $P$. These matrices -- those that are congruent to the identity matrix -- are called positive-definite matrices. Along the lines of part f above, one can also consider the case of non-invertible $P$ -- these are called positive-semidefinite or nonnegative-definite matrices, and Euclidean geometry is homomorphic to positive-semidefinite geometries.
    2. To be fair, these aren't the only geometries isomorphic to Euclidean geometry -- remember the trivial isomorphisms? So for instance, negative-definite geometries are also isomorphic to Euclidean geometry.
    3. A neat way to visualise these "isomorphisms and homomorphisms of geometries" is by looking at the contours of the geometries, i.e. the set $x^TMx = C$. Positive definite (and negative definite) matrices correspond to elliptical contours (while positive semidefinite matrices correspond to the degenerate cases -- a degenerate ellipse has all the symmetries of a non-degenerate one, but not vice versa), which can easily be stretched into a Euclidean circle. Other matrices, on the other hand, may have hyperbolic contours which cannot be similarly deformed into a Euclidean circle.
      any symmetry of the ellipse/circle can be mapped homomorphically to a symmetry of the straight lines, but not vice versa.
    4. Recall that the equation of an ellipse takes the form $\mathop \sum \limits_{i = 1}^n {a_i}{x_i}^2 = C$ for positive $a_i$. So our interpretation above is equivalent to stating that a matrix of the form $P^T P$ or $P^H P$ has positive (for invertible $P$) or non-negative (degenerate ellipse) real eigenvalues(this should be pretty easy to prove).
    5. More generally, any two congruent matrices have the same numbers of positive, negative and zero eigenvalues (called the positive, negative and zero indices of inertia respectively). This is known as Sylvester's law of inertia (prove it!), and shows that all real-eigenvalued matrices are congruent to a diagonal matrix with some number of 1's, -1's and 0's (and arrangement doesn't matter) -- see also the metric signature. This gives us a condition to tell if any two matrices are congruent, or any two form-preserving geometries are isomorphic/homomorphic.
  5. Is it really true, though -- that any geometry with elliptical contours is isomorphic to Euclidean geometry? Come up with a counter-example (and how did you come up with it?)
    1. You might consider, e.g. $M=\left[\begin{array}{*{20}{c}}1&{ - 1}\\1&1\end{array}\right]$ -- this produces exactly the same contours as Euclidean geometry -- same unit circle, same everything. But it's not symmetric, and all positive-definite matrices are symmetric/Hermitian (proof is trivial). In fact, any $M$ produces the same contours as $\frac12 (M+M^T)$ -- its "symmetric part" (why?). What's going on? Does the norm (quadratic form) not completely define the dot product (bilinear form)?
    2. If you think about this for a while, you might get an idea of what's going on -- the symmetric/Hermitian part of a matrix defines the contours on the real part of the vector space, but the antisymmetric/anti-Hermitian part begins to matter in $\mathbb{C}^n$. The contours of the quadratic form in $\mathbb{C}^n$ completely determine the dot product, i.e. if $v^HMv=v^HNv$ for all complex vectors $v$, then $v^HMw=v^HNw$ for all complex vectors $v$ and $w$, i.e. $M=N$. The proof is trivial.
  6. Next, let's consider some properties and alternate characterisations of positive-definite matrices.
    1. From the ellipse depiction, it's reasonable to wonder if a matrix is positive-definite if the norm it induces is positive for all non-zero real vectors, i.e. $v^TMv>0$ (certainly the forward implication -- only if -- is clear, from the $P^TP$ factorisation). As it turns out, though, there are other matrices -- such as a rotation by less than $\pi/2$, that also satisfy this condition. It is certainly clear that the condition $v^TMv>0$ combined with $A$ being symmetric imply a matrix is positive-definite -- by completing the square on a symmetric bilinear form -- but any matrix $A$ for which $(M+M^T)/2$ is positive-definite also satisfies $v^TMv>0$ by the same argument (if this seems anything but completely obvious, think about the corresponding quadratic expressions).
    2. The reason for this annoyance is that in $\mathbb{R}^n$, you have matrices that are pure rotations, with no eigenvectors, so the non-positiveness of your eigenvalues don't have to matter. On the other hand, if you extend our domain to $\mathbb{C}^n$ -- i.e. $v^HMv>0$ for all complex vectors $v$, all the eigenvalues are "accounted for". Find a way to write this idea down precisely.
    3. Here's another way to think about it: if $v^HMv\in\mathbb{R}$ for all complex vectors $v$, the matrix $M$ is Hermitian. The proof is basically just algebraic simplification, considering the imaginary part of $(v+iw)^HA(v+iw)$, which is $v^HAw-w^HAv$ -- which is the "symplectic part" of the general complex dot product. See this math stackexchange answer for a fuller explanation.
To be completely honest, I'm being disingenous in claiming that this is "the" motivation for positive-definite matrices. There is a completely independent motivation arising from optimisation in multivariable calculus (the second-derivative/Hessian test), a completely independent motivation arising from systems of differential equations, a completely independent motivation arising from covariance matrices, and so on.

Perhaps a way of thinking of this is that a positive-definite matrix is simply a normal matrix with positive eigenvalues, and much of this article is really a justification for why positive-definite metric tensors are (un-)interesting.

No comments:

Post a Comment