The Winding Number: lie theory

Showing posts with label lie theory. Show all posts

The Killing form; factorising non-Abelian Lie groups

It could be fun to try and define a "dot product" on a Lie algebra.

You know, you might've already realised that the cross product is a Lie bracket of sorts -- you know, given its antisymmetry and the whole $a^\mu b^\nu - a^\nu b^\mu$ representation of the wedge product and all that. It's a short exercise to verify that the Lie algebra $\mathfrak{so}(3)$ of $SO(3)$ is the algebra of skew-symmetric matrices, and with the Lie bracket $XY-YX$ is isomorphic to $\mathbb{R}^3$ with the cross product.

Well, the dot product on $\mathbb{R}^3$ has an interesting connection to $SO(3)$ -- it is precisely the form that is invariant under the action of $SO(3)$. Well, but that's $SO(3)$ acting on $\mathbb{R}^3$ -- what is that action in the notation of $\mathfrak{so}(3)$? As it turns out (and you can work this out), it is precisely the adjoint map $\mathrm{Ad}_gX:=gXg^{-1}$ which corresponds to this "rotating $X$ by $g$". It's not really that unexpected, if you ask me -- conjugation is always the natural way to transform matrices in linear algebra when vectors are multiplied on the left.

So the "dot product" is an $\mathrm{Ad}$-invariant bilinear form. In fact, adding a symmetricity requirement allows us to just bother with norms (as a symmetric inner product can be determined from the norm, through the cosine rule). Conjugation basically allows you to determine the "contours" of this norm or inner product. The question is: can we determine the bilinear form -- up to scaling -- just from "$\mathrm{Ad}$-invariant symmetric bilinear form" alone?

This is equivalent to asking "is the orbit of some non-zero $X$ under conjugation by $G$ equal to $\mathfrak{g}$?" (so that the norm of that $X$ would suffice to determine all norms -- do you see why?) Well, this is equivalent to asking "is $X$ contained in some non-trivial ideal?" (prove that these are equivalent!), and this is equivalent to asking "does $\mathfrak{g}$ have any non-trivial ideals?" (do you see why?)

A Lie algebra without nontrivial ideals is called a simple Lie algebra. Our demonstration above shows that a simple Lie algebra has a unique $\mathrm{Ad}$-invariant symmetric bilinear form, determined by the value of $\langle X, X\rangle$ for some non-zero $X$.

Even before we actually derive what this form must look like, we can derive one important consequence of automorphism invariance: $\langle X, [X, Y]\rangle = 0$ (prove it!), i.e. the tangent to an automorphism curve is perpendicular to the position vector at every point. The understanding of the group as acting as a "rotation group" on its Lie algebra in the adjoint representation really makes sense!

Someone tell me if they know how one may "derive" the trace-form formula from this characterisation rather than pulling it out of the blue and then proving it is the unique $\mathrm{Ad}$-invariant symmetric bilinear form. Here's something I started to write:

Here's an idea for the base length (i.e. to define the scaling): $X$ has length 1 iff the length of $[X,Y]$ equals the length of $Y$ for all $Y$ perpendicular to $X$ -- equivalently: $\forall V\in\mathfrak{g}, |[X,[X,V]]|=|[X,V]|$. We need to check that this condition is well-defined, i.e. that:

Given an $X$, $|[X,[X,U]]|=|[X,U]|$ for some $U$ not a multiple of $X$ implies that $|[X,[X,V]]|=|[X,V]|$ for all $V$.
$X$ satisfying $|[X,[X,V]]|=|[X,V]|$ implies that all conjugates $gXg^{-1}$ of it satisfy it too. This is trivial from considering $V=gV'g^{-1}$ (since the identity is true for all $V$).

Is the first one even true outside $\mathfrak{so}(3)$ -- for all simple Lie algebras?

One may come up with the idea of defining a form $\langle X, Y\rangle = \mathrm{tr}[X,[Y,\cdot]]$ (example of some weak motivation -- the vector triple product $x\times(x\times v)$ has as eigenvectors the vectors $v$ perpendicular to $x$ and the eigenvalues depend on the length of $x$) and check that this is indeed an $\mathrm{Ad}$-invariant symmetric bilinear form, and is thus unique up to scaling for simple Lie algebras. This form is called the Killing form.

Factorisation of Lie groups

We have seen the classification of connected Abelian Lie groups: they are products of circles and lines. We wonder if such a classification is possible for more general Lie groups.

The natural way to "factorise" groups by taking quotients over normal subgroups -- we wonder if this means that all Lie groups can be written as direct products of simple Lie groups (groups that don't have a nontrivial connected normal subgroup -- can you see why "connected" matters?). Well, not really -- the quotients need not be subgroups at all, after all. Instead, the "factorisation" takes the form of what is known as a group extension. A group for which it is a direct product is called a reductive Lie group -- and its Lie algebra is the direct sum of simple Lie algebras, or a reductive Lie algebra.

It is more conventional in the literature to define a simple Lie algebra excluding the one-dimensional/abelian case. In this definition, direct sums of simple Lie algebras are semisimple Lie algebras, and reductive Lie algebras are direct sums of semisimple and abelian Lie algebras.

TBC: Cartan's criterion, solvability, nilpotency

Lie group topology

I'll assume you have a basic understanding of general topology -- if not, consult the topology articles here. Most of the abstract stuff and "weird" cases are not really important, because it is easy to see that Lie groups are manifolds.

We need to be careful while studying the topology of Lie groups, because we already have an intuitive picture of a Lie group, and we need to be careful to prove all the things we just "believe" to be true.

The main point of the topology of a Lie group is that the group elements define the "flows" on the manifold. What this means is that left-multiplication is a homeomorphism, and it's not absurd to say that inversion is a homeomorphism, because it represents a "reflection" of the manifold. That these conditions make sense is confirmed by looking at the proofs of the following "obvious" facts.

(1) In a connected group, a neighbourhood of the identity generates the entire group, i.e. $H\le G\land H\in N(1)\implies H=G$ for connected $G$.

Let's think about why this is true. Why does $H$ need to be a neighbourhood -- why must it contain an open set containing the identity? Suppose instead we just knew it contained a set $Q$ that looked like this:

Well, $H$ still contains the orange point, but we cannot say it contains the purple point, because it's perfectly happy not containing it -- it's not like we have some vertical element in the Lie group that if you multiplied to some point in $Q$, you'd get the purple point. But instead if $Q$ was an open neighbourhood of the identity:

Then the purple point has to be in $H$, because $Q$ contains flows in "all directions" on the group. To actually prove that every point will be contained in $H$ -- well, we know that the point is (will eventually be) that $H$ is the connected component of $G$ (and since $G$ is connected, $H=G$) -- let's just show that $H$ is both open and closed, i.e. nothing in $H$ touches its exterior, and nothing in its exterior touches $H$. Here's the proof:

Nothing in $H$ touches anything -- Suppose $\exists x\in H, x\in\mathrm{cl}(H')$. Then $xQ$ contains a point in $H'$.
Nothing outside $H$ touches it -- Suppose $\exists x\in H', x\in\mathrm{cl}(H)$. Then $xQ$ contains a point in $H$, so $x$ must be in $H$.

We're really just formalising the notion of "translating $Q$ to its edges to extend $H$ further and further". The key fact we've used here is, of course, that left-multiplication is a homeomorphism, so $xQ$ is still an open set.

(2) The connected component of the identity is a subgroup.

The idea is that taking two elements $g,h$ of the connected component, their product should remain in the connected component. Once again, this follows from the continuity of left-multiplication -- considering the action of left-multiplication by $g$ on the connected component, its continuity implies that the image must remain connected.

(3) If a subgroup contains a neighbourhood of the identity, it contains the connected component of the identity.

Corollary to (1) and (2).

(4) The connected component of the identity is a normal subgroup.

Conjugation is a continuous map.

(5) Open subgroups are closed.

Corollary to (3). Alternate proof: the complement is the union of some cosets, which are open sets too. A weaker theorem can be made of closed sets -- closed subgroups with finite index are open.

What this means: any open subgroup is a union of connected components.

(6) Intuition for compact subgroups

How can a Lie group possibly "close in on itself"? Surely we keep "extending" an open neighbourhood $W$ of the identity by observing that $xW$ must be in the subgroup? The idea is that these translations of $W$ form an open cover of the group, if it has a finite subcover, then it makes sense for the group to close in on itself. By playing around with different open neighbourhoods $W$ and taking some suitable unions, one can see that this is equivalent to the condition that every open cover has a finite subcover, i.e. the group is compact.

(7) A compact, connected Abelian Lie group is a torus.

This is a generalisation of "a finite Abelian group is the direct product of cyclic groups".

The idea behind the proof is that in the Abelian case, the exponential map is a homomorphism from the Lie algebra to the Lie group, but the Lie algebra cannot detect compactness in the Lie group -- the kernel of the exponential map can. We know from our study of the exponential map that it has a discrete kernel, and in the Abelian case is surjective -- thus the Lie group is homeomorphic to $\mathbb{R}^n/\mathbb{Z}^n$, which is an $n$-torus.

(8) A connected Abelian Lie group is a cylinder (direct product of a torus and an affine space)

Analogous to above, except $\mathbb{R}^m/\mathbb{Z}^n$ where $m\ge n$.

Lie group homomorphisms

Because a Lie group is fundamentally a group that is also a manifold, we'd like to define a Lie group homomorphism as one that is both a group homomorphism, and smooth. For this, though, we need to define what it means to differentiate a group homomorphism.

Recall that the general notion of a derivative is the idea of "how does the map work locally"? Letting a general function $f:G\to H$ map a curve $\gamma(t)$, it should be easy to see that $\gamma'(t)$ transforms as $(f\circ\gamma)'(t)$ (make sure that this makes sense -- think in terms of the chain rule, or write it out in limit form, or just in terms of the image of the curve).

Consequently this leads to the differential $df:dG\to dH$ (where $dG$ is the Lie Algebra of $G$) defined as $df(\gamma'(0))=(f\circ\gamma)'(0)$. Some short exercises:

Confirm that this is equivalent to saying that $df(X)$ is the directional derivative of $f$ in the $X$ direction.
Differentiate $f(xyx^{-1}y^{-1})$ with respect to $x$ in the $X$ direction at $x=1$ (hint: this is a direct application of the definition of the differential in reverse).
Convince yourself that any derivative operator commutes with $df$, i.e. $D(df(X))=df(D(X))$.

It should be intuitively clear that if $f$ is a homomorphism, its local effect should be to act as a homomorphism of the Lie algebra as it should preserve all local structure. We can easily show that:

Since $df$ is a derivative of $f$, its value must be a linear map (like the Jacobian). This applies to the derivative as an operator on the tangent space of any manifold -- $f$ doesn't need to be a group homomorphism at all.
It preserves the Lie bracket. Take $f(xyx^{-1}y^{-1})=f(x)f(y)f(x)^{-1}f(y)^{-1}$ and differentiate it once with respect to $x$ in the $X$ direction at $x=1$, obtaining: $df(X-yXy^{-1})=df(X)-f(y)df(X)f(y)^{-1}$, simplify and differentiate it with respect to $y$ in the $Y$ direction at $y=1$ to get: $df([Y,X])=[df(Y),df(X)]$.

The adjoint map

The Lie Bracket $[Y,X]$ is not the derivative of conjugation $gxg^{-1}$, so you don't have to worry -- the Lie Bracket is not a Lie algebra homomorphism (it doesn't preserve Lie Brackets), the derivative of conjugation at the identity is zero. That's unfortunate -- our explanation of the Jacobi identity ("a derivation acts through the Lie Bracket as a derivation on the space of derivations where multiplication is given by the Lie Bracket") really indicated that it has something to do with it.

The Lie Bracket is the derivative of conjugation $xgx^{-1}$. OK, so?

Here's the idea: $\mathrm{Ad}(x)(y)=xyx^{-1}$ defines a homomorphism $\mathrm{Ad}:G\to\mathrm{Aut}(G)$. Its differential $\mathrm{ad}:dG\to d\mathrm{Aut}(G)$ can be confirmed to be the Lie Bracket $\mathrm{ad}(X)(Y)=[X,Y]$. So preservation of the Lie Bracket means:

$$\mathrm{ad}([X,Y])=[\mathrm{ad}(X),\mathrm{ad}(Y)]$$
This is precisely the Jacobi identity! So the Lie bracket is a Lie algebra homomorphism, from a Lie algebra to the Lie algebra of half-filled Lie brackets.

There is indeed a relationship between this "homomorphism" understanding of the Jacobi identity and the "derivation" understanding. In general, given a curve $\phi:\mathbb{R}\to\mathrm{Aut}(G)$, differentiating $\phi(t)(gh)=\phi(t)(g)\phi(t)(h)$ at $t=0$ we see that its derivative $d\phi$ satisfies the product rule, i.e. is a derivation (in fact this is true even when $G$ is not a group -- often a Lie group arises this way, as the automorphism group of some object and these derivations then form its Lie algebra). This implies

$$d\mathrm{Aut}(G)\subseteq\mathrm{Der}(dG)$$
So $[X,\cdot]$ is a derivation, and the map from $X$ to $[X,\cdot]$ is a Lie algebra homomorphism $dG\to\mathrm{Der}(dG)$. This really does give us a much more general way to look at everything we talked about in the last article.

Wait -- shouldn't it be an equality? I thought all derivations were part of the Lie Algebra? Ah, but there the derivations on $M$ formed the Lie Algebra of $\mathrm{Aut}(M)$, i.e. $d\mathrm{Aut}(M)=\mathrm{Der}(M)$. So indeed $d\mathrm{Aut}(dG)=\mathrm{Der}(dG)$. This makes sense, indeed $\mathrm{Aut}(G)\subseteq \mathrm{Aut}(dG)$. It's interesting to think about when it is that the Lie algebra has "more" automorphisms than the Lie group.

One may wonder if all automorphisms of a group are a conjugation by something -- or equivalently, if all automorphisms of a Lie algebra are a derivation of some kind. We will later see a special classification of Lie group for which this is true -- in general, the conjugation automorphisms are called the innner automorphisms of the group and are denoted as $\mathrm{Inn}(G)$. The group of all endomorphisms (invertible linear transformations $dG\to dG$) of a Lie algebra, meanwhile are denoted as $\mathrm{End}(dG)$, and it's easy to see that this occurs iff the Lie algebra is Abelian.

Exercise: Show that the map $\mathrm{Ad}:G\to \mathrm{Aut}(G)$ is injective iff $G$ has a trivial center.

So if $G$ has trivial center and all its automorphisms are inner, it is isomorphic to $\mathrm{Aut}(G)$ and is called complete.

The determinant map

The determinant is a homomorphism $\det:GL_F(n)\to F$ from any matrix group. The first thing we'd like to do with this is find its differential $\det'$ (which will be an $F$-valued function on $M_F(n)$). By definition of the differential:

$$\det' A = \lim_{\varepsilon\to 0}\frac{\det (I+\varepsilon A)-1}{\varepsilon}$$
It's easy to prove by writing out the entries of the matrix as $\delta_{ij}+\lambda_{ij}\varepsilon$ and performing induction on the dimension of the matrix that this is equivalent to:

$$\det'A=\mathrm{tr} A$$

Lie algebra homomorphisms in detail: ideals

Well, Lie algebra homomorphisms are a specific category of vector space homomorphisms, aren't they? It's not enough that they preserve the linear structure, they must preserve the Lie bracket too. Well, let's study them in more detail -- like a crash course through linear algebra, but with Lie algebra instead.

What does the kernel of a Lie algebra homomorphism $A$ look like? Well, because the homomorphism preserves linear combinations, the kernel must be a linear subspace -- similarly because the homomorphism preserves the Lie bracket, we must have that $Av=0\implies \forall w\in\mathfrak{g}, A[v,w]=0$, i.e. the kernel must be closed under derivations from $\mathfrak{g}$: $[\mathfrak{g},\mathfrak{i}]\subseteq\mathfrak{i}$. Such a subalgebra is called an ideal.

Exercise: Show that the Lie algebra of a normal subgroup is an ideal (careful -- it's not as obvious as you might think -- but still pretty obvious).

Orthogonal group, indefinite orthogonal group, orthochronous stuff

This post appears in the Linear Algebra and Special Relativity courses.

There are several ways to see that the matrices satisfying $A^*A=I$ are related to rotations in some way, other than just expanding out the components like a dumb pygmy chimp -- no, we are the normal chimp:

Write it as $A^TIA=I$ -- i.e. the set of matrices that preserve the identity quadratic form. The identity quadratic form corresponds to the $n$-sphere (e.g. a circle), so we're looking for transformations that preserve the $n$-sphere. A clearer way to see this is that preserving the quadratic form $I$ is equivalent to preserving the valuation $x^TIy$ for all $x, y$, i.e. $(Ax)^TI(Ay)=x^Ty$, so it preserves the value of each contour.
With the same logic as above, $(Ax)^T(Ay)=x^Ty$, i.e. the preservation of the Euclidean dot product means that all lengths and angles are preserved. These are called "rigid rotations", and are basically the kind of stuff we can do to a sheet of paper without compressing or stretching it in any way -- i.e. if we nudge a vector by a certain angle, every other vector should also be nudged by the same angle.

What kind of transformations preserve the unit sphere?

The reason this is a good way of understanding things is that there are plenty of other such "dot products" you can define in mathematics, corresponding to different geometries -- each can be based on the bilinear form it preserves, see this later linear algebra article for more details, relating to isomorphisms of such geometries etc.

As for discriminating between rotations and reflections, suppose we define rotations in a completely geometric way -- for a matrix to be a rotation, all its eigenvalues are either 1 or in pairs of unit complex conjugates.

What do the eigenvalues of orthogonal matrices look like? For each eigenvalue, you need $\overline{\lambda}\lambda=1$, i.e. all the eigenvalues are unit complex numbers. If a complex eigenvalue isn't paired with a corresponding conjugate, you will not get a real-valued transformation on $\mathbb{R}^n$. Meanwhile if an eigenvalue of -1 isn't paired with another -1 -- i.e. if there are an odd number of reflections -- you get a reflection. In this sense, the "conjugate eigenvalues" property of rotations can be seen as a generalisation of the "$s_1s_2=r$" property which you may have learned from plane geometry or dihedral groups. The orthogonal (or rather unitary) transformations that do not behave this way are precisely the rotations.

The similarity between unpaired unit complex eigenvalues and unpaired -1's is interesting, by the way -- when thinking about reflections, you might have gotten the idea that reflections are $\pi$-angle rotations in a higher-dimensional space -- like the vector was rotated through a higher-dimensional space and then landed on its reflection -- like it was a discrete snapshot of a process as smooth as any rotation.

Well, now you know what this higher-dimensional space is -- precisely $\mathbb{C}^n$. And the determinant of a unitary matrix also takes a continuous spectrum -- the entire unit circle. In this sense (among other senses) complex linear algebra is more "complete" than real linear algebra. In fact, you will see in Lie theory that the group $SO(n)$ is connected but $O(n)$ is not, while $SU(n)$ and $U(n)$ are both connected. Can you see why?

(original version of above originally posted to math stackexchange)

Well, here, we benefited from the fact that the product of two reflections is a rotation -- so we could just enforce the "even number of flips", i.e. that $\det A=1$, to specify rotations. But what if we're dealing with one of the "generalised geometries" we discussed? What if instead of preserving $I$, we wanted the group $O(m\mid n)$, i.e. that preserves some $\mathrm{diag}(m\mid n)$ with $m$ 1's and $n$ -1's along the diagonal?

Well, then we don't have rotations between the "1"-labeled (spatial) axes and the "-1"-labeled (temporal) axes, only boosts. But compositions between such reflections form rotations! So simply restricting that $\det A = 1$ will -- while still forming a group $SO(m \mid n)$ -- retain all these rotations which can only be understood as compositions of reflections.

So how do we extract the transformations we want? (What transformations do we want? The ones that correspond to changes of reference frame, in special relativity language -- well, in the sense of Lie theory, this means we're looking for the "component connected to the identity" -- do you see why?)

Let's think about this more clearly. Start by noting that not all reflections in spacetime preserve the Minkowski metric $\mathrm{diag}(m\mid n)$ -- only those that preserve the invariant hyperboloids. In the case of 3+1-spacetime, this means infinite spatial reflections and one time-reversal -- in the case of a general $m+n$-spacetime, this means infinite spatial reflections and infinite temporal reflections (in any $m+n-1$-plane whose normal vector is temporal, not to be confused with time-like). When you multiply an odd temporal reflection with an odd spatial reflection, you get an even time-space rotation, which is in $SO(3\mid 1)$.

$$A = \left[ {\begin{array}{*{20}{c}}{{A_t}}&B^T\\C&{{A_s}}\end{array}} \right]$$
(Note on notation: we'll use ${A_T} = \left[ {\begin{array}{*{20}{c}}{{A_t}}&0\\0&I\end{array}} \right]$ and analogously ${A_S} = \left[ {\begin{array}{*{20}{c}}I&0\\0&{{A_s}}\end{array}} \right]$, where $A_t$ and $A_T$ are "basically the same thing", and analogously for $A_s$ and $A_S$ -- in particular $\det A_t=\det A_T$ and $\det A_s=\det A_S$.)

We see the problem: instead of just mandating $\det A=1$, we must mandate that the temporal minor and the spatial minor of the matrix both have determinant 1, $\det A_t=\det A_s = 1$. But this isn't right -- if you have a boost, i.e. some mixing between the space and time co-ordinates, then $A\ne A_TA_S$ and the component determinants are multiplied by a Lorentz factor (even though still $\det A = 1$). So we mandate instead that $\det A_t>0$, $\det A_s>0$ (equivalently $\ge 1$). Such transformations are called the proper orthochronous Lorentz transformations, because in the context of special relativity they are proper Lorentz transformations that do not flip time:

$$SO^{+}(3 \mid 1)=\{A\in O(3 \mid 1) \mid \det A_t >0, \det A_s >0\}$$
OK, how do we show $SO^{+}(m\mid n)$ is a subgroup? You might get the notion that because of the "two sheets hyperbola" topology of the group, the sheet connected to the identity must be a subgroup (and the other sheet a coset) because moving about on the sheet keeps you on the sheet (and that's what group multiplication is -- moving about on the sheet). The formal way to say this is to say that the map $A\mapsto \mathrm{sgn}(\det A_t )$ is a group homomorphism to the cyclic group $\{1,-1\}$, so its kernel is necessarily a normal subgroup (do you see how these are the same thing?).

So the key is to prove that for two matrices satisfying $\mathrm{sgn}(\det A_t )>0$, their product does too. A proof of the $SO^+(m\mid 1)$ case (relevant for relativity) can be found here -- I'm not sure how that proof can be appropriately generalised to $SO^+(m\mid n)$. I've written out the first few steps here:

Multiply the two matrices $A$ and $\tilde{A}$ to show $(A\tilde{A})_t=A_t\tilde{A}_t+B^T\tilde{C}$. We want to show the determinant of this is positive.
From multiplying out $A^T\eta A=\eta$ and $A\eta A^T=\eta$, we see that $A_t^2-C^TC=A_t^2-B^TB=I$ and analogous for $\tilde{A}$.
So $\det((A\tilde{A})_t-A_t\tilde{A}_t)=\det(B^T\tilde{C})=\sqrt{\det(A_t^2-I)\det(\tilde{A}_t^2-I)}$
Well, I'm not sure how to proceed at this point. Does $\det(X-PQ)=\det((P^2-I)(Q^2-I))^{1/2}$ imply that $\det P\ge1\land\det Q\ge1\Rightarrow \det X>0$?

Well, I can't think of a way to continue -- and certainly one can think of a much wider category of problems like this, where we have a much simpler topological picture in our heads than rubbish algebra like the above would betray. So we need a topological way of looking at Lie groups.

You might think of just considering something like the orbit of a vector -- e.g. the unit time vector -- under the group for the topology, but this does not fully describe the topology of the group. As an illustration, in the above example, for $n>1$, the orbit of the time vector under $O^+(m\mid n)$ is actually connected (prove this -- you need to count the number of sheets a general hyperbola has), while the entire topology of the group is actually disconnected, as we will see. A simple way to see that these are two different topologies is that spatial rotations/reflections leave the unit time vector unchanged and therefore all correspond to a single point on the orbit.

This will be our starting point to motivate the study of the topology of a Lie group in the Lie theory articles.

Derivations and the Jacobi Identity

Let's consider a new way to think of the Lie algebra to a group -- instead of just considering the tangent vector to be at the identity, we could smear it across the group to form a vector field, resolving questions of whether our tangent space "really needs to be" at the identity (the exponential map in matrix representation only exists in the traditional form if we're talking about tangent vectors at the identity, but we're free to write down the Lie algebra in this way).

But not every vector field is a valid element of the Lie algebra. We need the vector field to be "constant" across the manifold in some sense so that that constant vector it equals is the tangent-space-at-the-identity element it corresponds to. But what exactly do we mean by "constant" on a Lie Group?

In the case of the unit circle in the complex plane, we have an idea of what we want -- the vector field $T(M)$ is constant over the group if it is determined by the value at the identity as $T(M)=MT(0)$.

Is this preserved in the matrix representation of the group? Well, yes, because the correspondence between complex numbers and spiral matrices is a homomorphism. We can use this as a motivation to define the condition for a vector field to be a Lie algebra on a matrix Lie group -- it needs to be a left-invariant vector field, i.e. we need that the value of the vector field determined as $T(M)=MT(0)$.

Why left-invariant? Why not right-invariant? Why matrix multiplication at all? The choices made here are certainly arbitrary to some extent. When we study abstract lie algebra, we'll just have "left-multiplication by $M$" being replaced by a group action and the usage of matrix multiplication is a choice of representation. In the context of abstract Lie algebra, the "left-multiplication by $M$ we're interested in is really the derivative of the group homomorphism $M:G\to G$, which is a linear map between the tangent spaces at $I$ and $M$. You can show that this map is represented by matrix left-multiplication given a matrix representation (i.e. letting the group be $GL(n,\mathbb{C})$).

Ok, why did we just do that? Why did we upgrade our tangent vectors to vector fields? If it wasn't obvious already, the noncommutativity of a Lie group is "the" feature of importance in a Lie group, at least in some neighbourhood of the identity (we will later find out exactly the kind of features that aren't determined by just the Lie bracket -- the important keywords here are connected and compact) -- if the Lie group is commutative, then the Lie algebra is just a vector space with no additional structure, and the Lie group is a "basically unique" choice.

In our discussions of noncommutativity in the last article, we repeatedly referred to flowing along a vector -- the nature of noncommutativity is inherently "dynamical" in this sense. So we need to talk about differentiating along the corresponding vector field to a tangent vector.

So let's upgrade our vector fields to derivative operators, or derivations $D$. These are operators on functions $f:G\to \mathbb{R}$ that tell you the derivative of $f$ in the direction of the vector field -- the left-invariant ones are a certain generalisation of the directional derivative operators.

Well, what exactly is a derivation? On Euclidean space, directional derivatives can be imagined as stuff of the form $f\mapsto\vec{v}\cdot\nabla f$ -- but this requires the concept of a dot product which is quite weird within the context of matrix groups. But if you try to work this out on the unit circle (do it!), you might get an idea: we can define a curve $\gamma:\mathbb{R}\to G$ passing through a point and consider:

$$f\mapsto(f\circ \gamma)'(t)$$
At the point and you get precisely the directional derivative in the direction $\gamma'(t)$ (show that this is right in Euclidean space, and make sure you understand why it is right/makes sense -- it's the chain rule, and a certain analogy exists to projecting matrices onto subspaces in linear algebra). And if we just want tangent vectors at the identity, we can just consider the operation $f\mapsto(f\circ \gamma)'(0)$.

OK. Let's try to "abstract out" the properties of a derivation $D$, i.e. something that just allows us to define what a derivation is, abstractly, that is equivalent to being an operator of the above form.

What makes an operator a directional derivative? Certainly it must be a linear operator -- but not every linear operator is a directional derivative. The key idea behind a directional derivative is that $D(f(x))$ is determined in a specific way by $D(x)$, the rate at which $x$ changes in the specified direction.

How do we use this? Well, if you think about it a little bit, we can restrict $f$ to be analytic -- so we need:

$D(x)$ predicts $D(x^n)$ in the right way -- this is ensured by the product rule -- $D(fg)=f\ Dg + g\ Df$.
$D(x^n)$ for all $n$ predicts $D(a_0+a_1x+a_2x^2+\ldots)$ in the right way -- this is ensured by linearity.

If anyone can motivate the definition of a derivation without restricting to analytic functions, tell me.

An operator that satisfies these two properties is called a derivation -- one can prove additional properties from these axioms fairly easily, e.g. $D(c)=0$ for constant $c$, etc.

Let's think about why this whole construction above makes sense.

Let $G$ be the group of translations of $\mathbb{R}$ -- one can parameterise them by the translated distance as $\Delta(p)$ with composition given by $\Delta(p)\Delta(q)=\Delta(p+q)$. Well, this is isomorphic to the additive group on the reals, and in turn to the multiplicative positive real numbers. We can consider the group to be acting on real analytic functions by translations of the domain: $\Delta_pf(x):=f(x+p)$ The Lie algebra is just spanned by the derivative of $\Delta(p)$ at the identity, that is:

$$\Delta '(0) = \lim\limits_{h \to 0} \frac{{\Delta (h) - 1}}{h} = \frac{d}{{dx}}$$
And our Lie algebra members are all real multiples of $d/dx$ -- these are precisely the directional derivatives on $\mathbb{R}$. Similar constructions can be made on $\mathbb{R}^n$, or a general automorphism group.

So we see that the "derivations" construction of the Lie algebra actually are the tangent vectors on the Lie group identified as the automorphism group of some object. If you've ever done some differential geometry, this gives you the motivation for treating partial derivatives as basis vectors.

Our discussion of derivations so far works both for derivations (general vector fields on the manifold) and point-derivations (basically tangent vectors at a specific point). Under the first interpretations, though, we're not actually interested in all derivations, only the left-invariant ones. For example, in the example above, an operation of the form of $p(x)\frac{d}{dx}$ is linear and satisfies the product rule:

$$p\frac{d(f\cdot g)}{dx}=g\cdot p\frac{df}{dx}+f\cdot p\frac{dg}{dx}$$
And why shouldn't it? It corresponds to a vector field all right -- $xe_x$. But this is not a left-invariant vector field.

Interpret the Taylor series as the exponential map from the Lie algebra to the Lie group! Make the "similar construction" in the multivariate case ($\mathbb{R}^n$) and interpret the multivariate taylor series as an exponential map -- i.e. that $\Delta=\exp\nabla$

The first thing that we can do with our formalism of point-derivations is give another proof of closure under the Lie Bracket:

$$[D_1,D_2](fg)=f[D_1,D_2]g+g[D_2,D_1]f$$

I.e. that the Lie Bracket of two derivations is also a derivation. Check that the above is correct by expanding stuff out and using the product rule for $D_1$ and $D_2$.

There's another way that derivations can be used to show closure under the Lie Bracket, which shows more closely the connection to the product rule for the second derivative discussed in the previous article.

One might wonder if, like the directional derivative at the identity in the $c'(0)$ direction is given by $(f\circ c)'(0)$, the directional derivative at the identity in the $c''(0)$ direction may be given as $(f\circ c)''(0)$. Well, in general:

$$(f\circ c)''(t)=c''(t)\cdot\nabla f(t)+c'(t)\frac{d}{dt}\nabla f(t)$$
Which since $c'(0)=0$, at $t=0$ is simply equal to the first term, the directional derivative in the $c''(0)$ direction. So we just need to show that $f\mapsto (f\circ c)''(0)$ is a derivation. This follows from the Leibniz rule for the second derivative, and the fact that the first derivative of $c$ is zero.

OK, one more thing before we actually do something useful -- something we haven't done before in other ways.

This is an extended pitfall prevention, because I fell into this pit myself. When thinking about left-invariance of a vector field $D$, I formulated the idea in my head this way: the idea is that under $D$, we should get the same result if we differentiate (derivate?) $f$ at 0 or if we translate it forward by $x$ and derivate it at $x$. i.e. where $\phi^h$ represents the translation $f(x)\mapsto f(x-h)$, we want:

$$D=\phi^{h}D\phi^{-h}$$

(THIS IS WRONG! This is a pitfall prevention, not an actual result!) And I looked at some simple Abelian cases, like the additive real group and the circle group and thought this was clearly true.

But it's wrong. How do we know that? Well, let's consider the group action $\phi^{-h}D\phi^h$ -- certainly at $h=0$, it's the identity, so let's differentiate it (against $h$) at 0. We get, where $d\phi_0$ is the derivative of $\phi$ at 0:

$$[d\phi_0, D]$$
Which isn't zero. So my argument must be wrong -- I must have assumed abelian-ness somehow.

Here's the problem: the final left-multiplication by $\phi^h$ is fine -- it just brings the derived function back to the origin, but "translating the function forward and then differentiating it" messes things up when the direction you're differentiating in doesn't commute with the direction of translation. Draw some pictures of curved surfaces to convince yourselves of this.

So left-multiplication determines a sort of "parallel transport" on the Lie Group, while right-multiplication is an "alternative" way to compare vectors in different tangent spaces, and its disagreement with left-multiplication determines the non-commutativity of the group. Well, this choice of left-multiplication vs right-multiplication is really a convention, arising from the choice of representation.

OK, the useful thing: Suppose we're interested in "nested Lie brackets" $[X,[Y,Z]]$. We're talking about conjugating $[Y,Z]$ as $\phi^p[Y,Z]\phi^{-p}$ where $d\phi_0=X$ so that to first-order in $p$:

$$\phi^p[Y,Z]\phi^{-p}=[Y,Z]+p[X,[Y,Z]]$$
Since conjugation is a homomorphism, we can also write:
$$\begin{align}
\phi^p[Y,Z]\phi^{-p} &= [\phi^pY\phi^{-p},\phi^pZ\phi^{-p}] \\
&= [Y+p[X,Y],Z+p[X,Z]] \\
&= [Y,Z] + p([Y,[X,Z]]+[[X,Y],Z])\\
\Rightarrow [X,[Y,Z]]&=[Y,[X,Z]]+[[X,Y],Z]
\end{align}$$
Now, couldn't we have just have proven this by expanding everything out as commutators? Sure, but this provides more insight as to what's going on -- you might notice the resemblance to the product rule. Indeed, this identity -- the Jacobi identity -- is perhaps best stated as:

"A derivation $X$ acts through the Lie Bracket as a derivation on the space of derivations where "multiplication" is given by the Lie Bracket."

In this sense, it's actually quite expected -- it results from the fact that the Lie Bracket is a bilinear operator obtained from differentiating a group symmetry, conjugation -- this mandates that it is a derivation.

As it turns out, the Jacobi identity, along with the antisymmetry and the bilinearity, determines the Lie Algebra -- it is enough to "abstract out" the properties of a Lie Algebra. Why? This is something we will see over several articles, which will then allow us to motivate abstract Lie algebra.

Lie Bracket, closure under the Lie Bracket

(If you're just here for the easy way to see closure, skip ahead to Closure under the Lie Bracket)

In the previous article, I introduced Lie Groups and Lie Algebras by talking about Lie Algebras as a parameterisation for the Lie Group -- we said that the elements of the Lie Group could be written as exponentials of these parameters (not uniquely, sure, but they can be written in this way). Some things to note here:

What we've called "Lie Groups" refers only to connected Lie Groups, as motivation. In general, the theory of Lie groups considers any group that is also a manifold -- for instance, the non-zero real numbers are also a Lie Group (even though their Lie Algebra is identical to that of the positive real numbers -- can you see why?). We will hereby use this more general definition.
It's not really true that any Lie group can be parameterised in this fashion by writing each element as an exponential of a Lie Algebra element -- even for connected groups. This shouldn't be surprising -- given a term of the form $\exp X$ and a term $\exp Y$, their product $\exp X\exp Y$ is in the group by closure, but it isn't necessarily equivalent to $\exp(X+Y)$ on a non-Abelian group (could it be the exponential of something else? We'll find out later).
A parameterisation of this form is not the same as a co-ordinate system.

The last point is what we will concentrate on in this article, because not being described fully by the Lie algebra is what makes things interesting, right?

What is a co-ordinate system on a manifold? Well, they key point is that any element of the manifold can be decomposed in terms of its components along the co-ordinates. On a Lie Group, this means that there should exist a "basis" for the Lie Group $\exp(X_1),\ldots\exp(X_n)$ corresponding to the basis $X_1,\ldots X_n$ for the Lie Algebra vector space such that every element of the Lie Group can be written as products of powers of these elements, and any rearrangement of the terms in the product should leave it invariant (i.e. the elements should commute with each other).

Note that it is possible to decompose elements of a connected Lie Group as a product of some exponentials, but this is different from there being specifically $n$ elements that one can write any Lie group element as products of.

But clearly, this can only be possible if the group is Abelian, commutative. This is a special case of the more general fact that only a holonomic basis gives rise to a co-ordinate system on a manifold. The idea is -- a closed loop should produce no overall group action. If you flow $\varepsilon$ in the $X$ direction, then flow $\varepsilon$ in the $Y$ direction, then flow $\varepsilon$ back in the $X$ direction and flow $\varepsilon$ back in the $Y$ direction, you should end up back where you started. If you don't, then the resulting difference is the infinitesimal "group commutator" of the Lie Group:

$$e^{\varepsilon X}e^{\varepsilon Y}e^{-\varepsilon X}e^{\varepsilon Y}$$
One can check via a Taylor expansion that this is equal, to second order, to:

$$1+\varepsilon^2(XY-YX)$$
The first thing to note about this is that the $\varepsilon^1$ term is zero -- this may seem like a surprising coincidence, but perhaps it isn't that surprising (I mean, there's nothing else it could be, right? If the commutator was to first-order $1+\varepsilon z$, $\exp z$ would be equal to 1, and so it would give no characterisation at all of the amount of non-commutativity of the flows $X$ and $Y$) -- it's analogous to vector calculus, where the curl of a vector field is proportional to $\varepsilon^2$ (i.e. a line integral along the curve is proportional to its area, so you divide it by this area in the definition of curl, etc.).

The second-order term, $XY-YX$, is more interesting. This may seem weird because so far, we've been considering the Lie algebra purely as a vector space, with addition and scalar multiplication being the only things going on. But clearly, this cannot be the entire picture, or a connected Lie group would be characterised entirely by the dimension of its Lie algebra. This operation -- the Lie Bracket or Lie Algebra commutator represented by $[X,Y]$ -- as we will see, gives some additional structure to the Lie Algebra, and in fact characterises it (we'll see what this means).

So far, we've obtained no motivation for why this operation $XY-YX$ is actually of any significance. Sure, it appeared in our second-order approximation for the group commutator, but is the group commutator we defined really so great? Surely there could be other ways one could measure the non-commutativity of a group. And the $\varepsilon^2$ business is weird. Things that arise proportional to $\varepsilon$ live in the tangent space, in the Lie Algebra. Where does $[X,Y]$ even live?

Two facts will convince us that the Lie Bracket is indeed the "right" measure of non-commutativity of a Lie Algebra:

The Lie Algebra is closed under the Lie Bracket -- we will see that in fact, $[X,Y]$ lives in the lie algebra, so it is in fact a binary operation on the Lie Algebra, and really does add structure to the Lie Algebra.
It characterises the entire Lie Algebra -- not only is it part of the structure of the Lie Algebra, it characterises the entire structure of the Lie Algebra. What this means is that defining the Lie Bracket on the vector space allows a full characterisation of the part of the group connected to the identity (the "connected part" of the group), so we can say that any Lie Algebras with the same dimension and Lie Bracket are isomorphic.

Closure under the Lie Bracket

If you're like me, you might've thought of several analogous situations to our $1+\varepsilon^2(XY-YX)$ expression -- e.g. in (complex) analysis, at a point where the derivative of a function is zero, the function is characterised by its second derivative (consult Needham's Complex Analysis, p. 205-207 for an explanation). Another example is -- if the first derivative of a function is zero, the second derivative satisfies the product rule (this is actually directly related, in a way we won't go into now).

Here's an idea you might think of: as we discussed earlier, the infinitesimal group commutator is $e^{\varepsilon X}e^{\varepsilon Y}e^{-\varepsilon X}e^{-\varepsilon Y}= 1+\varepsilon^2 (XY - YX) + O(\varepsilon^3)\in G$. But for a moment let $\varepsilon$ not be infinitesimal. So $\varepsilon (XY - YX) + O(\varepsilon^2)\in \mathfrak{g}$, the Lie Algebra corresponding to Lie Group $G$, so by scaling $XY-YX+O(\varepsilon)\in\mathfrak{g}$ and by connectedness of the vector space $XY-YX\in\mathfrak{g}$.

But this argument is incorrect -- this becomes obvious if you try to formally write it down -- In general, $1+\varepsilon T\in G$ does not imply $T\in\mathfrak{g}$ for non-infinitesimal $\varepsilon$. It's close to an element in $\mathfrak{g}$ (for small $\varepsilon$), but how close? You might get the feeling that it is "sufficiently close", in that the limit $\varepsilon\to0$ of the sequence $\left(c_\varepsilon(X,Y)-1\right)/\varepsilon^2$ (where $c_\varepsilon(X,Y)$ is the group commutator) indeed ends up in the Lie Algebra.

To make this feeling formal, consider instead the curve parameterised differently as $\gamma(\varepsilon)=e^{\sqrt\varepsilon X}e^{\sqrt\varepsilon Y}e^{-\sqrt\varepsilon X}e^{-\sqrt\varepsilon Y}$. Then $\gamma'(0)=XY-YX$, and we're done.

think about the Taylor expansion here of this new curve for a while

Introduction to Lie groups

When you first learned about cyclic groups, the picture in your head was that of the unit circle (complex numbers with norm one). Sure, the unit circle isn't actually a cyclic group, but it really feels like one. When I first motivate group theory, I even base the motivation on the close similarities between the circle group and the modular addition group $\mathbb{Z}/p\mathbb{Z}$. Indeed, the circle group is just the group of real numbers mod $2\pi$.

The solution to this problem can be seen from the quickest proof that the unit circle isn't cyclic -- the fact that it isn't countable (while the integers are). Well, what if we discard the centrality of the integers to our definition of a cyclic group and admit real powers on groups?

Ok, but how? It's easy to construct integer powers on an arbitrary group -- in terms of repeated addition (which defines natural powers) and inverses. But the real numbers are a wholly different beast -- they require a nice and connected "smooth" structure, a geometry on the group. We can certainly visualise this geometry on the unit circle or the positive real numbers (which is also "real-power cyclic"), but it's interesting to think about how one might introduce such a geometry on other groups (groups that admit such a geometry are called Lie groups).

Well, if you think for a while, you might get the idea of defining a group via a real-number paramterisation $\mathbb{R}\to G$. The unit circle can be parameterised as $g(\theta)=\exp i\theta$, the positive real numbers can be parametarised as $g(\xi)=\exp\xi$, etc. This parameterisation would then give $\exp rt =(\exp t)^r$ for real powers $r$ of elements in the group.

But here's the thing -- we could have introduced any sort of ugly and terrible parameterisation for our group. We knew how the parameterisation should look for the unit circle, but we could have as well have created something definitely not smooth -- like mapping $\pi i$ to $-1$ and mapping $(\pi + \varepsilon)i$ to $i$ (sorry, you can't use too much dramatic hyperbole on the unit circle... fine, let's map it to 30 gazillion, which isn't on the unit circle, but whatever), and the real-power would look ridiculous, not at all what we want, and we may not even have a "real-power cyclic" structure.

What exactly do we want from our parameterisation?

Let's think about what a generator looks like with real powers on the unit circle. Well, really any non-identity element $e^{i\theta_0}$ can generate the group (take it to the power of $\theta/\theta_0$), but if we want to emulate the case of the integers under addition $\{...a^{-2},a^{-1},1,a,a^2,a^3,...\}$, we'd like to call the element really close to 1 the generator. Well, there's no element that's really close to 1, so we're talking about some kind of an infinitesimal thing. This is called an infinitesimal generator of the Lie group.

In the first-order approximation, such an element would be of the form $1+i\varepsilon$. By making $\varepsilon$ sufficiently small, the element will be sufficiently close to being "on the unit circle", with an arc length of $\varepsilon$ away from the identity, and its real power $r$ of the element will have an arc length of $r\varepsilon$ away from the identity. So to generate the element with parameter $\theta$, we need to take $1+i\varepsilon$ to a real power of $r=\theta/\varepsilon$. I.e.

$$g(\theta) = \lim\limits_{\varepsilon\to 0} (1+i\varepsilon)^{\theta/\varepsilon}=\lim\limits_{r\to\infty} \left(1+\frac{i\theta}{r}\right)^{r}=\exp i\theta$$
If you were studying calculus for the first time, this is really solid intuition for Euler's formula. Conversely, you can go in the other direction and say it's solid intuition for the compound-interest limit.

$$\lim\limits_{\varepsilon\to 0} (1+\varepsilon\theta t)^{1/\varepsilon} = \exp(t\theta)$$
But here, we can view it in a more general light, and say this is the definition of the exponential map to a Lie group. What exactly is it a map from? I.e. what is the parameterising space? Well, as you can see, it maps an element $i\theta$ to the group parameterised by $\theta$ -- what is $i\theta$? It is

$$\lim\limits_{\varepsilon \to 0} \frac{{(1 + i\varepsilon \theta ) - 1}}{\varepsilon }$$
I.e. these are the elements span the tangent line to the group at 1. In general, one may have more dimensions to this group, i.e. more parameters to put in the smooth parameterisation -- in this case we have:

$$ g(\theta ) = \lim\limits_{\varepsilon \to 0} {\left( {1 + \varepsilon ({t_1}{\theta _1} + \ldots {t_n}{\theta _n})} \right)^{1/\varepsilon }} = \exp \vec \theta $$
Where $\vec\theta \in V$, which is a vector space with basis $\langle t_1 \dots t_n \rangle$ -- the tangent space to the group at the identity. This vector space is called the Lie algebra of the Lie group.

Take a moment to appreciate the significance of this -- smoothness tells us (sorta) that a function or structure can be determined by the values of all its derivatives at a point. But when you add the group structure -- when you require an exponential structure for the parameterisation, i.e. (1) $g(\theta_1+\theta_2) = g(\theta_1)g(\theta_2)$; (2) $g(r\theta)=g(\theta)^r$; (3) $g(0)=1$ -- just the first derivative, the tangent plane, determines the entire parameterisation. This is precisely analogous to how given that a given smooth function has an exponential structure $e^{tx}$, it can be determined from its first derivative alone. The structure of a Lie group is "fundamentally exponential".

Here's another way to see how the additivity-multiplicativity condition allows the first derivative to determine the entire parameterisation. The Taylor series of the parameterisation is given by:

$$g(\theta)=\sum\limits_{k=0}^\infty \frac{g^{(k)}(0)}{k!}\theta^k$$
Meanwhile the exponential map is:

$$\exp \left(\theta g'(0)\right) =\sum\limits_{k=0}^\infty \frac{\left(\theta g'(0)\right)^k}{k!}$$
So a sufficient condition for the two to be equal is:

$$g^{(k)}(0)=g'(0)^k$$
This is something that is true for exponential functions, of course, but what's the condition for it to be true in general? Writing both sides in limit form and using the Binomial theorem on the right,

$$\frac{1}{{{h^k}}}\sum\limits_{k = 0}^k {{\binom{n}{k}}{{( - 1)}^k}g\left( {(n - k)h} \right)} = \frac{1}{{{h^k}}}\sum\limits_{k = 0}^k {{\binom{n}{k}}{{( - 1)}^k}g{{(h)}^{n - k}}g{{(0)}^k}} $$
Which is true since $g((n-k)h) = g(h)^{n-k}$ and $g(0)=1$.

(something to note: the "official" word for real-power cyclic is "one-parameter group" or "one-dimensional Lie group". Higher dimensional groups have more generators, i.e. more dimensions)

Show, from the $(1+X/r)^r$ definition of the exponential map, that it can be given by the standard Taylor expansion:

$$\exp X = 1 + X + \frac{X^2}{2!} + \ldots $$
You can't really assume the Binomial theorem (as it is only true on commutative rings, and the ring of $n$-dimensional matrices -- which is the ring that we embed Lie groups and their Lie algebras in -- isn't commutative), but perhaps a weaker result holds? What kind of elements still commute on general rings?