Showing posts with label covectors. Show all posts
Showing posts with label covectors. Show all posts

Curvature is just the Hessian

If you recall some basic calculus, the gradient of a scalar function $f(x_1,\dots x_n)$ is just the generalization of the derivative: 

$$f'(x_1,\dots x_n) =\left[\begin{array}{}\frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{array} \right] $$

And the Hessian of a scalar function $f(x_1,\dots x_n)$ is just the generalization of the second derivative:

$$f''(x_1,\dots x_n) =\left[\begin{array}{}\frac{\partial^2 f}{\partial x_1^2} & \dots & \frac{\partial^2 f}{\partial x_1\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1} & \dots & \frac{\partial^2 f}{\partial x_n^2} \end{array} \right] $$

Why is this interesting? Consider just $f$ quadratic -- then just like in one dimension, $f$ can be written only in terms of its value, derivative, and second derivative at 0, $f$ can be written only in terms of its value, gradient and Hessian at 0. 

$$  \begin{align} f(x,y) &= c + (c_1x+c_2y) + (c_{11}x^2 + c_{12}xy + c_{21}y^2) \\ &= f(0) + \left(f_x(0)x+f_y(0)y\right) + \frac12 \left(f_{xx}(0)x^2+2f_{xy}(0)xy+ f_{yy}(0)y^2\right) \\ f(\mathbf{x}) &= f(0) + f'(0)\cdot \mathbf{x} + \mathbf{x}\cdot f''(0) \mathbf{x} \end{align}$$

What this tells us is:

  • The gradient is naturally thought of as a linear form.
  • The Hessian is naturally thought of as a quadratic form.

A what and a what?

There are two ways of thinking of a thing like $\left[\begin{array}{}a \\ b \end{array} \right]$ -- a vector $a\mathbf{e}_1+b\mathbf{e}_2$, or a linear expression $ax_1+bx_2$, a function on $x_1,x_2$. The former is an object in the space $\mathbb{R}^n$, while the latter is a function $\mathbb{R}^n\to\mathbb{R}$ (do you see why?).  

Similarly, there are two ways of thinking of a matrix $\left[\begin{array}{}a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right]$ -- a linear transformation $\mathbb{R}^n\to\mathbb{R}^n$, or a quadratic expression $a_{11}x^2+(a_{12}+a_{21})xy+a_{22}y^2$, which is a function  on $x_1,x_2$, a function $\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R}$ (do you see why?). 

This is what duality is in linear algebra. Also in tensor notation, vectors are $v_i$ while linear forms are $v^i$; linear transformations are $A_i^j$, quadratic forms are $A_{ij}$ -- see also. Don't bother with this if you don't want to.

So e.g. the gradient should naturally be thought of as a function that, given some vector as input, gives you the directional derivative in the direction of that vector.

(Make sure you understand this very clearly.)

Similarly, the Hessian should be thought of as a function that, given two vectors as input, gives the second derivative in their directions $f_{xy}$.

(Make sure you understand this VERY clearly.)

Now suppose we wanted to talk about the curvature of a surface.

We know that the curvature of some curve $\phi(t)$ at the point $t=0$ is $\phi''(0)$. Naturally, we'd like the "curvature of a surface" would be something of a function that gives you the curvature in each direction -- that gives you the second derivative in each direction. So naturally, you'd want something like the Hessian.

I'm not sure if the cross-derivative $f_{xy}$, i.e. $A(X, Y)$, has any natural geometric interpretation. Does this have anything to do with torsion? Does $A(X, Y)$ ever come of use?

So we'd like to define some quadratic form $A$ such that $\phi'(0)\cdot A \phi'(0)$ is the curvature $\phi''(0)$. Actually, it should just be the normal curvature, the component of $\phi''(0)$ normal to the surface, the sort of curvature that can be attributed entirely to the surface, rather than to the curve wiggling around on the surface.

[For whomsoever it may concern, Theorem 10.4 in your notes is what computes this quadratic form $A$ as the differential of the Gauss map, and is what motivates the Gauss map in the first place. This is why you should start with the last chapter and read backwards.]

Covectors, conjugates, and the metric tensor

The fact -- as is often introduced in an introductory general relativity or tensor calculus course -- that the gradient is a covector seems rather bizarre to someone who's always seen the gradient as the "steepest ascent vector". Surely, the direction of steepest ascent is, you know, a direction -- an arrow. And what even is a covector, anyway?

Let's think about differentiating with respect to vectors. The idea we have is that $\frac{\partial f}{\partial \vec x}$ needs to contain all the information -- each of the $\frac{\partial f}{\partial x_i}$. And analogously for derivatives with respect to tensors. You might think we could just create an array with the same dimensions containing each derivative -- much like the gradient, Hessian, etc. that we're used to -- i.e.

$$\nabla f=\left[ {{\partial ^i}f} \right]$$
$$\nabla^2f = \left[ {{\partial ^i}{\partial ^j}f} \right]$$
(I'm using $\nabla^2$ for the Hessian -- and will do so in the rest of the article -- but it's too widely used for its trace the Laplacian, which should be represented as $|\nabla|^2$) etc. But you might get the sense that this feels just fundamentally wrong -- like you're giving the "division by tensor" object the structure of the same tensor, but you should somehow be giving it an "inverse" structure.

We want to construct a situation to see that the idea above -- of making the gradient ("derivative with respect to a vector") and Hessian ("derivative with respect to a rank-2 tensor") -- a vector and a rank-tensor doesn't work. We know such a situation can arrive when we have multiplication between the gradient and a vector, or the Hessian and a rank-2 tensor. For instance, for linear $f$:

$$f(\vec{x})-f(0)=\vec{x}\cdot\nabla f$$
But this is wrong -- for any non-Euclidean manifold. For instance, if the metric tensor is something like $\rm{diag}(-1,1)$, this dot product gives:

$$f(\vec{x})-f(0)= - x\frac{{\partial f}}{{\partial x}} + y\frac{{\partial f}}{{\partial y}}$$
Which is just wrong. So instead, the gradient is a covector, which we represent in Einstein notation using subscripts instead of superscripts:

$$f(\vec x) - f(0) = {x^i}{\partial _i}f$$
(As you can see, I omitted Einstein notation when I was writing the wrong equations -- seeing repeated indices on the same vertical alignment is physically painful.) If we want the vector gradient -- for direction of steepest ascent or whatever -- you need to multiply by the metric tensor.

This also motivates the picture of seeing covectors as parallel surfaces whose normals are their vector versions -- in Euclidean geometry, it doesn't make a difference, but on a general setting, this normality is a bit weird. Think about this.


But I haven't really given a motivation for the metric tensor or how it comes up here -- for this, read on.



Let's talk about something completely different -- let's think about the derivative of functions from $\mathbb{C}\to\mathbb{R}$, $df/dz$. I don't know about you, but I like the complex numbers, and prefer them to $\mathbb{R}^2$, because pretty much anything I write with the complex numbers is well-defined, and easily so -- so I don't need to worry about whether $df/d \vec{x}$ makes any sense or not. Well, we can write:

$$\frac{df}{dz}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial z}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial z}\\\Rightarrow \frac{df}{dz}=\frac{\partial f}{\partial x}-\frac{\partial f}{\partial y}i$$
This $df/dz$ above is exactly the analog of the gradient for real-valued functions defined on the complex plane -- analogous to scalar multivariable functions.

What's the expression for the complex derivative of a complex function? Compute it -- it may look a bit different from the analogous tensor derivative -- think of traces and commutators.

Note: In actual complex calculus, complex differentiability is defined in a more restrictive way -- specifically one needs to satisfy the Cauchy-Riemann equations, which makes the structure of complex functions fundamentally more special than that of multivariable functions, stuff like $dx/dz$ is even undefined, and the stuff we've written above isn't really relevant in complex analysis. It is, however, the "Wirtinger derivative".

Something interesting happened here, though -- we got a negative sign on the imaginary component of the derivative. The derivative got conjugated, or something -- and the reason this occurred is that $i^2=-1$ (so $1/i=-i$), and this leaves some sort of signature in our derivative.

Now let's (non-rigorous alert!) think about how an analogous argument may be written for vectors.

$$\frac{{df}}{{d\vec x}} = \frac{{\partial f}}{{\partial x}}\frac{{\partial x}}{{\partial \vec x}} + \frac{{\partial f}}{{\partial y}}\frac{{\partial y}}{{\partial \vec x}}$$
What really is $\frac{{\partial x}}{{\partial \vec x}}$, though? We know that $\frac{\partial \vec x}{\partial x}=\vec{e_x}$. But what's the "inverse" of a vector? What does that even mean?

So we want to define some sort of a product, or multiplication, with vectors -- we want to define a thing that when multiplied by a vector gives a scalar. It sounds like we're talking about a dot product -- but the dot product lacks an important property we need to have division, it's not injective. I.e. $\vec{a}\cdot\vec{b}=c$ for fixed $\vec{a}$ and $c$ defines a whole plane of vectors $\vec{b}$, not a unique one. But if we added an additional component to our product, the cross product (or in more than three dimensions, the wedge product), then the "dot product and cross product combined" is injective.

This combination, of course, is the tensor product. Specifically, when we're talking about something like $1/\vec{e_x}$, we want a thing whose tensor product with $\vec{e_x}$ has trace (dot product) 1 and commutator (wedge/cross product) 0, i.e. $\mathrm{tr}(\vec{e_x}'\vec{e_x})=1$ and $(\vec{e_x}'\vec{e_x})-(\vec{e_x}'\vec{e_x})^T=0$.

If all you've ever done in your life is Euclidean geometry, you'd probably think the answer to this question is $\vec{e_x}$ itself -- indeed, its dot product with $\vec{e_x}$ is 1 and its cross product with $\vec{e_x}$ is 0. But if you've ever done relativity and dealt with -- forget curved manifolds! -- the Minkowski manifold, you know that this is not necessarily true -- it depends on the metric tensor.

Could we define a vector in a general co-ordinate system that is the inverse of $\vec{e_x}$? Yes, we can. But let's not do that (yet*) -- it just seems like there should be something more natural, or elegant, like we had with complex numbers.

So we define a space of "covectors", as "scalars divided by vectors" (informally speaking), call their basis $\tilde{e^i}$ which have the required dot and cross products. In Euclidean space -- and only in Euclidean space, these look exactly the same as vectors, and have exactly the same components. I like to call the conjugation here "metric conjugation", and the gradient is naturally a covector.

*As for the question of writing the gradient as a vector instead, this follows naturally using the metric tensor -- as an exercise, show, by considering the required vector corresponding to the covector $\tilde{e^x}$ (i.e. that has the right dot and cross products with $\vec{e_x}$) that the vector gradient can be given as the product of the inverse metric tensor and the covector gradient:

$${\partial ^\mu }f = {g^{\mu \nu }}{\partial _\nu }f$$
(Do this exercise! It is the motivation for the metric tensor, and why it determines your co-ordinate system!)



I've been talking about the covector $\tilde{e^x}$ as being equal to the quotient "$1/\vec{e_x}$" but as I mentioned, this isn't really accurate -- the "1" in the quotient is a (1,1) tensor with trace 1 and commutator 0. Think about this tensor. Can you find this tensor in Clifford algebra? Maybe not. Can you find it as a linear transformation? Yes? Find it. And can you think of the covector alternatively as a quotient of a bivector and a trivector? Will you get $(e_y\wedge e_z)/(e_x\wedge e_y\wedge e_z)$?