The Winding Number: tensor algebra

The fact -- as is often introduced in an introductory general relativity or tensor calculus course -- that the gradient is a covector seems rather bizarre to someone who's always seen the gradient as the "steepest ascent vector". Surely, the direction of steepest ascent is, you know, a direction -- an arrow. And what even is a covector, anyway?

Let's think about differentiating with respect to vectors. The idea we have is that $\frac{\partial f}{\partial \vec x}$ needs to contain all the information -- each of the $\frac{\partial f}{\partial x_i}$. And analogously for derivatives with respect to tensors. You might think we could just create an array with the same dimensions containing each derivative -- much like the gradient, Hessian, etc. that we're used to -- i.e.

$$\nabla f=\left[ {{\partial ^i}f} \right]$$
$$\nabla^2f = \left[ {{\partial ^i}{\partial ^j}f} \right]$$
(I'm using $\nabla^2$ for the Hessian -- and will do so in the rest of the article -- but it's too widely used for its trace the Laplacian, which should be represented as $|\nabla|^2$) etc. But you might get the sense that this feels just fundamentally wrong -- like you're giving the "division by tensor" object the structure of the same tensor, but you should somehow be giving it an "inverse" structure.

We want to construct a situation to see that the idea above -- of making the gradient ("derivative with respect to a vector") and Hessian ("derivative with respect to a rank-2 tensor") -- a vector and a rank-tensor doesn't work. We know such a situation can arrive when we have multiplication between the gradient and a vector, or the Hessian and a rank-2 tensor. For instance, for linear $f$:

$$f(\vec{x})-f(0)=\vec{x}\cdot\nabla f$$
But this is wrong -- for any non-Euclidean manifold. For instance, if the metric tensor is something like $\rm{diag}(-1,1)$, this dot product gives:

$$f(\vec{x})-f(0)= - x\frac{{\partial f}}{{\partial x}} + y\frac{{\partial f}}{{\partial y}}$$
Which is just wrong. So instead, the gradient is a covector, which we represent in Einstein notation using subscripts instead of superscripts:

$$f(\vec x) - f(0) = {x^i}{\partial _i}f$$
(As you can see, I omitted Einstein notation when I was writing the wrong equations -- seeing repeated indices on the same vertical alignment is physically painful.) If we want the vector gradient -- for direction of steepest ascent or whatever -- you need to multiply by the metric tensor.

This also motivates the picture of seeing covectors as parallel surfaces whose normals are their vector versions -- in Euclidean geometry, it doesn't make a difference, but on a general setting, this normality is a bit weird. Think about this.

But I haven't really given a motivation for the metric tensor or how it comes up here -- for this, read on.

Let's talk about something completely different -- let's think about the derivative of functions from $\mathbb{C}\to\mathbb{R}$, $df/dz$. I don't know about you, but I like the complex numbers, and prefer them to $\mathbb{R}^2$, because pretty much anything I write with the complex numbers is well-defined, and easily so -- so I don't need to worry about whether $df/d \vec{x}$ makes any sense or not. Well, we can write:

$$\frac{df}{dz}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial z}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial z}\\\Rightarrow \frac{df}{dz}=\frac{\partial f}{\partial x}-\frac{\partial f}{\partial y}i$$

This $df/dz$ above is exactly the analog of the gradient for real-valued functions defined on the complex plane -- analogous to scalar multivariable functions.

What's the expression for the complex derivative of a complex function? Compute it -- it may look a bit different from the analogous tensor derivative -- think of traces and commutators.

Note: In actual complex calculus, complex differentiability is defined in a more restrictive way -- specifically one needs to satisfy the Cauchy-Riemann equations, which makes the structure of complex functions fundamentally more special than that of multivariable functions, stuff like $dx/dz$ is even undefined, and the stuff we've written above isn't really relevant in complex analysis. It is, however, the "Wirtinger derivative".

Something interesting happened here, though -- we got a negative sign on the imaginary component of the derivative. The derivative got conjugated, or something -- and the reason this occurred is that $i^2=-1$ (so $1/i=-i$), and this leaves some sort of signature in our derivative.

Now let's (non-rigorous alert!) think about how an analogous argument may be written for vectors.

$$\frac{{df}}{{d\vec x}} = \frac{{\partial f}}{{\partial x}}\frac{{\partial x}}{{\partial \vec x}} + \frac{{\partial f}}{{\partial y}}\frac{{\partial y}}{{\partial \vec x}}$$
What really is $\frac{{\partial x}}{{\partial \vec x}}$, though? We know that $\frac{\partial \vec x}{\partial x}=\vec{e_x}$. But what's the "inverse" of a vector? What does that even mean?

So we want to define some sort of a product, or multiplication, with vectors -- we want to define a thing that when multiplied by a vector gives a scalar. It sounds like we're talking about a dot product -- but the dot product lacks an important property we need to have division, it's not injective. I.e. $\vec{a}\cdot\vec{b}=c$ for fixed $\vec{a}$ and $c$ defines a whole plane of vectors $\vec{b}$, not a unique one. But if we added an additional component to our product, the cross product (or in more than three dimensions, the wedge product), then the "dot product and cross product combined" is injective.

This combination, of course, is the tensor product. Specifically, when we're talking about something like $1/\vec{e_x}$, we want a thing whose tensor product with $\vec{e_x}$ has trace (dot product) 1 and commutator (wedge/cross product) 0, i.e. $\mathrm{tr}(\vec{e_x}'\vec{e_x})=1$ and $(\vec{e_x}'\vec{e_x})-(\vec{e_x}'\vec{e_x})^T=0$.

If all you've ever done in your life is Euclidean geometry, you'd probably think the answer to this question is $\vec{e_x}$ itself -- indeed, its dot product with $\vec{e_x}$ is 1 and its cross product with $\vec{e_x}$ is 0. But if you've ever done relativity and dealt with -- forget curved manifolds! -- the Minkowski manifold, you know that this is not necessarily true -- it depends on the metric tensor.

Could we define a vector in a general co-ordinate system that is the inverse of $\vec{e_x}$? Yes, we can. But let's not do that (yet*) -- it just seems like there should be something more natural, or elegant, like we had with complex numbers.

So we define a space of "covectors", as "scalars divided by vectors" (informally speaking), call their basis $\tilde{e^i}$ which have the required dot and cross products. In Euclidean space -- and only in Euclidean space, these look exactly the same as vectors, and have exactly the same components. I like to call the conjugation here "metric conjugation", and the gradient is naturally a covector.

*As for the question of writing the gradient as a vector instead, this follows naturally using the metric tensor -- as an exercise, show, by considering the required vector corresponding to the covector $\tilde{e^x}$ (i.e. that has the right dot and cross products with $\vec{e_x}$) that the vector gradient can be given as the product of the inverse metric tensor and the covector gradient:

$${\partial ^\mu }f = {g^{\mu \nu }}{\partial _\nu }f$$
(Do this exercise! It is the motivation for the metric tensor, and why it determines your co-ordinate system!)

I've been talking about the covector $\tilde{e^x}$ as being equal to the quotient "$1/\vec{e_x}$" but as I mentioned, this isn't really accurate -- the "1" in the quotient is a (1,1) tensor with trace 1 and commutator 0. Think about this tensor. Can you find this tensor in Clifford algebra? Maybe not. Can you find it as a linear transformation? Yes? Find it. And can you think of the covector alternatively as a quotient of a bivector and a trivector? Will you get $(e_y\wedge e_z)/(e_x\wedge e_y\wedge e_z)$?

When you learned linear algebra, you learned that under a passive transformation $B$, a scalar $c$ remained $c$, vector $v$ transformed as $B^{-1}v$ and a matrix transformed as $B^{-1}AB$. That was all nice and simple, until you learned about quadratic forms. Matrices can be used to represent quadratic forms too, in the form

$$v^TAv=c$$
Now under the passive transformation $B$, $c\to c$, $v\to B^{-1}v$ and $v^T\to v^T\left(B^T\right)^{-1}$. Let $A'$ be the matrix such that $A\to A'$. Then

$${v^T}{\left( {{B^T}} \right)^{ - 1}}A'{B^{ - 1}}v = c = {v^T}Av$$
As this must be true for all vectors $v$, this means ${\left( {{B^T}} \right)^{ - 1}}A'{B^{ - 1}} = A$. Hence

$$A' = {B^T}AB$$
This is rather bizarre. Why would the same object -- a matrix -- transform differently based on how it's used?

The answer is that these are really two different objects that just correspond to the same matrix in a particular co-ordinate representation. The first, the object that transformed as $B^{-1}AB$, is an object that maps vectors to other vectors. The second is an object that maps two vectors to a scalar.

In an introductory linear algebra course, this problem is avoided, because quadratic forms $A$ are required to be symmetric (so the matrix $B$ is orthogonal, and $B^{-1}=B^T$. But this is not true of tensors in general (here's an idea I haven't thought much on -- could you define quadratic forms on a "non-commutative field" (i.e. division ring)? -- then you would have no choice but to entertain asymmetric quadratic forms).

These objects we're talking about are tensors. A tensor representing a quadratic form is not the same as a tensor representing a standard vector transformation, because they only have the same representation (i.e. the same matrix) in a specific co-ordinate basis. Change your basis, and voila! The representation has transformed away, into something entirely different.

There's a convenient notation used to distinguish between these kinds of tensors, called index notation. Representing vectors as ${v^i}$ for index $i$ that runs between 1 and the dimension of the vector space, we write

$$A_i^j{v^i} = {w^j}$$
For the vanilla linear transformation tensor -- $j$ can take on new indices, if we're dealing with non-square matrices, but this is not why we use a different index -- we use a different index because $i$ and $j$ can independently take on distinct value. Meanwhile,

$$\sum\limits_{i,j}^{} {{A_{ij}}{v^i}{v^j}} = c$$
A few observations to define our notation:

Note how we really just treat $v^i$, etc. as the $i$th component of the vector $v$, as the notation suggests. This is very useful, because it means we don't need to remember the meanings of fancy new products, etc. -- just write stuff down in terms of components. This is also why order no longer matters in this notation -- the fancy rules regarding matrix multiplication are now irrelevant, our multiplication is all scalar, and the rules are embedded into the way we calculate these products.
An index, if repeated once on top and once at the bottom anywhere throughout the expression, ends up cancelling out. This is the point of choosing stuff to go on top and stuff to go below. E.g.
If you remove the summation signs, things look a lot more like the expressions with vectors directly (i.e. not component-wise).

(1) cannot be emphasised enough -- when we do this product, ${v^i}{w_j} = A_j^i$, what we're really doing is multiplying two vectors to get a rank-2 tensor. When we multiply $v_iw^i=c$, we're multiplying a covector by a vector, and get a rank-0 tensor (a scalar). The row vector/column vector notation and multiplication rules are just notation that helps us yield the same result -- we represent the first as a column vector multiplied by a row vector, and the second as a row vector multiplied by a column vector. Note that this does not really correspond to the positioning of the indices -- $v_iw^j$ also gives you a rank 2 tensor, since you can swap around the order of multiplication in tensor notation -- this is because here we're really operating with the scalar components of $v$, $w$ and $A$, and scalar multiplication commutes.

If we were to use standard matrix form and notation to denote $A_j^i$, would $j$ denote which column you're in or which row you're in?

A demonstration for (3) is the dot product between vectors $v^i$ and $w^i$, $\sum\limits_i {{v_i}{w^i}} $ where writing $i$ is a subscript represents a covector (typically represented as a row vector). This certainly looks a lot nicer just written as ${{a_i}{b^i}}$ -- like you're just multiplying the vectors together.

This -- omitting the summation sign when you have repeated indices -- half of them on top and the other half at the bottom -- is called the Einstein summation convention.

An important terminology to mention here -- you can see that the summation convention introduces two different kind of indices, unsummed and summed -- the first is called a "free index", because you can vary the index within some range (typically 1 to the dimension of the space, which it will mean throughout this article set unless stated otherwise, but sometimes the equation might hold only for a small range of the index), and the second is called a dummy index (because it gets summed over anyway and holds no relevance to the result).

Question 1

Represent the following in tensor index notation, with or without the summation convention.

$\vec a + \vec b$
$\vec v \cdot \vec w$
$|v{|^2}$
$AB=C$
$\vec{v}=v^1\hat\imath+v^2\hat\jmath+v^3\hat{k}$
$B = {A^T}$
$\mathrm{tr}A$
The pointwise product of two vectors, e.g. $\left[ {\begin{array}{*{20}{c}}a\\b\end{array}} \right]\wp \left[ {\begin{array}{*{20}{c}}c\\d\end{array}} \right] = \left[ \begin{array}{l}ac\\bd\end{array} \right]$
${v^T}Qv = q$

Feel free to define your own tensors if necessary to solve any of these problems.

Question 2

The fact that the components of a vector and its corresponding covector are identical, i.e. that ${v_i} = {v^i}$, has been a feature of Euclidean geometry, which is the geometry we've studied so far. The point of defining things in this way is that the value of ${w_i}{v^i}$, the Euclidean dot product, is then invariant under rotations, which are a very important kind of linear transformation.

However in relativity, Lorentz transformations, which are combinations of skews between the t and spatial axes and rotations of the spatial axes, are the important kinds of transformations. This is OK, because rotations are really just complex skews. The invariant under this Lorentz transformation is also called a dot product, but defined slightly differently:

$$\left[ \begin{array}{l}{t}\\{x}\\{y}\\{z}\end{array} \right] \cdot \left[ \begin{array}{l}{E}\\{p_x}\\{p_y}\\{p_z}\end{array} \right] = - {E}{t} + {p_x}{x} + {p_y}{y} + {p_z}{z}$$

Therefore we define covectors in a way that negates their zeroth (called "time-like" -- e.g. $t$) component. I.e. ${v_0} = - {v^0}$. For instance if the vector in question is

$$\left[ \begin{array}{l}t\\x\\y\\z\end{array} \right]$$
Then the covector is

$$\left[ \begin{array}{l} - t\\x\\y\\z\end{array} \right]$$
These are called the covariant and contravariant components of a vector respectively.

The dot product is then calculated normally as ${v_i}{w^i}$, and is invariant under Lorentz transformations like the Euclidean dot product is invariant under spatial rotations. Similarly, the norm (called the Minkowski norm) is calculated as $(v_iv^i)^{1/2}$.

But what if we wished, for some reason, to calculate a Euclidean norm or Euclidean dot product? How would we represent that in index notation?

(More on dot products on Minkowski geometry)

Answers to Question 1

${a_i} + {b_i}$
${a_i}{b^i}$
$a_ia^i$
$C_j^i = A_k^iB_j^k$
${v^i} = {v^j}\delta _j^i$
$B^i_j=A^j_i$ (or alternatively $B_{ij}=A_{ji}$, etc.)
$A_i^i$
$z_k = \wp_{ijk}x^iy^j$ where $\wp_{ijk}$ is a rank-3 tensor which is 0 unless $i=j=k$, in which case it's 1.
${v^i}{Q_{ij}}{v^j} = q$

Another rank-3 tensor, the Le Cevita symbol. Let's not call it a
"3-dimensional tensor", since that just means the indices all
range from 1 to 3 (or any other three integer values)

Answer to Question 2

Euclidean dot product: $\eta _i^i{v_i}{w^i}$
Euclidean norm: $\eta _i^i{v_i}{v^i}$

Where

$$\eta _i^j = \left[ {\begin{array}{*{20}{c}}{ - 1} & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{array}} \right]$$

We really want its inverse in the above two formulae, but they happen to be equal in the basis we're using, where $c=1$.

...is the Minkowski metric tensor, the Minkowski analog of the Dirac Delta function, and contains the dot products of the basis vectors as its components.

For this reason, we actually call dot products, cross products, pointwise products, etc. tensors themselves. For instance, the Euclidean dot product is $\delta_{ij}$, the Minkowski dot product is $\eta_{ij}$, the pointwise product we mentioned earlier is a rank 3 tensor $\wp_{ijk}$, and as we will see, the cross product is also a rank 3 tensor $\epsilon_{ijk}$. In fact, it is conventional to define different dot products based on what transformations are important, so the dot product is invariant under this transformation. If rotations are important, use the circular, Euclidean dot product. If skews are important for one dimension and rotations for the other three, as it is in relativity, use the hyperbolic, Minkowski dot product.

Relabeling of indices

In solving things with standard summation-y notation, you might've often noticed it to be useful to group certain terms together. For instance, if you have

$$\sum\limits_{i = 1}^n {x_i^2} + \sum\limits_{j = 1}^n {y_j^2} = \sum\limits_{k = 1}^n {2{x_k}{y_k}} $$
It might be useful to rewrite this as

$$\sum\limits_{i = 1}^n {(x_i^2 + y_i^2 - 2{x_i}{y_i})} = 0$$
What we did here, implicitly, was change the indices $j$ and $k$ to $i$. This is possible, because the summed indices vary between the same limits. In Einstein notation, the first sum would have been

$${x_i}{x^i} + {y_j}{y^j} = 2{x_k}{y^k}$$
And the relabelling was ${x_i}{x^i} + {y_i}{y^i} = 2{x_i}{y^i}$. We will do this all the time, so get used to it.

Even when the ranges of the indices are not the same, you can add or subtract a few terms to make the indices the same. E.g. if in ${x_i}{x^i} + {y_j}{y^j} = 2{x_k}{y^k}$, $k$ ranges between 1 and 3 while $i$ and $j$ range between 0 and 3, then we can write

$${x_i}{x^i} + {y_j}{y^j} = 2{x_m}{y^m} - 2{x_0}{y^0}$$
And then relabel, where $m$ ranges between 0 and 3.

Covectors, conjugates, and the metric tensor

Introduction to tensors and index notation