### Introduction to tensors and index notation

When you learned linear algebra, you learned that under a passive transformation $B$, a scalar $c$ remained $c$, vector $v$ transformed as $B^{-1}v$ and a matrix transformed as $B^{-1}AB$. That was all nice and simple, until you learned about quadratic forms. Matrices can be used to represent quadratic forms too, in the form

$$v^TAv=c$$
Now under the passive transformation $B$, $c\to c$, $v\to B^{-1}v$ and $v^T\to v^T\left(B^T\right)^{-1}$. Let $A'$ be the matrix such that $A\to A'$. Then

$${v^T}{\left( {{B^T}} \right)^{ - 1}}A'{B^{ - 1}}v = c = {v^T}Av$$
As this must be true for all vectors $v$, this means ${\left( {{B^T}} \right)^{ - 1}}A'{B^{ - 1}} = A$. Hence

$$A' = {B^T}AB$$
This is rather bizarre. Why would the same object -- a matrix -- transform differently based on how it's used?

The answer is that these are really two different objects that just correspond to the same matrix in a particular co-ordinate representation. The first, the object that transformed as $B^{-1}AB$, is an object that maps vectors to other vectors. The second is an object that maps two vectors to a scalar.

In an introductory linear algebra course, this problem is avoided, because quadratic forms $A$ are required to be symmetric (so the matrix $B$ is orthogonal, and $B^{-1}=B^T$. But this is not true of tensors in general (here's an idea I haven't thought much on -- could you define quadratic forms on a "non-commutative field" (i.e. division ring)? -- then you would have no choice but to entertain asymmetric quadratic forms).

These objects we're talking about are tensors. A tensor representing a quadratic form is not the same as a tensor representing a standard vector transformation, because they only have the same representation (i.e. the same matrix) in a specific co-ordinate basis. Change your basis, and voila! The representation has transformed away, into something entirely different.

There's a convenient notation used to distinguish between these kinds of tensors, called index notation. Representing vectors as ${v^i}$ for index $i$ that runs between 1 and the dimension of the vector space, we write

$$A_i^j{v^i} = {w^j}$$
For the vanilla linear transformation tensor -- $j$ can take on new indices, if we're dealing with non-square matrices, but this is not why we use a different index -- we use a different index because $i$ and $j$ can independently take on distinct value. Meanwhile,

$$\sum\limits_{i,j}^{} {{A_{ij}}{v^i}{v^j}} = c$$
A few observations to define our notation:
1. Note how we really just treat $v^i$, etc. as the $i$th component of the vector $v$, as the notation suggests. This is very useful, because it means we don't need to remember the meanings of fancy new products, etc. -- just write stuff down in terms of components. This is also why order no longer matters in this notation -- the fancy rules regarding matrix multiplication are now irrelevant, our multiplication is all scalar, and the rules are embedded into the way we calculate these products.
2. An index, if repeated once on top and once at the bottom anywhere throughout the expression, ends up cancelling out. This is the point of choosing stuff to go on top and stuff to go below. E.g.
3. If you remove the summation signs, things look a lot more like the expressions with vectors directly (i.e. not component-wise).

(1) cannot be emphasised enough -- when we do this product, ${v^i}{w_j} = A_j^i$, what we're really doing is multiplying two vectors to get a rank-2 tensor. When we multiply $v_iw^i=c$, we're multiplying a covector by a vector, and get a rank-0 tensor (a scalar). The row vector/column vector notation and multiplication rules are just notation that helps us yield the same result -- we represent the first as a column vector multiplied by a row vector, and the second as a row vector multiplied by a column vector. Note that this does not really correspond to the positioning of the indices -- $v_iw^j$ also gives you a rank 2 tensor, since you can swap around the order of multiplication in tensor notation -- this is because here we're really operating with the scalar components of $v$, $w$ and $A$, and scalar multiplication commutes.

If we were to use standard matrix form and notation to denote $A_j^i$, would $j$ denote which column you're in or which row you're in?

A demonstration for (3) is the dot product between vectors $v^i$ and $w^i$, $\sum\limits_i {{v_i}{w^i}}$ where writing $i$ is a subscript represents a covector (typically represented as a row vector). This certainly looks a lot nicer just written as ${{a_i}{b^i}}$ -- like you're just multiplying the vectors together.

This -- omitting the summation sign when you have repeated indices -- half of them on top and the other half at the bottom -- is called the Einstein summation convention.

An important terminology to mention here -- you can see that the summation convention introduces two different kind of indices, unsummed and summed -- the first is called a "free index", because you can vary the index within some range (typically 1 to the dimension of the space, which it will mean throughout this article set unless stated otherwise, but sometimes the equation might hold only for a small range of the index), and the second is called a dummy index (because it gets summed over anyway and holds no relevance to the result).

Question 1

Represent the following in tensor index notation, with or without the summation convention.
1. $\vec a + \vec b$
2. $\vec v \cdot \vec w$
3. $|v{|^2}$
4. $AB=C$
5. $\vec{v}=v^1\hat\imath+v^2\hat\jmath+v^3\hat{k}$
6. $B = {A^T}$
7. $\mathrm{tr}A$
8. The pointwise product of two vectors, e.g. $\left[ {\begin{array}{*{20}{c}}a\\b\end{array}} \right]\wp \left[ {\begin{array}{*{20}{c}}c\\d\end{array}} \right] = \left[ \begin{array}{l}ac\\bd\end{array} \right]$
9. ${v^T}Qv = q$

Feel free to define your own tensors if necessary to solve any of these problems.

Question 2

The fact that the components of a vector and its corresponding covector are identical, i.e. that ${v_i} = {v^i}$, has been a feature of Euclidean geometry, which is the geometry we've studied so far. The point of defining things in this way is that the value of ${w_i}{v^i}$, the Euclidean dot product, is then invariant under rotations, which are a very important kind of linear transformation.

However in relativity, Lorentz transformations, which are combinations of skews between the t and spatial axes and rotations of the spatial axes, are the important kinds of transformations. This is OK, because rotations are really just complex skews. The invariant under this Lorentz transformation is also called a dot product, but defined slightly differently:

$$\left[ \begin{array}{l}{t}\\{x}\\{y}\\{z}\end{array} \right] \cdot \left[ \begin{array}{l}{E}\\{p_x}\\{p_y}\\{p_z}\end{array} \right] = - {E}{t} + {p_x}{x} + {p_y}{y} + {p_z}{z}$$

Therefore we define covectors in a way that negates their zeroth (called "time-like" -- e.g. $t$) component. I.e. ${v_0} = - {v^0}$. For instance if the vector in question is

$$\left[ \begin{array}{l}t\\x\\y\\z\end{array} \right]$$
Then the covector is

$$\left[ \begin{array}{l} - t\\x\\y\\z\end{array} \right]$$
These are called the covariant and contravariant components of a vector respectively.

The dot product is then calculated normally as ${v_i}{w^i}$, and is invariant under Lorentz transformations like the Euclidean dot product is invariant under spatial rotations. Similarly, the norm  (called the Minkowski norm) is calculated as $(v_iv^i)^{1/2}$.

But what if we wished, for some reason, to calculate a Euclidean norm or Euclidean dot product? How would we represent that in index notation?

(More on dot products on Minkowski geometry)

1. ${a_i} + {b_i}$
2. ${a_i}{b^i}$
3. $a_ia^i$
4. $C_j^i = A_k^iB_j^k$
5. ${v^i} = {v^j}\delta _j^i$
6. $B^i_j=A^j_i$ (or alternatively $B_{ij}=A_{ji}$, etc.)
7. $A_i^i$
8. $z_k = \wp_{ijk}x^iy^j$ where $\wp_{ijk}$ is a rank-3 tensor which is 0 unless $i=j=k$, in which case it's 1.
9. ${v^i}{Q_{ij}}{v^j} = q$

 Another rank-3 tensor, the Le Cevita symbol. Let's not call it a "3-dimensional tensor", since that just means the indices all range from 1 to 3 (or any other three integer values)

Euclidean dot product: $\eta _i^i{v_i}{w^i}$
Euclidean norm: $\eta _i^i{v_i}{v^i}$

Where

$$\eta _i^j = \left[ {\begin{array}{*{20}{c}}{ - 1} & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{array}} \right]$$

We really want its inverse in the above two formulae, but they happen to be equal in the basis we're using, where $c=1$.

...is the Minkowski metric tensor, the Minkowski analog of the Dirac Delta function, and contains the dot products of the basis vectors as its components.

For this reason, we actually call dot products, cross products, pointwise products, etc. tensors themselves. For instance, the Euclidean dot product is $\delta_{ij}$, the Minkowski dot product is $\eta_{ij}$, the pointwise product we mentioned earlier is a rank 3 tensor $\wp_{ijk}$, and as we will see, the cross product is also a rank 3 tensor $\epsilon_{ijk}$. In fact, it is conventional to define different dot products based on what transformations are important, so the dot product is invariant under this transformation. If rotations are important, use the circular, Euclidean dot product. If skews are important for one dimension and rotations for the other three, as it is in relativity, use the hyperbolic, Minkowski dot product.

Relabeling of indices

In solving things with standard summation-y notation, you might've often noticed it to be useful to group certain terms together. For instance, if you have

$$\sum\limits_{i = 1}^n {x_i^2} + \sum\limits_{j = 1}^n {y_j^2} = \sum\limits_{k = 1}^n {2{x_k}{y_k}}$$
It might be useful to rewrite this as

$$\sum\limits_{i = 1}^n {(x_i^2 + y_i^2 - 2{x_i}{y_i})} = 0$$
What we did here, implicitly, was change the indices $j$ and $k$ to $i$. This is possible, because the summed indices vary between the same limits. In Einstein notation, the first sum would have been

$${x_i}{x^i} + {y_j}{y^j} = 2{x_k}{y^k}$$
And the relabelling was ${x_i}{x^i} + {y_i}{y^i} = 2{x_i}{y^i}$. We will do this all the time, so get used to it.

Even when the ranges of the indices are not the same, you can add or subtract a few terms to make the indices the same. E.g. if in ${x_i}{x^i} + {y_j}{y^j} = 2{x_k}{y^k}$, $k$ ranges between 1 and 3 while $i$ and $j$ range between 0 and 3, then we can write

$${x_i}{x^i} + {y_j}{y^j} = 2{x_m}{y^m} - 2{x_0}{y^0}$$
And then relabel, where $m$ ranges between 0 and 3.