Showing posts with label tensors. Show all posts
Showing posts with label tensors. Show all posts

Curvature is just the Hessian

If you recall some basic calculus, the gradient of a scalar function $f(x_1,\dots x_n)$ is just the generalization of the derivative: 

$$f'(x_1,\dots x_n) =\left[\begin{array}{}\frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{array} \right] $$

And the Hessian of a scalar function $f(x_1,\dots x_n)$ is just the generalization of the second derivative:

$$f''(x_1,\dots x_n) =\left[\begin{array}{}\frac{\partial^2 f}{\partial x_1^2} & \dots & \frac{\partial^2 f}{\partial x_1\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1} & \dots & \frac{\partial^2 f}{\partial x_n^2} \end{array} \right] $$

Why is this interesting? Consider just $f$ quadratic -- then just like in one dimension, $f$ can be written only in terms of its value, derivative, and second derivative at 0, $f$ can be written only in terms of its value, gradient and Hessian at 0. 

$$  \begin{align} f(x,y) &= c + (c_1x+c_2y) + (c_{11}x^2 + c_{12}xy + c_{21}y^2) \\ &= f(0) + \left(f_x(0)x+f_y(0)y\right) + \frac12 \left(f_{xx}(0)x^2+2f_{xy}(0)xy+ f_{yy}(0)y^2\right) \\ f(\mathbf{x}) &= f(0) + f'(0)\cdot \mathbf{x} + \mathbf{x}\cdot f''(0) \mathbf{x} \end{align}$$

What this tells us is:

  • The gradient is naturally thought of as a linear form.
  • The Hessian is naturally thought of as a quadratic form.

A what and a what?

There are two ways of thinking of a thing like $\left[\begin{array}{}a \\ b \end{array} \right]$ -- a vector $a\mathbf{e}_1+b\mathbf{e}_2$, or a linear expression $ax_1+bx_2$, a function on $x_1,x_2$. The former is an object in the space $\mathbb{R}^n$, while the latter is a function $\mathbb{R}^n\to\mathbb{R}$ (do you see why?).  

Similarly, there are two ways of thinking of a matrix $\left[\begin{array}{}a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right]$ -- a linear transformation $\mathbb{R}^n\to\mathbb{R}^n$, or a quadratic expression $a_{11}x^2+(a_{12}+a_{21})xy+a_{22}y^2$, which is a function  on $x_1,x_2$, a function $\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R}$ (do you see why?). 

This is what duality is in linear algebra. Also in tensor notation, vectors are $v_i$ while linear forms are $v^i$; linear transformations are $A_i^j$, quadratic forms are $A_{ij}$ -- see also. Don't bother with this if you don't want to.

So e.g. the gradient should naturally be thought of as a function that, given some vector as input, gives you the directional derivative in the direction of that vector.

(Make sure you understand this very clearly.)

Similarly, the Hessian should be thought of as a function that, given two vectors as input, gives the second derivative in their directions $f_{xy}$.

(Make sure you understand this VERY clearly.)

Now suppose we wanted to talk about the curvature of a surface.

We know that the curvature of some curve $\phi(t)$ at the point $t=0$ is $\phi''(0)$. Naturally, we'd like the "curvature of a surface" would be something of a function that gives you the curvature in each direction -- that gives you the second derivative in each direction. So naturally, you'd want something like the Hessian.

I'm not sure if the cross-derivative $f_{xy}$, i.e. $A(X, Y)$, has any natural geometric interpretation. Does this have anything to do with torsion? Does $A(X, Y)$ ever come of use?

So we'd like to define some quadratic form $A$ such that $\phi'(0)\cdot A \phi'(0)$ is the curvature $\phi''(0)$. Actually, it should just be the normal curvature, the component of $\phi''(0)$ normal to the surface, the sort of curvature that can be attributed entirely to the surface, rather than to the curve wiggling around on the surface.

[For whomsoever it may concern, Theorem 10.4 in your notes is what computes this quadratic form $A$ as the differential of the Gauss map, and is what motivates the Gauss map in the first place. This is why you should start with the last chapter and read backwards.]

Moments as tensors

We discussed the second multivariate moment a bit haphazardly in the last article. In general, we'd like a nice way of expressing the general moment (i.e. multivariate cross-moment).

Let $X=(X^1,\ldots X^n)$ be a vector of random variables, and consider their $p$th order moments ($p\le n$) -- these form a rank-$p$ tensor of dimension $n$, the moment tensor, given by:

$$Mp[X]^{j_1\ldots j_p}=\mathrm{E}(X^{j_1}\ldots X^{j_p})$$
(e.g. $p=1$ gives you the mean vector, $p=2$ gives you the badly-named auto"correlation" matrix) And the central moments form a similar tensor, the central moment tensor, given by:

$$mp[X]^{j_1\ldots j_p}=\mathrm{E}\left((X^{j_1}-EX^{j_1})\ldots (X^{j_p}-EX^{j_p})\right)$$
(e.g. $p=1$ gives you zero, annoyingly, but $p=2$ gives you the covariance matrix aka autocovariance matrix) But, well, each random variable $X^i$ can also be understood as a vector, remember? Let's write $X^i=(X^i_\alpha)$ for $\alpha$ a pseudo-index that represents the idea that $X^i$ is a vector (I guess this is really Penrose (abstract index) notation rather than Einstein notation).

Actually, let's also make the following extension to tensor notation: every Greek index is summed over, regardless of whether/how many times it's repeated and where -- and we take the expectation instead of the sum (which is like a normalized sum, or some sort of a trace). So we write:

$$Mp[X]^{j_1\ldots j_p}=X^{j_p}_\alpha\ldots X^{j_p}_\alpha$$$$mp[X]^{j_1\ldots j_p}=(X^{j_p}_\alpha-X^{j_p}_{\alpha_1})\ldots (X^{j_p}_\alpha-X^{j_p}_{\alpha_p})$$
Where we use different dummy indices $\alpha_1,\ldots\alpha_n$ to indicate that these are summed over earlier (since they're not repeated again in the expression). These changes to index notation are all an artifact of the fact that random variables are not really "fundamentally quadratic", but rather "fundamentally $p$-normed".



OK -- so that's the univariate cross-moment -- it can also be considered a multivariate moment, the moment of the random vector $X$ -- its mean is the mean vector, its variance is the covariance matrix, etc. What about cross moments between random vectors? And you can imagine that once we have that, we'll call it a moment of a random rank-2 tensor, and so on.

What we're really looking for is the moment of a random tensor. This is a rank $pq$ tensor where $p$ is the degree of the moment and $q$ is the rank of the random tensor. As an example, when $p=2$ and $q=2$, one gets a rank 4 tensor consisting of cross-covariance (and autocovariance) matrices.

Note that this is not at all some unnecessary generalisation -- measuring the correlation between random vectors is a thing with very significant practical implication.

For example, a time series is a random vector -- its covariance matrix represents its internal correlations (how well its current value predicts a future value), but often we're interested in looking at correlations between time series -- how does the price of gold correlate with the price of S&P 500, etc. Then this cross-covariance matrix will be a bivariate function of $(t_1,t_2)$, called the cross-correlation function.

Covectors, conjugates, and the metric tensor

The fact -- as is often introduced in an introductory general relativity or tensor calculus course -- that the gradient is a covector seems rather bizarre to someone who's always seen the gradient as the "steepest ascent vector". Surely, the direction of steepest ascent is, you know, a direction -- an arrow. And what even is a covector, anyway?

Let's think about differentiating with respect to vectors. The idea we have is that $\frac{\partial f}{\partial \vec x}$ needs to contain all the information -- each of the $\frac{\partial f}{\partial x_i}$. And analogously for derivatives with respect to tensors. You might think we could just create an array with the same dimensions containing each derivative -- much like the gradient, Hessian, etc. that we're used to -- i.e.

$$\nabla f=\left[ {{\partial ^i}f} \right]$$
$$\nabla^2f = \left[ {{\partial ^i}{\partial ^j}f} \right]$$
(I'm using $\nabla^2$ for the Hessian -- and will do so in the rest of the article -- but it's too widely used for its trace the Laplacian, which should be represented as $|\nabla|^2$) etc. But you might get the sense that this feels just fundamentally wrong -- like you're giving the "division by tensor" object the structure of the same tensor, but you should somehow be giving it an "inverse" structure.

We want to construct a situation to see that the idea above -- of making the gradient ("derivative with respect to a vector") and Hessian ("derivative with respect to a rank-2 tensor") -- a vector and a rank-tensor doesn't work. We know such a situation can arrive when we have multiplication between the gradient and a vector, or the Hessian and a rank-2 tensor. For instance, for linear $f$:

$$f(\vec{x})-f(0)=\vec{x}\cdot\nabla f$$
But this is wrong -- for any non-Euclidean manifold. For instance, if the metric tensor is something like $\rm{diag}(-1,1)$, this dot product gives:

$$f(\vec{x})-f(0)= - x\frac{{\partial f}}{{\partial x}} + y\frac{{\partial f}}{{\partial y}}$$
Which is just wrong. So instead, the gradient is a covector, which we represent in Einstein notation using subscripts instead of superscripts:

$$f(\vec x) - f(0) = {x^i}{\partial _i}f$$
(As you can see, I omitted Einstein notation when I was writing the wrong equations -- seeing repeated indices on the same vertical alignment is physically painful.) If we want the vector gradient -- for direction of steepest ascent or whatever -- you need to multiply by the metric tensor.

This also motivates the picture of seeing covectors as parallel surfaces whose normals are their vector versions -- in Euclidean geometry, it doesn't make a difference, but on a general setting, this normality is a bit weird. Think about this.


But I haven't really given a motivation for the metric tensor or how it comes up here -- for this, read on.



Let's talk about something completely different -- let's think about the derivative of functions from $\mathbb{C}\to\mathbb{R}$, $df/dz$. I don't know about you, but I like the complex numbers, and prefer them to $\mathbb{R}^2$, because pretty much anything I write with the complex numbers is well-defined, and easily so -- so I don't need to worry about whether $df/d \vec{x}$ makes any sense or not. Well, we can write:

$$\frac{df}{dz}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial z}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial z}\\\Rightarrow \frac{df}{dz}=\frac{\partial f}{\partial x}-\frac{\partial f}{\partial y}i$$
This $df/dz$ above is exactly the analog of the gradient for real-valued functions defined on the complex plane -- analogous to scalar multivariable functions.

What's the expression for the complex derivative of a complex function? Compute it -- it may look a bit different from the analogous tensor derivative -- think of traces and commutators.

Note: In actual complex calculus, complex differentiability is defined in a more restrictive way -- specifically one needs to satisfy the Cauchy-Riemann equations, which makes the structure of complex functions fundamentally more special than that of multivariable functions, stuff like $dx/dz$ is even undefined, and the stuff we've written above isn't really relevant in complex analysis. It is, however, the "Wirtinger derivative".

Something interesting happened here, though -- we got a negative sign on the imaginary component of the derivative. The derivative got conjugated, or something -- and the reason this occurred is that $i^2=-1$ (so $1/i=-i$), and this leaves some sort of signature in our derivative.

Now let's (non-rigorous alert!) think about how an analogous argument may be written for vectors.

$$\frac{{df}}{{d\vec x}} = \frac{{\partial f}}{{\partial x}}\frac{{\partial x}}{{\partial \vec x}} + \frac{{\partial f}}{{\partial y}}\frac{{\partial y}}{{\partial \vec x}}$$
What really is $\frac{{\partial x}}{{\partial \vec x}}$, though? We know that $\frac{\partial \vec x}{\partial x}=\vec{e_x}$. But what's the "inverse" of a vector? What does that even mean?

So we want to define some sort of a product, or multiplication, with vectors -- we want to define a thing that when multiplied by a vector gives a scalar. It sounds like we're talking about a dot product -- but the dot product lacks an important property we need to have division, it's not injective. I.e. $\vec{a}\cdot\vec{b}=c$ for fixed $\vec{a}$ and $c$ defines a whole plane of vectors $\vec{b}$, not a unique one. But if we added an additional component to our product, the cross product (or in more than three dimensions, the wedge product), then the "dot product and cross product combined" is injective.

This combination, of course, is the tensor product. Specifically, when we're talking about something like $1/\vec{e_x}$, we want a thing whose tensor product with $\vec{e_x}$ has trace (dot product) 1 and commutator (wedge/cross product) 0, i.e. $\mathrm{tr}(\vec{e_x}'\vec{e_x})=1$ and $(\vec{e_x}'\vec{e_x})-(\vec{e_x}'\vec{e_x})^T=0$.

If all you've ever done in your life is Euclidean geometry, you'd probably think the answer to this question is $\vec{e_x}$ itself -- indeed, its dot product with $\vec{e_x}$ is 1 and its cross product with $\vec{e_x}$ is 0. But if you've ever done relativity and dealt with -- forget curved manifolds! -- the Minkowski manifold, you know that this is not necessarily true -- it depends on the metric tensor.

Could we define a vector in a general co-ordinate system that is the inverse of $\vec{e_x}$? Yes, we can. But let's not do that (yet*) -- it just seems like there should be something more natural, or elegant, like we had with complex numbers.

So we define a space of "covectors", as "scalars divided by vectors" (informally speaking), call their basis $\tilde{e^i}$ which have the required dot and cross products. In Euclidean space -- and only in Euclidean space, these look exactly the same as vectors, and have exactly the same components. I like to call the conjugation here "metric conjugation", and the gradient is naturally a covector.

*As for the question of writing the gradient as a vector instead, this follows naturally using the metric tensor -- as an exercise, show, by considering the required vector corresponding to the covector $\tilde{e^x}$ (i.e. that has the right dot and cross products with $\vec{e_x}$) that the vector gradient can be given as the product of the inverse metric tensor and the covector gradient:

$${\partial ^\mu }f = {g^{\mu \nu }}{\partial _\nu }f$$
(Do this exercise! It is the motivation for the metric tensor, and why it determines your co-ordinate system!)



I've been talking about the covector $\tilde{e^x}$ as being equal to the quotient "$1/\vec{e_x}$" but as I mentioned, this isn't really accurate -- the "1" in the quotient is a (1,1) tensor with trace 1 and commutator 0. Think about this tensor. Can you find this tensor in Clifford algebra? Maybe not. Can you find it as a linear transformation? Yes? Find it. And can you think of the covector alternatively as a quotient of a bivector and a trivector? Will you get $(e_y\wedge e_z)/(e_x\wedge e_y\wedge e_z)$?

Introduction to tensors and index notation

When you learned linear algebra, you learned that under a passive transformation $B$, a scalar $c$ remained $c$, vector $v$ transformed as $B^{-1}v$ and a matrix transformed as $B^{-1}AB$. That was all nice and simple, until you learned about quadratic forms. Matrices can be used to represent quadratic forms too, in the form

$$v^TAv=c$$
Now under the passive transformation $B$, $c\to c$, $v\to B^{-1}v$ and $v^T\to v^T\left(B^T\right)^{-1}$. Let $A'$ be the matrix such that $A\to A'$. Then

$${v^T}{\left( {{B^T}} \right)^{ - 1}}A'{B^{ - 1}}v = c = {v^T}Av$$
As this must be true for all vectors $v$, this means ${\left( {{B^T}} \right)^{ - 1}}A'{B^{ - 1}} = A$. Hence

$$A' = {B^T}AB$$
This is rather bizarre. Why would the same object -- a matrix -- transform differently based on how it's used?

The answer is that these are really two different objects that just correspond to the same matrix in a particular co-ordinate representation. The first, the object that transformed as $B^{-1}AB$, is an object that maps vectors to other vectors. The second is an object that maps two vectors to a scalar.

In an introductory linear algebra course, this problem is avoided, because quadratic forms $A$ are required to be symmetric (so the matrix $B$ is orthogonal, and $B^{-1}=B^T$. But this is not true of tensors in general (here's an idea I haven't thought much on -- could you define quadratic forms on a "non-commutative field" (i.e. division ring)? -- then you would have no choice but to entertain asymmetric quadratic forms).

These objects we're talking about are tensors. A tensor representing a quadratic form is not the same as a tensor representing a standard vector transformation, because they only have the same representation (i.e. the same matrix) in a specific co-ordinate basis. Change your basis, and voila! The representation has transformed away, into something entirely different.

There's a convenient notation used to distinguish between these kinds of tensors, called index notation. Representing vectors as ${v^i}$ for index $i$ that runs between 1 and the dimension of the vector space, we write

$$A_i^j{v^i} = {w^j}$$
For the vanilla linear transformation tensor -- $j$ can take on new indices, if we're dealing with non-square matrices, but this is not why we use a different index -- we use a different index because $i$ and $j$ can independently take on distinct value. Meanwhile,

$$\sum\limits_{i,j}^{} {{A_{ij}}{v^i}{v^j}}  = c$$
A few observations to define our notation:
  1. Note how we really just treat $v^i$, etc. as the $i$th component of the vector $v$, as the notation suggests. This is very useful, because it means we don't need to remember the meanings of fancy new products, etc. -- just write stuff down in terms of components. This is also why order no longer matters in this notation -- the fancy rules regarding matrix multiplication are now irrelevant, our multiplication is all scalar, and the rules are embedded into the way we calculate these products.
  2. An index, if repeated once on top and once at the bottom anywhere throughout the expression, ends up cancelling out. This is the point of choosing stuff to go on top and stuff to go below. E.g. 
  3. If you remove the summation signs, things look a lot more like the expressions with vectors directly (i.e. not component-wise).

(1) cannot be emphasised enough -- when we do this product, ${v^i}{w_j} = A_j^i$, what we're really doing is multiplying two vectors to get a rank-2 tensor. When we multiply $v_iw^i=c$, we're multiplying a covector by a vector, and get a rank-0 tensor (a scalar). The row vector/column vector notation and multiplication rules are just notation that helps us yield the same result -- we represent the first as a column vector multiplied by a row vector, and the second as a row vector multiplied by a column vector. Note that this does not really correspond to the positioning of the indices -- $v_iw^j$ also gives you a rank 2 tensor, since you can swap around the order of multiplication in tensor notation -- this is because here we're really operating with the scalar components of $v$, $w$ and $A$, and scalar multiplication commutes.

If we were to use standard matrix form and notation to denote $A_j^i$, would $j$ denote which column you're in or which row you're in?

A demonstration for (3) is the dot product between vectors $v^i$ and $w^i$, $\sum\limits_i {{v_i}{w^i}} $ where writing $i$ is a subscript represents a covector (typically represented as a row vector). This certainly looks a lot nicer just written as ${{a_i}{b^i}}$ -- like you're just multiplying the vectors together.

This -- omitting the summation sign when you have repeated indices -- half of them on top and the other half at the bottom -- is called the Einstein summation convention.

An important terminology to mention here -- you can see that the summation convention introduces two different kind of indices, unsummed and summed -- the first is called a "free index", because you can vary the index within some range (typically 1 to the dimension of the space, which it will mean throughout this article set unless stated otherwise, but sometimes the equation might hold only for a small range of the index), and the second is called a dummy index (because it gets summed over anyway and holds no relevance to the result).

Question 1

Represent the following in tensor index notation, with or without the summation convention.
  1. $\vec a + \vec b$
  2. $\vec v \cdot \vec w$
  3. $|v{|^2}$
  4. $AB=C$ 
  5. $\vec{v}=v^1\hat\imath+v^2\hat\jmath+v^3\hat{k}$
  6. $B = {A^T}$
  7. $\mathrm{tr}A$
  8. The pointwise product of two vectors, e.g. $\left[ {\begin{array}{*{20}{c}}a\\b\end{array}} \right]\wp \left[ {\begin{array}{*{20}{c}}c\\d\end{array}} \right] = \left[ \begin{array}{l}ac\\bd\end{array} \right]$
  9. ${v^T}Qv = q$

Feel free to define your own tensors if necessary to solve any of these problems.

Question 2

The fact that the components of a vector and its corresponding covector are identical, i.e. that ${v_i} = {v^i}$, has been a feature of Euclidean geometry, which is the geometry we've studied so far. The point of defining things in this way is that the value of ${w_i}{v^i}$, the Euclidean dot product, is then invariant under rotations, which are a very important kind of linear transformation.

However in relativity, Lorentz transformations, which are combinations of skews between the t and spatial axes and rotations of the spatial axes, are the important kinds of transformations. This is OK, because rotations are really just complex skews. The invariant under this Lorentz transformation is also called a dot product, but defined slightly differently:

$$\left[ \begin{array}{l}{t}\\{x}\\{y}\\{z}\end{array} \right] \cdot \left[ \begin{array}{l}{E}\\{p_x}\\{p_y}\\{p_z}\end{array} \right] =  - {E}{t} + {p_x}{x} + {p_y}{y} + {p_z}{z}$$

Therefore we define covectors in a way that negates their zeroth (called "time-like" -- e.g. $t$) component. I.e. ${v_0} = - {v^0}$. For instance if the vector in question is

$$\left[ \begin{array}{l}t\\x\\y\\z\end{array} \right]$$
Then the covector is

$$\left[ \begin{array}{l} - t\\x\\y\\z\end{array} \right]$$
These are called the covariant and contravariant components of a vector respectively.

The dot product is then calculated normally as ${v_i}{w^i}$, and is invariant under Lorentz transformations like the Euclidean dot product is invariant under spatial rotations. Similarly, the norm  (called the Minkowski norm) is calculated as $(v_iv^i)^{1/2}$.

But what if we wished, for some reason, to calculate a Euclidean norm or Euclidean dot product? How would we represent that in index notation?

(More on dot products on Minkowski geometry)

Answers to Question 1
  1. ${a_i} + {b_i}$
  2. ${a_i}{b^i}$
  3. $a_ia^i$
  4. $C_j^i = A_k^iB_j^k$
  5. ${v^i} = {v^j}\delta _j^i$
  6. $B^i_j=A^j_i$ (or alternatively $B_{ij}=A_{ji}$, etc.)
  7. $A_i^i$
  8. $z_k = \wp_{ijk}x^iy^j$ where $\wp_{ijk}$ is a rank-3 tensor which is 0 unless $i=j=k$, in which case it's 1.
  9. ${v^i}{Q_{ij}}{v^j} = q$

Another rank-3 tensor, the Le Cevita symbol. Let's not call it a
"3-dimensional tensor", since that just means the indices all
range from 1 to 3 (or any other three integer values)

Answer to Question 2

Euclidean dot product: $\eta _i^i{v_i}{w^i}$
Euclidean norm: $\eta _i^i{v_i}{v^i}$

Where

$$\eta _i^j = \left[ {\begin{array}{*{20}{c}}{ - 1} & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{array}} \right]$$

We really want its inverse in the above two formulae, but they happen to be equal in the basis we're using, where $c=1$.

...is the Minkowski metric tensor, the Minkowski analog of the Dirac Delta function, and contains the dot products of the basis vectors as its components.

For this reason, we actually call dot products, cross products, pointwise products, etc. tensors themselves. For instance, the Euclidean dot product is $\delta_{ij}$, the Minkowski dot product is $\eta_{ij}$, the pointwise product we mentioned earlier is a rank 3 tensor $\wp_{ijk}$, and as we will see, the cross product is also a rank 3 tensor $\epsilon_{ijk}$. In fact, it is conventional to define different dot products based on what transformations are important, so the dot product is invariant under this transformation. If rotations are important, use the circular, Euclidean dot product. If skews are important for one dimension and rotations for the other three, as it is in relativity, use the hyperbolic, Minkowski dot product.

Relabeling of indices

In solving things with standard summation-y notation, you might've often noticed it to be useful to group certain terms together. For instance, if you have

$$\sum\limits_{i = 1}^n {x_i^2}  + \sum\limits_{j = 1}^n {y_j^2}  = \sum\limits_{k = 1}^n {2{x_k}{y_k}} $$
It might be useful to rewrite this as

$$\sum\limits_{i = 1}^n {(x_i^2 + y_i^2 - 2{x_i}{y_i})}  = 0$$
What we did here, implicitly, was change the indices $j$ and $k$ to $i$. This is possible, because the summed indices vary between the same limits. In Einstein notation, the first sum would have been

$${x_i}{x^i} + {y_j}{y^j} = 2{x_k}{y^k}$$
And the relabelling was ${x_i}{x^i} + {y_i}{y^i} = 2{x_i}{y^i}$. We will do this all the time, so get used to it.

Even when the ranges of the indices are not the same, you can add or subtract a few terms to make the indices the same. E.g. if in ${x_i}{x^i} + {y_j}{y^j} = 2{x_k}{y^k}$, $k$ ranges between 1 and 3 while $i$ and $j$ range between 0 and 3, then we can write

$${x_i}{x^i} + {y_j}{y^j} = 2{x_m}{y^m} - 2{x_0}{y^0}$$
And then relabel, where $m$ ranges between 0 and 3.