### Introduction to special relativity

Often, one wonders why some major paradigm shift took so long to occur. We ponder this in the context of political economy, for instance -- with regards to the Neolithic Revolution (the invention of agriculture, circa 9000 BC) and the Industrial Revolution (which one may trace as the ultimate conclusion of a series of events that began with the end of feudalism in Europe in the 1400s).

We also often ponder this in the context of scientific achievements. Why, for instance, did it take till Einstein for an insight as key as relativity to be discovered? Why was it not discovered by Archimedes, or by some Rashtrakuta or Vijayanagara mathematician, or at least in the time of Newton?

While this question is sometimes tricky to answer, the question is very clear in the context of relativity.

Special Relativity was developed as a resolution to the failure of Galilean Relativity to accomodate the predictions of Maxwell's electromagnetism. It turns out that while Maxwell's electromagnetism was fine (it is "Lorentz invariant"), mechanics itself needed to be fixed. A key insight from relativity with regards to electromagnetism is, in fact, that magnetism is a relativistic effect. Magnetism is what you get when electricity undergoes a Lorentz transformation, i.e. when the charge starts moving. It is just like the effect of velocity on mass, for instance, or distance, or duration.

 (Source: XKCD - 1489)
The precise contradiction was as follows: Maxwell gives an absolute value for the speed of light, but Galileo says that no absolute speed exists -- it depends on the reference frame. For instance, if a train travelling at v with respect to the ground) blares light at speed c (in its own reference frame), then according to the ground's reference frame, the light should be travelling at c + v.

Light is not important! Even though the initial result that spurred relativity came from the theory of electromagnetism and light, relativity itself produces the same predictions for anything that travels at the same speed that light does -- any massless particle, in general. I.e. the curious case of light in relativity is not a result of its particle-physics-y properties, but its kinematic properties.

This prediction (by Galilean relativity) is fundamentally a result of the nature of the Galilean transformation (and this is the transformation that Einstein sought to change). This is the transformation that tells you how to transform co-ordinates (or really anything) between inertial reference frames with the same origin.

Suppose Observer $O'$ is moving at speed $v_{O'}$ with respect to Observer $O$. Now consider the time and position of some event $P$ to be $(t, x)$ in reference frame $O$. If we're dealing with four dimensions, then $x$ and $v$ are of course, three-vectors. Then what's the position of event $P$ according to $O'$? Well, at time $t$, $O'$ would be $v_{O'}t$ to the right of $O$, hence the position of $P$ would be measured as $(t,x-v_{O'}t)$.

This is the Galilean transformation

$$G(w):\,\,\left[ {\begin{array}{*{20}{c}}t\\x\end{array}} \right] \to \left[ {\begin{array}{*{20}{c}}t\\{x - wt}\end{array}} \right]$$
One may also write this as the matrix:

$$G(w) = \left[ {\begin{array}{*{20}{c}}1&0\\{ - w}&1\end{array}} \right]$$
Perhaps the asymmetry of the matrix bothers you. It bothers me too. And fortunately for us, it bothered Einstein too, and he actually did something about it, rather than rant about it on a blog. In fact, the asymmetry of this matrix corresponds directly to the asymmetry of time and space in Galilean relativity. This is pretty clear from the form of the Galilean transformation, and should also be obvious from your knowledge of linear algebra (if it's not, you should go and read up the first few chapters of the linear algebra course). As we will see, the symmetry between space and time will arise quite neatly from our postulates.

One may plot this transformation on a spacetime diagram. Below shows a spacetime diagram viewed from the perspective of $O$, where the transformed reference frame $O'$ is shown as well.

A spacetime diagram is essentially a displacement-time graph, where the displacement function is considered a transformation of the t-axis.We make the following observations:
• The t' curve is the worldline of observer $O'$, i.e. the path taken by $O'$ in spacetime. The $x$-axis is not transformed (this is the asymmetry we were talking about earlier).
• The x-axis is essentially the set of all events in spacetime such that $t=0$, i.e. are simultaneous to "the present". Surely, these points must be the same to all observers, since whether $t=0$ or $t=1$, or whatever value t holds, is independent of the observer?
• Within the reference frame $O$, $O'$'s reference frame seems squished up. But since no reference frame is special, within $O'$'s reference frame, $O'$ will look normal, and $O$ will look transformed, specifically by the velocity $-v$ (so the axis t is tilted from the "normal" $t'$ by the same angle in the opposite direction). This is just the inverse transformation.

So those were the Galilean transformations, which we know are incorrect (differentiate $x' = x-wt$ so you get $\dot{x}'=\dot{x}-w$ -- we know from Maxwell that this is incorrect with regards to light). Before we derive the correct transformations (called the "Lorentz transformations"), we'll first take a detour to prove some significant results in special relativity, which will also give ourselves an idea of how powerfully predictive our two axioms are.

(A note on notation: we will use units of distance and time such that the value of c is unity. For instance, lightseconds and seconds, etc. This is useful, because it eliminates c from our formulae and helps expose the symmetry between space and time.)

1. Nothing can travel faster than light

We consider a thought experiment, where an object $O'$ travelling at speed v (in reference frame $O$) releases light in the positive x-direction. The speed of this light is, of course, c in all reference frames. According to $O$ (i.e. an observer in $O$), the speed of this light is $c$, but the speed of light relative to the object is $c-v$.

That's okay. But now consider if $v>c$. Then $c-v$ is negative, i.e. $O$ observes the light ray being emitted at some speed in the other direction with respect to the object, i.e. it sees $O'$ farting out the light ray, rather than vomiting it up. Whereas according to $O'$, he is stationary, and the velocity of light is still $c$ in the positive x-direction.

 Simulation of how one would observe a "tachyon" -- a hypothetical particle that can move faster than light.
Why is this inconsistency problematic? Well, suppose there is some hi-tech wall somewhere further down the positive x-direction, which functions in the following way:
• If light is shone on it, the hi-tech wall stops working.
• If the object collides into the wall while it is working, it sets off blaring alarms and sends planes flying into buildings so everyone knows about the event.

According to Observer $O$, the object collides with the wall first, before the wall stops working. His children die in a plane crash and he ends up drunk and homeless.

According to Observer $O'$, however, he can never catch up with the light ray, and he bangs into a dysfunctional wall, and nothing happens. He turns around and waves at $O$, who is sober.

We have ended up at a logical contradiction, and our only solution is to say that such an object that travels faster than light, $O'$, does not exist.

When I narrate this proof to people, they are quick to ask "If we're talking about the response of the wall, shouldn't we only care about the wall's reference frame?" Well, no, because that privileges a reference frame. The laws of physics -- including the laws of the wall -- are valid in all reference frames, and an external observer shouldn't see the wall giving a response inconsistent with the laws of the wall. I mean, it's possible to have a wall that functions in the way demonstrated in the question in all reference frames.

This is a rather surprising result. If you're travelling at .99999999c, can't you just supply a bit of energy to go 4m/s faster? As we will see, it turns out the laws of dynamics also change in special relativity, and this "bit" of energy is infinite.

"Aha!" you say, "Maybe you can never choose an inertial reference frame travelling faster than light with respect to your reference frame, but what if you choose a reference frame travelling at 0.6c, then another reference frame travelling at 0.6c with respect to that reference frame. Then wouldn't this third reference frame be travelling at 1.2c with respect to us?" Again, it turns out that even the velocity addition formula is changed in special relativity.

When we say "velocity addition formula", we mean "co-ordinate transformation between reference frames in relative motion with each other", i.e. if an observer on the train moving at speed v relative to the ground measures the speed of something to be w, then what's the speed of that thing wrt the ground? That's the velocity addition we're talking about.

We're not talking about velocity addition within the same reference frame. If we see two light beams shining towards each other, we do see the space closing at speed 2c.

2. Relativity of simultaneity

We know from linear algebra that a linear transformation in $\mathbb{R}^n$ can be described fully by the images of $n$ linearly independent vectors. These images, written next to each other, form the matrix of the transformation in the basis comprised of these vectors. One such set of linearly independent vectors is the standard basis.

We're working towards an expression for the Lorentz transformation. To find out how the unit vectors in the txy and z axes transform, it is sufficient to find out how the axes themselves transform, and what the scale is on these transformed axes (e.g. the identity transformation and a scaling of two both leave the axes unchanged, but the scales on the transformed axes are different).

For readers with a reasonable knowledge of linear algebra: we know two sets of eigenvectors of the Lorentz transformation. The fact that these are not eigenvectors of the Galilean transformation is equivalent to the problem of Galilean transformation not respecting the invariance of the speed of light. The problem of special relativity is therefore equivalent to finding a matrix with these eigenvectors, with eigenvalues that respect some symmetry properties we will see later.)

We consider a general 1+1-dimensional spacetime diagram in the reference frame of $O$. Obviously, the t-axis is the worldline of our observer/the origin of our observer.

What, exactly is the x-axis? Well, any P-axis is essentially the set of points such that all co-ordinates except P are 0. The x-axis is the set of points such that t = 0.

In other words, the x-axis is what the observer regards as the present. If the x-axis were transformed in any way, then it would mean that the idea of what's the present and what's the past also depends on the observer. In general, any line parallel to the x-axis is a line of simultaneity (i.e. events that occur at the same point in time), and if the x-axis is transformed, the conception of simultaneity depends on the observer.

So it makes sense to study simultaneity in our quest to find the Lorentz transformation.

The relativity of simultaneity can be illustrated with the following thought experiment: suppose we have two sources of light, $S_1$ and $S_2$, which (in the reference frame of Observer O) release a pulse of light at the same instant $t=0$. How does Observer O know this? Well, he is situated at the midpoint of the two sources, and knows the distance between him and each source to be s, so when he sees the two pulses simultaneously at $t=s/c$, he knows that each pulse was released $s/c$ time earlier, i.e. at $t=0$.

Now consider another observer $O'$, moving parallel to the light coming from $S_1$ at speed $v$. It happens to be that at the instant the two pulses collide at the origin of $O$, $O'$ also crosses this point.

It is important to note that he observes the collision of the two pulses at the same instant as $O$ does. This occurs at a single event (a single point in spacetime), and all observers must agree on what happens at this event (this is from the principle of relativity).

However, what $O'$ disagrees with $O$ on is on the simultaneity of the release of the light pulses itself. Observer $O'$ considers himself to be an ordinary, stationary observer. He has seen the point of intersection running away from the light emanating from $S_2$ and towards the light emanating from $S_1$. Therefore light -- whose speed is $c$ -- emanating from $S_2$ has to catch up with the intersection point, the distance closing at a speed of $c-v$, while light emanating from $S_1$ meets the intersection point with the distance closing at a speed of $c+v$.

So for them to meet at the intersection point the same distance away from each source, $S_2$ must have released its pulse earlier than $S_1$. How much earlier? Well, let's not go there too fast -- we still don't know if distances themselves change in the reference frame, i.e. what the scale on the transformed x-axis is.

Being simultaneous to doesn't mean "see". To see something, light (or anything else) must travel from that event to the observer's worldline. For instance, if Betelgeuse were to go supernova today, we are not simultaneous with the event, which we calculate to have happened hundreds of 600 years ago.

You might wonder, then: what if the two events are causally connected? I.e. what if there is a link that ensures $S_2$ turns on in response to $S_1$ turning on? Well, it turns out that in such a case, the order of the two events is in fact preserved. We will see why later -- the reason has to do with the connection between causality and light cones.

3. Transformation of the x-axis

Let's think about how one would actually determine some event to be simultaneous to us right now. Well, obviously one must observe the event, for which we must detect the light coming from that event. Suppose we just observed the light we get from that event right now, at $t=0$. Could we say that gives us an event simultaneous with the present? Well, of course not. We know that the light traveled some distance to get here, so we're observing an event some time into the past. We would figure out how much into the past by determining the distance of the event from us. How would we do this? Well, we would reflect light off the object and see how long it takes to return.

So suppose we use this method to determine which event is simultaneous to us. Releasing a light ray now would be too late -- we would get an event in the future in the reflected ray. Instead, we should have released a light ray d/c seconds ago, and if the light ray returns d/c seconds into the future,  .,the object is d away from us, and the reflected ray shows the event simultaneous to us right now.

For instance,  if we shot a light ray at Betelgeuse in 1400, then the reflection we get in 2600 will be an image of how Betelgeuse looks today, in the year 2000 (on the scale of the universe, 17 years is no big deal -- for example, it is clearly an insufficient age to learn the difference between 2000 and 2017), because the star is 600 lightyears away.

(Note that the slope of a light ray on a spacetime diagram is always 1/c in some direction -- since we're using natural units, this is just a slope of 1.)

So here we have a general property -- and in fact a defining property -- of the x-axis: it is the set of all points such that if you sent a ray to bounce off the point $-a$ seconds ago, it will return to us $a$ seconds later.

Why is this useful? Well, if this reference frame were viewed in some other observer's reference frame, it would still be true (by the principle of relativity).

What do we mean?

Label the axes of this co-ordinate system as $t'$ and $x'$:

Then how would the points of spacetime this reference frame map into another reference frame? Well, perhaps something like this:

What do we want to know about this diagram? Well, the direction of the x' axis relative to the x-axis, i.e. the angle between them.

• The slopes of the blue lines (the paths of the light rays) are of magnitude one (because the speed of light is the same in this reference frame, too).
• AO = OD.
• The angle between t and the t' axes, which is simply a function of the velocity (you should be able to calculate this angle with respect to the velocity by now -- remember, it's just a distance-time graph).
How would you calculate angle BOC?

Note: this is simply a geometric problem at this point. I encourage you to try it out on your own.

Well, if you look at the diagram hard enough, you might have noticed that ABD is a right-angled triangle with right angle B. Additionally, AO = OD. Well, any triangle can be inscribed in a circle, and in the case of a right-angled triangle, AD becomes the diameter. Thus AO = OD = OB is the radius.

Then ODB is an isoceles triangle, and angle ODB = angle OBD. Meanwhile angles OCE and OEC are both pi/4, thus OED and OCB are equal as well (they are 3pi/4). Triangle OBC is thus congruent to triangle ODE (since two angles and a side are equal), and angle BOC = angle DOE. Since angle DOE = angle FOA, this means angle BOC = angle FOA.

This conclusion is tremendously significant: the x-axis is rotated by precisely the same angle as the t-axis is, towards each other. This creates a brilliant symmetry between space and time in relativity. We are also very close to our final expression for the Lorentz transformation.

By the way, this proof also illustrates the beauty of natural units: by choosing a system of units such that the slope of light's path is one, the angle ABD became a right angle, and we were able to exploit the property of an angle subtended by the diameter being 90 degrees.

Think about what happens in a reference frame moving at the speed of light. The axes then coincide.

To be fair, we already expected this. Since the speed of light is constant, the null vectors (vectors pointing along the path of light in spacetime, i.e. along the diagonals) are eigenvectors of the Lorentz transformation. The only way for this to be true when you have a linear transformation is for the x-axis to be tilted inwards by the same angle.

4. Scale on the transformed axes

We now know how the axes transform, and must determine the scale on each axis.

First of all, we may assume that the Lorentz transformation is linear. Why? Well, a linear transformation is one which ensures that all straight lines remain straight lines, and the origin remains fixed. The origin remains fixed in the Lorentz transformation by definition (since the observer is at the same spot -- translations are not considered), and lines must not turn into curves, since curves represent non-inertial reference frames and an inertial reference frame must be seen as inertial in all reference frames.

So how do our unit vectors look like? Well, we know the image of the x-unit vector is a multiple of the vector $\left[ {\begin{array}{*{20}{c}} 1 \\ v \end{array}} \right]$, where we're of course using natural units. The t-unit vector, meanwhile, is a multiple of the vector $\left[ {\begin{array}{*{20}{c}} v \\ 1 \end{array}} \right]$.

So the transformation matrix, which is itself a function of $v$, takes the form

$$L(v)=\left[ {\begin{array}{*{20}{c}} \alpha &{\beta v} \\ {\alpha v}&\beta \end{array}} \right]$$
For some constants $\alpha$ and $\beta$. Note that this is the transformation matrix which maps the original co-ordinate system to the new one -- the actual Lorentz transformation is a co-ordinate transformation, and thus the inverse of this matrix.

How would we find the values of $\alpha$ and $\beta$? Well, one way would be to consider the product $L(v)L(-v)$. Since you are simply boosting by a velocity of $v$ then boosting back by $-v$, this product must equal the identity matrix $I$. This is "Einstein's principle of velocity reciprocity". We impose this condition:

$$\begin{gathered} \left[ {\begin{array}{*{20}{c}} 1&0 \\ 0&1 \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} \alpha &{\beta v} \\ {\alpha v}&\beta \end{array}} \right]\left[ {\begin{array}{*{20}{c}} \alpha &{ - \beta v} \\ { - \alpha v}&\beta \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{\alpha ^2} - \alpha \beta {v^2}}&{{\beta ^2}v - \alpha \beta v} \\ {{\alpha ^2}v - \alpha \beta v}&{{\beta ^2} - \alpha \beta {v^2}} \end{array}} \right] \hfill \\ {\alpha ^2}v - \alpha \beta v = 0 = {\beta ^2}v - \alpha \beta v \Rightarrow {\alpha ^2} = \alpha \beta = {\beta ^2} \Rightarrow \alpha = \beta \hfill \\ {\alpha ^2} - \alpha \beta {v^2} = 1 = {\beta ^2} - \alpha \beta {v^2} \Rightarrow {\alpha ^2} = 1 + \alpha \beta {v^2} = {\beta ^2} \Rightarrow {\alpha ^2} = 1 + {\alpha ^2}{v^2} \hfill \\ \Rightarrow \alpha = \beta = \frac{1}{{\sqrt {1 - {v^2}} }} \hfill \\ \end{gathered}$$
We call this coefficient the "Lorentz factor", and denote it by $\gamma$. From linear algebra, we know then that the co-ordinates of any point can then be transformed into the reference frame $O'$ as follows:

$$\begin{gathered} \left[ {\begin{array}{*{20}{c}} {x'} \\ {t'} \end{array}} \right] = {L^{ - 1}}\left[ {\begin{array}{*{20}{c}} x \\ t \end{array}} \right] = {\gamma ^{ - 1}}{\left[ {\begin{array}{*{20}{c}} 1&v \\ v&1 \end{array}} \right]^{ - 1}}\left[ {\begin{array}{*{20}{c}} x \\ t \end{array}} \right] \\ = \sqrt {1 - {v^2}} \cdot \frac{1}{{1 - {v^2}}}\left[ {\begin{array}{*{20}{c}} 1&{ - v} \\ { - v}&1 \end{array}} \right]\left[ {\begin{array}{*{20}{c}} x \\ t \end{array}} \right] \\ = \frac{1}{{\sqrt {1 - {v^2}} }}\left[ {\begin{array}{*{20}{c}} 1&{ - v} \\ { - v}&1 \end{array}} \right]\left[ {\begin{array}{*{20}{c}} x \\ t \end{array}} \right] \\ = \gamma \left[ {\begin{array}{*{20}{c}} 1&{ - v} \\ { - v}&1 \end{array}} \right]\left[ {\begin{array}{*{20}{c}} x \\ t \end{array}} \right] \\ \end{gathered}$$
We may write this without matrices as:

$$\begin{gathered} x' = \gamma \left( {x - vt} \right) \\ t' = \gamma \left( {t - vx} \right) \\ \end{gathered}$$
Which updates the Galilean transformation discussed previously, which was $x'=x-vt,\ \ t' = t$.

How does this look without natural units? Well, first of all,

$$\gamma = \frac{1}{{\sqrt {1 - \frac{{{v^2}}}{{{c^2}}}} }}$$
And

$$\begin{gathered} x' = \gamma \left( {x - \frac{v}{c}ct} \right) \hfill \\ ct' = \gamma \left( {ct - \frac{v}{c}x} \right) \hfill \\ \end{gathered}$$
You can see why we prefer to set $c=1$, but this is also instructive -- it presents a symmetry between $x$ and $ct$, and $v/c$ is the important "ratio factor" between these dimensions.

The transformation we've been calling "Lorentz transformations" are actually Lorentz boosts. Lorentz transformations are a broader set of transformations which includes boosts as well as spatial rotations -- essentially all linear transformations under which special relativity is invariant. An even broader set, called the Poincaire transformations, is the set of all affine transformations under which special relativity is invariant, i.e. it includes translations. As we will learn, General Relativity is only invariant under Lorentz transformations, not translations.

We imposed the condition $L(v)L(-v)=L(0)$. Do you think one may impose, in general, that $L(v)L(w)=L(v+w)$? Why or why not? ... Answer is "no", because the velocity addition formula is not, in general, $v+w$.

5. Zero orthogonal action of the Lorentz transformation

Something we haven't considered so far is how a Lorentz boost treats spatial directions orthogonal to a Lorentz boost. We've been considering a Lorentz boost in the x-direction -- what happens to the y- and z- coordinates under this boost?

Well, turns out, the answer is nothing. The explanation for this is pretty simple: attach a paintbrush to a train and let it paint the walls of the tunnel as the train drives through. Now send another train in the opposite direction and attach a paintbrush to it at the same height. Neither paintbrush can be "higher" than the other -- the paintbrushes must overlap in all reference frames.