Special relativity I: spacetime diagrams and the invariance of the interval

In this post I will explain, to my liking, (parts of) two papers which I highly recommend:
  • Mermin, An introduction to space–time diagrams; American Journal of Physics 65, 476 (1997)
  • Brill, D., Jacobson, T. Spacetime and Euclidean geometry. Gen Relativ Gravit 38, 643–651 (2006).

Both of them aim to give purely geometric derivations of the essential facts of special relativity, by the use of spacetime diagrams. Mermin's paper is particularly helpful to elucidate the meaning of spacetime diagrams, and that is where we start.

The main postulate of relativity is that all laws of physics look the same in every inertial frame of reference. A useful, but not entirely necessary, second postulate is that there is a thing called 'light' which is easy to produce and which travels in every (inertial) frame of reference at exactly the same speed $c$.

An event is just a name for a point in spacetime. It is something, or perhaps nothing, that happens at a particular location at a particular moment in time. Two different observers might assign different coordinates to events, but they will agree on what the events are. For instance: they may disagree on whether two beams of light reach two different points at the same time (the relativity of simultaneity), but they cannot disagree on whether two beams of light converge onto the same point at the same time! For the latter constitutes an event, which is an absolute notion.

A spacetime diagram is a useful graphical representation of events. To make our lives easier, we will think of a 2-dimensional spacetime (so that we have only one dimension of space), as opposed to our familiar 4-dimensional one. This is not quite enough to describe all interesting relativistic phenomena (see e.g. Thomas precession), but it's plenty enough for a good start. Let us agree, henceforth, that all observers use a particular system of units in which the speed of light is precisely $c = 1$. We will imagine our hypothetical universe has two observers, Alice and Bob, who are moving at uniform speed $v$ with respect to one another.

Alice's point of view

Imagine a spacetime diagram as an infinite sheet of paper. Alice will use this paper to depict events, so that points in the sheet of paper correspond to events in spacetime; there is much freedom in choosing how this representation works, as we will see. The set of all events which happen at a particular location in space will correspond, in the diagram, to a straight (but not necessarily vertical) line (why a straight line and not something curved? This is a convention, the possibility of which is justified by the translational symmetry of spacetime.) This straight line is called a line of constant position. Two different lines of constant position must be parallel, for a point of intersection would correspond to an event that happens at two different locations in space, contradicting the nature of events. Alice will space these lines according to some scale factor $\lambda$, such that two lines of constant position whose distance within the diagram is $d$ correspond to events that happen at locations of spatial distance $d/\lambda$ (in her frame of reference).
Each line of constant position represents all events that happen at a particular location in space. The spatial distance between these locations, for the lines shown in the diagram, is $1$

Analogously, the set of all events which occur at a particular moment in time will correspond, in the diagram, to a straight line called a line of constant time. Alice is free to choose the angle $\theta$ these lines make with the lines of constant position, and will (by convention) use the same factor $\lambda$ to convert actual distances in time to distances within the diagram (so that two lines of constant time whose distance within the diagram is $d$ correspond to events that happen at moments $d/\lambda$ apart in time). Each line of constant time crosses each line of constant position at precisely one event, which happens at that time and that location.
A portion of Alice's grid of unit spatial and temporal distances

Now suppose there are two points $p, q$ in a line of constant time which are at distance $d$ in the diagram. They correspond to two events that happen at the same time and different locations; what is the actual distance $D$ between these locations? To see this, draw lines of constant position through $p$ and $q$:
Then $D = d'/\lambda$ where $d'$ is the distance in the diagram between the two lines. Since $d'/d = \sin(\theta)$, it follows that \[ D = \frac{d \sin(\theta)}{\lambda} = \frac{d}{\mu} \] where $\mu = \lambda/\sin(\theta)$ is another scaling factor. This same scaling factor also works for lines of constant position: if $p, q$ are at distance $d$ in a line of constant position, they represent events (at the same location) which are a distance $d/\mu$ in time.

Let us now bring in light. Suppose a beam of light (which we draw as a dashed line) travels between points $p, q$ in one unit of time. Draw lines of constant position and time through $p$ and $q$:
This is what Mermin calls a unit rhombus

Since these events are one unit of time and one unit of space apart (due to $c = 1$), the diagram figure is a rhombus (i.e. a quadrilateral with all sides of equal length) whose sides have length $\mu$. Therefore the light line bisects the lines of constant position and constant time. As a consequence, it follows that light lines meet each other at right angles: indeed, the other light line through $p$ (not drawn) bisects the angle $\pi - \theta$, so that the angle between the light lines is \[ \frac{\pi - \theta}{2} + \frac{\theta}{2} = \frac{\pi}{2} \]

Bob's point of view

Now Bob, who is moving at uniform speed $v$ with respect to Alice, (metaphorically) comes in and looks over her diagram. Since each point in Alice's diagram already corresponds to a unique event in spacetime, Bob has no freedom in how to relabel this diagram according to his notions of time and space. Since he moves at uniform velocity with respect to Alice, for a given position (in Bob's frame) the set of all events occuring at that position in space will form a straight line in the diagram. These are Bob's lines of constant position, which are tilted with respect to Alice's lines of constant position by an angle $\alpha$ such that $\tan(\alpha) = v$.

It may be tempting to immediately draw Bob's lines of constant time so as to make the angle with the lines of constant position be bisected by light lines. However, recall by this point Bob has no freedom! Instead, we must prove his lines of constant time satisfy this property. This is done via a very standard argument in relativity, which goes as follows. Suppose Bob is standing in the middle of a train and at point $p$ flashes beams of light towards each end $q, r$ of the train, which are then reflected back to him at point $s$. Let $t$ be the intersection of his line of constant position with the line $qr$.
The vertical-ish lines are Bob's lines of constant position. The outer lines represent the two ends of the train, and the middle line represents Bob's position.

Since he is standing at the middle of the train and the speed of light is the same in both directions, he considers $q$ and $r$ to be simultaneous events, so that the line $qr$ is a line of constant time. Note that $pqsr$ is a rectangle with $qr$ and $ps$ its diagonals. The angle that the leftmost line of constant position makes with the light line is the same as $\angle tpr$, which is the same as $\angle tqs$ due to these being congruent triangles. Thus indeed the light line $qs$ bisects the angle between Bob's constant position and constant time lines.

Note then that Bob's lines of constant time are not the same as Alice's (unless Bob is stationary with respect to Alice). This means that events which Alice considers to happen simultaneously are considered by Bob to happen at different moments in time. This phenomenon is known as the relativity of simultaneity, and lies at the heart of many of special relativity's apparent paradoxes, a famous example being the Ladder paradox.

Let now $\lambda_B$ be Bob's scaling factor for distances between lines of constant position (which is forced upon him). This scaling factor need not equal Alice's $\lambda_A$. If $\lambda'_B$ is Bob's scaling factor for distances between lines of constant time, we must see $\lambda'_B = \lambda_B$ (as it happens for Alice). Let a beam of light be emitted at a point $p$ and absorbed one unit of Bob's time later at point $q$. We draw lines of constant position and time through $p$ and $q$ to form a parallelogram:

If $\theta_B$ is the angle between Bob's lines of constant position and constant time, then the vertical-ish sides of the parallelogram are equal to $\mu'_B = \lambda'_B/\sin(\theta_B)$, and the horizontal-ish sides are equal to $\mu_B = \lambda_B/\sin(\theta_B)$. Since the diagonal of this parallelogram bisects the angle between its sides, it follows that it must be a rhombus, and hence $\mu_B = \mu'_B$, so that $\lambda'_B = \lambda_B$ as was to be shown.

We thus conclude that Bob's labelings of the Alice's diagram will work exactly the same way as if he had drawn the diagram himself to begin with, so that the procedure for drawing spacetime diagrams is correctly observer-independent. In his paper, Mermin goes on to establish a relationship between the scaling factors: \[ \lambda_A \mu_A = \lambda_B \mu_B \] From this relationship he extracts the invariance of the spacetime interval $(\Delta t)^2 - (\Delta x)^2$. I find his proofs a bit technical and confusing, so now is a good time to switch over to Brill and Jacobson's paper.

The invariance of the interval

Let us agree to orient our spacetime diagrams in such a way so that the light lines make an angle of $45^\circ$ with the vertical direction, and so that the forward direction in time is the upwards direction in the diagram (as we have been doing). Any line whose angle with the vertical direction is less than $45^\circ$ may represent a line of constant position for some inertial observer (e.g. an observer who travels along such a line). Such lines are said to be timelike.

Reciprocally, any line whose angle with the horizontal direction is less than $45^\circ$ may represent a line of constant time for some inertial observer (e.g. the observer who travels along a line symmetric to the given one with respect to the diagonal). Such lines are said to be spacelike. The diagonal lines, which represent possible trajectories of light, are said to be lightlike (or some times null).

Given some timelike segment $pq$, its proper time $(pq)_m$ (the 'm' being for Minkowski) is defined to be the distance in time between $p$ and $q$ with respect to an observer for whom the line $pq$ is a line of constant position (e.g. an observer who travels along $pq$). In other words, it is the temporal distance measured by an observer from whom both events happen at the same point in space. Since all such observers are stationary with respect to one another, this concept is well-defined. We denote by $(pq)_e$ the usual Euclidean length of the segment (within the diagram). Therefore \[ (pq)_e = \mu (pq)_m \] where $\mu$ is the scaling factor for the aforementioned observer.

Similarly, given some spacelike segment $p'q'$, its proper length is defined to be the distance in space between $p'$ and $q'$ with respect to an observer for whom the line $p'q'$ is a line of constant time (i.e. for whom both events happen at the same moment in time). Again one has $(p'q')_e = \mu (p'q')_m$ for such an observer.

The interval between events $p$ and $q$ is defined as $(\Delta t)^2 - (\Delta x)^2$, where $\Delta t$ (resp. $\Delta x$) is the temporal (resp. spatial) distance between $p$ and $q$, in some reference frame; we will show that this is the same for all reference frames. Note that, with respect to an observer for whom $pq$ is a line of constant position, one has $(\Delta t)^2 - (\Delta x)^2 = (pq)_m^2$. Similarly, with respect to an observer for whom $pq$ is a line of constant time, one has $(\Delta t)^2 - (\Delta x)^2 = -(pq)_m^2$.

Let now $pq$ be some timelike segment. We may construct upon this segment a rhombus whose diagonals are light lines; Brill and Jacobson call this a Minkowski square. There are actually two possibilities for how to do this (depending on whether the light beams from $pq$ go to the left or to the right), but they are congruent.
If Alice is an observer for which the line $pq$ is a line of constant position (and thus the line $pp'$ is of constant time), then the (Euclidean) area of this rhombus, within the diagram, is \[ (pq)_e^2 \sin(\theta_A) = (pq)_m^2 \mu_A^2 \sin(\theta_A) = (pq)_m^2 \frac{\lambda_A^2}{\sin(\theta_A)} = (pq)_m^2 \lambda_A \mu_A \] Hence the fact that the square of the proper time $(pq)_m^2$ is proportional to the area of the Minkowski square built upon $pq$ is equivalent to Mermin's proposition that the product of the scaling factors $\lambda \mu$ is equal for all observers. We will prove this fact following Brill and Jacobson. But first, a lemma.

Suppose Alice and Bob meet at a point $p$ while traveling at some uniform speed with respect to one another. Afterwards, at point $q$, Alice emits a light signal which Bob receives at $r'$; similarly, at point $q'$ Bob emits a light signal which Alice receives at $r$.
The light lines are supposed to be perpendicular
If $(pq)_m = (pq')_m$, so that Alice and Bob each wait the same amount of time (with respect to their individual clocks) before sending the signal, then we must have $(pr)_m = (pr')_m$ (i.e. both will take equally long to receive their signal), since the situation is completely symmetric. If, instead, one had $(pq)_m = 2(pq')_m$, then we must have $(pr')_m = 2(pr)_m$. Indeed, we can imagine that Alice and Bob actually send two signals, at equally spaced time intervals of $(pq')_m$. Then they each also receive signals at equally spaced intervals of $(pr)_m$, again by symmetry, so that Bob waits $2(pr)_m$ before getting the second signal, which is the true one. In general (and this is our lemma), we have \[ \frac{(pr')_m}{(pr)_m} = \frac{(pq)_m}{(pq')_m} \]

We now show that the squared proper time $(pq)_m^2$ of a given timelike segment $pq$ is proportional to the area of the Minkowski square built upon it (with proportionality constant $\lambda \mu$). First we observe that one may construct upon $pq$ a triangle $pqr$ with the two other sides being lightlike (again, there are two congruent possibilities for this triangle):
The triangle $pqr$ is a null triangle


The Minkowski square on $pq$ is composed of four congruent null triangles, so that its area is four times that of the null triangle $pqr$. Therefore it suffices to show that $(pq)_m^2$ is proportional to the area of the null triangle built upon $pq$.

Consider then two timelike segments (which, by translation invariance, we may suppose have a common origin) $pq$ and $pq'$, and let us suppose at first that there is a light line connecting $q'$ to $q$:
The triangle $pqs$ is a null triangle for $pq$, and $pq's$ a null triangle for $pq'$. Let $A$ be the area of $pqs$, and $A'$ the area of $pq's$. We wish to show that \[ \frac{A}{A'} = \frac{(pq)_m^2}{(pq')_m^2} \] Note that $A = (1/2) (ps)_e (qs)_e$, and $A' = (1/2) (ps)_e (q's)_e$. By Thales's theorem we know (since the light lines are parallel) that \[ \frac{A}{A'} = \frac{(qs)_e}{(q's)_e} = \frac{(pq)_e}{(pr)_e} = \frac{(pq)_m}{(pr)_m} \] where to obtain the last equality we multiply top and bottom by the scaling factor $\mu$ corresponding to the line $pq$. By the previous lemma we know \[ \frac{(pq)_m}{(pq')_m} = \frac{(pq')_m}{(pr)_m} \] so that $(pq')_m^2 = (pq)_m (pr)_m$. Hence we conclude \[ \frac{(pq)_m^2}{(pq')_m^2} = \frac{(pq)_m}{(pr)_m} = \frac{A}{A'} \] as was to be shown. In general it may not be the case that $q'$ can be joined to $q$ by a light line. However, we may always scale $pq'$ by a factor $\alpha$ for this to be so; that will scale both $(pq')_m^2$ and $A'$ by $\alpha^2$, so that the proportionality is maintained.

We have thus shown that the product of the scaling factors $k = \lambda \mu$ is observer-independent, and that for a timelike segment $pq$ one has $A = k (pq)_m^2$ where $A$ is the area of the Minkowski square built upon $pq$. The same result is true for spacelike $pq$, with completely analogous proofs. Let us suppose, from now on, that the observers agree on choosing scaling factors such that $k = 1$.

We now come to our main point, which is to establish the formula $(pq)_m^2 = (\Delta t)^2 - (\Delta x)^2$ for a timelike segment $pq$. This fact is evocative of the Pythagorean theorem, and, indeed, Brill and Jacobson give a beautiful proof that mimics the common proof of the Pythagorean theorem by rearranging triangles within a square. Let us see how that goes.
Through the point $p$ Alice draws a line of constant position; the point $q'$ on that line is chosen so that the line $q'q$ is a line of constant time. This defines the triangle $1$, and we draw congruent triangles $2,3,4$ on the other sides of the Minkowski square on $pq$. We denote by $A$ the area of the Minkowski square; recall that we've shown $(pq)_m^2 = A$.

The temporal distance $\Delta t$ between $p$ and $q$ in Alice's frame of reference is given by \[ \mu_A \Delta t = (pq')_e \] Similarly the spatial distance $\Delta x$ between $p$ and $q$ in her frame is given by \[ \mu_A \Delta x = (q'q)_e \] Therefore the area $A_x$ of the Minkowski square on $q'q$ is \[ A_x = (q'q)_e^2 \sin(\theta_A) = (\Delta x)^2 \mu_A^2 \sin(\theta_A) = (\Delta x)^2 \] where we used $\mu_A^2 \sin(\theta_A) = \lambda_A \mu_A = 1$.

By sliding triangles $1$ and $3$ the former picture can be transformed into the following:
The bigger leg of the triangles has length $(pq')_e$, so that the area of the resulting Minkowski square is \[ A_t = (pq')_e^2 \sin(\theta_A) = (\Delta t)^2 \mu_A^2 \sin(\theta_A) = (\Delta t)^2 \] Since $A_t = A + A_x$, we can finally conclude \[ (pq)_m^2 = A = A_t - A_x = (\Delta t)^2 - (\Delta x)^2 \] as was to be shown.

The proof that, for spacelike segments $pq$, one has $(pq)_m^2 = (\Delta x)^2 - (\Delta t)^2$ is completely analagous.

Conclusion

Consider $M^2$ to be the the real vector space $\mathbb{R}^2$, but equipped with the Minkowski metric \[ \eta(v, w) = v_1 w_1 - v_2 w_2 \] $M^2$ is called the $(1+1)$-dimensional Minkowski spacetime. By fixing an origin (and an orientation of time and space), Alice can identify spacetime with $M^2$ using her coordinates $t, x$. We may characterize vectors $v \in M^2$ as:
  • timelike, if $\eta(v, v) > 0$
  • spacelike, if $\eta(v, v) < 0$
  • lightlike, if $\eta(v, v) = 0$

This matches our previous understanding: $v$ is timelike precisely when the segment from the origin to (the endpoint of) $v$ is timelike, and so on. If Bob fixes the same origin as Alice's, obtaining his own identification of spacetime with $M^2$ by his coordinates $t', x'$, these will differ by a linear transformation $L : M^2 \to M^2$ such that \[ \eta(L(v), L(v)) = \eta(v, v) \] for all $v \in M^2$, since that is the invariance of the interval we've just shown. It follows by the polarization identity that $L$ preserves the metric $\eta$. A linear transformation that preserves $\eta$ is called a Lorentz transformation, and the group of all such transformations, the Lorentz group, is denoted $O(1, 1)$ (there are natural generalizations to an $(n+1)$-dimensional spacetime, with $n$ dimensions of space, for which one has the Lorentz group $O(n, 1)$).

To obtain the familiar formula for Lorentz transformations, note $L$ must preserve the lines $t + x = 0$ and $t - x = 0$, since these are the lightlike vectors. Also, since the area of the unit rhombus (introduced in the beginning) is the product of the scaling factors, which is observer-independent (and in our convention equal to $1$) it follows $L$ must be area-preserving. Supposing Alice and Bob fixed the same orientation of space and time, we must have \[ \left\{ \begin{align*} t - x & = e^{\beta} (t' - x') \\ t + x & = e^{-\beta} (t' + x') \end{align*} \right. \] (To wit: each factor must be positive in order to maintain the orientations of space and time, so that we write them as exponentials. They must multiply to $1$ in order for $L$ to be area-preserving.) Hence we can solve to get \[ \left\{ \begin{align*} t & = \cosh(\beta)t' - \sinh(\beta)x' \\ x & = \cosh(\beta)x' - \sinh(\beta) t' \end{align*} \right. \] (see hyperbolic functions). This can be interpreted as a hyperbolic rotation, which I might explain in a future post; but enough for now.

Comments