The joy of quaternions

Quaternions are a lovely topic that usually don't get much attention in the standard undergraduate and graduate textbooks on mathematics. In this post I'll try to go over the basic ideas and motivations of the theory in a hopefully illuminating way; let's get started.

The complex numbers $\mathbb{C}$ form a wonderful tool to describe geometry in a plane, since (among other things) complex multiplication describes at once both rotations and scalings in a neatly concise way. Hamilton's plan, which eventually led him to the discovery of quaternions, was to develop an analogous number system that would help describe geometry in three-dimensional space. So how might you do that? Naively, you might try to just extend the complex numbers and take numbers of the form \[ a + bi + cj \] where $a,b,c \in \mathbb{R}$, $i$ is the usual imaginary unit and $j$ is a new quantity which also satisfies $j^2 = -1$. To obtain a multiplication of these new numbers, essentially we just need to decide what $ij$ (and $ji$) should be. For that, we recall some inspiration from the complex numbers: one of the great things about them is that each nonzero complex number $z = a + bi$ has an inverse $z^{-1}$ so that $z z^{-1} = 1$. This inverse is found by defining the complex conjugate $\overline{z} = a - bi$ and noticing that \[ z \overline{z} = (a+bi)(a-bi) = a^2 + b^2 \] which (if $z \neq 0$) is a nonzero real number. Therefore \[ z^{-1} = \frac{\overline{z}}{a^2 + b^2} \] Attempting the same for one of our would-be numbers $q = a + bi + cj$, we might define $\overline{q} = a - bi - cj$ and hope that $q \overline{q} = a^2 + b^2 + c^2$. Well, let's try: \[ q \overline{q} = (a + bi + cj)(a - bi - cj) = a^2 + b^2 + c^2 - bc(ij+ji) \] So it seems we need to set $ij + ji = 0$, i.e. $ij = -ji$. While that's fine and dandy, it still doesn't tell us what $ij$ actually is, as an expression of the form $a + bi + cj$. But here's something cool: what's $(ij)^2$? Well, we compute easily: \[(ij)^2 = ijij = -i^2 j^2 = -1 \] What Hamilton eventually realized is that $ij$ needs to be an entirely new quantity, independent of $i$ and $j$: let's call it $k = ij$. We've just seen that $k^2 = -1$ also. So the quaternions are expressions of the form: \[ \mathbb{H} = \{a + bi + cj + dk\ |\ a,b,c,d \in \mathbb{R} \} \] To know how to multiply these numbers, all we need to know is that $i^2 = j^2 = k^2 = -1$, and $ij = -ji = k$. For example, we can thus deduce that \[ ik = i(ij) = -j \] and similarly $ki = j$, $kj = -i$ and $jk = i$. You might be worried it'll be a pain to memorize all of that, but worry not! It's easy with this picture:
If you're going along with the arrows, you don't get a sign: $ij = k$, $jk = i$ and $ki = j$. If you go against the arrows, you get a sign: $ji = -k$, $ik = -j$ and $kj = -i$.

Exercise: Convince yourself that the multiplication of quaternions is entirely determined by $i^2 = j^2 = k^2 = ijk = -1$ (the way it is usually presented).

Alright, so now we have quaternions (technically you should check those things like associativity etc. -- I'll leave those to the very dilligent reader). Crucially, we do obtain our sought-after dream: for a given quaternion $q = a + bi + cj + dk$, defining $\overline{q} = a - bi - cj - dk$ we have \[ q \overline{q} = a^2 + b^2 + c^2 + d^2 \] Therefore nonzero quaternions can be inverted using the same trick we did for complex numbers. Crucially, though, multiplication of quaternions is not commutative! We call the quaternions a division ring instead of a field because of that quirk (it's a real division algebra since we can multiply quaternions by real numbers).

Exercise: Check that quaternion multiplication plays well with the Euclidean norm: \[ |qq'| = |q||q'| \] for any $q, q' \in \mathbb{H}$. As a hint, recall that $q\overline{q} = |q|^2$.

By this point you might feel you've been swindled: weren't we supposed to get something to help us do three-dimensional geometry? How is a four-dimensional algebra of quaternions going to help? Rather amazingly, though, it does! For that, we introduce the notion of a pure quaternion (also called a vector quaternion), which is just a quaternion with no real part: something of the form $ai + bj + ck$. As a vector space, we can identify pure quaternions with $\mathbb{R}^3$.

Exercise: Given pure quaternions $u = ai + bj + ck$, $v = di + ej + fk$, check that \[uv = -((a, b, c) \cdot (d, e, f)) + ((a,b,c) \times (d,e,f)) \cdot (i,j,k) \] Thus pure quaternion multiplication encodes both the dot and the cross product of three-dimensional vectors. Quaternions, discovered in 1843, actually significantly predate the notion of dot and cross product, introduced by Gibbs in 1901.
Exercise: When is a pure quaternion a square root of $-1$?

Rotations

Of course what we're really here for is the fact quaternions can represent rotations. So how does that work? Let us introduce the first major player: \[ \text{Sp}(1) = \{q \in \mathbb{H}\ |\ q \overline{q} = 1\} \] This is the group of unit quaternions; it is a Lie group which is topologically the sphere $S^3$. If you think about it, its Lie algebra is actually the space of pure quaternions! So we denote that by $\text{sp}(1)$ (if you don't understand these Lie theory remarks, just ignore them -- it won't be essential for what follows).

Exercise: Check that if $q \in \text{Sp}(1)$ is a unit quaternion and $u \in \text{sp}(1)$ is a pure quaternion then $\text{Ad}_q(u) = quq^{-1}$ is also a pure quaternion (this is the adjoint representation).

Therefore a unit quaternion $q \in \text{Sp}(1)$ can act on $\mathbb{R}^3$ (identified with the space of pure quaternions via the basis $i,j,k$) by conjugation: $u \mapsto quq^{-1}$. For a fixed $q$, this is a linear map on $u$, and we easily check it preserves Euclidean norm: \[ |quq^{-1}| = |q| |u| |q|^{-1} = |u| \] We see then that $\text{Ad}_q(u) = quq^{-1}$ is an orthogonal linear map of $\mathbb{R}^3$. The assignment $q \mapsto \text{Ad}_q$ is thus a morphism $\Phi : \text{Sp}(1) \to O(3)$. Since $\text{Sp}(1)$ is connected (being a sphere), it must actually be a morphism \[\Phi : \text{Sp}(1) \to \text{SO}(3)\] Rotations! The first question is: what is $\text{ker}\ \Phi$? Well, if conjugation by $q$ is the identity on pure quaternions, that means $q$ commutes with $i, j, k$. But since $q$ obviously commutes with $1$, it follows $q$ commutes with all quaternions. You can check as an exercise that the center of the quaternion algebra is $\mathbb{R}$ (seen as the multiples of $1$). Hence $q$ is real; since furthermore $q$ has norm 1, it follows that \[ \text{ker}\ \Phi = \{\pm 1\} \] The harder part is to show that $\Phi$ is surjective, so that it can indeed represent all rotations. Since it is such an important and fascinating fact, I will offer you a couple different proofs.

The first proof is probably my favorite, because it yields a really cool and useful formula to determine which quaternion represents a desired rotation. We begin by observing that, given a pure quaternion $u$, it makes perfect sense to take the exponential $e^u$. You can view this as being defined by the usual power series for $\text{exp}$, or alternatively view it as the exponential map $\text{sp}(1) \to \text{Sp}(1)$ from a Lie algebra to its Lie group. If $u$ is a unit pure quaternion, then $u^2 = -1$ (giving away the answer to a previous exercise); the same power series argument as usual will prove that, for any $\theta \in \mathbb{R}$, \[ e^{\theta u} = \cos(\theta) + u \sin(\theta) \] Fact: Interpreting the unit pure quaternion $u$ as a unit vector in $\mathbb{R}^3$ (thus specifying an oriented axis), $\Phi(e^{\theta u})$ is a rotation of angle $2\theta$ about the axis $u$.
Proof: You can do it by just cranking out the computation; it's done here. $\square$

Undoubtedly the reader is disappointed by this argument: surely there must be a conceptual proof, one that explains why the rotation angle gets doubled. So let me sketch one more conceptual (but more complicated) argument, for those familiar with some basic Lie theory. Denote by $l_x, l_y, l_z$ the usual generators of the Lie algebra $\text{so}(3)$. The rotation about $u = (u_x, u_y, u_z)$ of angle $\omega$ is given by $\exp(\omega (u_x l_x + u_y l_y + u_z l_z))$. So the whole thing follows from the fact the exponential map commutes with Lie morphisms, provided we can explain why the induced map \[ \phi : \text{sp}(1) \to \text{so}(3) \] maps $i, j, k$ respectively to $2l_x, 2l_y, 2l_z$.

Let me offer a plausability argument: the commutation relations in $\text{sp}(1)$ are $[i, j] = 2k$ and cyclic permutations thereof. However, in $\text{so}(3)$ we have the relations $[l_x, l_y] = l_z$ and cyclic permutations thereof. Thus if $\phi : \text{sp}(1) \to \text{so}(3)$ is going to preserve brackets, it'll need to have a doubling effect of this sort.

Exercise (harder than the previous ones): Recalling that the adjoint action of $\text{SO}(3)$ on $\text{so}(3)$ corresponds, under the usual identification $\text{so}(3) = \mathbb{R}^3$, to the defining representation of $\text{SO}(3)$ on $\mathbb{R}^3$ (via rotations), elevate the above plausability argument to a formal proof. (See here for my solution.)

Let me now give a more geometrically vivid proof (of the fact $\Phi : \text{Sp}(1) \to \text{SO}(3)$ is surjective). Consider $u \in \text{sp}(1)$ a unit pure quaternion (recall that $u^2 = -1$). We ask: what rotation is $\Phi(u)$? First, observe that \[ \text{Ad}_u(u) = uuu^{-1} = u \] Since $u$ is fixed, this must be a rotation about $u$. What angle, though? Well, if we rotate some pure quaternion $v \in \text{sp}(1)$ twice, we get \[ v \to u(uvu^{-1})u^{-1} = u^2 v (u^2)^{-1} = v\] since $u^2 = -1$. Hence double this rotation is the identity, so that $\Phi(u)$ is a 180º rotation about $u$ (well, it could be the identity, but it's not -- why?).

Therefore quaternions can represent 180º rotations about any axis. Now here's a wonderful

Geometrical fact: If you compose two 180º rotations (about two different axes), you get a rotation about the perpendicular axis, of angle double the angle between the original axes.
Proof: Try it! For instance, take a book and rotate it 180º about the x axis; then rotate it 180º about the y axis. You'll see this is equivalent to just rotating 180º (which is twice the 90º between the x and y axes) about the z axis. $\square$

This fact implies that any rotation at all can be expressed as a product of two 180º rotations. Since quaternions can represent those, they can represent all rotations. This is quite similar to the general proof that $\text{Spin}(n) \to \text{SO}(n)$ is surjective, if you know what that means.

Exercise: Come up with a third proof (and tell me about it!).

The isomorphism $\text{Sp}(1) \cong \text{SU}(2)$

To end our story, let's relate this to $\text{SU}(2)$, the group of complex $2 \times 2$ unitary matrices of determinant 1. Turns out that $\text{Sp}(1) \cong \text{SU}(2)$. Why is that? This is related to how you can encode a complex number $a + bi$ as the $2 \times 2$ real matrix \[ \begin{bmatrix} a & -b \\ b & a \end{bmatrix} \] Note how cool that the determinant of this matrix is $a^2 + b^2$, the norm-squared of our complex number. Also, transposition of the matrix corresponds to complex conjugation. What we are doing is representing $1$ as the identity matrix, and $i$ as the anti-symmetric matrix \[ \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \] We can do a very analogous thing for quaternions! We now represent them as $2 \times 2$ complex matrices. We'll again represent $1$ as the identity matrix; but now $i, j, k$ will be represented by anti-Hermitian matrices. Well, the Pauli matrices $\sigma_x, \sigma_y, \sigma_z$ form (along with the identity) a basis for the space of Hermitian matrices. To get anti-Hermitian, we just multiply by $-i$ (could be by $i$ too, the minus is a convention), and so represent $i,j,k$ as $-i \sigma_x, -i \sigma_y, -i \sigma_z$ respectively. Explicitly, this means \[ a + bi + cj + dk \mapsto \begin{bmatrix} a-di & -c-bi \\ c-bi & a+di \end{bmatrix} \] (We couldn't have picked any set of 3 anti-Hermitian matrices to represent $i, j, k$, of course; the Pauli matrices just happen to have the right relations for this to work. I'm not sure I have a deep, non-circular explanation for this.) Note that again the determinant is the norm-squared of the quaternion. Also, the conjugate transpose of this matrix corresponds to quaternion conjugation. With that in mind, it's not hard to check that $\text{Sp}(1)$ corresponds to $\text{SU}(2)$, so that $\text{Sp}(1) \cong \text{SU}(2)$.

This is all just the beginning of a beautiful story, involving the Dirac belt trick, spinors, and the difference between fermions and bosons. But that's a story for another time.

Comments

Popular posts from this blog

Classical field theory and the Euler-Lagrange equations

Green, Gauss, Stokes: the classical theorems of integral calculus (part I)