Classical field theory and the Euler-Lagrange equations

In this post I'll try to briefly give a more-precise-than-you-usually-find explanation of the Euler-Lagrange equations for classical fields. This is a point where textbooks tend to be sloppy so hopefully this post can clear things up somewhat, even though I will ignore the analytical details of convergence and smoothness. In a follow-up post I plan to explain Noether's theorem for fields.

Let's begin with the simplest example: a real scalar field. That is simply a function $\phi : \mathbb{R}^{n+1} \to \mathbb{R}$ (assumed to be sufficiently smooth and to decay sufficiently fast at infinity). Here $\mathbb{R}^{n+1}$ stands for one dimension of time and $n$ dimensions of space. We write the coordinates on $\mathbb{R}^{n+1}$ as $x^\mu$, where $\mu$ ranges from $0$ to $n$, with $\mu = 0$ being the time dimension.

One obtains the equations of motion for $\phi$ from a Lagrangian density $\mathcal{L}$. In this case, the Lagrangian density is a function[1] $\mathcal{L} : \mathbb{R} \times \mathbb{R}^{n+1} \to \mathbb{R}$, which we think of as $\mathcal{L}(\phi, \partial_{\mu} \phi)$. When we write $\mathcal{L}$ as a function of $\phi$ and $\partial_\mu \phi$, these are thought of as simply labels for the arguments of $\mathcal{L}$. There is some notational abuse because $\phi$ and $\partial_\mu \phi$ can show up several times in a formula, sometimes being a label for an argument of $\mathcal{L}$, and other times being an actual function $\phi$ and the actual derivatives $\partial_\mu \phi$ of that function. This is analogous to how we're used to writing $f = f(x, y)$ and then referring to objects like $\partial f/\partial x$. Here $x$ can stand for an actual number, or it can stand for a label for the first argument of $f$. Both usages frequently appear in formulas, such as if we write $\partial f/\partial x = x^2 + y$, where on the left $x$ is merely a label, and on the right it is an actual number.

Given a Lagrangian density $\mathcal{L}$ (which we assume sufficiently reasonable in terms of smoothness and decay) we can define the action functional \[ S[\phi] = \int_{\mathbb{R}^{n+1}} \mathcal{L}(\phi(x), \partial_\mu \phi(x))\ d^{n+1} x \] In this formula $\phi$ is an actual function $\phi : \mathbb{R}^{n+1} \to \mathbb{R}$, and we are evaluating $\mathcal{L}$ at the point \[ (\phi(x), (\partial_0 \phi)(x), \ldots, (\partial_n \phi)(x)) \in \mathbb{R} \times \mathbb{R}^{n+1}\] We also assume that the integral converges, so $S$ is well-defined (on some space of well-behaved real scalar fields which we won't specify).

Given this functional $S$ we can compute its functional derivative. The functional derivative (with respect to some field $\phi$) is defined to be a function $\delta S/\delta \phi : \mathbb{R}^{n+1} \to \mathbb{R}$ which satisfies \[ \int_{\mathbb{R}^{n+1}} \frac{\delta S}{\delta \phi}(x) \psi(x)\ d^{n+1}x = \frac{d}{d\epsilon}\Bigr|_{\epsilon=0} S[\phi + \epsilon \psi] \] for all (well-behaved) $\psi : \mathbb{R}^{n+1} \to \mathbb{R}$. We will assume this exists; see here for some more information. The above quantity is denoted $\delta S[\phi; \psi]$. In practice we will assume the variation $\psi$ has compact support.

The idea is that if $\phi$ is to describe the behavior of a field in nature, then it should be stationary with respect to the action, which means that $\delta S[\phi; \psi] = 0$ for all (well-behaved, and compactly supported) $\psi$. We now calculate what condition this imposes on $\phi$. To begin, we have \[ \delta S[\phi; \psi] = \int_{\mathbb{R}^{n+1}} \frac{d}{d\epsilon}\Bigr|_{\epsilon=0} \mathcal{L}(\phi + \epsilon \psi, \partial_\mu \phi + \epsilon \partial_\mu \psi)\ d^{n+1}x \] Here we moved the derivative inside the integral (valid because $\mathcal{L}$ is reasonable) and we used the linearity of $\partial_\mu$ to re-write $\partial_\mu(\phi + \epsilon \psi) = \partial_\mu \phi + \epsilon \partial_\mu \psi$ (recall that now $\partial_\mu$ is an actual derivative, and not a label). We now compute the derivative using the chain rule: \[ \frac{d}{d\epsilon}\Bigr|_{\epsilon=0} \mathcal{L}(\phi + \epsilon \psi, \partial_\mu \phi + \epsilon \partial_\mu \psi) = \frac{\partial \mathcal{L}}{\partial \phi} \psi + \frac{\partial \mathcal{L}}{\partial(\partial_\mu \phi)} \partial_\mu \psi \] Here $\partial \mathcal{L}/\partial \phi$ is the derivative of $\mathcal{L}$ with respect to its first argument (so $\phi$ is just a label), and similarly for each $\partial \mathcal{L}/\partial(\partial_\mu \phi)$. The derivatives of $\mathcal{L}$ are to be evaluated at the point $(\phi(x), (\partial_\mu \phi)(x))$ in $\mathbb{R} \times \mathbb{R}^{n+1}$ (where now $\phi$ is the actual function). We are also using the Einstein summation convention where repeated indices are summed over (i.e. the second term on the right-hand side is summed over all $\mu$).

The product rule (or, more precisely, summing a product rule for each $\mu$) tells us that [2] \[ \partial_\mu \left( \frac{\partial \mathcal{L}}{\partial(\partial_\mu \phi)} \psi \right) = \partial_\mu \left( \frac{\partial \mathcal{L}}{\partial(\partial_\mu \phi)} \right) \psi + \frac{\partial \mathcal{L}}{\partial(\partial_\mu \phi)} \partial_\mu \psi \] If we integrate the left-hand side of this equation over a large enough ball, by the divergence theorem it can be re-written as an integral of (the vector field with $\mu$ component) $\partial\mathcal{L}/\partial(\partial_\mu \phi)\ \psi$ over the boundary, which will vanish since $\psi$ is compactly supported. Therefore, integrating both sides over all of spacetime $\mathbb{R}^{n+1}$, the left-hand side vanishes and we find \[ \int_{\mathbb{R}^{n+1}} \frac{\partial \mathcal{L}}{\partial(\partial_\mu \phi)} \partial_\mu \psi\ d^{n+1}x = \int_{\mathbb{R}^{n+1}} - \partial_\mu \left( \frac{\partial \mathcal{L}}{\partial(\partial_\mu \phi)} \right) \psi\ d^{n+1}x\]
Therefore we can conclude \[ \delta S[\phi; \psi] = \int_{\mathbb{R}^{n+1}} \left( \frac{\partial \mathcal{L}}{\partial \phi} - \partial_\mu \left( \frac{\partial \mathcal{L}}{\partial(\partial_\mu \phi)} \right) \right) \psi\ d^{n+1}x \] This is zero for all smooth compactly supported $\psi$ if and only if \[ \frac{\partial \mathcal{L}}{\partial \phi} = \partial_\mu \left( \frac{\partial \mathcal{L}}{\partial(\partial_\mu \phi)} \right) \] which is the Euler-Lagrange equation for a real scalar field (in this case it is a single equation).

For a simple example, we take $n = 3$, and write the spacetime coordinates as $t, x, y, z$. We take the Lagrangian density \[ \mathcal{L}(\phi, \partial_t \phi, \partial_x \phi, \partial_y \phi, \partial_z \phi) = \frac{1}{2} \left( \left( \frac{\partial \phi}{\partial t} \right)^2 - \left( \frac{\partial \phi}{\partial x} \right)^2 - \left( \frac{\partial \phi}{\partial y} \right)^2 - \left( \frac{\partial \phi}{\partial z} \right)^2 \right) \] The Euler-Lagrange equation then reads \[ \frac{\partial^2 \phi}{\partial t^2} - \frac{\partial^2 \phi}{\partial x^2} - \frac{\partial^2 \phi}{\partial y^2} - \frac{\partial^2 \phi}{\partial z^2} = 0 \] i.e. $\phi$ satisfies the wave equation. Note that by using the Minkowski metric $\eta$ the Lagrangian density can be written as $(1/2) \eta^{\mu \nu} \partial_\mu \phi \partial_\nu \phi$. Subtracting $(\mu^2/2) \phi^2$ from this Lagrangian leads one to the Klein-Gordon equation instead.

And now for a general field  

This discussion can be generalized to tensor fields, where instead of a function $\phi : \mathbb{R}^{n+1} \to \mathbb{R}$ we have some tensor (field) on $\mathbb{R}^{n+1}$. For instance, suppose $A$ is a contravariant tensor field of rank 1 (also known as a $1$-form) which we write in coordinates as $A = A_\mu dx^\mu$. We can think of each component $A_\mu$ as a real scalar field, and now we have a Lagrangian density $\mathcal{L}(A_\mu, \partial_\nu A_\mu)$ which takes $(n+1) + (n+1)(n+1)$ inputs and produces a real number as output.

To derive the Euler-Lagrange equations, we must vary $A$ with respect to some $\psi$ and find the conditions on $A$ for that to be zero for all smooth compactly supported 1-forms $\psi$. To do this, we can choose $\psi$ with only one component $\psi_\mu$ non-zero. Then everything will go through as before and we will end up with one Euler-Lagrange equation for each component $\mu$: \[ \frac{\partial \mathcal{L}}{\partial A_\mu} = \partial_\nu \left( \frac{\partial \mathcal{L}}{\partial (\partial_\nu A_\mu)} \right) \] The situation is analogous for an arbitrary tensor: we get one Euler-Lagrange equation for each component.

One important example is that of a contravariant rank $1$ tensor field $A_{\mu}$. We define the anti-symmetric contravariant $2$-tensor $F_{\mu \nu}$ by \[ F_{\mu \nu} = \partial_\mu A_\nu - \partial_\nu A_\nu \] In other words, $F = dA$. Then taking the Lagrangian density $\mathcal{L} = (-1/4) F_{\mu \nu} F^{\mu \nu}$ (here we are using the Minkowski metric to raise indices), one obtains (from the Euler-Lagrange equations) exactly the vacuum Maxwell equations of electrogmanetism. I might do this in detail in a future post, but, for now, enough.

Notes

[1] What I've defined is really the expression of the Lagrangian density in the given coordinates, so that if we change the coordinate system on spacetime, we'd also have to change the function $\mathcal{L}$. It's philosophically better (but much more complicated) to define $\mathcal{L}$ as a function on a jet bundle to handle this issue; see this paper for instance.

[2] Note that, to compute (for $\mu$ fixed) $\partial_\mu (\partial \mathcal{L}/\partial (\partial_\mu \phi))$, one symbolically differentiates $\mathcal{L}$ with respect to the label $\partial_\mu \phi$, then plugs in the actual function $\phi$ with its derivatives $\partial_\mu \phi$ into that formula (since we use the same notation for labels and functions, this happens seamlessly), and then differentiates the resulting expression with respect to $x^{\mu}$.

Comments

Popular posts from this blog

The joy of quaternions

Green, Gauss, Stokes: the classical theorems of integral calculus (part I)