Functional derivatives are very useful in physics, and they are not too difficult to manipulate; indeed, the rules are very similar to those for the "usual" derivative (of a function with respect to a variable). However, the latter has a straightforward visual interpretation, unlike the functional derivative, which might partly explain the difficulty of the concept. Another source of confusion is precisely the acquired reflex of taking the derivative "with respect to the variable".
We will consider a typical application, i.e. finding the configuration of a system via energy minimization. The system is described by the value of a field \(f(x) \) in all points \(x\) of a certain domain \(\mathcal{D}\). \(\mathcal{D}\) may be \(n\)-dimensional, in which case we will write simply \(x\) instead of \((x_1, x_2, \ldots, x_n)\). The energy \(U\) is a functional1 of \(f\). For instance: \begin{equation} U = \int_{\mathcal{D}} \text{d}x \, f^2(x) \, . \label{eq:defU}\end{equation} In analogy with functions, an extremum of \(U\) is a configuration \(f_0(x)\) such that "small variations with respect to this position leave the energy unchanged".
These are not variations in the variable \(x\), but rather changes of \(f(x) \) over the entire domain \(\mathcal{D}\). Let us write \(f(x) = f_0(x) + \delta f (x)\) and limit ourselves to terms linear in \(\delta f\). More rigorously, we can write \(\delta f (x) =\epsilon g(x) \) and work in the limit \(\epsilon \to 0\). \(\delta f \) and \(g\) are functions defined on \(\mathcal{D}\) and can be subject to certain constraints (on the boundary, in particular). The resulting change in \(U\) is:\begin{equation} \delta U = \int_{\mathcal{D}} \text{d}x \, [f_0(x)+\delta f (x)]^2(x) - \int_{\mathcal{D}} \text{d}x \, f_0(x)^2(x) = \int_{\mathcal{D}} \text{d}x \, \delta f (x) 2 f_0(x) \, . \label{eq:deriv}\end{equation}to first order in \(\delta f\). We say that \(2 f\) is the functional derivative of \(U\), \(\frac{\delta U}{\delta f}\) and write:\begin{equation} \delta U = \int_{\mathcal{D}} \text{d}x \, \delta f (x) \frac{\delta U}{\delta f} \quad \text{which is similar to} \quad \text{d}f = \text{d}x \cdot \nabla f \label{eq:diff}\end{equation}with the identification: \(\delta f \equiv \text{d}x\), \(\frac{\delta U}{\delta f} \equiv \nabla f\), and \(\int_{\mathcal{D}} \text{d}x \equiv \, \cdot \, \) plays the role of the scalar product. Ensuring that \(\delta U = 0\) for any variation \(\delta f\) requires that \(\frac{\delta U}{\delta f} = 0 \Rightarrow f_0(x) =0\) for our example \eqref{eq:defU}, exactly as \( \text{d}f = 0\) for all \( \text{d}x\) implies \( \nabla f =0\).
To develop the similarity: \(\text{d}x \) and \(\nabla f\) are \(n\)-dimensional vectors, and their product yields the variation of \(f\) along \(\text{d}x \). \(\delta f\) and \(\frac{\delta U}{\delta f}\) are (infinite-dimensional) vectors, and their "product" yields the variation of \(U\) "in the direction of" \(\delta f\). To belabor the point, this "direction" is defined in the function space, and not in the domain \(\mathcal{D}\) of the integration variable \(x\).
We glossed here over the fact that \(\text{d}x \) and \(\nabla f\) do not belong to the same vector space (the contravariant/covariant distinction). In the same vein, \(\frac{\delta U}{\delta f}\) need not belong to the same family of functions as \(\delta f\) (or \(g\)) and must more generally be defined as a distribution.
These are not variations in the variable \(x\), but rather changes of \(f(x) \) over the entire domain \(\mathcal{D}\). Let us write \(f(x) = f_0(x) + \delta f (x)\) and limit ourselves to terms linear in \(\delta f\). More rigorously, we can write \(\delta f (x) =\epsilon g(x) \) and work in the limit \(\epsilon \to 0\). \(\delta f \) and \(g\) are functions defined on \(\mathcal{D}\) and can be subject to certain constraints (on the boundary, in particular). The resulting change in \(U\) is:\begin{equation} \delta U = \int_{\mathcal{D}} \text{d}x \, [f_0(x)+\delta f (x)]^2(x) - \int_{\mathcal{D}} \text{d}x \, f_0(x)^2(x) = \int_{\mathcal{D}} \text{d}x \, \delta f (x) 2 f_0(x) \, . \label{eq:deriv}\end{equation}to first order in \(\delta f\). We say that \(2 f\) is the functional derivative of \(U\), \(\frac{\delta U}{\delta f}\) and write:\begin{equation} \delta U = \int_{\mathcal{D}} \text{d}x \, \delta f (x) \frac{\delta U}{\delta f} \quad \text{which is similar to} \quad \text{d}f = \text{d}x \cdot \nabla f \label{eq:diff}\end{equation}with the identification: \(\delta f \equiv \text{d}x\), \(\frac{\delta U}{\delta f} \equiv \nabla f\), and \(\int_{\mathcal{D}} \text{d}x \equiv \, \cdot \, \) plays the role of the scalar product. Ensuring that \(\delta U = 0\) for any variation \(\delta f\) requires that \(\frac{\delta U}{\delta f} = 0 \Rightarrow f_0(x) =0\) for our example \eqref{eq:defU}, exactly as \( \text{d}f = 0\) for all \( \text{d}x\) implies \( \nabla f =0\).
To develop the similarity: \(\text{d}x \) and \(\nabla f\) are \(n\)-dimensional vectors, and their product yields the variation of \(f\) along \(\text{d}x \). \(\delta f\) and \(\frac{\delta U}{\delta f}\) are (infinite-dimensional) vectors, and their "product" yields the variation of \(U\) "in the direction of" \(\delta f\). To belabor the point, this "direction" is defined in the function space, and not in the domain \(\mathcal{D}\) of the integration variable \(x\).
We glossed here over the fact that \(\text{d}x \) and \(\nabla f\) do not belong to the same vector space (the contravariant/covariant distinction). In the same vein, \(\frac{\delta U}{\delta f}\) need not belong to the same family of functions as \(\delta f\) (or \(g\)) and must more generally be defined as a distribution.
1. A functional is a mathematical object that acts upon a function and yields a scalar. We assume that it can always be written as an integral over \(\mathcal{D}\) (as in \eqref{eq:defU}), although sometimes this integral will symbolize operating with a distribution upon \(f\). ↩