# The Legendre transform

Jordan Bell
April 25, 2014

## 1 Convexity

Write $\overline{\mathbb{R}}=[-\infty,\infty]$. We define $\infty+\infty=\infty$, $-\infty-\infty=-\infty$, and $\infty-\infty$ is nonsense. If $a\in\mathbb{R}$, we define $a+\infty=\infty$ and $a-\infty=-\infty$.

If $X$ is a vector space and $f:X\to\overline{\mathbb{R}}$ is a function, we define the epigraph of $f$ to be the set

 $\mathrm{epi}\,f=\{(x,\alpha)\in X\times\mathbb{R}:\alpha\geq f(x)\},$

and if $\mathrm{epi}\,f$ is a convex subset of the vector space $X\times\mathbb{R}$, we say that $f$ is convex. We define the effective domain of a convex function $f$ to be

 $\mathrm{dom}\,f=\{x\in X:f(x)<\infty\}.$

We say that a convex function $f:X\to\overline{\mathbb{R}}$ is proper if $\mathrm{dom}\,f\neq\emptyset$ and $f$ does not take the value $-\infty$. If $C$ is a convex subset of $X$ and $f:C\to\mathbb{R}$ is a function, we extend $f$ to $X$ by defining $f(x)=\infty$ for $x\in X\setminus C$.

###### Lemma 1.

If $X$ is a vector space, $C$ is a convex subset of $X$, and $f:C\to\mathbb{R}$ satisfies

 $f((1-t)x_{1}+tx_{2})\leq(1-t)f(x_{1})+tf(x_{2}),\qquad x_{1},x_{2}\in C,\quad 0% \leq t\leq 1,$

then $f:X\to\overline{\mathbb{R}}$ is convex.

###### Proof.

Let $(x_{1},\alpha_{1}),(x_{2},\alpha_{2})\in\mathrm{epi}\,f$ and $0\leq t\leq 1$. The fact that the pairs $(x_{i},\alpha_{i})$ belong to $\mathrm{epi}\,f$ means in particular that $f(x_{i})<\infty$, and hence that $x_{i}\in C$, as otherwise we would have $f(x_{i})=\infty$. But

 $(1-t)(x_{1},\alpha_{1})+t(x_{2},\alpha_{2})=((1-t)x_{1}+tx_{2},(1-t)\alpha_{1}% +t\alpha_{2}),$

and, as $x_{1},x_{2}\in C$,

 $f((1-t)x_{1}+tx_{2})\leq(1-t)f(x_{1})+tf(x_{2})\leq(1-t)\alpha_{1}+t\alpha_{2},$

showing that $(1-t)(x_{1},\alpha_{1})+t(x_{2},\alpha_{2})\in\mathrm{epi}\,f$, and hence that $f:X\to\overline{\mathbb{R}}$ is convex. ∎

## 2 Definition of the Legendre transform

Let $V$ be a locally convex space and let $V^{*}$ be the dual space of $V$, i.e. the set of all continuous linear maps $V\to\mathbb{R}$. With the weak-* topology, $V^{*}$ is itself a locally convex space and $V=(V^{*})^{*}$, with the isomorphism of locally convex spaces $x\mapsto(\lambda\mapsto\lambda x)$. If $f:V\to\overline{\mathbb{R}}$ is a function, the Legendre transform or convex conjugate of $f$ is the function $f^{*}:V^{*}\to\overline{\mathbb{R}}$ defined by

 $f^{*}(\lambda)=\sup\{\lambda v-f(v):v\in V\}=\sup\{\lambda v-f(v):v\in\mathrm{% dom}\,f\}.$

Like how the Fourier transform of a function from a locally compact abelian group to $\mathbb{C}$ is itself a function from the Pontryagin dual of the group to $\mathbb{C}$, the Legendre transform of a function from a locally convex space to $\overline{\mathbb{R}}$ is itself a function from the dual space to $\overline{\mathbb{R}}$.

###### Theorem 2.

If $V$ is a locally convex space and $f:V\to\overline{\mathbb{R}}$ is convex, then its Legendre transform $f^{*}:V^{*}\to\overline{\mathbb{R}}$ is convex.

###### Proof.

Let $(\lambda_{1},\alpha_{1}),(\lambda_{2},\alpha_{2})\in\mathrm{epi}\,f^{*}$, and let $0\leq t\leq 1$. We have

 $\displaystyle f^{*}((1-t)\lambda_{1}+t\lambda_{2})$ $\displaystyle=$ $\displaystyle\sup\{((1-t)\lambda_{1}+t\lambda_{2})v-f(v):v\in\mathrm{dom}\,f\}$ $\displaystyle=$ $\displaystyle\sup\{(1-t)(\lambda_{1}v-f(v))+t(\lambda_{2}v-f(v)):v\in\mathrm{% dom}\,f\}$ $\displaystyle\leq$ $\displaystyle\sup\{(1-t)(\lambda_{1}v-f(v)):v\in\mathrm{dom}\,f\}$ $\displaystyle+\sup\{t(\lambda_{2}v-f(v)):v\in\mathrm{dom}\,f\}$ $\displaystyle=$ $\displaystyle(1-t)f^{*}(\lambda_{1})+tf^{*}(\lambda_{2})$ $\displaystyle\leq$ $\displaystyle(1-t)\alpha_{1}+t\alpha_{2},$

which means that $(1-t)(\lambda_{1},\alpha_{1})+t(\lambda_{2},\alpha_{2})\in\mathrm{epi}\,f^{*}$, and hence that $f^{*}$ is convex. ∎

## 3 Lower semicontinuity

If $X$ is a topological vector space and $f:X\to\overline{\mathbb{R}}$ is a function, we say that $f$ is lower semicontinuous if $\mathrm{epi}\,f$ is a closed subset of $X\times\mathbb{R}$.

Theorem 2 shows that the Legendre transform of a convex function is itself convex. The following lemma states that if a proper convex function is lower semicontinuous, then its Legendre transform is proper; one proves the lemma using the Hahn-Banach separation theorem.11 1 Viorel Barbu and Teodor Precupanu, Convexity and Optimization in Banach Spaces, fourth ed., p. 78, Corollary 2.21. We use this lemma in the proof of the theorem that comes after.

###### Lemma 3.

If $V$ is a locally convex space and $f:V\to\overline{\mathbb{R}}$ is a lower semicontinuous proper convex function, then its Legendre transform $f^{*}:V^{*}\to\overline{\mathbb{R}}$ is proper.

###### Theorem 4.

If $V$ is a locally convex space and $f:V\to\overline{\mathbb{R}}$ is a lower semicontinuous proper convex convex, then $f^{**}=f$.

###### Proof.

For any $\lambda\in V^{*}$ we have $f^{*}(\lambda)=\sup\{\lambda v-f(v):v\in V\}$, and hence for any $v\in V$ we have $f^{*}(\lambda)\geq\lambda v-f(v)$. Thus, for any $v\in V$ and $\lambda\in V^{*}$ we have

 $\lambda v-f^{*}(\lambda)\leq f(v).$

Using this, for any $v\in V$ we have

 $f^{**}(v)=\sup\{v\lambda-f^{*}(\lambda):\lambda\in\mathrm{dom}\,f^{*}\}=\sup\{% \lambda v-f^{*}(\lambda):\lambda\in\mathrm{dom}\,f^{*}\}\leq f(v).$

Suppose by contradiction that there were some $v_{0}\in V$ for which $f^{**}(v_{0}). First, by Lemma 3 we have that $f^{*}$ is proper, so in particular $\mathrm{dom}\,f^{*}\neq\emptyset$, and this tells us that $f^{**}$ does not take the value $-\infty$. Hence $-\infty, which tells us that $(v_{0},f^{**}(v_{0}))\in V\times\mathbb{R}\setminus\mathrm{epi}\,f$. Therefore, $\mathrm{epi}\,f$ and the singleton $\{(v_{0},f^{**}(v_{0}))\}$ are disjoint closed convex sets ($\mathrm{epi}\,f$ is closed because $f$ is lower semicontinuous), and so we can apply the Hahn-Banach separation theorem: there is some $\Lambda\in(V\times\mathbb{R})^{*}$ and some $\gamma\in\mathbb{R}$ for which

 $\Lambda(v,\alpha)<\gamma<\Lambda(v_{0},f^{**}(v_{0})),\qquad(v,\alpha)\in% \mathrm{epi}\,f.$

As $\Lambda\in(V\times\mathbb{R})^{*}$, there is some $\lambda\in V^{*}$ and some $\beta\in\mathbb{R}^{*}=\mathbb{R}$ for which $\Lambda(v,\alpha)=\lambda v+\beta\alpha$, and so

 $\lambda v+\beta\alpha<\gamma<\lambda v_{0}+\beta f^{**}(v_{0}),\qquad(v,\alpha% )\in\mathrm{epi}\,f.$ (1)

If $\beta>0$ then we get a contradiction because for a fixed $v\in\mathrm{dom}\,f$ there are arbitrarily large positive $\alpha$ for which $(v,\alpha)\in\mathrm{epi}\,f$. Hence, $\beta\leq 0$. Assume by contradiction that $\beta=0$, and hence that

 $\lambda v<\gamma<\lambda v_{0},\qquad v\in\mathrm{dom}\,f.$ (2)

If $v_{0}\in\mathrm{dom}\,f$ then we get $\lambda v_{0}<\gamma<\lambda v_{0}$, a contradiction. If $v_{0}\not\in\mathrm{dom}\,f$, we shall still obtain a contradiction. Let $\mu\in\mathrm{dom}\,f^{*}$. For any $h>0$,

 $\displaystyle f^{*}(\mu+h\lambda)$ $\displaystyle=$ $\displaystyle\sup\{(\mu+h\lambda)v-f(v):v\in\mathrm{dom}\,f\}$ $\displaystyle=$ $\displaystyle\sup\{\mu v-f(v)+h\lambda v:v\in\mathrm{dom}\,f\}$ $\displaystyle\leq$ $\displaystyle\sup\{\mu v-f(v):v\in\mathrm{dom}\,f\}+\sup\{h\lambda v:v\in% \mathrm{dom}\,f\}$ $\displaystyle=$ $\displaystyle f^{*}(\mu)+h\sup\{\lambda v:v\in\mathrm{dom}\,f\}.$

Therefore,

 $\displaystyle f^{**}(v_{0})$ $\displaystyle\geq$ $\displaystyle(\mu+h\lambda)v_{0}-f^{*}(\mu+h\lambda)$ $\displaystyle\geq$ $\displaystyle(\mu+h\lambda)v_{0}-f^{*}(\mu)-h\sup\{\lambda v:v\in\mathrm{dom}% \,f\}$ $\displaystyle=$ $\displaystyle\mu v_{0}-f^{*}(\mu)+h\big{(}\lambda v_{0}-\sup\{\lambda v:v\in% \mathrm{dom}\,f\}\big{)}.$

But by (2) we have $\lambda v_{0}-\sup\{\lambda v:v\in\mathrm{dom}\,f\}>0$, and therefore the right-hand side of

 $f^{**}(v_{0})\geq\mu v_{0}-f^{*}(\mu)+h\big{(}\lambda v_{0}-\sup\{\lambda v:v% \in\mathrm{dom}\,f\}\big{)}$

can be an arbitrarily large positive number (as $f^{*}(\mu)<\infty$), contradicting that $f^{**}(v_{0})<\infty$. Therefore, $\beta<0$, and dividing (1) by $\beta$ then yields

 $\frac{1}{\beta}\lambda v+\alpha>\frac{\gamma}{\beta}>\frac{1}{\beta}\lambda v_% {0}+f^{**}(v_{0}),\qquad(v,\alpha)\in\mathrm{epi}\,f.$

Hence,

 $\displaystyle\left(-\frac{1}{\beta}\lambda\right)v_{0}-f^{**}(v_{0})$ $\displaystyle>$ $\displaystyle\sup\left\{-\frac{1}{\beta}\lambda v-\alpha:(v,\alpha)\in\mathrm{% epi}\,f\right\}$ $\displaystyle=$ $\displaystyle\sup\left\{-\frac{1}{\beta}\lambda v-f(v):v\in\mathrm{dom}\,f\right\}$ $\displaystyle=$ $\displaystyle f^{*}\left(-\frac{1}{\beta}\lambda\right),$

 $\left(-\frac{1}{\beta}\lambda\right)v_{0}-f^{**}(v_{0})\leq f^{*}\left(-\frac{% 1}{\beta}\lambda\right).$

Therefore, there is no $v_{0}\in V$ for which $f^{**}(v_{0}), i.e., for all $v\in V$ we have

 $f(v_{0})\leq f^{**}(v_{0}).$

## 4 Example in Rn

Let $V:\mathbb{R}^{n}\to\mathbb{R}$ be a function, let $A$ be an $n\times n$ symmetric positive definite matrix, and define $L:T\mathbb{R}^{n}\to\mathbb{R}$ by

 $L(q,v)=\frac{1}{2}\left\langle v,Av\right\rangle-V(q).$

Fix any $q\in\mathbb{R}^{n}$, let $X=T_{q}\mathbb{R}^{n}=\mathbb{R}^{n}$, and write $L_{q}(v)=L(q,v)$, for which $L_{q}:X\to\mathbb{R}$. The Legendre transform of $L_{q}$ is $L_{q}^{*}:X^{*}\to\overline{\mathbb{R}}$, defined by

 $L_{q}^{*}(\lambda)=\sup\{\lambda v-L_{q}(v):v\in X\}.$

Because $A$ is symmetric, for any $v\in X$ we obtain

 $D(\lambda-L_{q})(v)=\lambda-Av.$

Hence, $D(\lambda-L_{q})(v)=0$ is equivalent to

 $v=A^{-1}\lambda,$

and with this,

 $L_{q}(A^{-1}\lambda)=\frac{1}{2}\left\langle A^{-1}\lambda,AA^{-1}\lambda% \right\rangle-V(q)=\frac{1}{2}\left\langle A^{-1}\lambda,\lambda\right\rangle-% V(q),$

and therefore

 $L_{q}^{*}(\lambda)=\lambda(A^{-1}\lambda)-\frac{1}{2}\left\langle A^{-1}% \lambda,\lambda\right\rangle+V(q)=\frac{1}{2}\left\langle A^{-1}\lambda,% \lambda\right\rangle+V(q).$

## 5 Derivatives

Let $\Omega$ be a domain in $\mathbb{R}^{n}$ and let $f\in C^{s}(\Omega)$ for some $s\geq 2$. We define $\phi:\Omega\to\mathbb{R}^{n}$ by $\phi(x)=(Df)(x)$; we have $\phi\in C^{s-1}(\Omega,\mathbb{R}^{n})$. Following Giaquinta and Hildebrandt,22 2 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 6. we call $\phi$ a gradient mapping. The following theorem gives conditions under which $\phi:\Omega\to\phi(\Omega)$ is invertible.33 3 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 6, Lemma 1. (To be locally invertible means that for each point there is an open neighborhood such that the restriction of $\phi$ to that neighborhood is invertible.)

###### Theorem 5.

If

 $\det(D^{2}f)(x)\neq 0,\qquad x\in\Omega,$

then $\phi$ is locally invertible on $\Omega$. If $\Omega$ is convex and for all $x\in\Omega$ the matrix $(D^{2}f)(x)$ is positive definite, then $\phi:\Omega\to\phi(\Omega)$ is a $C^{s-1}$ diffeomorphism.

###### Proof.

Because $\phi\in C^{s-1}(\Omega,\mathbb{R}^{n})$ and $\det(D\phi)(x)=\det(D^{2}f)(x)\neq 0$ for all $x\in\Omega$, by the inverse function theorem44 4 Jerrold E. Marsden and Michael J. Hoffman, Elementary Classical Analysis, second ed., p. 393, Theorem 7.1.1. we have that $\phi:\Omega\to\phi(\Omega)$ is a local $C^{s-1}$ diffeomorphism.

Suppose that $\phi(x_{1})=\phi(x_{2})$ for some distinct $x_{1},x_{2}\in\Omega$. Put $x=x_{2}-x_{1}$. Because $x_{1},x_{1}+x=x_{2}\in\Omega$ and $\Omega$ is convex, for any $0\leq t\leq 1$ we have $x_{1}+tx\in\Omega$. Now define

 $A(t)=(D^{2}f)(x_{1}+tx),\qquad 0\leq t\leq 1;$

because $f\in C^{s}(\Omega)$ with $s\geq 2$, we have that $A$ is continuous. Because

 $\frac{d}{dt}\phi(x_{1}+tx)=(D\phi)(x_{1}+tx)x=(D^{2}f)(x_{1}+tx)x=A(t)x,$

we have

 $\displaystyle\left\langle x,\phi(x_{2})-\phi(x_{1})\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle x,\int_{0}^{1}\frac{d}{dt}\phi(x_{1}+tx)dt\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle x,\int_{0}^{1}A(t)dt\right\rangle$ $\displaystyle=$ $\displaystyle\int_{0}^{1}\left\langle x,A(t)x\right\rangle dt.$

For each $0\leq t\leq 1$ we have that $A(t)$ is a positive definite matrix, and because $x\neq 0$, this gives us that $\left\langle x,A(t)x\right\rangle>0$. Moreover, $t\mapsto\left\langle x,A(t)x\right\rangle$ is continuous, so it follows that

 $\int_{0}^{1}\left\langle x,A(t)x\right\rangle dt>0.$

But $\left\langle x,\phi(x_{2})-\phi(x_{1})\right\rangle=0$, a contradiction. Therefore, $\phi$ is one-to-one. It is a fact that a local diffeomorphism that is one-to-one is a diffeomorphism, thus $\phi:\Omega\to\phi(\Omega)$ is a $C^{s-1}$ diffeomorphism. ∎

Suppose that the gradient mapping $\phi:\Omega\to\phi(\Omega)$ is a $C^{s-1}$ diffeomorphism. We write $\psi=\phi^{-1}$, so $\psi:\phi(\Omega)\to\Omega$ is a $C^{s-1}$ diffeomorphism. The following theorem gives an explicit expression for the Legendre transform of certain functions.55 5 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 9.

###### Theorem 6.

If $\Omega$ is a convex domain in $\mathbb{R}^{n}$, $f\in C^{2}(\Omega)$, and for all $x\in\Omega$ the matrix $(D^{2}f)(x)$ is positive definite, then

 $f^{*}(\xi)=\xi\psi(\xi)-f(\psi(\xi)),\qquad\xi\in\phi(\Omega).$
###### Proof.

Fix $\xi\in\phi(\Omega)$ and define $g:\Omega\to\mathbb{R}$ by

 $g(x)=\xi x-f(x).$

We have $g\in C^{2}(\Omega)$, and we have $(Dg)(x)=\xi-(Df)(x)$ and $(D^{2}g)(x)=-(D^{2}f)(x)$. Thus, for each $x\in\Omega$, the matrix $(D^{2}g)(x)$ is negative definite. It follows that if there is a point $x_{0}\in\Omega$ at which $(Dg)(x_{0})=0$, then for all other $x\in\Omega$ we have $g(x). To have $(Dg)(x_{0})=0$ is equivalent $(Df)(x_{0})=\xi$, and because $\phi:\Omega\to\phi(\Omega)$ is a bijection, there is indeed a unique $x_{0}\in\Omega$ for which $(Df)(x_{0})=\phi(x_{0})=\xi$. Therefore,

 $\displaystyle f^{*}(\xi)$ $\displaystyle=$ $\displaystyle\sup\{\xi x-f(x):x\in\Omega\}$ $\displaystyle=$ $\displaystyle\sup\{g(x):x\in\Omega\}$ $\displaystyle=$ $\displaystyle g(x_{0})$ $\displaystyle=$ $\displaystyle\xi x_{0}-f(x_{0})$ $\displaystyle=$ $\displaystyle\xi\psi(\xi)-f(\psi(\xi)).$

Using the above expression for the Legendre transform of a $C^{2}$ function with positive definite Hessian, we show in the following theorem that the Legendre transform of a $C^{s}$ function with positive definite Hessian is itself a $C^{2}$ function.66 6 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 7, Lemma 2.

###### Theorem 7.

If $\Omega$ is a convex domain in $\mathbb{R}^{n}$, $f\in C^{s}(\Omega)$ with $s\geq 2$, and for all $x\in\Omega$ the matrix $(D^{2}f)(x)$ is positive definite, then $Df^{*}=\psi$ and $f^{*}\in C^{s}(\phi(\Omega))$.

###### Proof.

For all $\xi\in\phi(\Omega)$ we have by Theorem 6,

 $f^{*}(\xi)=\xi\psi(\xi)-f(\psi(\xi)).$

Thus,

 $\displaystyle(Df^{*})(\xi)$ $\displaystyle=$ $\displaystyle\psi(\xi)+\xi(D\psi)(\xi)-(Df)(\psi(\xi))(D\psi)(\xi)$ $\displaystyle=$ $\displaystyle\psi(\xi)+\xi(D\psi)(\xi)-\phi(\psi(\xi))(D\psi)(\xi)$ $\displaystyle=$ $\displaystyle\psi(\xi)+\xi(D\psi)(\xi)-\xi(D\psi)(\xi)$ $\displaystyle=$ $\displaystyle\psi(\xi).$

Hence $Df^{*}=\psi$. Because $\psi\in C^{s-1}(\phi(\Omega))$, it follows that $f\in C^{s}(\phi(\Omega))$. ∎

For all $x\in\Omega$, because $\psi(\phi(x))=x$ we have $(D\psi)(\phi(x))(D\phi)(x)=I$, i.e. $(D^{2}f^{*})(\phi(x))(D^{2}f)(x)=I$, so

 $(D^{2}f)(x)=((D^{2}f^{*})(\phi(x)))^{-1}.$

For all $\xi\in\phi(\Omega)$, because $\phi(\psi(\xi))=\xi$, we have $(D\phi)(\psi(\xi))(D\psi)(\xi)=I$, i.e. $(D^{2}f)(\psi(\xi))(D^{2}f^{*})(\xi)=I$, so

 $(D^{2}f^{*})(\xi)=((D^{2}f)(\psi(\xi)))^{-1}.$

## 6 Example in R2

Suppose that $\Omega$ is a convex domain in $\mathbb{R}^{2}$, that $f\in C^{2}(\Omega)$, and that for all $x\in\Omega$, the matrix $(D^{2}f)(x)$ is positive definite. Write $\rho(x)=\det(D^{2}f)(x)$; $\rho(x)>0$ for all $x\in\Omega$. Because $(D^{2}f)(x)=((D^{2}f^{*})(\phi(x)))^{-1}$ for all $x\in\Omega$, we have

 $\displaystyle\begin{pmatrix}f_{11}(x)&f_{12}(x)\\ f_{21}(x)&f_{22}(x)\end{pmatrix}$ $\displaystyle=$ $\displaystyle\begin{pmatrix}f^{*}_{11}(\phi(x))&f^{*}_{12}(\phi(x))\\ f^{*}_{21}(\phi(x))&f^{*}_{22}(\phi(x))\end{pmatrix}^{-1}$ $\displaystyle=$ $\displaystyle\frac{1}{\det(D^{2}f^{*})(\phi(x))}\begin{pmatrix}f^{*}_{22}(\phi% (x))&-f^{*}_{12}(\phi(x))\\ -f^{*}_{21}(\phi(x))&f^{*}_{11}(\phi(x))\end{pmatrix}$ $\displaystyle=$ $\displaystyle\rho(x)\begin{pmatrix}f^{*}_{22}(\phi(x))&-f^{*}_{12}(\phi(x))\\ -f^{*}_{21}(\phi(x))&f^{*}_{11}(\phi(x)).\end{pmatrix}$

Giaquinta and Hildebrandt77 7 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 14. give the following consequence of what we have just written out. If $f$ satisfies the above conditions and satisfies the equation

 $(1+f_{2}^{2})f_{11}-2f_{1}f_{2}f_{12}+(1+f_{1}^{2})f_{22}=2H(1+f_{1}^{2}+f_{2}% ^{2})^{3/2},$

on $\Omega$, where $H$ is some constant, then

 $\begin{split}&\displaystyle(1+f_{2}(x)^{2})\rho(x)f^{*}_{22}(\phi(x))+2f_{1}(x% )f_{2}(x)\rho(x)f^{*}_{12}(\phi(x))\\ &\displaystyle+(1+f_{1}(x)^{2})\rho(x)f^{*}_{11}(\phi(x))\\ \displaystyle=&\displaystyle 2H(1+f_{1}(x)^{2}+f_{2}(x)^{2})^{3/2}\end{split}$

for all $x\in\Omega$. Therefore,

 $\begin{split}&\displaystyle(1+\xi_{2}^{2})\rho(\psi(\xi))f^{*}_{22}(\xi)+2\xi_% {1}\xi_{2}\rho(\psi(\xi))f^{*}_{12}(\xi)+(1+\xi_{1}^{2})\rho(\psi(\xi))f^{*}_{% 11}(\xi)\\ \displaystyle=&\displaystyle 2H(1+\xi_{1}^{2}+\xi_{2}^{2})^{3/2}\end{split}$

for all $\xi\in\phi(\Omega)$, $\xi=(\xi_{1},\xi_{2})$. In the case where $H=0$, then, dividing by $\rho(\psi(\xi))$, which is $>0$, we obtain

 $(1+\xi_{2}^{2})f^{*}_{22}(\xi)+2\xi_{1}\xi_{2}f^{*}_{12}(\xi)+(1+\xi_{1}^{2})f% ^{*}_{11}(\xi)=0.$

In the case $H=0$, the equation satisfied by $f$ is called the minimal surface equation, and we have thus found a partial differential equation satisfied by the Legendre transform of a solution of the minimal surface equation that satisfies the conditions we imposed at the start of the example. Writing the equation satisfied by $f^{*}$ in the form

 $Af^{*}_{11}+2Bf^{*}_{12}+Cf^{*}_{22}=0,$

we have $A=(1+\xi_{1}^{2})$, $B=\xi_{1}\xi_{2}$, $C=(1+\xi_{2}^{2})$, with which

 $B^{2}-AC=\xi_{1}^{2}\xi_{2}^{2}-(1+\xi_{1}^{2})(1+\xi_{2}^{2})=-1-\xi_{1}^{2}-% \xi_{2}^{2}\leq-1,$

which means that partial differential equation satisfied by $f^{*}$ is elliptic.

## 7 Lagrangians and Hamiltonians

Theorem 6 states that if $\Omega$ is a convex domain in $\mathbb{R}^{n}$, $f\in C^{2}(\Omega)$, and for all $x\in\Omega$ the matrix $(D^{2}f)(x)$ is positive definite, then

 $f^{*}(\xi)=\xi\psi(\xi)-f(\psi(\xi)),\qquad\xi\in\phi(\Omega).$

Suppose that $L:\mathbb{R}^{n}\times\mathbb{R}^{n}\times\mathbb{R}$ is a function such that for each $q\in\mathbb{R}^{n}$ and $t\in\mathbb{R}$, the function $v\mapsto L(q,v,t)$ satisfies the above conditions. Fix $q\in\mathbb{R}^{n}$ and $t\in\mathbb{R}$. With $DL=(\frac{\partial L}{\partial q},\frac{\partial L}{\partial v},\frac{\partial L% }{\partial t})$ and $\phi(v)=\frac{\partial L}{\partial v}(q,v,t)$, $\psi=\phi^{-1}$, we have

 $L^{*}(q,p,t)=p\psi(p)-L(q,\psi(p),t),\qquad p\in\phi(\mathbb{R}^{n}),$

or with $H=L^{*}$,

 $H(q,p,t)=p\psi(p)-L(q,\psi(p),t),\qquad p\in\phi(\mathbb{R}^{n}).$

We have

 $\frac{\partial H}{\partial q}(q,p,t)=-\frac{\partial L}{\partial q}(q,\psi(p),% t),$

and

 $\displaystyle\frac{\partial H}{\partial p}(q,p,t)$ $\displaystyle=$ $\displaystyle\psi(p)+p(D\psi)(p)-\frac{\partial L}{\partial v}(q,\psi(p),t)(D% \psi)(p)$ $\displaystyle=$ $\displaystyle\psi(p)+p(D\psi)(p)-\phi(\psi(p))(D\psi)(p)$ $\displaystyle=$ $\displaystyle\psi(p)+p(D\psi)(p)-p(D\psi)(p)$ $\displaystyle=$ $\displaystyle\psi(p),$

and

 $\frac{\partial H}{\partial t}(q,p,t)=-\frac{\partial L}{\partial t}(q,\psi(p),% t).$

For a path $(q(t),v(t),t)$ to satisfy the Euler-Lagrange equation means that

 $\frac{d}{dt}\left(\frac{\partial L}{\partial v}(q(t),v(t),t)\right)=\frac{% \partial L}{\partial q}(q(t),v(t),t).$

With $p(t)=\phi(v(t))$, this yields

 $\frac{dp}{dt}(t)=\frac{\partial L}{\partial q}(q(t),\psi(p(t)),t),$

and hence

 $\frac{dp}{dt}(t)=-\frac{\partial H}{\partial q}(q(t),p(t),t).$

## 8 Physical units

First, if a Lagrangian $L$, $L(q,v,t)$, has units $\mathrm{J}$, then the Hamiltonian $H=L^{*}$, $H(q,p,t)$, has the same units $\mathrm{J}$, and it follows that $p\psi(p)$ has units $\mathrm{J}$. Second, $[\psi(p)]=[v]$, and so $[H]=[p][\psi(p)]=[p][v]$. Therefore, $[p][v]=\mathrm{J}$. If we take $[v]=\mathrm{m}\mathrm{/}\mathrm{s}$, then this implies that $[p]=\mathrm{k}\mathrm{g}\,\mathrm{m}\mathrm{/}\mathrm{s}$.

## 9 More books

V. I. Arnold, Mathematical Methods of Classical Mechanics, second ed., p. 61, §14; Ralph Abraham and Jerrold E. Marsden, Foundations of Mechanics, second ed., p. 218, §3.6; Jerrold E. Marsden and Tudor S. Ratiu, Introduction to Mechanics and Symmetry, second ed., p. 183, §7.2; Jürgen Jost and Xianqing Li-Jost, Calculus of Variations, chapter 4; David Yang Gao, Duality Principles in Nonconvex Systems: Theory, Methods and Applications.

## 10 History

As best as I can tell, the thing we call the Legendre transform is named after Legendre because of the following paper: Adrien-Marie Legendre, Mémoire sur l’intégration de quelques Équations aux différences partielles, Histoire de l’Académie royale des sciences (1787), 309–351. The following is a partial bibliography of works that refer to this paper of Legendre’s. No historical summary of the Legendre transform exists in the literature, and the following is presented as an aid to the preparation of one. To properly tell the story of the Legendre transform, one would be well served by carefully digging through sources and attentively reading Legendre’s original paper, and also by making oneself comfortable with how it appears in convex analysis, minimal surfaces, contact geometry, thermodynamics, etc. Such a comprehensive history would require meticulously scanning through Legendre’s monumental Traite on elliptic integrals lest relevant material is included there. The best biography of Legendre that exists is the one by Itard in the Dictionary of Scientific Biography, who mentions that something relevant to the Legendre transform appears in volume II of the 1826 Traite, concerning arc lengths. One should also scan through the work of Lagrange, including his 1788 Méchanique analitique, and the work of Euler on the calculus of variations.

Correspondance de Leonhard Euler avec A. C. Clairaut, J. d’Alembert et J. L. Lagrange, pp. 440–441, Note 6; S. S. Demidov, The study of partial differential equations of the first order in the 18th and 19th centuries, Arch. Hist. Exact Sci. 26 (1982), no. 4, 325–350; Erwin Kreyszig, On the Theory of Minimal Surfaces, The Problem of Plateau (Themistocles M. Rassias, ed.), 1992, 138–164, p. 145; Julian Lowell Coolidge, A History of Geometrical Methods, p. 377; Alfred Enneper, Bemerkungen über einige Flächen von constantem Krümmungsmaaß, Nachrichten von der Königl. Gesellschaft der Wissenschaften und der Georg-Augusts-Universität zu Göttingen (1876), 597–619, p. 614; Alfred Enneper, Ueber Flächen mit besonderen Meridiancurven, Abhandlungen der Königlichen Gesellschaft der Wissenschaften in Göttingen 29 (1882), 3–87, p. 6; Gaston Darboux, Leçons sur la théorie générale des surfaces, vol. 1, p. 271, §177; Édouard Goursat, Leçons sur l’intégration des équations aux dérivées partielles du second ordre, à deux variables indépendantes, tome 2, p. 32, chapter V, §113; René Taton, L’œuvre scientifique de Monge, p. 262; Karin Reich, Die Geschichte der Differentialgeometrie von Gauß bis Riemann (1828–1868), Arch. Hist. Exact Sci. 11 (1973), no. 4, 273–376, p. 315; Ivor Grattan-Guinness, Convolutions in French Mathematics, 1800-1840, vol. I, p. 152; Morris Kline, Mathematical Thought From Ancient to Modern Times, chapter 22; João Caramalho Domingues, Lacroix and the Calculus, p. 223; Ernst Hairer, Syvert Paul Nørsett and Gerhard Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems, p. 32; Paul Mansion, Théorie des équations aux dérivées partielles du premier ordre, p. 76; A. R. Forsyth, A Treatise on Differential Equations, sixth ed., pp. 418, 476; Lagrange, Méchanique analitique (1788), tome 1, partie 2, §IV; Ernesto Pascal, Die Variationsrechnung, p. 125; Bernhard Riemann, Ueber die Fläche vom kleinsten Inhalt bei gegebener Begrenzung; Courant and Hilbert, vol. II; Cornelius Lanczos, The Variational Principles of Mechanics, fourth ed., §VI.1; Ed. Combescure, Remarques sur un Mémoire de Legendre, Comptes rendus hebdomadaires des séances de l’Académie des sciences 74 (1872), 798–802; Johannes C. C. Nitsche, Vorlesungen über Minimalflächen, p. 147; A. W. Conway and J. L. Synge (ed.), The Mathematical Papers of Sir William Rowan Hamilton, vol. I, (1931), p. 474.