The Legendre transform

Jordan Bell
April 25, 2014

1 Convexity

Write ¯=[-,]. We define +=, --=-, and - is nonsense. If a, we define a+= and a-=-.

If X is a vector space and f:X¯ is a function, we define the epigraph of f to be the set

epif={(x,α)X×:αf(x)},

and if epif is a convex subset of the vector space X×, we say that f is convex. We define the effective domain of a convex function f to be

domf={xX:f(x)<}.

We say that a convex function f:X¯ is proper if domf and f does not take the value -. If C is a convex subset of X and f:C is a function, we extend f to X by defining f(x)= for xXC.

Lemma 1.

If X is a vector space, C is a convex subset of X, and f:CR satisfies

f((1-t)x1+tx2)(1-t)f(x1)+tf(x2),x1,x2C,0t1,

then f:XR¯ is convex.

Proof.

Let (x1,α1),(x2,α2)epif and 0t1. The fact that the pairs (xi,αi) belong to epif means in particular that f(xi)<, and hence that xiC, as otherwise we would have f(xi)=. But

(1-t)(x1,α1)+t(x2,α2)=((1-t)x1+tx2,(1-t)α1+tα2),

and, as x1,x2C,

f((1-t)x1+tx2)(1-t)f(x1)+tf(x2)(1-t)α1+tα2,

showing that (1-t)(x1,α1)+t(x2,α2)epif, and hence that f:X¯ is convex. ∎

2 Definition of the Legendre transform

Let V be a locally convex space and let V* be the dual space of V, i.e. the set of all continuous linear maps V. With the weak-* topology, V* is itself a locally convex space and V=(V*)*, with the isomorphism of locally convex spaces x(λλx). If f:V¯ is a function, the Legendre transform or convex conjugate of f is the function f*:V*¯ defined by

f*(λ)=sup{λv-f(v):vV}=sup{λv-f(v):vdomf}.

Like how the Fourier transform of a function from a locally compact abelian group to is itself a function from the Pontryagin dual of the group to , the Legendre transform of a function from a locally convex space to ¯ is itself a function from the dual space to ¯.

Theorem 2.

If V is a locally convex space and f:VR¯ is convex, then its Legendre transform f*:V*R¯ is convex.

Proof.

Let (λ1,α1),(λ2,α2)epif*, and let 0t1. We have

f*((1-t)λ1+tλ2) = sup{((1-t)λ1+tλ2)v-f(v):vdomf}
= sup{(1-t)(λ1v-f(v))+t(λ2v-f(v)):vdomf}
sup{(1-t)(λ1v-f(v)):vdomf}
+sup{t(λ2v-f(v)):vdomf}
= (1-t)f*(λ1)+tf*(λ2)
(1-t)α1+tα2,

which means that (1-t)(λ1,α1)+t(λ2,α2)epif*, and hence that f* is convex. ∎

3 Lower semicontinuity

If X is a topological vector space and f:X¯ is a function, we say that f is lower semicontinuous if epif is a closed subset of X×.

Theorem 2 shows that the Legendre transform of a convex function is itself convex. The following lemma states that if a proper convex function is lower semicontinuous, then its Legendre transform is proper; one proves the lemma using the Hahn-Banach separation theorem.11 1 Viorel Barbu and Teodor Precupanu, Convexity and Optimization in Banach Spaces, fourth ed., p. 78, Corollary 2.21. We use this lemma in the proof of the theorem that comes after.

Lemma 3.

If V is a locally convex space and f:VR¯ is a lower semicontinuous proper convex function, then its Legendre transform f*:V*R¯ is proper.

Theorem 4.

If V is a locally convex space and f:VR¯ is a lower semicontinuous proper convex convex, then f**=f.

Proof.

For any λV* we have f*(λ)=sup{λv-f(v):vV}, and hence for any vV we have f*(λ)λv-f(v). Thus, for any vV and λV* we have

λv-f*(λ)f(v).

Using this, for any vV we have

f**(v)=sup{vλ-f*(λ):λdomf*}=sup{λv-f*(λ):λdomf*}f(v).

Suppose by contradiction that there were some v0V for which f**(v0)<f(v0). First, by Lemma 3 we have that f* is proper, so in particular domf*, and this tells us that f** does not take the value -. Hence -<f**(v0)<f(v0), which tells us that (v0,f**(v0))V×epif. Therefore, epif and the singleton {(v0,f**(v0))} are disjoint closed convex sets (epif is closed because f is lower semicontinuous), and so we can apply the Hahn-Banach separation theorem: there is some Λ(V×)* and some γ for which

Λ(v,α)<γ<Λ(v0,f**(v0)),(v,α)epif.

As Λ(V×)*, there is some λV* and some β*= for which Λ(v,α)=λv+βα, and so

λv+βα<γ<λv0+βf**(v0),(v,α)epif. (1)

If β>0 then we get a contradiction because for a fixed vdomf there are arbitrarily large positive α for which (v,α)epif. Hence, β0. Assume by contradiction that β=0, and hence that

λv<γ<λv0,vdomf. (2)

If v0domf then we get λv0<γ<λv0, a contradiction. If v0domf, we shall still obtain a contradiction. Let μdomf*. For any h>0,

f*(μ+hλ) = sup{(μ+hλ)v-f(v):vdomf}
= sup{μv-f(v)+hλv:vdomf}
sup{μv-f(v):vdomf}+sup{hλv:vdomf}
= f*(μ)+hsup{λv:vdomf}.

Therefore,

f**(v0) (μ+hλ)v0-f*(μ+hλ)
(μ+hλ)v0-f*(μ)-hsup{λv:vdomf}
= μv0-f*(μ)+h(λv0-sup{λv:vdomf}).

But by (2) we have λv0-sup{λv:vdomf}>0, and therefore the right-hand side of

f**(v0)μv0-f*(μ)+h(λv0-sup{λv:vdomf})

can be an arbitrarily large positive number (as f*(μ)<), contradicting that f**(v0)<. Therefore, β<0, and dividing (1) by β then yields

1βλv+α>γβ>1βλv0+f**(v0),(v,α)epif.

Hence,

(-1βλ)v0-f**(v0) > sup{-1βλv-α:(v,α)epif}
= sup{-1βλv-f(v):vdomf}
= f*(-1βλ),

which contradicts

(-1βλ)v0-f**(v0)f*(-1βλ).

Therefore, there is no v0V for which f**(v0)<f(v0), i.e., for all vV we have

f(v0)f**(v0).

4 Example in Rn

Let V:n be a function, let A be an n×n symmetric positive definite matrix, and define L:Tn by

L(q,v)=12v,Av-V(q).

Fix any qn, let X=Tqn=n, and write Lq(v)=L(q,v), for which Lq:X. The Legendre transform of Lq is Lq*:X*¯, defined by

Lq*(λ)=sup{λv-Lq(v):vX}.

Because A is symmetric, for any vX we obtain

D(λ-Lq)(v)=λ-Av.

Hence, D(λ-Lq)(v)=0 is equivalent to

v=A-1λ,

and with this,

Lq(A-1λ)=12A-1λ,AA-1λ-V(q)=12A-1λ,λ-V(q),

and therefore

Lq*(λ)=λ(A-1λ)-12A-1λ,λ+V(q)=12A-1λ,λ+V(q).

5 Derivatives

Let Ω be a domain in n and let fCs(Ω) for some s2. We define ϕ:Ωn by ϕ(x)=(Df)(x); we have ϕCs-1(Ω,n). Following Giaquinta and Hildebrandt,22 2 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 6. we call ϕ a gradient mapping. The following theorem gives conditions under which ϕ:Ωϕ(Ω) is invertible.33 3 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 6, Lemma 1. (To be locally invertible means that for each point there is an open neighborhood such that the restriction of ϕ to that neighborhood is invertible.)

Theorem 5.

If

det(D2f)(x)0,xΩ,

then ϕ is locally invertible on Ω. If Ω is convex and for all xΩ the matrix (D2f)(x) is positive definite, then ϕ:Ωϕ(Ω) is a Cs-1 diffeomorphism.

Proof.

Because ϕCs-1(Ω,n) and det(Dϕ)(x)=det(D2f)(x)0 for all xΩ, by the inverse function theorem44 4 Jerrold E. Marsden and Michael J. Hoffman, Elementary Classical Analysis, second ed., p. 393, Theorem 7.1.1. we have that ϕ:Ωϕ(Ω) is a local Cs-1 diffeomorphism.

Suppose that ϕ(x1)=ϕ(x2) for some distinct x1,x2Ω. Put x=x2-x1. Because x1,x1+x=x2Ω and Ω is convex, for any 0t1 we have x1+txΩ. Now define

A(t)=(D2f)(x1+tx),0t1;

because fCs(Ω) with s2, we have that A is continuous. Because

ddtϕ(x1+tx)=(Dϕ)(x1+tx)x=(D2f)(x1+tx)x=A(t)x,

we have

x,ϕ(x2)-ϕ(x1) = x,01ddtϕ(x1+tx)𝑑t
= x,01A(t)𝑑t
= 01x,A(t)x𝑑t.

For each 0t1 we have that A(t) is a positive definite matrix, and because x0, this gives us that x,A(t)x>0. Moreover, tx,A(t)x is continuous, so it follows that

01x,A(t)x𝑑t>0.

But x,ϕ(x2)-ϕ(x1)=0, a contradiction. Therefore, ϕ is one-to-one. It is a fact that a local diffeomorphism that is one-to-one is a diffeomorphism, thus ϕ:Ωϕ(Ω) is a Cs-1 diffeomorphism. ∎

Suppose that the gradient mapping ϕ:Ωϕ(Ω) is a Cs-1 diffeomorphism. We write ψ=ϕ-1, so ψ:ϕ(Ω)Ω is a Cs-1 diffeomorphism. The following theorem gives an explicit expression for the Legendre transform of certain functions.55 5 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 9.

Theorem 6.

If Ω is a convex domain in Rn, fC2(Ω), and for all xΩ the matrix (D2f)(x) is positive definite, then

f*(ξ)=ξψ(ξ)-f(ψ(ξ)),ξϕ(Ω).
Proof.

Fix ξϕ(Ω) and define g:Ω by

g(x)=ξx-f(x).

We have gC2(Ω), and we have (Dg)(x)=ξ-(Df)(x) and (D2g)(x)=-(D2f)(x). Thus, for each xΩ, the matrix (D2g)(x) is negative definite. It follows that if there is a point x0Ω at which (Dg)(x0)=0, then for all other xΩ we have g(x)<g(x0). To have (Dg)(x0)=0 is equivalent (Df)(x0)=ξ, and because ϕ:Ωϕ(Ω) is a bijection, there is indeed a unique x0Ω for which (Df)(x0)=ϕ(x0)=ξ. Therefore,

f*(ξ) = sup{ξx-f(x):xΩ}
= sup{g(x):xΩ}
= g(x0)
= ξx0-f(x0)
= ξψ(ξ)-f(ψ(ξ)).

Using the above expression for the Legendre transform of a C2 function with positive definite Hessian, we show in the following theorem that the Legendre transform of a Cs function with positive definite Hessian is itself a C2 function.66 6 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 7, Lemma 2.

Theorem 7.

If Ω is a convex domain in Rn, fCs(Ω) with s2, and for all xΩ the matrix (D2f)(x) is positive definite, then Df*=ψ and f*Cs(ϕ(Ω)).

Proof.

For all ξϕ(Ω) we have by Theorem 6,

f*(ξ)=ξψ(ξ)-f(ψ(ξ)).

Thus,

(Df*)(ξ) = ψ(ξ)+ξ(Dψ)(ξ)-(Df)(ψ(ξ))(Dψ)(ξ)
= ψ(ξ)+ξ(Dψ)(ξ)-ϕ(ψ(ξ))(Dψ)(ξ)
= ψ(ξ)+ξ(Dψ)(ξ)-ξ(Dψ)(ξ)
= ψ(ξ).

Hence Df*=ψ. Because ψCs-1(ϕ(Ω)), it follows that fCs(ϕ(Ω)). ∎

For all xΩ, because ψ(ϕ(x))=x we have (Dψ)(ϕ(x))(Dϕ)(x)=I, i.e. (D2f*)(ϕ(x))(D2f)(x)=I, so

(D2f)(x)=((D2f*)(ϕ(x)))-1.

For all ξϕ(Ω), because ϕ(ψ(ξ))=ξ, we have (Dϕ)(ψ(ξ))(Dψ)(ξ)=I, i.e. (D2f)(ψ(ξ))(D2f*)(ξ)=I, so

(D2f*)(ξ)=((D2f)(ψ(ξ)))-1.

6 Example in R2

Suppose that Ω is a convex domain in 2, that fC2(Ω), and that for all xΩ, the matrix (D2f)(x) is positive definite. Write ρ(x)=det(D2f)(x); ρ(x)>0 for all xΩ. Because (D2f)(x)=((D2f*)(ϕ(x)))-1 for all xΩ, we have

(f11(x)f12(x)f21(x)f22(x)) = (f11*(ϕ(x))f12*(ϕ(x))f21*(ϕ(x))f22*(ϕ(x)))-1
= 1det(D2f*)(ϕ(x))(f22*(ϕ(x))-f12*(ϕ(x))-f21*(ϕ(x))f11*(ϕ(x)))
= ρ(x)(f22*(ϕ(x))-f12*(ϕ(x))-f21*(ϕ(x))f11*(ϕ(x)).)

Giaquinta and Hildebrandt77 7 Mariano Giaquinta and Stefan Hildebrandt, Calculus of Variations II, p. 14. give the following consequence of what we have just written out. If f satisfies the above conditions and satisfies the equation

(1+f22)f11-2f1f2f12+(1+f12)f22=2H(1+f12+f22)3/2,

on Ω, where H is some constant, then

(1+f2(x)2)ρ(x)f22*(ϕ(x))+2f1(x)f2(x)ρ(x)f12*(ϕ(x))+(1+f1(x)2)ρ(x)f11*(ϕ(x))=2H(1+f1(x)2+f2(x)2)3/2

for all xΩ. Therefore,

(1+ξ22)ρ(ψ(ξ))f22*(ξ)+2ξ1ξ2ρ(ψ(ξ))f12*(ξ)+(1+ξ12)ρ(ψ(ξ))f11*(ξ)=2H(1+ξ12+ξ22)3/2

for all ξϕ(Ω), ξ=(ξ1,ξ2). In the case where H=0, then, dividing by ρ(ψ(ξ)), which is >0, we obtain

(1+ξ22)f22*(ξ)+2ξ1ξ2f12*(ξ)+(1+ξ12)f11*(ξ)=0.

In the case H=0, the equation satisfied by f is called the minimal surface equation, and we have thus found a partial differential equation satisfied by the Legendre transform of a solution of the minimal surface equation that satisfies the conditions we imposed at the start of the example. Writing the equation satisfied by f* in the form

Af11*+2Bf12*+Cf22*=0,

we have A=(1+ξ12), B=ξ1ξ2, C=(1+ξ22), with which

B2-AC=ξ12ξ22-(1+ξ12)(1+ξ22)=-1-ξ12-ξ22-1,

which means that partial differential equation satisfied by f* is elliptic.

7 Lagrangians and Hamiltonians

Theorem 6 states that if Ω is a convex domain in n, fC2(Ω), and for all xΩ the matrix (D2f)(x) is positive definite, then

f*(ξ)=ξψ(ξ)-f(ψ(ξ)),ξϕ(Ω).

Suppose that L:n×n× is a function such that for each qn and t, the function vL(q,v,t) satisfies the above conditions. Fix qn and t. With DL=(Lq,Lv,Lt) and ϕ(v)=Lv(q,v,t), ψ=ϕ-1, we have

L*(q,p,t)=pψ(p)-L(q,ψ(p),t),pϕ(n),

or with H=L*,

H(q,p,t)=pψ(p)-L(q,ψ(p),t),pϕ(n).

We have

Hq(q,p,t)=-Lq(q,ψ(p),t),

and

Hp(q,p,t) = ψ(p)+p(Dψ)(p)-Lv(q,ψ(p),t)(Dψ)(p)
= ψ(p)+p(Dψ)(p)-ϕ(ψ(p))(Dψ)(p)
= ψ(p)+p(Dψ)(p)-p(Dψ)(p)
= ψ(p),

and

Ht(q,p,t)=-Lt(q,ψ(p),t).

For a path (q(t),v(t),t) to satisfy the Euler-Lagrange equation means that

ddt(Lv(q(t),v(t),t))=Lq(q(t),v(t),t).

With p(t)=ϕ(v(t)), this yields

dpdt(t)=Lq(q(t),ψ(p(t)),t),

and hence

dpdt(t)=-Hq(q(t),p(t),t).

8 Physical units

First, if a Lagrangian L, L(q,v,t), has units J, then the Hamiltonian H=L*, H(q,p,t), has the same units J, and it follows that pψ(p) has units J. Second, [ψ(p)]=[v], and so [H]=[p][ψ(p)]=[p][v]. Therefore, [p][v]=J. If we take [v]=m/s, then this implies that [p]=kgm/s.

9 More books

V. I. Arnold, Mathematical Methods of Classical Mechanics, second ed., p. 61, §14; Ralph Abraham and Jerrold E. Marsden, Foundations of Mechanics, second ed., p. 218, §3.6; Jerrold E. Marsden and Tudor S. Ratiu, Introduction to Mechanics and Symmetry, second ed., p. 183, §7.2; Jürgen Jost and Xianqing Li-Jost, Calculus of Variations, chapter 4; David Yang Gao, Duality Principles in Nonconvex Systems: Theory, Methods and Applications.

10 History

As best as I can tell, the thing we call the Legendre transform is named after Legendre because of the following paper: Adrien-Marie Legendre, Mémoire sur l’intégration de quelques Équations aux différences partielles, Histoire de l’Académie royale des sciences (1787), 309–351. The following is a partial bibliography of works that refer to this paper of Legendre’s. No historical summary of the Legendre transform exists in the literature, and the following is presented as an aid to the preparation of one. To properly tell the story of the Legendre transform, one would be well served by carefully digging through sources and attentively reading Legendre’s original paper, and also by making oneself comfortable with how it appears in convex analysis, minimal surfaces, contact geometry, thermodynamics, etc. Such a comprehensive history would require meticulously scanning through Legendre’s monumental Traite on elliptic integrals lest relevant material is included there. The best biography of Legendre that exists is the one by Itard in the Dictionary of Scientific Biography, who mentions that something relevant to the Legendre transform appears in volume II of the 1826 Traite, concerning arc lengths. One should also scan through the work of Lagrange, including his 1788 Méchanique analitique, and the work of Euler on the calculus of variations.

Correspondance de Leonhard Euler avec A. C. Clairaut, J. d’Alembert et J. L. Lagrange, pp. 440–441, Note 6; S. S. Demidov, The study of partial differential equations of the first order in the 18th and 19th centuries, Arch. Hist. Exact Sci. 26 (1982), no. 4, 325–350; Erwin Kreyszig, On the Theory of Minimal Surfaces, The Problem of Plateau (Themistocles M. Rassias, ed.), 1992, 138–164, p. 145; Julian Lowell Coolidge, A History of Geometrical Methods, p. 377; Alfred Enneper, Bemerkungen über einige Flächen von constantem Krümmungsmaaß, Nachrichten von der Königl. Gesellschaft der Wissenschaften und der Georg-Augusts-Universität zu Göttingen (1876), 597–619, p. 614; Alfred Enneper, Ueber Flächen mit besonderen Meridiancurven, Abhandlungen der Königlichen Gesellschaft der Wissenschaften in Göttingen 29 (1882), 3–87, p. 6; Gaston Darboux, Leçons sur la théorie générale des surfaces, vol. 1, p. 271, §177; Édouard Goursat, Leçons sur l’intégration des équations aux dérivées partielles du second ordre, à deux variables indépendantes, tome 2, p. 32, chapter V, §113; René Taton, L’œuvre scientifique de Monge, p. 262; Karin Reich, Die Geschichte der Differentialgeometrie von Gauß bis Riemann (1828–1868), Arch. Hist. Exact Sci. 11 (1973), no. 4, 273–376, p. 315; Ivor Grattan-Guinness, Convolutions in French Mathematics, 1800-1840, vol. I, p. 152; Morris Kline, Mathematical Thought From Ancient to Modern Times, chapter 22; João Caramalho Domingues, Lacroix and the Calculus, p. 223; Ernst Hairer, Syvert Paul Nørsett and Gerhard Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems, p. 32; Paul Mansion, Théorie des équations aux dérivées partielles du premier ordre, p. 76; A. R. Forsyth, A Treatise on Differential Equations, sixth ed., pp. 418, 476; Lagrange, Méchanique analitique (1788), tome 1, partie 2, §IV; Ernesto Pascal, Die Variationsrechnung, p. 125; Bernhard Riemann, Ueber die Fläche vom kleinsten Inhalt bei gegebener Begrenzung; Courant and Hilbert, vol. II; Cornelius Lanczos, The Variational Principles of Mechanics, fourth ed., §VI.1; Ed. Combescure, Remarques sur un Mémoire de Legendre, Comptes rendus hebdomadaires des séances de l’Académie des sciences 74 (1872), 798–802; Johannes C. C. Nitsche, Vorlesungen über Minimalflächen, p. 147; A. W. Conway and J. L. Synge (ed.), The Mathematical Papers of Sir William Rowan Hamilton, vol. I, (1931), p. 474.