Gradients and Hessians in Hilbert spaces
1 Gradients
Let be a real Hilbert space. The Riesz representation theorem says that the mapping
is an isometric isomorphism. Let be a nonempty open subset of and let be differentiable, with derivative . The gradient of is the function defined by
Thus, for , is the unique element of satisfying
(1) |
Because is continuous, if then , being a composition of two continuous functions.
For example, let be a bounded self-adjoint operator on and define by
For ,
Thus
which shows that is differentiable at , with . Thus by (1), .
For example, let , let , and define by
We calculate that
For , define
It is proved11 1 cf. J.W. Neuberger, A Sequence of Problems on Semigroups, p. 51, Problem 195. that satisfies
For a function , we say that is Lipschitz if
The following is a useful inequality for functions whose gradients are Lipschitz.22 2 Juan Peypouquet, Convex Optimization in Normed Spaces: Theory, Methods and Examples, p. 15, Lemma 1.30.
Lemma 1.
If is differentiable and is Lipschitz, then
Proof.
Let and define by . By the chain rule, for ,
Thus by the fundamental theorem of calculus,
and so, using the Cauchy-Schwarz inequality and the fact that is Lipschitz,
proving the claim. ∎
2 Hessians
Let be a nonempty open subset of . We prove that if a function is then its gradient is .33 3 Rodney Coleman, Calculus on Normed Vector Spaces, p. 139, Theorem 6.5.
Theorem 2.
Let be an open subset of . If , then , and
(2) |
Proof.
That is means that is . That is, for all , the map is continuous at , there is such that
(3) |
as , and the map is continuous .
Let and let . Define by
Define , thus
It is straightforward that is linear. Because is an isometric isomorphism,
where is a bilinear form, with
showing that is a bounded linear operator with . For such that and for ,
so
Thus by (3),
as , and because , this means that is differentiable at , with . It remains to prove that is continuous , namely that is continuous. For and for with ,
and because is continuous on we get that is continuous on , completing the proof. ∎
If , we proved in the above theorem that . We call the derivative of the Hessian of ,44 4 cf. R. A. Tapia, The differentiation and integration of nonlinear operators, pp. 45–101, in Nonlinear Functional Analysis and Applications (Louis B. Rall, ed.)
and (2) then reads
Furthermore, it is a fact that if , then for each , the bilinear form
is symmetric.55 5 Serge Lang, Real and Functional Analysis, third ed., p. 344, Theorem 5.3. Thus, for and ,
Now, using that is symmetric as is a real Hilbert space, satisfies
so
Because this is true for all we have , and because this is true for all we have , i.e. is self-adjoint.
Theorem 3.
If is an open subset of and , then for each it is the case that is self-adjoint.
3 Critical points
For an open set in for , and for , we say that is a critical point of if . If is a critical point of , let we say that is a nondegenerate critical point of if is invertible. The Morse-Palais lemma66 6 Serge Lang, Differential and Riemannian Manifolds, p. 182, chapter VII, Theorem 5.1; Kung-ching Chang, Infinite Dimensional Morse Theory and Multiple Solution Problems, p. 33, Theorem 4.1; André Avez, Calcul différentiel, p. 87, §3; N. A. Bobylev, S. V. Emel’yanov, and S. K. Korovin, Geometrical Methods in Variational Problems, p. 360, Theorem 5.5.2; Hajime Urakawa, Calculus of Variations and Harmonic Maps, p. 87, chapter 3, §1, Theorem 1.10; Jean-Pierre Aubin and Ivar Ekeland, Applied Nonlinear Analysis, p. 52, Theorem 8; Melvyn S. Berger, Nonlinearity and Functional Analysis: Lectures on Nonlinear Problems in Mathemtical Analysis, p. 355, Theorem 6.5.4. states that if with , , and is a nondegenerate critical point of , then there is some open subset of with and a diffeomorphism , , such that
If is a critical point of a differentiable function , we call a critical value of . If and , Sard’s theorem tells us that the set of critical values of has Lebesgue measure and is meager.
For Banach spaces and , a Fredholm operator77 7 Martin Schechter, Principles of Functional Analysis, second ed., chapter 5. is a bounded linear operator such that (i) , (ii) is a closed subset of , and (iii) . The index of a Fredholm operator is
For a differentiable function , an open subset of , and for , . is a Fredholm operator if and only if . For a connected open subset of and for , we call a Fredholm map if is a Fredholm operator for each . It is a fact that for all , using that is connected. We denote this common value by . A generalization of Sard’s theorem by Smale here tells us that if is separable, is a connected open subset of , is a Fredholm map, and
then the set of critical values of is meager.88 8 Eberhard Zeidler, Nonlinear Functional Analysis and its Applications, IV: Applications to Mathematical Physics, p. 829, Theorem 78.A; Melvyn S. Berger, Nonlinearity and Functional Analysis: Lectures on Nonlinear Problems in Mathematical Analysis, p. 125, Theorem 3.1.45.
A function is said to satisfy the Palais-Smale condition if is a sequence in such that (i) is a bounded subset of and (ii) , then is a precompact subset of : every subsequence of itself has a Cauchy subsequence.
Often when speaking about ordinary differential equations in , we deal with differentiable functions whose derivatives are locally Lipschitz. has the Heine-Borel property: a subset of is compact if and only if is closed and bounded. In fact no infinite dimensional Banach space has the Heine-Borel property.99 9 Some Fréchet spaces have the Heine-Borel property, like the space of holomorphic functions on the open unit disc, which is what Montel’s theorem says. Thus a locally Lipschitz function need not be Lipschitz on a bounded subset of . (On a compact set, the set is covered by balls on which the function is Lipschitz, and then the function is Lipschitz on the compact set with Lipschitz constant equal to the maximum of finitely many Lipschitz constants on the balls.) We denote by the set of function that are differentiable and such that for each bounded subset of , the restriction of to is Lipschitz.
The mountain pass theorem1010 10 Lawrence C. Evans, Partial Differential Equations, p. 480, Theorem 2; Antonio Ambrosetti and David Arcoya Álvarez, An Introduction to Nonlinear Functional Analysis and Elliptic Problems, p. 48, §5.3. states that if (i) , (ii) satisfies the Palais-Smale condition, (iii) , (iv) there are such that when , and (v) there is some satisfying and , then
is a critical value of , where
4 Convexity
We prove that a critical point of a differentiable convex function on an open convex set is a minimum.1111 11 N. A. Bobylev, S. V. Emel’yanov, and S. K. Korovin, Geometrical Methods in Variational Problems, p. 39, Theorem 2.1.4.
Theorem 4.
If is an open convex set, is differentiable and convex, and is a critical point of , then for all .
Proof.
Because is convex, for ,
i.e.
Taking ,
and because is a critical point,
i.e. . ∎
We establish equivalent conditions for a differentiable function to be convex.1212 12 Juan Peypouquet, Convex Optimization in Normed Spaces: Theory, Methods and Examples, p. 38, Proposition 3.10.
Theorem 5.
If is an open convex subset of and is differentiable, then the following are equvialent:
-
1.
is convex.
-
2.
, .
-
3.
, .
Proof.
Suppose (1). For and , that is convex means , i.e.
and taking yields
i.e.
Suppose (2) and let , for which
Adding these inequalities,
Suppose (3), let , and define by
and , and for , using the chain rule gives
Let , let and , which both belong to because is convex, and so the above reads
so
And
so
But (3) tells us
so, as ,
showing that is nondecreasing. On the other hand, because and , by the mean value theorem there is some for which . Therefore, because is nondecreasing it holds that
and
That is, is nonincreasing on , and with this yields for , and is nondecreasing on , and with this yields for . Therefore for , which means that
showing that is convex. ∎
Theorem 6.
If is an open convex subset of and is twice differentiable, then the following are equivalent:
-
1.
is convex.
-
2.
, , .
Proof.
Suppose (2), let and define by
Applying the chain rule, for ,
i.e.
showing that is nondecreasing. In the proof of Theorem 5 we deduced from being nondecreasing and satisfying , , that is convex, and the same reasoning yields here that is convex. ∎
We call a function co-coercive if
We prove conditions under which the gradient of a differentiable convex function is co-coercive.1313 13 Juan Peypouquet, Convex Optimization in Normed Spaces: Theory, Methods and Examples, p. 40, Theorem 3.13.
Theorem 7 (Baillon-Haddad theorem).
Let be differentiable and convex and let . Then is Lipschitz if and only if is co-coercive.
Proof.
Suppose that is Lipschitz and for , define by
For and , because is convex,
showing that is convex. For ,1414 14 Henri Cartan, Differential Calculus, p. 29, Proposition 2.4.2.
and in particular . Thus by Theorem 4,
(4) |
For , by Lemma 1,
so
i.e.
and applying (4),
(5) |
Now,
so for each there is some with and
Let , and applying (5) with yields
Likewise, because does not change when and are switched,
Adding these inequalities,
i.e.
This is true for all , so
showing that is co-coercive.
Suppose that is co-coercive and let . Then applying the Cauchy-Schwarz inequality,
If then certainly . Otherwise, dividing by gives
showing that is Lipschitz. ∎