# Fréchet derivatives and Gâteaux derivatives

Jordan Bell
April 3, 2014

## 1 Introduction

In this note all vector spaces are real. If $X$ and $Y$ are normed spaces, we denote by $\mathscr{B}(X,Y)$ the set of bounded linear maps $X\to Y$, and write $\mathscr{B}(X)=\mathscr{B}(X,X)$. $\mathscr{B}(X,Y)$ is a normed space with the operator norm.

## 2 Remainders

If $X$ and $Y$ are normed spaces, let $o(X,Y)$ be the set of all maps $r:X\to Y$ for which there is some map $\alpha:X\to Y$ satisfying:

• $r(x)=\left\|x\right\|\alpha(x)$ for all $x\in X$,

• $\alpha(0)=0$,

• $\alpha$ is continuous at $0$.

Following Penot,11 1 Jean-Paul Penot, Calculus Without Derivatives, p. 133, §2.4. we call elements of $o(X,Y)$ remainders. It is immediate that $o(X,Y)$ is a vector space.

If $X$ and $Y$ are normed spaces, if $f:X\to Y$ is a function, and if $x_{0}\in X$, we say that $f$ is stable at $x_{0}$ if there is some $\epsilon>0$ and some $c>0$ such that $\left\|x-x_{0}\right\|\leq\epsilon$ implies that $\left\|f(x-x_{0})\right\|\leq c\left\|x-x_{0}\right\|$. If $T:X\to Y$ is a bounded linear map, then $\left\|Tx\right\|\leq\left\|T\right\|\left\|x\right\|$ for all $x\in X$, and thus a bounded linear map is stable at $0$. The following lemma shows that the composition of a remainder with a function that is stable at $0$ is a remainder.22 2 Jean-Paul Penot, Calculus Without Derivatives, p. 134, Lemma 2.41.

###### Lemma 1.

Let $X,Y$ be normed spaces and let $r\in o(X,Y)$. If $W$ is a normed space and $f:W\to X$ is stable at 0, then $r\circ f\in o(W,Y)$. If $Z$ is a normed space and $g:Y\to Z$ is stable at 0, then $g\circ r\in o(X,Z)$.

###### Proof.

$r\in o(X,Y)$ means that there is some $\alpha:X\to Y$ satisfying $r(x)=\left\|x\right\|\alpha(x)$ for all $x\in X$, that takes the value $0$ at $0$, and that is continuous at $0$. As $f$ is stable at $0$, there is some $\epsilon>0$ and some $c>0$ for which $\left\|w\right\|\leq\epsilon$ implies that $\left\|f(w)\right\|\leq c\left\|w\right\|$. Define $\beta:W\to Y$ by

 $\beta(w)=\begin{cases}\frac{\left\|f(w)\right\|}{\left\|w\right\|}\alpha(f(w))% &w\neq 0\\ 0&w=0,\end{cases}$

for which we have

 $(r\circ f)(w)=\left\|w\right\|\beta(w),\qquad w\in W.$

If $\left\|w\right\|\leq\epsilon$, then $\left\|\beta(w)\right\|\leq c\left\|\alpha(f(w))\right\|$. But $f(w)\to 0$ as $w\to 0$, and because $\alpha$ is continuous at $0$ we get that $\alpha(f(w))\to\alpha(0)=0$ as $w\to 0$. So the above inequality gives us $\beta(w)\to 0$ as $w\to 0$. As $\beta(0)=0$, the function $\beta:W\to Y$ is continuous at $0$, and therefore $r\circ f$ is remainder.

As $g$ is stable at $0$, there is some $\epsilon>0$ and some $c>0$ for which $\left\|y\right\|\leq\epsilon$ implies that $\left\|g(y)\right\|\leq c\left\|y\right\|$. Define $\gamma:X\to Z$ by

 $\gamma(x)=\begin{cases}\frac{g(\left\|x\right\|\alpha(x))}{\left\|x\right\|}&x% \neq 0\\ 0&x=0.\end{cases}$

For all $x\in X$,

 $(g\circ r)(x)=g(\left\|x\right\|\alpha(x))=\left\|x\right\|\gamma(x).$

Since $\alpha(0)=0$ and $\alpha$ is continuous at $0$, there is some $\delta>0$ such that $\left\|x\right\|\leq\delta$ implies that $\left\|\alpha(x)\right\|\leq\epsilon$. Therefore, if $\left\|x\right\|\leq\delta\wedge 1$ then

 $\left\|g(\left\|x\right\|\alpha(x))\right\|\leq c\left\|x\right\|\left\|\alpha% (x)\right\|\leq c\left\|x\right\|\epsilon,$

and hence if $\left\|x\right\|\leq\delta\wedge 1$ then $\left\|\gamma(x)\right\|\leq c\epsilon$. This shows that $\gamma(x)\to 0$ as $x\to 0$, and since $\gamma(0)=0$ the function $\gamma:X\to Z$ is continuous at $0$, showing that $g\circ r$ is a remainder. ∎

If $Y_{1},\ldots,Y_{n}$ are normed spaces where $Y_{k}$ has norm $\left\|\cdot\right\|_{k}$, then $\left\|(y_{1},\ldots,y_{n})\right\|=\max_{1\leq k\leq n}\left\|y_{k}\right\|_{k}$ is a norm on $\prod_{k=1}^{n}Y_{k}$, and one can prove that the topology induced by this norm is the product topology.

###### Lemma 2.

If $X$ and $Y_{1},\ldots,Y_{n}$ are normed spaces, then a function $r:X\to\prod_{k=1}^{n}Y_{k}$ is a remainder if and only if each of $r_{k}:X\to Y_{k}$ are remainders, $1\leq k\leq n$, where $r(x)=(r_{1}(x),\ldots,r_{n}(x))$ for all $x\in X$.

###### Proof.

Suppose that there is some function $\alpha:X\to\prod_{k=1}^{n}Y_{k}$ such that $r(x)=\left\|x\right\|\alpha(x)$ for all $x\in X$. With $\alpha(x)=(\alpha_{1}(x),\ldots,\alpha_{n}(x))$, we have

 $r_{k}(x)=\left\|x\right\|\alpha_{k}(x),\qquad x\in X.$

Because $\alpha(x)\to 0$ as $x\to 0$, for each $k$ we have $\alpha_{k}(x)\to 0$ as $x\to 0$, which shows that $r_{k}$ is a remainder.

Suppose that each $r_{k}$ is a remainder. Thus, for each $k$ there is a function $\alpha_{k}:X\to Y_{k}$ satisfying $r_{k}(x)=\left\|x\right\|\alpha_{k}(x)$ for all $x\in X$ and $\alpha_{k}(x)\to 0$ as $x\to 0$. Then the function $\alpha:X\to\prod_{k=1}^{n}Y_{k}$ defined by $\alpha(x)=(\alpha_{1}(x),\ldots,\alpha_{n}(x))$ satisfies $r(x)=\left\|x\right\|\alpha(x)$. Because $\alpha_{k}(x)\to 0$ as $x\to 0$ for each of the finitely many $k$, $1\leq k\leq n$, we have $\alpha(x)\to 0$ as $x\to 0$. ∎

## 3 Definition and uniqueness of Fréchet derivative

Suppose that $X$ and $Y$ are normed spaces, that $U$ is an open subset of $X$, and that $x_{0}\in U$. A function $f:U\to Y$ is said to be Fréchet differentiable at $x_{0}$ if there is some $L\in\mathscr{B}(X,Y)$ and some $r\in o(X,Y)$ such that

 $f(x)=f(x_{0})+L(x-x_{0})+r(x-x_{0}),\qquad x\in U.$ (1)

Suppose there are bounded linear maps $L_{1},L_{2}$ and remainders $r_{1},r_{2}$ that satisfy the above. Writing $r_{1}(x)=\left\|x\right\|\alpha_{1}(x)$ and $r_{2}(x)=\left\|x\right\|\alpha_{2}(x)$ for all $x\in X$, we have

 $L_{1}(x-x_{0})+\left\|x-x_{0}\right\|\alpha_{1}(x-x_{0})=L_{2}(x-x_{0})+\left% \|x-x_{0}\right\|\alpha_{2}(x-x_{0}),\qquad x\in U,$

i.e.,

 $L_{1}(x-x_{0})-L_{2}(x-x_{0})=\left\|x-x_{0}\right\|(\alpha_{2}(x-x_{0})-% \alpha_{1}(x-x_{0})),\qquad x\in U.$

For $x\in X$, there is some $h>0$ such that for all $|t|\leq h$ we have $x_{0}+tx\in U$, and then

 $L_{1}(tx)-L_{2}(tx)=\left\|tx\right\|(\alpha_{2}(tx)-\alpha_{1}(tx)),$

hence, for $0<|t|\leq h$,

 $L_{1}(x)-L_{2}(x)=\left\|x\right\|(\alpha_{2}(tx)-\alpha_{1}(tx)).$

But $\alpha_{2}(tx)-\alpha_{1}(tx)\to 0$ as $t\to 0$, which implies that $L_{1}(x)-L_{2}(x)=0$. As this is true for all $x\in X$, we have $L_{1}=L_{2}$ and then $r_{1}=r_{2}$. If $f$ is Fréchet differentiable at $x_{0}$, the bounded linear map $L$ in (1) is called the Fréchet derivative of $f$ at $x_{0}$, and we define $Df(x_{0})=L$. Thus,

 $f(x)=f(x_{0})+Df(x_{0})(x-x_{0})+r(x-x_{0}),\qquad x\in U.$

If $U_{0}$ is the set of those points in $U$ at which $f$ is Fréchet differentiable, then $Df:U_{0}\to\mathscr{B}(X,Y)$.

Suppose that $X$ and $Y$ are normed spaces and that $U$ is an open subset of $X$. We denote by $C^{1}(U,Y)$ the set of functions $f:U\to Y$ that are Fréchet differentiable at each point in $U$ and for which the function $Df:U\to\mathscr{B}(X,Y)$ is continuous. We say that an element of $C^{1}(U,Y)$ is continuously differentiable. We denote by $C^{2}(U,Y)$ those elements $f$ of $C^{1}(U,Y)$ such that

 $Df\in C^{1}(U,\mathscr{B}(X,Y));$

that is, $C^{2}(U,Y)$ are those $f\in C^{1}(U,Y)$ such that the function $Df:U\to\mathscr{B}(X,Y)$ is Fréchet differentiable at each point in $U$ and such that the function

 $D(Df):U\to\mathscr{B}(X,\mathscr{B}(X,Y))$

is continuous.33 3 See Henri Cartan, Differential Calculus, p. 58, §5.1, and Jean Dieudonné, Foundations of Modern Analysis, enlarged and corrected printing, p. 179, Chapter VIII, §12.

The following theorem characterizes continuously differentiable functions $\mathbb{R}^{n}\to\mathbb{R}^{m}$.44 4 Henri Cartan, Differential Calculus, p. 36, §2.7.

###### Theorem 3.

Suppose that $f:\mathbb{R}^{n}\to\mathbb{R}^{m}$ is Fréchet differentiable at each point in $\mathbb{R}^{n}$, and write

 $f=(f_{1},\ldots,f_{m}).$

$f\in C^{1}(\mathbb{R}^{n},\mathbb{R}^{m})$ if and only if for each $1\leq i\leq m$ and $1\leq j\leq n$ the function

 $\frac{\partial f_{i}}{\partial x_{j}}:\mathbb{R}^{n}\to\mathbb{R}$

is continuous.

## 4 Properties of the Fréchet derivative

If $f:X\to Y$ is Fréchet differentiable at $x_{0}$, then because a bounded linear map is continuous and in particular continuous at $0$, and because a remainder is continuous at $0$, we get that $f$ is continuous at $x_{0}$.

We now prove that Fréchet differentiation at a point is linear.

###### Lemma 4 (Linearity).

Let $X$ and $Y$ be normed spaces, let $U$ be an open subset of $X$ and let $x_{0}\in U$. If $f_{1},f_{2}:U\to Y$ are both Fréchet differentiable at $x_{0}$ and if $\alpha\in\mathbb{R}$, then $\alpha f_{1}+f_{2}$ is Fréchet differentiable at $x_{0}$ and

 $D(\alpha f_{1}+f_{2})(x_{0})=\alpha Df_{1}(x_{0})+Df_{2}(x_{0}).$
###### Proof.

There are remainders $r_{1},r_{2}\in o(X,Y)$ such that

 $f_{1}(x)=f_{1}(x_{0})+Df_{1}(x_{0})(x-x_{0})+r_{1}(x-x_{0}),\qquad x\in U,$

and

 $f_{2}(x)=f_{2}(x_{0})+Df_{2}(x_{0})(x-x_{0})+r_{2}(x-x_{0}),\qquad x\in U.$

Then for all $x\in U$,

 $\displaystyle(\alpha f_{1}+f_{2})(x)-(\alpha f_{1}+f_{2})(x_{0})$ $\displaystyle=$ $\displaystyle\alpha f_{1}(x)-\alpha f_{1}(x_{0})+f_{2}(x)-f_{2}(x_{0})$ $\displaystyle=$ $\displaystyle\alpha Df_{1}(x_{0})(x-x_{0})+\alpha r_{1}(x-x_{0})$ $\displaystyle+Df_{2}(x_{0})(x-x_{0})+r_{2}(x-x_{0})$ $\displaystyle=$ $\displaystyle(\alpha Df_{1}(x_{0})+Df_{2}(x_{0}))(x-x_{0})$ $\displaystyle+(\alpha r_{1}+r_{2})(x-x_{0}),$

and $\alpha r_{1}+r_{2}\in o(X,Y)$. ∎

The following lemma gives an alternate characterization of a function being Fréchet differentiable at a point.55 5 Jean-Paul Penot, Calculus Without Derivatives, p. 136, Lemma 2.46.

###### Lemma 5.

Suppose that $X$ and $Y$ are normed space, that $U$ is an open subset of $X$, and that $x_{0}\in U$. A function $f:U\to Y$ is Fréchet differentiable at $x_{0}$ if and only if there is some function $F:U\to\mathscr{B}(X,Y)$ that is continuous at $x_{0}$ and for which

 $f(x)-f(x_{0})=F(x)(x-x_{0}),\qquad x\in U.$
###### Proof.

Suppose that there is a function $F:U\to\mathscr{B}(X,Y)$ that is continuous at $x_{0}$ and that satisfies $f(x)-f(x_{0})=F(x)(x-x_{0})$ for all $x\in U$. Then, for $x\in U$,

 $\displaystyle f(x)-f(x_{0})$ $\displaystyle=$ $\displaystyle F(x)(x-x_{0})-F(x_{0})(x-x_{0})+F(x_{0})(x-x_{0})$ $\displaystyle=$ $\displaystyle F(x_{0})(x-x_{0})+r(x-x_{0}),$

where $r:X\to Y$ is defined by

 $r(x)=\begin{cases}(F(x+x_{0})-F(x_{0}))(x)&x+x_{0}\in U\\ 0&x+x_{0}\not\in U.\end{cases}$

We further define

 $\alpha(x)=\begin{cases}\frac{(F(x+x_{0})-F(x_{0}))(x)}{\left\|x\right\|}&x+x_{% 0}\in U,x\neq 0\\ 0&x+x_{0}\not\in U\\ 0&x=0,\end{cases}$

with which $r(x)=\left\|x\right\|\alpha(x)$ for all $x\in X$. To prove that $r$ is a remainder it suffices to prove that $\alpha(x)\to 0$ as $x\to 0$. Let $\epsilon>0$. That $F:U\to\mathscr{B}(X,Y)$ is continuous at $x_{0}$ tells us that there is some $\delta>0$ for which $\left\|x\right\|<\delta$ implies that $\left\|F(x+x_{0})-F(x_{0})\right\|<\epsilon$ and hence

 $\left\|(F(x+x_{0})-F(x_{0}))(x)\right\|\leq\left\|F(x+x_{0})-F(x_{0})\right\|% \left\|x\right\|<\epsilon\left\|x\right\|.$

Therefore, if $\left\|x\right\|<\delta$ then $\left\|\alpha(x)\right\|<\epsilon$, which establishes that $r$ is a remainder and therefore that $f$ is Fréchet differentiable at $x_{0}$, with Fréchet derivative $Df(x_{0})=F(x_{0})$.

Suppose that $f$ is Fréchet differentiable at $x_{0}$: there is some $r\in o(X,Y)$ such that

 $f(x)=f(x_{0})+Df(x_{0})(x-x_{0})+r(x-x_{0}),\qquad x\in U,$

where $Df(x_{0})\in\mathscr{B}(X,Y)$. As $r$ is a remainder, there is some $\alpha:X\to Y$ satisfying $r(x)=\left\|x\right\|\alpha(x)$ for all $x\in X$, and such that $\alpha(0)=0$ and $\alpha(x)\to 0$ as $x\to 0$. For each $x\in X$, by the Hahn-Banach extension theorem66 6 Walter Rudin, Functional Analysis, second ed., p. 59, Corollary to Theorem 3.3. there is some $\lambda_{x}\in X^{*}$ such that $\lambda_{x}x=\left\|x\right\|$ and $|\lambda_{x}v|\leq\left\|v\right\|$ for all $v\in X$. Thus,

 $r(x)=(\lambda_{x}x)\alpha(x),\qquad x\in X.$

Define $F:U\to\mathscr{B}(X,Y)$ by

 $F(x)=Df(x_{0})+(\lambda_{x-x_{0}})\alpha(x-x_{0}),$

i.e. for $x\in U$ and $v\in X$,

 $F(x)(v)=Df(x_{0})(v)+(\lambda_{x-x_{0}}v)\alpha(x-x_{0})\in Y.$

Then for $x\in U$,

 $r(x-x_{0})=(\lambda_{x-x_{0}}(x-x_{0}))\alpha(x-x_{0})=F(x)(x-x_{0})-Df(x_{0})% (x-x_{0}),$

and hence

 $f(x)=f(x_{0})+F(x)(x-x_{0}),\qquad x\in U.$

To complete the proof it suffices to prove that $F$ is continuous at $x_{0}$. But both $\lambda_{0}=0$ and $\alpha(0)=0$ so $F(x_{0})=Df(x_{0})$, and for $x\in U$ and $v\in X$,

 $\displaystyle\left\|(F(x)-F(x_{0}))(v)\right\|$ $\displaystyle=$ $\displaystyle\left\|(\lambda_{x-x_{0}}v)\alpha(x-x_{0})\right\|$ $\displaystyle=$ $\displaystyle|\lambda_{x-x_{0}}v|\left\|\alpha(x-x_{0})\right\|$ $\displaystyle\leq$ $\displaystyle\left\|v\right\|\left\|\alpha(x-x_{0})\right\|,$

so $\left\|F(x)-F(x_{0})\right\|\leq\left\|\alpha(x-x_{0})\right\|$. From this and the fact that $\alpha(0)=0$ and $\alpha(x)\to 0$ as $x\to 0$ we get that $F$ is continuous at $x_{0}$, completing the proof. ∎

We now prove the chain rule for Fréchet derivatives.77 7 Jean-Paul Penot, Calculus Without Derivatives, p. 136, Theorem 2.47.

###### Theorem 6 (Chain rule).

Suppose that $X,Y,Z$ are normed spaces and that $U$ and $V$ are open subsets of $X$ and $Y$ respectively. If $f:U\to Y$ satisfies $f(U)\subseteq V$ and is Fréchet differentiable at $x_{0}$ and if $g:V\to Z$ is Fréchet differentiable at $f(x_{0})$, then $g\circ f:U\to Z$ is Fréchet differentiable at $x_{0}$, and its Fréchet derivative at $x_{0}$ is

 $D(g\circ f)(x_{0})=Dg(f(x_{0}))\circ Df(x_{0}).$
###### Proof.

Write $y_{0}=f(x_{0})$, $L_{1}=Df(x_{0})$, and $L_{2}=Dg(y_{0})$. Because $f$ is Fréchet differentiable at $x_{0}$, there is some $r_{1}\in o(X,Y)$ such that

 $f(x)=f(x_{0})+L_{1}(x-x_{0})+r_{1}(x-x_{0}),\qquad x\in U,$

and because $g$ is Fréchet differentiable at $y_{0}$ there is some $r_{2}\in o(Y,Z)$ such that

 $g(y)=g(y_{0})+L_{2}(y-y_{0})+r_{2}(y-y_{0}),\qquad y\in V.$

For all $x\in U$ we have $f(x)\in V$, and using the above formulas,

 $\displaystyle g(f(x))$ $\displaystyle=g(y_{0})+L_{2}(f(x)-y_{0})+r_{2}(f(x)-y_{0})$ $\displaystyle=g(y_{0})+L_{2}\Big{(}L_{1}(x-x_{0})+r_{1}(x-x_{0})\Big{)}+r_{2}% \Big{(}L_{1}(x-x_{0})+r_{1}(x-x_{0})\Big{)}$ $\displaystyle=g(y_{0})+L_{2}(L_{1}(x-x_{0}))+L_{2}(r_{1}(x-x_{0}))+r_{2}\Big{(% }L_{1}(x-x_{0})+r_{1}(x-x_{0})\Big{)}.$

Define $r_{3}:X\to Z$ by $r_{3}(x)=r_{2}(L_{1}x+r_{1}(x))$, and fix any $c>\left\|L_{1}\right\|$. Writing $r_{1}(x)=\left\|x\right\|\alpha_{1}(x)$, the fact that $\alpha(0)=0$ and that $\alpha$ is continuous at $0$ gives us that there is some $\delta>0$ such that if $\left\|x\right\|<\delta$ then $\left\|\alpha(x)\right\|, and hence if $\left\|x\right\|<\delta$ then $\left\|r_{1}(x)\right\|\leq(c-\left\|L_{1}\right\|)\left\|x\right\|$. Then, $\left\|x\right\|<\delta$ implies that

 $\left\|L_{1}x+r_{1}(x)\right\|\leq\left\|L_{1}x\right\|+\left\|r_{1}(x)\right% \|\leq\left\|L_{1}\right\|\left\|x\right\|+(c-\left\|L_{1}\right\|)\left\|x% \right\|=c\left\|x\right\|.$

This shows that $x\mapsto L_{1}x+r_{1}(x)$ is stable at $0$ and so by Lemma 1 that $r_{3}\in o(X,Z)$. Then, $r:X\to Z$ defined by $r=L_{1}\circ r_{1}+r_{3}$ is a sum of two remainders and so is itself a remainder, and we have

 $g\circ f(x)=g\circ f(x_{0})+L_{2}\circ L_{1}(x-x_{0})+r(x-x_{0}),\qquad x\in U.$

But $L_{1}\in\mathscr{B}(X,Y)$ and $L_{2}\in\mathscr{B}(Y,Z)$, so $L_{2}\circ L_{1}\in\mathscr{B}(X,Z)$. This shows that $g\circ f$ is Fréchet differentiable at $x_{0}$ and that its Fréchet derivative at $x_{0}$ is

 $L_{2}\circ L_{1}=Dg(y_{0})\circ Df(x_{0})=Dg(f(x_{0}))\circ Df(x_{0}).$

The following is the product rule for Fréchet derivatives. By $f_{1}\cdot f_{2}$ we mean the function $x\mapsto f_{1}(x)f_{2}(x)$.

###### Theorem 7 (Product rule).

Suppose that $X$ is a normed space, that $U$ is an open subset of $X$, that $f_{1},f_{2}:U\to\mathbb{R}$ are functions, and that $x_{0}\in U$. If $f_{1}$ and $f_{2}$ are both Fréchet differentiable at $x_{0}$, then $f_{1}\cdot f_{2}$ is Fréchet differentiable at $x_{0}$, and its Fréchet derivative at $x_{0}$ is

 $D(f_{1}\cdot f_{2})(x_{0})=f_{2}(x_{0})Df_{1}(x_{0})+f_{1}(x_{0})Df_{2}(x_{0}).$
###### Proof.

There are $r_{1},r_{2}\in o(X,\mathbb{R})$ with which

 $f_{1}(x)=f_{1}(x_{0})+Df_{1}(x_{0})(x-x_{0})+r_{1}(x-x_{0}),\qquad x\in U$

and

 $f_{2}(x)=f_{2}(x_{0})+Df_{2}(x_{0})(x-x_{0})+r_{2}(x-x_{0}),\qquad x\in U.$

Multiplying the above two formulas,

 $\displaystyle f_{1}(x)f_{2}(x)$ $\displaystyle=$ $\displaystyle f_{1}(x_{0})f_{2}(x_{0})+f_{2}(x_{0})Df_{1}(x_{0})(x-x_{0})+f_{1% }(x_{0})Df_{2}(x_{0})(x-x_{0})$ $\displaystyle+Df_{1}(x_{0})(x-x_{0})Df_{2}(x_{0})(x-x_{0})+r_{1}(x-x_{0})r_{2}% (x-x_{0})$ $\displaystyle+f_{1}(x_{0})r_{2}(x-x_{0})+r_{2}(x-x_{0})Df_{1}(x_{0})(x-x_{0})$ $\displaystyle+f_{2}(x_{0})r_{1}(x-x_{0})+r_{1}(x-x_{0})Df_{2}(x_{0})(x-x_{0}).$

Define $r:X\to\mathbb{R}$ by

 $\displaystyle r(x)$ $\displaystyle=$ $\displaystyle Df_{1}(x_{0})xDf_{2}(x_{0})x+r_{1}(x)r_{2}(x)+f_{1}(x_{0})r_{2}(% x)+r_{2}(x)Df_{1}(x_{0})x$ $\displaystyle+f_{2}(x_{0})r_{1}(x)+r_{1}(x)Df_{2}(x_{0})x,$

for which we have, for $x\in U$,

 $f_{1}(x)f_{2}(x)=f_{1}(x_{0})f_{2}(x_{0})+f_{2}(x_{0})Df_{1}(x_{0})(x-x_{0})+f% _{1}(x_{0})Df_{2}(x_{0})(x-x_{0})+r(x-x_{0}).$

Therefore, to prove the claim it suffices to prove that $r\in o(X,\mathbb{R})$. Define $\alpha:X\to\mathbb{R}$ by $\alpha(0)=0$ and $\alpha(x)=\frac{Df_{1}(x_{0})xDf_{2}(x_{0})x}{\left\|x\right\|}$ for $x\neq 0$. For $x\neq 0$,

 $\displaystyle|\alpha(x)|$ $\displaystyle=$ $\displaystyle\frac{|Df_{1}(x_{0})x||Df_{2}(x_{0})x|}{\left\|x\right\|}$ $\displaystyle\leq$ $\displaystyle\frac{\left\|Df_{1}(x_{0})\right\|\left\|x\right\|\left\|Df_{2}(x% _{0})\right\|\left\|x\right\|}{\left\|x\right\|}$ $\displaystyle=$ $\displaystyle\left\|Df_{1}(x_{0})\right\|\left\|Df_{2}(x_{0})\right\|\left\|x% \right\|.$

Thus $\alpha(x)\to 0$ as $x\to 0$, showing that the first term in the expression for $r$ belongs to $o(X,\mathbb{R})$. Likewise, each of the other five terms in the expression for $r$ belongs to $o(X,\mathbb{R})$, and hence $r\in o(X,\mathbb{R})$, completing the proof. ∎

## 5 Dual spaces

If $X$ is a normed space, we denote by $X^{*}$ the set of bounded linear maps $X\to\mathbb{R}$, i.e. $X^{*}=\mathscr{B}(X,\mathbb{R})$. $X^{*}$ is itself a normed space with the operator norm. If $X$ is a normed space, the dual pairing $\left\langle\cdot,\cdot\right\rangle:X\times X^{*}\to\mathbb{R}$ is

 $\left\langle x,\psi\right\rangle=\psi(x),\qquad x\in X,\psi\in X^{*}.$

If $U$ is an open subset of $X$ and if a function $f:U\to\mathbb{R}$ is Fréchet differentiable at $x_{0}\in U$, then $Df(x_{0})$ is a bounded linear map $X\to\mathbb{R}$, and so belongs to $X^{*}$. If $U_{0}$ are those points in $U$ at which $f:U\to\mathbb{R}$ is Fréchet differentiable, then

 $Df:U_{0}\to X^{*}.$

In the case that $X$ is a Hilbert space with inner product $\left\langle\cdot,\cdot\right\rangle$, the Riesz representation theorem shows that $R:X\to X^{*}$ defined by $R(x)(y)=\left\langle y,x\right\rangle$ is an isometric isomorphism. If $f:U\to\mathbb{R}$ is Fréchet differentiable at $x_{0}\in U$, then we define

 $\nabla f(x_{0})=R^{-1}(Df(x_{0})),$

and call $\nabla f(x_{0})\in X$ the gradient of $f$ at $x_{0}$. With $U_{0}$ denoting the set of those points in $U$ at which $f$ is Fréchet differentiable,

 $\nabla f:U_{0}\to X.$

(To define the gradient we merely used that $R$ is a bijection, but to prove properties of the gradient one uses that $R$ is an isometric isomorphism.)

Example. Let $X$ be a Hilbert space, $A\in\mathscr{B}(X)$, $v\in X$, and define

 $f(x)=\left\langle Ax,x\right\rangle-\left\langle x,v\right\rangle,\qquad x\in X.$

For all $x_{0},x\in X$ we have, because the inner product of a real Hilbert space is symmetric,

 $\displaystyle f(x)-f(x_{0})$ $\displaystyle=$ $\displaystyle\left\langle Ax,x\right\rangle-\left\langle x,v\right\rangle-% \left\langle Ax_{0},x_{0}\right\rangle+\left\langle x_{0},v\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle Ax,x\right\rangle-\left\langle Ax_{0},x\right\rangle% +\left\langle Ax_{0},x\right\rangle-\left\langle Ax_{0},x_{0}\right\rangle-% \left\langle x-x_{0},v\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle A(x-x_{0}),x\right\rangle+\left\langle Ax_{0},x-x_{0% }\right\rangle-\left\langle x-x_{0},v\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle x-x_{0},A^{*}x\right\rangle+\left\langle x-x_{0},Ax_% {0}\right\rangle-\left\langle x-x_{0},v\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle x-x_{0},A^{*}x+Ax_{0}-v\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle x-x_{0},A^{*}x-A^{*}x_{0}+A^{*}x_{0}+Ax_{0}-v\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle x-x_{0},(A^{*}+A)x_{0}-v\right\rangle+\left\langle x% -x_{0},A^{*}(x-x_{0})\right\rangle.$

With $Df(x_{0})(x-x_{0})=\left\langle x-x_{0},(A^{*}+A)x_{0}-v\right\rangle$, or $Df(x_{0})(x)=\left\langle x,(A^{*}+A)x_{0}-v\right\rangle$, we have that $f$ is Fréchet differentiable at each $x_{0}\in X$. Furthermore, its gradient at $x_{0}$ is

 $\nabla f(x_{0})=(A^{*}+A)x_{0}-v.$

For each $x_{0}\in X$, the function $f:X\to\mathbb{R}$ is Fréchet differentiable at $x_{0}$, and thus

 $Df:X\to X^{*},$

and we can ask at what points $Df$ has a Fréchet derivative. For $x_{0},x,y\in X$,

 $\displaystyle(Df(x)-Df(x_{0}))(y)=$ $\displaystyle\left\langle y,(A^{*}+A)x-v\right\rangle-\left\langle y,(A^{*}+A)% x_{0}-v\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle y,(A^{*}+A)(x-x_{0})\right\rangle.$

For $D(Df)(x_{0})(x-x_{0})(y)=\left\langle y,(A^{*}+A)(x-x_{0})\right\rangle$, in other words with

 $D^{2}f(x_{0})(x)(y)=D(Df)(x_{0})(x)(y)=\left\langle y,(A^{*}+A)x\right\rangle,$

we have that $Df$ is Fréchet differentiable at each $x_{0}\in X$. Thus

 $D^{2}f:X\to\mathscr{B}(X,X^{*}).$

Because $D^{2}f(x_{0})$ does not depend on $x_{0}$, it is Fréchet differentiable at each point in $X$, with $D^{3}f(x_{0})=0$ for all $x_{0}\in X$. Here $D^{3}f:X\to\mathscr{B}(X,\mathscr{B}(X,X^{*}))$.

## 6 Gâteaux derivatives

Let $X$ and $Y$ be normed spaces, let $U$ be an open subset of $X$, let $f:U\to Y$ be a function, and let $x_{0}\in U$. If there is some $T\in\mathscr{B}(X,Y)$ such that for all $v\in X$ we have

 $\lim_{t\to 0}\frac{f(x_{0}+tv)-f(x_{0})}{t}=Tv,$ (2)

then we say that $f$ is Gâteaux differentiable at $x_{0}$ and call $T$ the Gâteaux derivative of $f$ at $x_{0}$.88 8 Our definition of the Gâteaux derivative follows Jean-Paul Penot, Calculus Without Derivatives, p. 127, Definition 2.23. It is apparent that there is at most one $T\in\mathscr{B}(X,Y)$ that satisfies (2) for all $v\in X$. We write $f^{\prime}(x_{0})=T$. Thus, $f^{\prime}$ is a map from the set of points in $U$ at which $f$ is Gâteaux differentiable to $\mathscr{B}(X,Y)$. If $V\subseteq U$ and $f$ is Gâteaux differentiable at each element of $V$, we say that $f$ is Gâteaux differentiable on $V$.

Example. Define $f:\mathbb{R}^{2}\to\mathbb{R}$ by $f(x_{1},x_{2})=\frac{x_{1}^{4}x_{2}}{x_{1}^{6}+x_{2}^{3}}$ for $(x_{1},x_{2})\neq(0,0)$ and $f(0,0)=0$. For $v=(v_{1},v_{2})\in\mathbb{R}^{2}$ and $t\neq 0$,

 $\frac{f(0+tv)-f(0)}{t}=\frac{f(tv_{1},tv_{2})}{t}=\begin{cases}\frac{1}{t}% \cdot\frac{t^{5}v_{1}^{4}v_{2}}{t^{6}v_{1}^{6}+t^{3}v_{2}^{3}}&v\neq(0,0)\\ 0&v=(0,0)\end{cases}=\begin{cases}\frac{tv_{1}^{4}v_{2}}{t^{3}v_{1}^{6}+v_{2}^% {3}}&v\neq(0,0)\\ 0&v=(0,0).\end{cases}$

Hence, for any $v\in\mathbb{R}^{2}$, we have $\frac{f(0+tv)-f(0)}{t}\to 0$ as $t\to 0$. Therefore, $f$ is Gâteaux differentiable at $(0,0)$ and $f^{\prime}(0,0)v=0\in\mathbb{R}$ for all $v\in\mathbb{R}^{2}$, i.e. $f^{\prime}(0,0)=0$. However, for $(x_{1},x_{2})\neq(0,0)$,

 $f(x_{1},x_{1}^{2})=\frac{x_{1}^{6}}{x_{1}^{6}+x_{1}^{6}}=\frac{1}{2},$

from which it follows that $f$ is not continuous at $(0,0)$. We stated in §4 that if a function is Fréchet differentiable at a point then it is continuous at that point, and so $f$ is not Fréchet differentiable at $(0,0)$. Thus, a function that is Gâteaux differentiable at a point need not be Fréchet differentiable at that point.

We prove that being Fréchet differentiable at a point implies being Gâteaux differentiable at the point, and that in this case the Gâteaux derivative is equal to the Fréchet derivative.

###### Theorem 8.

Suppose that $X$ and $Y$ are normed spaces, that $U$ is an open subset of $X$, that $f\in Y^{U}$, and that $x_{0}\in U$. If $f$ is Fréchet differentiable at $x_{0}$, then $f$ is Gâteaux differentiable at $x_{0}$ and $f^{\prime}(x_{0})=Df(x_{0})$.

###### Proof.

Because $f$ is Fréchet differentiable at $x_{0}$, there is some $r\in o(X,Y)$ for which

 $f(x)=f(x_{0})+Df(x_{0})(x-x_{0})+r(x-x_{0}),\qquad x\in U.$

For $v\in X$ and nonzero $t$ small enough that $x_{0}+tv\in U$,

 $\frac{f(x_{0}+tv)-f(x_{0})}{t}=\frac{Df(x_{0})(x_{0}+tv-x_{0})+r(x_{0}+tv-x_{0% })}{t}=\frac{tDf(x_{0})v+r(tv)}{t}.$

Writing $r(x)=\left\|x\right\|\alpha(x)$,

 $\frac{f(x_{0}+tv)-f(x_{0})}{t}=\frac{tDf(x_{0})+\left\|tv\right\|\alpha(tv)}{t% }=Df(x_{0})v+\left\|v\right\|\alpha(tv).$

Hence,

 $\lim_{t\to 0}\frac{f(x_{0}+tv)-f(x_{0})}{t}=Df(x_{0})v.$

This holds for all $v\in X$, and as $Df(x_{0})\in\mathscr{B}(X,Y)$ we get that $f$ is Gâteaux differentiable at $x_{0}$ and that $f^{\prime}(x_{0})=Df(x_{0})$. ∎

If $X$ is a vector space and $u,v\in X$, let

 $[u,v]=\{(1-t)u+tv:0\leq t\leq 1\},$

namely, the line segment joining $u$ and $v$. The following is a mean value theorem for Gâteaux derivatives.99 9 Antonio Ambrosetti and Giovanni Prodi, A Primer of Nonlinear Analysis, p. 13, Theorem 1.8.

###### Theorem 9 (Mean value theorem).

Let $X$ and $Y$ be normed spaces, let $U$ be an open subset of $X$, and let $f:U\to Y$ be Gâteaux differentiable on $U$. If $u,v\in U$ and $[u,v]\subset U$, then

 $\left\|f(u)-f(v)\right\|\leq\sup_{w\in[u,v]}\left\|f^{\prime}(w)\right\|\cdot% \left\|u-v\right\|.$
###### Proof.

If $f(u)=f(v)$ then immediately the claim is true. Otherwise, $f(v)-f(u)\neq 0$, and so by the Hahn-Banach extension theorem1010 10 Walter Rudin, Functional Analysis, second ed., p. 59, Corollary. there is some $\psi\in Y^{*}$ satisfying $\psi(f(v)-f(u))=\left\|f(v)-f(u)\right\|$ and $\left\|\psi\right\|=1$. Define $h:[0,1]\to\mathbb{R}$ by

 $h(t)=\left\langle f((1-t)u+tv),\psi\right\rangle.$

For $0 and $\tau\neq 0$ satisfying $t+\tau\in[0,1]$, we have

 $\displaystyle\frac{h(t+\tau)-h(t)}{\tau}$ $\displaystyle=$ $\displaystyle\frac{1}{\tau}\left\langle f((1-t-\tau)u+(t+\tau)v),\psi\right% \rangle-\frac{1}{\tau}\left\langle f((1-t)u+tv),\psi\right\rangle$ $\displaystyle=$ $\displaystyle\left\langle\frac{f((1-t)u+tv+(v-u)\tau)-f((1-t)u+tv)}{\tau},\psi% \right\rangle.$

Because $f$ is Gâteaux differentiable at $(1-t)u+tv$,

 $\lim_{\tau\to 0}\frac{f((1-t)u+tv+(v-u)\tau)-f((1-t)u+tv)}{\tau}=f^{\prime}((1% -t)u+tv)(v-u),$

so because $\psi$ is continuous,

 $\lim_{\tau\to 0}\frac{h(t+\tau)-h(t)}{\tau}=\left\langle f^{\prime}((1-t)u+tv)% (v-u),\psi\right\rangle,$

which shows that $h$ is differentiable at $t$ and that

 $h^{\prime}(t)=\left\langle f^{\prime}((1-t)u+tv)(v-u),\psi\right\rangle.$

$h:[0,1]\to\mathbb{R}$ is a composition of continuous functions so it is continuous. Applying the mean value theorem, there is some $\theta$, $0<\theta<1$, for which

 $h^{\prime}(\theta)=h(1)-h(0).$

On the one hand,

 $h^{\prime}(\theta)=\left\langle f^{\prime}((1-\theta)u+\theta v)(v-u),\psi% \right\rangle.$

On the other hand,

 $h(1)-h(0)=\left\langle f(v),\psi\right\rangle-\left\langle f(u),\psi\right% \rangle=\left\langle f(v)-f(u),\psi\right\rangle=\left\|f(v)-f(u)\right\|.$

Therefore

 $\displaystyle\left\|f(v)-f(u)\right\|$ $\displaystyle=$ $\displaystyle|\left\langle f^{\prime}((1-\theta)u+\theta v)(v-u),\psi\right\rangle|$ $\displaystyle\leq$ $\displaystyle\left\|\psi\right\|\left\|f^{\prime}((1-\theta)u+\theta v)(v-u)\right\|$ $\displaystyle=$ $\displaystyle\left\|f^{\prime}((1-\theta)u+\theta v)(v-u)\right\|$ $\displaystyle\leq$ $\displaystyle\left\|f^{\prime}((1-\theta)u+\theta v)\right\|\left\|v-u\right\|$ $\displaystyle\leq$ $\displaystyle\sup_{w\in[u,v]}\left\|f^{\prime}(w)\right\|\left\|v-u\right\|.$

## 7 Antiderivatives

Suppose that $X$ is a Banach space and that $f:[a,b]\to X$ be continuous. Define $F:[a,b]\to X$ by

 $F(x)=\int_{a}^{x}f.$

Let $x_{0}\in(a,b)$. For $x\in(a,b)$, we have

 $F(x)-F(x_{0})=\int_{a}^{x}f-\int_{a}^{x_{0}}f=\int_{x_{0}}^{x}f=f(x_{0})(x-x_{% 0})+\int_{x_{0}}^{x}(f-f(x_{0})),$

from which it follows that $F$ is Fréchet differentiable at $x_{0}$, and that

 $DF(x_{0})(x-x_{0})=f(x_{0})(x-x_{0}).$

If we identify $f(x_{0})\in X$ with the map $x\mapsto f(x_{0})x$, namely if we say that $X=\mathscr{B}(\mathbb{R},X)$, then $DF(x_{0})=f(x_{0})$.

Let $X$ be a normed space, let $Y$ be a Banach space, let $U$ be an open subset of $X$, and let $f\in C^{1}(U,Y)$. Suppose that $u,v\in U$ satisfy $[u,v]\subset U$. Write $I=(0,1)$ and define $\gamma:I\to U$ by $\gamma(t)=(1-t)u+tv$. We have

 $D\gamma(t)=v-u,\qquad t\in I,$

and thus by Theorem 6,

 $D(f\circ\gamma)(t)=Df(\gamma(t))\circ D\gamma(t),\qquad t\in I,$

that is,

 $D(f\circ\gamma)(t)=Df(\gamma(t))\circ(v-u),\qquad t\in I,$

i.e.

 $D(f\circ\gamma)(t)=Df(\gamma(t))(v-u),\qquad t\in I.$

If $t\in I$ and $t+h\in I$, then

 $\displaystyle D(f\circ\gamma)(t+h)-D(f\circ\gamma)(t)$ $\displaystyle=$ $\displaystyle Df(\gamma(t+h))(v-u)-Df(\gamma(t))(v-u)$ $\displaystyle=$ $\displaystyle\left(Df(\gamma(t+h))-Df(\gamma(t))\right)(v-u),$

and hence

 $\left\|D(f\circ\gamma)(t+h)-D(f\circ\gamma)(t)\right\|\leq\left\|Df(\gamma(t+h% ))-Df(\gamma(t))\right\|\left\|v-u\right\|.$

Because $Df:U\to\mathscr{B}(X,Y)$ is continuous, it follows that

 $\left\|D(f\circ\gamma)(t+h)-D(f\circ\gamma)(t)\right\|\to 0$

as $h\to 0$, i.e. that $D(f\circ\gamma)$ is continuous at $t$, and thus that

 $D(f\circ\gamma):I\to\mathscr{B}(\mathbb{R},Y)$

is continuous. If we identify $\mathscr{B}(\mathbb{R},Y)$ with $Y$, then

 $D(f\circ\gamma):I\to Y.$

On the one hand,

 $\int_{0}^{1}D(f\circ\gamma)=(f\circ\gamma)(1)-(f\circ\gamma)(0)=f(v)-f(u).$

On the other hand,

 $\int_{0}^{1}D(f\circ\gamma)=\int_{0}^{1}Df(\gamma(t))(v-u)dt=\left(\int_{0}^{1% }Df((1-t)u+tv)dt\right)(v-u);$

here,

 $\int_{0}^{1}Df((1-t)u+tv)dt\in\mathscr{B}(X,Y).$

Therefore

 $f(v)-f(u)=\left(\int_{0}^{1}Df((1-t)u+tv)dt\right)(v-u).$