Fréchet derivatives and Gâteaux derivatives
1 Introduction
In this note all vector spaces are real. If $X$ and $Y$ are normed spaces, we denote by $\mathcal{B}(X,Y)$ the set of bounded linear maps $X\to Y$, and write $\mathcal{B}(X)=\mathcal{B}(X,X)$. $\mathcal{B}(X,Y)$ is a normed space with the operator norm.
2 Remainders
If $X$ and $Y$ are normed spaces, let $o(X,Y)$ be the set of all maps $r:X\to Y$ for which there is some map $\alpha :X\to Y$ satisfying:

•
$r(x)=\parallel x\parallel \alpha (x)$ for all $x\in X$,

•
$\alpha (0)=0$,

•
$\alpha $ is continuous at $0$.
Following Penot,^{1}^{1} 1 JeanPaul Penot, Calculus Without Derivatives, p. 133, §2.4. we call elements of $o(X,Y)$ remainders. It is immediate that $o(X,Y)$ is a vector space.
If $X$ and $Y$ are normed spaces, if $f:X\to Y$ is a function, and if ${x}_{0}\in X$, we say that $f$ is stable at ${x}_{\mathrm{0}}$ if there is some $\u03f5>0$ and some $c>0$ such that $\parallel x{x}_{0}\parallel \le \u03f5$ implies that $\parallel f(x{x}_{0})\parallel \le c\parallel x{x}_{0}\parallel $. If $T:X\to Y$ is a bounded linear map, then $\parallel Tx\parallel \le \parallel T\parallel \parallel x\parallel $ for all $x\in X$, and thus a bounded linear map is stable at $0$. The following lemma shows that the composition of a remainder with a function that is stable at $0$ is a remainder.^{2}^{2} 2 JeanPaul Penot, Calculus Without Derivatives, p. 134, Lemma 2.41.
Lemma 1.
Let $X,Y$ be normed spaces and let $r\in o(X,Y)$. If $W$ is a normed space and $f:W\to X$ is stable at 0, then $r\circ f\in o(W,Y)$. If $Z$ is a normed space and $g:Y\to Z$ is stable at 0, then $g\circ r\in o(X,Z)$.
Proof.
$r\in o(X,Y)$ means that there is some $\alpha :X\to Y$ satisfying $r(x)=\parallel x\parallel \alpha (x)$ for all $x\in X$, that takes the value $0$ at $0$, and that is continuous at $0$. As $f$ is stable at $0$, there is some $\u03f5>0$ and some $c>0$ for which $\parallel w\parallel \le \u03f5$ implies that $\parallel f(w)\parallel \le c\parallel w\parallel $. Define $\beta :W\to Y$ by
$$\beta (w)=\{\begin{array}{cc}\frac{\parallel f(w)\parallel}{\parallel w\parallel}\alpha (f(w))\hfill & w\ne 0\hfill \\ 0\hfill & w=0,\hfill \end{array}$$ 
for which we have
$$(r\circ f)(w)=\parallel w\parallel \beta (w),w\in W.$$ 
If $\parallel w\parallel \le \u03f5$, then $\parallel \beta (w)\parallel \le c\parallel \alpha (f(w))\parallel $. But $f(w)\to 0$ as $w\to 0$, and because $\alpha $ is continuous at $0$ we get that $\alpha (f(w))\to \alpha (0)=0$ as $w\to 0$. So the above inequality gives us $\beta (w)\to 0$ as $w\to 0$. As $\beta (0)=0$, the function $\beta :W\to Y$ is continuous at $0$, and therefore $r\circ f$ is remainder.
As $g$ is stable at $0$, there is some $\u03f5>0$ and some $c>0$ for which $\parallel y\parallel \le \u03f5$ implies that $\parallel g(y)\parallel \le c\parallel y\parallel $. Define $\gamma :X\to Z$ by
$$\gamma (x)=\{\begin{array}{cc}\frac{g(\parallel x\parallel \alpha (x))}{\parallel x\parallel}\hfill & x\ne 0\hfill \\ 0\hfill & x=0.\hfill \end{array}$$ 
For all $x\in X$,
$$(g\circ r)(x)=g(\parallel x\parallel \alpha (x))=\parallel x\parallel \gamma (x).$$ 
Since $\alpha (0)=0$ and $\alpha $ is continuous at $0$, there is some $\delta >0$ such that $\parallel x\parallel \le \delta $ implies that $\parallel \alpha (x)\parallel \le \u03f5$. Therefore, if $\parallel x\parallel \le \delta \wedge 1$ then
$$\parallel g(\parallel x\parallel \alpha (x))\parallel \le c\parallel x\parallel \parallel \alpha (x)\parallel \le c\parallel x\parallel \u03f5,$$ 
and hence if $\parallel x\parallel \le \delta \wedge 1$ then $\parallel \gamma (x)\parallel \le c\u03f5$. This shows that $\gamma (x)\to 0$ as $x\to 0$, and since $\gamma (0)=0$ the function $\gamma :X\to Z$ is continuous at $0$, showing that $g\circ r$ is a remainder. ∎
If ${Y}_{1},\mathrm{\dots},{Y}_{n}$ are normed spaces where ${Y}_{k}$ has norm $\parallel \cdot {\parallel}_{k}$, then $\parallel ({y}_{1},\mathrm{\dots},{y}_{n})\parallel ={\mathrm{max}}_{1\le k\le n}{\parallel {y}_{k}\parallel}_{k}$ is a norm on ${\prod}_{k=1}^{n}{Y}_{k}$, and one can prove that the topology induced by this norm is the product topology.
Lemma 2.
If $X$ and ${Y}_{1},\mathrm{\dots},{Y}_{n}$ are normed spaces, then a function $r:X\to {\prod}_{k=1}^{n}{Y}_{k}$ is a remainder if and only if each of ${r}_{k}:X\to {Y}_{k}$ are remainders, $1\le k\le n$, where $r(x)=({r}_{1}(x),\mathrm{\dots},{r}_{n}(x))$ for all $x\in X$.
Proof.
Suppose that there is some function $\alpha :X\to {\prod}_{k=1}^{n}{Y}_{k}$ such that $r(x)=\parallel x\parallel \alpha (x)$ for all $x\in X$. With $\alpha (x)=({\alpha}_{1}(x),\mathrm{\dots},{\alpha}_{n}(x))$, we have
$${r}_{k}(x)=\parallel x\parallel {\alpha}_{k}(x),x\in X.$$ 
Because $\alpha (x)\to 0$ as $x\to 0$, for each $k$ we have ${\alpha}_{k}(x)\to 0$ as $x\to 0$, which shows that ${r}_{k}$ is a remainder.
Suppose that each ${r}_{k}$ is a remainder. Thus, for each $k$ there is a function ${\alpha}_{k}:X\to {Y}_{k}$ satisfying ${r}_{k}(x)=\parallel x\parallel {\alpha}_{k}(x)$ for all $x\in X$ and ${\alpha}_{k}(x)\to 0$ as $x\to 0$. Then the function $\alpha :X\to {\prod}_{k=1}^{n}{Y}_{k}$ defined by $\alpha (x)=({\alpha}_{1}(x),\mathrm{\dots},{\alpha}_{n}(x))$ satisfies $r(x)=\parallel x\parallel \alpha (x)$. Because ${\alpha}_{k}(x)\to 0$ as $x\to 0$ for each of the finitely many $k$, $1\le k\le n$, we have $\alpha (x)\to 0$ as $x\to 0$. ∎
3 Definition and uniqueness of Fréchet derivative
Suppose that $X$ and $Y$ are normed spaces, that $U$ is an open subset of $X$, and that ${x}_{0}\in U$. A function $f:U\to Y$ is said to be Fréchet differentiable at ${x}_{\mathrm{0}}$ if there is some $L\in \mathcal{B}(X,Y)$ and some $r\in o(X,Y)$ such that
$$f(x)=f({x}_{0})+L(x{x}_{0})+r(x{x}_{0}),x\in U.$$  (1) 
Suppose there are bounded linear maps ${L}_{1},{L}_{2}$ and remainders ${r}_{1},{r}_{2}$ that satisfy the above. Writing ${r}_{1}(x)=\parallel x\parallel {\alpha}_{1}(x)$ and ${r}_{2}(x)=\parallel x\parallel {\alpha}_{2}(x)$ for all $x\in X$, we have
$${L}_{1}(x{x}_{0})+\parallel x{x}_{0}\parallel {\alpha}_{1}(x{x}_{0})={L}_{2}(x{x}_{0})+\parallel x{x}_{0}\parallel {\alpha}_{2}(x{x}_{0}),x\in U,$$ 
i.e.,
$${L}_{1}(x{x}_{0}){L}_{2}(x{x}_{0})=\parallel x{x}_{0}\parallel ({\alpha}_{2}(x{x}_{0}){\alpha}_{1}(x{x}_{0})),x\in U.$$ 
For $x\in X$, there is some $h>0$ such that for all $t\le h$ we have ${x}_{0}+tx\in U$, and then
$${L}_{1}(tx){L}_{2}(tx)=\parallel tx\parallel ({\alpha}_{2}(tx){\alpha}_{1}(tx)),$$ 
hence, for $$,
$${L}_{1}(x){L}_{2}(x)=\parallel x\parallel ({\alpha}_{2}(tx){\alpha}_{1}(tx)).$$ 
But ${\alpha}_{2}(tx){\alpha}_{1}(tx)\to 0$ as $t\to 0$, which implies that ${L}_{1}(x){L}_{2}(x)=0$. As this is true for all $x\in X$, we have ${L}_{1}={L}_{2}$ and then ${r}_{1}={r}_{2}$. If $f$ is Fréchet differentiable at ${x}_{0}$, the bounded linear map $L$ in (1) is called the Fréchet derivative of $f$ at ${x}_{\mathrm{0}}$, and we define $Df({x}_{0})=L$. Thus,
$$f(x)=f({x}_{0})+Df({x}_{0})(x{x}_{0})+r(x{x}_{0}),x\in U.$$ 
If ${U}_{0}$ is the set of those points in $U$ at which $f$ is Fréchet differentiable, then $Df:{U}_{0}\to \mathcal{B}(X,Y)$.
Suppose that $X$ and $Y$ are normed spaces and that $U$ is an open subset of $X$. We denote by ${C}^{1}(U,Y)$ the set of functions $f:U\to Y$ that are Fréchet differentiable at each point in $U$ and for which the function $Df:U\to \mathcal{B}(X,Y)$ is continuous. We say that an element of ${C}^{1}(U,Y)$ is continuously differentiable. We denote by ${C}^{2}(U,Y)$ those elements $f$ of ${C}^{1}(U,Y)$ such that
$$Df\in {C}^{1}(U,\mathcal{B}(X,Y));$$ 
that is, ${C}^{2}(U,Y)$ are those $f\in {C}^{1}(U,Y)$ such that the function $Df:U\to \mathcal{B}(X,Y)$ is Fréchet differentiable at each point in $U$ and such that the function
$$D(Df):U\to \mathcal{B}(X,\mathcal{B}(X,Y))$$ 
is continuous.^{3}^{3} 3 See Henri Cartan, Differential Calculus, p. 58, §5.1, and Jean Dieudonné, Foundations of Modern Analysis, enlarged and corrected printing, p. 179, Chapter VIII, §12.
The following theorem characterizes continuously differentiable functions ${\mathbb{R}}^{n}\to {\mathbb{R}}^{m}$.^{4}^{4} 4 Henri Cartan, Differential Calculus, p. 36, §2.7.
Theorem 3.
Suppose that $f:{\mathbb{R}}^{n}\to {\mathbb{R}}^{m}$ is Fréchet differentiable at each point in ${\mathbb{R}}^{n}$, and write
$$f=({f}_{1},\mathrm{\dots},{f}_{m}).$$ 
$f\in {C}^{1}({\mathbb{R}}^{n},{\mathbb{R}}^{m})$ if and only if for each $1\le i\le m$ and $1\le j\le n$ the function
$$\frac{\partial {f}_{i}}{\partial {x}_{j}}:{\mathbb{R}}^{n}\to \mathbb{R}$$ 
is continuous.
4 Properties of the Fréchet derivative
If $f:X\to Y$ is Fréchet differentiable at ${x}_{0}$, then because a bounded linear map is continuous and in particular continuous at $0$, and because a remainder is continuous at $0$, we get that $f$ is continuous at ${x}_{0}$.
We now prove that Fréchet differentiation at a point is linear.
Lemma 4 (Linearity).
Let $X$ and $Y$ be normed spaces, let $U$ be an open subset of $X$ and let ${x}_{0}\in U$. If ${f}_{1},{f}_{2}:U\to Y$ are both Fréchet differentiable at ${x}_{0}$ and if $\alpha \in \mathbb{R}$, then $\alpha {f}_{1}+{f}_{2}$ is Fréchet differentiable at ${x}_{0}$ and
$$D(\alpha {f}_{1}+{f}_{2})({x}_{0})=\alpha D{f}_{1}({x}_{0})+D{f}_{2}({x}_{0}).$$ 
Proof.
There are remainders ${r}_{1},{r}_{2}\in o(X,Y)$ such that
$${f}_{1}(x)={f}_{1}({x}_{0})+D{f}_{1}({x}_{0})(x{x}_{0})+{r}_{1}(x{x}_{0}),x\in U,$$ 
and
$${f}_{2}(x)={f}_{2}({x}_{0})+D{f}_{2}({x}_{0})(x{x}_{0})+{r}_{2}(x{x}_{0}),x\in U.$$ 
Then for all $x\in U$,
$(\alpha {f}_{1}+{f}_{2})(x)(\alpha {f}_{1}+{f}_{2})({x}_{0})$  $=$  $\alpha {f}_{1}(x)\alpha {f}_{1}({x}_{0})+{f}_{2}(x){f}_{2}({x}_{0})$  
$=$  $\alpha D{f}_{1}({x}_{0})(x{x}_{0})+\alpha {r}_{1}(x{x}_{0})$  
$+D{f}_{2}({x}_{0})(x{x}_{0})+{r}_{2}(x{x}_{0})$  
$=$  $(\alpha D{f}_{1}({x}_{0})+D{f}_{2}({x}_{0}))(x{x}_{0})$  
$+(\alpha {r}_{1}+{r}_{2})(x{x}_{0}),$ 
and $\alpha {r}_{1}+{r}_{2}\in o(X,Y)$. ∎
The following lemma gives an alternate characterization of a function being Fréchet differentiable at a point.^{5}^{5} 5 JeanPaul Penot, Calculus Without Derivatives, p. 136, Lemma 2.46.
Lemma 5.
Suppose that $X$ and $Y$ are normed space, that $U$ is an open subset of $X$, and that ${x}_{0}\in U$. A function $f:U\to Y$ is Fréchet differentiable at ${x}_{0}$ if and only if there is some function $F:U\to \mathcal{B}(X,Y)$ that is continuous at ${x}_{0}$ and for which
$$f(x)f({x}_{0})=F(x)(x{x}_{0}),x\in U.$$ 
Proof.
Suppose that there is a function $F:U\to \mathcal{B}(X,Y)$ that is continuous at ${x}_{0}$ and that satisfies $f(x)f({x}_{0})=F(x)(x{x}_{0})$ for all $x\in U$. Then, for $x\in U$,
$f(x)f({x}_{0})$  $=$  $F(x)(x{x}_{0})F({x}_{0})(x{x}_{0})+F({x}_{0})(x{x}_{0})$  
$=$  $F({x}_{0})(x{x}_{0})+r(x{x}_{0}),$ 
where $r:X\to Y$ is defined by
$$r(x)=\{\begin{array}{cc}(F(x+{x}_{0})F({x}_{0}))(x)\hfill & x+{x}_{0}\in U\hfill \\ 0\hfill & x+{x}_{0}\notin U.\hfill \end{array}$$ 
We further define
$$\alpha (x)=\{\begin{array}{cc}\frac{(F(x+{x}_{0})F({x}_{0}))(x)}{\parallel x\parallel}\hfill & x+{x}_{0}\in U,x\ne 0\hfill \\ 0\hfill & x+{x}_{0}\notin U\hfill \\ 0\hfill & x=0,\hfill \end{array}$$ 
with which $r(x)=\parallel x\parallel \alpha (x)$ for all $x\in X$. To prove that $r$ is a remainder it suffices to prove that $\alpha (x)\to 0$ as $x\to 0$. Let $\u03f5>0$. That $F:U\to \mathcal{B}(X,Y)$ is continuous at ${x}_{0}$ tells us that there is some $\delta >0$ for which $$ implies that $$ and hence
$$ 
Therefore, if $$ then $$, which establishes that $r$ is a remainder and therefore that $f$ is Fréchet differentiable at ${x}_{0}$, with Fréchet derivative $Df({x}_{0})=F({x}_{0})$.
Suppose that $f$ is Fréchet differentiable at ${x}_{0}$: there is some $r\in o(X,Y)$ such that
$$f(x)=f({x}_{0})+Df({x}_{0})(x{x}_{0})+r(x{x}_{0}),x\in U,$$ 
where $Df({x}_{0})\in \mathcal{B}(X,Y)$. As $r$ is a remainder, there is some $\alpha :X\to Y$ satisfying $r(x)=\parallel x\parallel \alpha (x)$ for all $x\in X$, and such that $\alpha (0)=0$ and $\alpha (x)\to 0$ as $x\to 0$. For each $x\in X$, by the HahnBanach extension theorem^{6}^{6} 6 Walter Rudin, Functional Analysis, second ed., p. 59, Corollary to Theorem 3.3. there is some ${\lambda}_{x}\in {X}^{*}$ such that ${\lambda}_{x}x=\parallel x\parallel $ and ${\lambda}_{x}v\le \parallel v\parallel $ for all $v\in X$. Thus,
$$r(x)=({\lambda}_{x}x)\alpha (x),x\in X.$$ 
Define $F:U\to \mathcal{B}(X,Y)$ by
$$F(x)=Df({x}_{0})+({\lambda}_{x{x}_{0}})\alpha (x{x}_{0}),$$ 
i.e. for $x\in U$ and $v\in X$,
$$F(x)(v)=Df({x}_{0})(v)+({\lambda}_{x{x}_{0}}v)\alpha (x{x}_{0})\in Y.$$ 
Then for $x\in U$,
$$r(x{x}_{0})=({\lambda}_{x{x}_{0}}(x{x}_{0}))\alpha (x{x}_{0})=F(x)(x{x}_{0})Df({x}_{0})(x{x}_{0}),$$ 
and hence
$$f(x)=f({x}_{0})+F(x)(x{x}_{0}),x\in U.$$ 
To complete the proof it suffices to prove that $F$ is continuous at ${x}_{0}$. But both ${\lambda}_{0}=0$ and $\alpha (0)=0$ so $F({x}_{0})=Df({x}_{0})$, and for $x\in U$ and $v\in X$,
$\parallel (F(x)F({x}_{0}))(v)\parallel $  $=$  $\parallel ({\lambda}_{x{x}_{0}}v)\alpha (x{x}_{0})\parallel $  
$=$  ${\lambda}_{x{x}_{0}}v\parallel \alpha (x{x}_{0})\parallel $  
$\le $  $\parallel v\parallel \parallel \alpha (x{x}_{0})\parallel ,$ 
so $\parallel F(x)F({x}_{0})\parallel \le \parallel \alpha (x{x}_{0})\parallel $. From this and the fact that $\alpha (0)=0$ and $\alpha (x)\to 0$ as $x\to 0$ we get that $F$ is continuous at ${x}_{0}$, completing the proof. ∎
We now prove the chain rule for Fréchet derivatives.^{7}^{7} 7 JeanPaul Penot, Calculus Without Derivatives, p. 136, Theorem 2.47.
Theorem 6 (Chain rule).
Suppose that $X,Y,Z$ are normed spaces and that $U$ and $V$ are open subsets of $X$ and $Y$ respectively. If $f:U\to Y$ satisfies $f(U)\subseteq V$ and is Fréchet differentiable at ${x}_{0}$ and if $g:V\to Z$ is Fréchet differentiable at $f({x}_{0})$, then $g\circ f:U\to Z$ is Fréchet differentiable at ${x}_{0}$, and its Fréchet derivative at ${x}_{0}$ is
$$D(g\circ f)({x}_{0})=Dg(f({x}_{0}))\circ Df({x}_{0}).$$ 
Proof.
Write ${y}_{0}=f({x}_{0})$, ${L}_{1}=Df({x}_{0})$, and ${L}_{2}=Dg({y}_{0})$. Because $f$ is Fréchet differentiable at ${x}_{0}$, there is some ${r}_{1}\in o(X,Y)$ such that
$$f(x)=f({x}_{0})+{L}_{1}(x{x}_{0})+{r}_{1}(x{x}_{0}),x\in U,$$ 
and because $g$ is Fréchet differentiable at ${y}_{0}$ there is some ${r}_{2}\in o(Y,Z)$ such that
$$g(y)=g({y}_{0})+{L}_{2}(y{y}_{0})+{r}_{2}(y{y}_{0}),y\in V.$$ 
For all $x\in U$ we have $f(x)\in V$, and using the above formulas,
$g(f(x))$  $=g({y}_{0})+{L}_{2}(f(x){y}_{0})+{r}_{2}(f(x){y}_{0})$  
$=g({y}_{0})+{L}_{2}\left({L}_{1}(x{x}_{0})+{r}_{1}(x{x}_{0})\right)+{r}_{2}\left({L}_{1}(x{x}_{0})+{r}_{1}(x{x}_{0})\right)$  
$=g({y}_{0})+{L}_{2}({L}_{1}(x{x}_{0}))+{L}_{2}({r}_{1}(x{x}_{0}))+{r}_{2}\left({L}_{1}(x{x}_{0})+{r}_{1}(x{x}_{0})\right).$ 
Define ${r}_{3}:X\to Z$ by ${r}_{3}(x)={r}_{2}({L}_{1}x+{r}_{1}(x))$, and fix any $c>\parallel {L}_{1}\parallel $. Writing ${r}_{1}(x)=\parallel x\parallel {\alpha}_{1}(x)$, the fact that $\alpha (0)=0$ and that $\alpha $ is continuous at $0$ gives us that there is some $\delta >0$ such that if $$ then $$, and hence if $$ then $\parallel {r}_{1}(x)\parallel \le (c\parallel {L}_{1}\parallel )\parallel x\parallel $. Then, $$ implies that
$$\parallel {L}_{1}x+{r}_{1}(x)\parallel \le \parallel {L}_{1}x\parallel +\parallel {r}_{1}(x)\parallel \le \parallel {L}_{1}\parallel \parallel x\parallel +(c\parallel {L}_{1}\parallel )\parallel x\parallel =c\parallel x\parallel .$$ 
This shows that $x\mapsto {L}_{1}x+{r}_{1}(x)$ is stable at $0$ and so by Lemma 1 that ${r}_{3}\in o(X,Z)$. Then, $r:X\to Z$ defined by $r={L}_{1}\circ {r}_{1}+{r}_{3}$ is a sum of two remainders and so is itself a remainder, and we have
$$g\circ f(x)=g\circ f({x}_{0})+{L}_{2}\circ {L}_{1}(x{x}_{0})+r(x{x}_{0}),x\in U.$$ 
But ${L}_{1}\in \mathcal{B}(X,Y)$ and ${L}_{2}\in \mathcal{B}(Y,Z)$, so ${L}_{2}\circ {L}_{1}\in \mathcal{B}(X,Z)$. This shows that $g\circ f$ is Fréchet differentiable at ${x}_{0}$ and that its Fréchet derivative at ${x}_{0}$ is
$${L}_{2}\circ {L}_{1}=Dg({y}_{0})\circ Df({x}_{0})=Dg(f({x}_{0}))\circ Df({x}_{0}).$$ 
∎
The following is the product rule for Fréchet derivatives. By ${f}_{1}\cdot {f}_{2}$ we mean the function $x\mapsto {f}_{1}(x){f}_{2}(x)$.
Theorem 7 (Product rule).
Suppose that $X$ is a normed space, that $U$ is an open subset of $X$, that ${f}_{1},{f}_{2}:U\to \mathbb{R}$ are functions, and that ${x}_{0}\in U$. If ${f}_{1}$ and ${f}_{2}$ are both Fréchet differentiable at ${x}_{0}$, then ${f}_{1}\cdot {f}_{2}$ is Fréchet differentiable at ${x}_{0}$, and its Fréchet derivative at ${x}_{0}$ is
$$D({f}_{1}\cdot {f}_{2})({x}_{0})={f}_{2}({x}_{0})D{f}_{1}({x}_{0})+{f}_{1}({x}_{0})D{f}_{2}({x}_{0}).$$ 
Proof.
There are ${r}_{1},{r}_{2}\in o(X,\mathbb{R})$ with which
$${f}_{1}(x)={f}_{1}({x}_{0})+D{f}_{1}({x}_{0})(x{x}_{0})+{r}_{1}(x{x}_{0}),x\in U$$ 
and
$${f}_{2}(x)={f}_{2}({x}_{0})+D{f}_{2}({x}_{0})(x{x}_{0})+{r}_{2}(x{x}_{0}),x\in U.$$ 
Multiplying the above two formulas,
${f}_{1}(x){f}_{2}(x)$  $=$  ${f}_{1}({x}_{0}){f}_{2}({x}_{0})+{f}_{2}({x}_{0})D{f}_{1}({x}_{0})(x{x}_{0})+{f}_{1}({x}_{0})D{f}_{2}({x}_{0})(x{x}_{0})$  
$+D{f}_{1}({x}_{0})(x{x}_{0})D{f}_{2}({x}_{0})(x{x}_{0})+{r}_{1}(x{x}_{0}){r}_{2}(x{x}_{0})$  
$+{f}_{1}({x}_{0}){r}_{2}(x{x}_{0})+{r}_{2}(x{x}_{0})D{f}_{1}({x}_{0})(x{x}_{0})$  
$+{f}_{2}({x}_{0}){r}_{1}(x{x}_{0})+{r}_{1}(x{x}_{0})D{f}_{2}({x}_{0})(x{x}_{0}).$ 
Define $r:X\to \mathbb{R}$ by
$r(x)$  $=$  $D{f}_{1}({x}_{0})xD{f}_{2}({x}_{0})x+{r}_{1}(x){r}_{2}(x)+{f}_{1}({x}_{0}){r}_{2}(x)+{r}_{2}(x)D{f}_{1}({x}_{0})x$  
$+{f}_{2}({x}_{0}){r}_{1}(x)+{r}_{1}(x)D{f}_{2}({x}_{0})x,$ 
for which we have, for $x\in U$,
$${f}_{1}(x){f}_{2}(x)={f}_{1}({x}_{0}){f}_{2}({x}_{0})+{f}_{2}({x}_{0})D{f}_{1}({x}_{0})(x{x}_{0})+{f}_{1}({x}_{0})D{f}_{2}({x}_{0})(x{x}_{0})+r(x{x}_{0}).$$ 
Therefore, to prove the claim it suffices to prove that $r\in o(X,\mathbb{R})$. Define $\alpha :X\to \mathbb{R}$ by $\alpha (0)=0$ and $\alpha (x)=\frac{D{f}_{1}({x}_{0})xD{f}_{2}({x}_{0})x}{\parallel x\parallel}$ for $x\ne 0$. For $x\ne 0$,
$\alpha (x)$  $=$  $\frac{D{f}_{1}({x}_{0})xD{f}_{2}({x}_{0})x}{\parallel x\parallel}$  
$\le $  $\frac{\parallel D{f}_{1}({x}_{0})\parallel \parallel x\parallel \parallel D{f}_{2}({x}_{0})\parallel \parallel x\parallel}{\parallel x\parallel}$  
$=$  $\parallel D{f}_{1}({x}_{0})\parallel \parallel D{f}_{2}({x}_{0})\parallel \parallel x\parallel .$ 
Thus $\alpha (x)\to 0$ as $x\to 0$, showing that the first term in the expression for $r$ belongs to $o(X,\mathbb{R})$. Likewise, each of the other five terms in the expression for $r$ belongs to $o(X,\mathbb{R})$, and hence $r\in o(X,\mathbb{R})$, completing the proof. ∎
5 Dual spaces
If $X$ is a normed space, we denote by ${X}^{*}$ the set of bounded linear maps $X\to \mathbb{R}$, i.e. ${X}^{*}=\mathcal{B}(X,\mathbb{R})$. ${X}^{*}$ is itself a normed space with the operator norm. If $X$ is a normed space, the dual pairing $\u27e8\cdot ,\cdot \u27e9:X\times {X}^{*}\to \mathbb{R}$ is
$$\u27e8x,\psi \u27e9=\psi (x),x\in X,\psi \in {X}^{*}.$$ 
If $U$ is an open subset of $X$ and if a function $f:U\to \mathbb{R}$ is Fréchet differentiable at ${x}_{0}\in U$, then $Df({x}_{0})$ is a bounded linear map $X\to \mathbb{R}$, and so belongs to ${X}^{*}$. If ${U}_{0}$ are those points in $U$ at which $f:U\to \mathbb{R}$ is Fréchet differentiable, then
$$Df:{U}_{0}\to {X}^{*}.$$ 
In the case that $X$ is a Hilbert space with inner product $\u27e8\cdot ,\cdot \u27e9$, the Riesz representation theorem shows that $R:X\to {X}^{*}$ defined by $R(x)(y)=\u27e8y,x\u27e9$ is an isometric isomorphism. If $f:U\to \mathbb{R}$ is Fréchet differentiable at ${x}_{0}\in U$, then we define
$$\nabla f({x}_{0})={R}^{1}(Df({x}_{0})),$$ 
and call $\nabla f({x}_{0})\in X$ the gradient of $f$ at ${x}_{\mathrm{0}}$. With ${U}_{0}$ denoting the set of those points in $U$ at which $f$ is Fréchet differentiable,
$$\nabla f:{U}_{0}\to X.$$ 
(To define the gradient we merely used that $R$ is a bijection, but to prove properties of the gradient one uses that $R$ is an isometric isomorphism.)
Example. Let $X$ be a Hilbert space, $A\in \mathcal{B}(X)$, $v\in X$, and define
$$f(x)=\u27e8Ax,x\u27e9\u27e8x,v\u27e9,x\in X.$$ 
For all ${x}_{0},x\in X$ we have, because the inner product of a real Hilbert space is symmetric,
$f(x)f({x}_{0})$  $=$  $\u27e8Ax,x\u27e9\u27e8x,v\u27e9\u27e8A{x}_{0},{x}_{0}\u27e9+\u27e8{x}_{0},v\u27e9$  
$=$  $\u27e8Ax,x\u27e9\u27e8A{x}_{0},x\u27e9+\u27e8A{x}_{0},x\u27e9\u27e8A{x}_{0},{x}_{0}\u27e9\u27e8x{x}_{0},v\u27e9$  
$=$  $\u27e8A(x{x}_{0}),x\u27e9+\u27e8A{x}_{0},x{x}_{0}\u27e9\u27e8x{x}_{0},v\u27e9$  
$=$  $\u27e8x{x}_{0},{A}^{*}x\u27e9+\u27e8x{x}_{0},A{x}_{0}\u27e9\u27e8x{x}_{0},v\u27e9$  
$=$  $\u27e8x{x}_{0},{A}^{*}x+A{x}_{0}v\u27e9$  
$=$  $\u27e8x{x}_{0},{A}^{*}x{A}^{*}{x}_{0}+{A}^{*}{x}_{0}+A{x}_{0}v\u27e9$  
$=$  $\u27e8x{x}_{0},({A}^{*}+A){x}_{0}v\u27e9+\u27e8x{x}_{0},{A}^{*}(x{x}_{0})\u27e9.$ 
With $Df({x}_{0})(x{x}_{0})=\u27e8x{x}_{0},({A}^{*}+A){x}_{0}v\u27e9$, or $Df({x}_{0})(x)=\u27e8x,({A}^{*}+A){x}_{0}v\u27e9$, we have that $f$ is Fréchet differentiable at each ${x}_{0}\in X$. Furthermore, its gradient at ${x}_{0}$ is
$$\nabla f({x}_{0})=({A}^{*}+A){x}_{0}v.$$ 
For each ${x}_{0}\in X$, the function $f:X\to \mathbb{R}$ is Fréchet differentiable at ${x}_{0}$, and thus
$$Df:X\to {X}^{*},$$ 
and we can ask at what points $Df$ has a Fréchet derivative. For ${x}_{0},x,y\in X$,
$(Df(x)Df({x}_{0}))(y)=$  $\u27e8y,({A}^{*}+A)xv\u27e9\u27e8y,({A}^{*}+A){x}_{0}v\u27e9$  
$=$  $\u27e8y,({A}^{*}+A)(x{x}_{0})\u27e9.$ 
For $D(Df)({x}_{0})(x{x}_{0})(y)=\u27e8y,({A}^{*}+A)(x{x}_{0})\u27e9$, in other words with
$${D}^{2}f({x}_{0})(x)(y)=D(Df)({x}_{0})(x)(y)=\u27e8y,({A}^{*}+A)x\u27e9,$$ 
we have that $Df$ is Fréchet differentiable at each ${x}_{0}\in X$. Thus
$${D}^{2}f:X\to \mathcal{B}(X,{X}^{*}).$$ 
Because ${D}^{2}f({x}_{0})$ does not depend on ${x}_{0}$, it is Fréchet differentiable at each point in $X$, with ${D}^{3}f({x}_{0})=0$ for all ${x}_{0}\in X$. Here ${D}^{3}f:X\to \mathcal{B}(X,\mathcal{B}(X,{X}^{*}))$.
6 Gâteaux derivatives
Let $X$ and $Y$ be normed spaces, let $U$ be an open subset of $X$, let $f:U\to Y$ be a function, and let ${x}_{0}\in U$. If there is some $T\in \mathcal{B}(X,Y)$ such that for all $v\in X$ we have
$$\underset{t\to 0}{lim}\frac{f({x}_{0}+tv)f({x}_{0})}{t}=Tv,$$  (2) 
then we say that $f$ is Gâteaux differentiable at ${x}_{\mathrm{0}}$ and call $T$ the Gâteaux derivative of $f$ at ${x}_{\mathrm{0}}$.^{8}^{8} 8 Our definition of the Gâteaux derivative follows JeanPaul Penot, Calculus Without Derivatives, p. 127, Definition 2.23. It is apparent that there is at most one $T\in \mathcal{B}(X,Y)$ that satisfies (2) for all $v\in X$. We write ${f}^{\prime}({x}_{0})=T$. Thus, ${f}^{\prime}$ is a map from the set of points in $U$ at which $f$ is Gâteaux differentiable to $\mathcal{B}(X,Y)$. If $V\subseteq U$ and $f$ is Gâteaux differentiable at each element of $V$, we say that $f$ is Gâteaux differentiable on $V$.
Example. Define $f:{\mathbb{R}}^{2}\to \mathbb{R}$ by $f({x}_{1},{x}_{2})=\frac{{x}_{1}^{4}{x}_{2}}{{x}_{1}^{6}+{x}_{2}^{3}}$ for $({x}_{1},{x}_{2})\ne (0,0)$ and $f(0,0)=0$. For $v=({v}_{1},{v}_{2})\in {\mathbb{R}}^{2}$ and $t\ne 0$,
$$\frac{f(0+tv)f(0)}{t}=\frac{f(t{v}_{1},t{v}_{2})}{t}=\{\begin{array}{cc}\frac{1}{t}\cdot \frac{{t}^{5}{v}_{1}^{4}{v}_{2}}{{t}^{6}{v}_{1}^{6}+{t}^{3}{v}_{2}^{3}}\hfill & v\ne (0,0)\hfill \\ 0\hfill & v=(0,0)\hfill \end{array}=\{\begin{array}{cc}\frac{t{v}_{1}^{4}{v}_{2}}{{t}^{3}{v}_{1}^{6}+{v}_{2}^{3}}\hfill & v\ne (0,0)\hfill \\ 0\hfill & v=(0,0).\hfill \end{array}$$ 
Hence, for any $v\in {\mathbb{R}}^{2}$, we have $\frac{f(0+tv)f(0)}{t}\to 0$ as $t\to 0$. Therefore, $f$ is Gâteaux differentiable at $(0,0)$ and ${f}^{\prime}(0,0)v=0\in \mathbb{R}$ for all $v\in {\mathbb{R}}^{2}$, i.e. ${f}^{\prime}(0,0)=0$. However, for $({x}_{1},{x}_{2})\ne (0,0)$,
$$f({x}_{1},{x}_{1}^{2})=\frac{{x}_{1}^{6}}{{x}_{1}^{6}+{x}_{1}^{6}}=\frac{1}{2},$$ 
from which it follows that $f$ is not continuous at $(0,0)$. We stated in §4 that if a function is Fréchet differentiable at a point then it is continuous at that point, and so $f$ is not Fréchet differentiable at $(0,0)$. Thus, a function that is Gâteaux differentiable at a point need not be Fréchet differentiable at that point.
We prove that being Fréchet differentiable at a point implies being Gâteaux differentiable at the point, and that in this case the Gâteaux derivative is equal to the Fréchet derivative.
Theorem 8.
Suppose that $X$ and $Y$ are normed spaces, that $U$ is an open subset of $X$, that $f\in {Y}^{U}$, and that ${x}_{0}\in U$. If $f$ is Fréchet differentiable at ${x}_{0}$, then $f$ is Gâteaux differentiable at ${x}_{0}$ and ${f}^{\prime}({x}_{0})=Df({x}_{0})$.
Proof.
Because $f$ is Fréchet differentiable at ${x}_{0}$, there is some $r\in o(X,Y)$ for which
$$f(x)=f({x}_{0})+Df({x}_{0})(x{x}_{0})+r(x{x}_{0}),x\in U.$$ 
For $v\in X$ and nonzero $t$ small enough that ${x}_{0}+tv\in U$,
$$\frac{f({x}_{0}+tv)f({x}_{0})}{t}=\frac{Df({x}_{0})({x}_{0}+tv{x}_{0})+r({x}_{0}+tv{x}_{0})}{t}=\frac{tDf({x}_{0})v+r(tv)}{t}.$$ 
Writing $r(x)=\parallel x\parallel \alpha (x)$,
$$\frac{f({x}_{0}+tv)f({x}_{0})}{t}=\frac{tDf({x}_{0})+\parallel tv\parallel \alpha (tv)}{t}=Df({x}_{0})v+\parallel v\parallel \alpha (tv).$$ 
Hence,
$$\underset{t\to 0}{lim}\frac{f({x}_{0}+tv)f({x}_{0})}{t}=Df({x}_{0})v.$$ 
This holds for all $v\in X$, and as $Df({x}_{0})\in \mathcal{B}(X,Y)$ we get that $f$ is Gâteaux differentiable at ${x}_{0}$ and that ${f}^{\prime}({x}_{0})=Df({x}_{0})$. ∎
If $X$ is a vector space and $u,v\in X$, let
$$[u,v]=\{(1t)u+tv:0\le t\le 1\},$$ 
namely, the line segment joining $u$ and $v$. The following is a mean value theorem for Gâteaux derivatives.^{9}^{9} 9 Antonio Ambrosetti and Giovanni Prodi, A Primer of Nonlinear Analysis, p. 13, Theorem 1.8.
Theorem 9 (Mean value theorem).
Let $X$ and $Y$ be normed spaces, let $U$ be an open subset of $X$, and let $f:U\to Y$ be Gâteaux differentiable on $U$. If $u,v\in U$ and $[u,v]\subset U$, then
$$\parallel f(u)f(v)\parallel \le \underset{w\in [u,v]}{sup}\parallel {f}^{\prime}(w)\parallel \cdot \parallel uv\parallel .$$ 
Proof.
If $f(u)=f(v)$ then immediately the claim is true. Otherwise, $f(v)f(u)\ne 0$, and so by the HahnBanach extension theorem^{10}^{10} 10 Walter Rudin, Functional Analysis, second ed., p. 59, Corollary. there is some $\psi \in {Y}^{*}$ satisfying $\psi (f(v)f(u))=\parallel f(v)f(u)\parallel $ and $\parallel \psi \parallel =1$. Define $h:[0,1]\to \mathbb{R}$ by
$$h(t)=\u27e8f((1t)u+tv),\psi \u27e9.$$ 
For $$ and $\tau \ne 0$ satisfying $t+\tau \in [0,1]$, we have
$\frac{h(t+\tau )h(t)}{\tau}$  $=$  $\frac{1}{\tau}}\u27e8f((1t\tau )u+(t+\tau )v),\psi \u27e9{\displaystyle \frac{1}{\tau}}\u27e8f((1t)u+tv),\psi \u27e9$  
$=$  $\u27e8{\displaystyle \frac{f((1t)u+tv+(vu)\tau )f((1t)u+tv)}{\tau}},\psi \u27e9.$ 
Because $f$ is Gâteaux differentiable at $(1t)u+tv$,
$$\underset{\tau \to 0}{lim}\frac{f((1t)u+tv+(vu)\tau )f((1t)u+tv)}{\tau}={f}^{\prime}((1t)u+tv)(vu),$$ 
so because $\psi $ is continuous,
$$\underset{\tau \to 0}{lim}\frac{h(t+\tau )h(t)}{\tau}=\u27e8{f}^{\prime}((1t)u+tv)(vu),\psi \u27e9,$$ 
which shows that $h$ is differentiable at $t$ and that
$${h}^{\prime}(t)=\u27e8{f}^{\prime}((1t)u+tv)(vu),\psi \u27e9.$$ 
$h:[0,1]\to \mathbb{R}$ is a composition of continuous functions so it is continuous. Applying the mean value theorem, there is some $\theta $, $$, for which
$${h}^{\prime}(\theta )=h(1)h(0).$$ 
On the one hand,
$${h}^{\prime}(\theta )=\u27e8{f}^{\prime}((1\theta )u+\theta v)(vu),\psi \u27e9.$$ 
On the other hand,
$$h(1)h(0)=\u27e8f(v),\psi \u27e9\u27e8f(u),\psi \u27e9=\u27e8f(v)f(u),\psi \u27e9=\parallel f(v)f(u)\parallel .$$ 
Therefore
$\parallel f(v)f(u)\parallel $  $=$  $\u27e8{f}^{\prime}((1\theta )u+\theta v)(vu),\psi \u27e9$  
$\le $  $\parallel \psi \parallel \parallel {f}^{\prime}((1\theta )u+\theta v)(vu)\parallel $  
$=$  $\parallel {f}^{\prime}((1\theta )u+\theta v)(vu)\parallel $  
$\le $  $\parallel {f}^{\prime}((1\theta )u+\theta v)\parallel \parallel vu\parallel $  
$\le $  $\underset{w\in [u,v]}{sup}\parallel {f}^{\prime}(w)\parallel \parallel vu\parallel .$ 
∎
7 Antiderivatives
Suppose that $X$ is a Banach space and that $f:[a,b]\to X$ be continuous. Define $F:[a,b]\to X$ by
$$F(x)={\int}_{a}^{x}f.$$ 
Let ${x}_{0}\in (a,b)$. For $x\in (a,b)$, we have
$$F(x)F({x}_{0})={\int}_{a}^{x}f{\int}_{a}^{{x}_{0}}f={\int}_{{x}_{0}}^{x}f=f({x}_{0})(x{x}_{0})+{\int}_{{x}_{0}}^{x}(ff({x}_{0})),$$ 
from which it follows that $F$ is Fréchet differentiable at ${x}_{0}$, and that
$$DF({x}_{0})(x{x}_{0})=f({x}_{0})(x{x}_{0}).$$ 
If we identify $f({x}_{0})\in X$ with the map $x\mapsto f({x}_{0})x$, namely if we say that $X=\mathcal{B}(\mathbb{R},X)$, then $DF({x}_{0})=f({x}_{0})$.
Let $X$ be a normed space, let $Y$ be a Banach space, let $U$ be an open subset of $X$, and let $f\in {C}^{1}(U,Y)$. Suppose that $u,v\in U$ satisfy $[u,v]\subset U$. Write $I=(0,1)$ and define $\gamma :I\to U$ by $\gamma (t)=(1t)u+tv$. We have
$$D\gamma (t)=vu,t\in I,$$ 
and thus by Theorem 6,
$$D(f\circ \gamma )(t)=Df(\gamma (t))\circ D\gamma (t),t\in I,$$ 
that is,
$$D(f\circ \gamma )(t)=Df(\gamma (t))\circ (vu),t\in I,$$ 
i.e.
$$D(f\circ \gamma )(t)=Df(\gamma (t))(vu),t\in I.$$ 
If $t\in I$ and $t+h\in I$, then
$D(f\circ \gamma )(t+h)D(f\circ \gamma )(t)$  $=$  $Df(\gamma (t+h))(vu)Df(\gamma (t))(vu)$  
$=$  $\left(Df(\gamma (t+h))Df(\gamma (t))\right)(vu),$ 
and hence
$$\parallel D(f\circ \gamma )(t+h)D(f\circ \gamma )(t)\parallel \le \parallel Df(\gamma (t+h))Df(\gamma (t))\parallel \parallel vu\parallel .$$ 
Because $Df:U\to \mathcal{B}(X,Y)$ is continuous, it follows that
$$\parallel D(f\circ \gamma )(t+h)D(f\circ \gamma )(t)\parallel \to 0$$ 
as $h\to 0$, i.e. that $D(f\circ \gamma )$ is continuous at $t$, and thus that
$$D(f\circ \gamma ):I\to \mathcal{B}(\mathbb{R},Y)$$ 
is continuous. If we identify $\mathcal{B}(\mathbb{R},Y)$ with $Y$, then
$$D(f\circ \gamma ):I\to Y.$$ 
On the one hand,
$${\int}_{0}^{1}D(f\circ \gamma )=(f\circ \gamma )(1)(f\circ \gamma )(0)=f(v)f(u).$$ 
On the other hand,
$${\int}_{0}^{1}D(f\circ \gamma )={\int}_{0}^{1}Df(\gamma (t))(vu)\mathit{d}t=\left({\int}_{0}^{1}Df((1t)u+tv)\mathit{d}t\right)(vu);$$ 
here,
$${\int}_{0}^{1}Df((1t)u+tv)\mathit{d}t\in \mathcal{B}(X,Y).$$ 
Therefore
$$f(v)f(u)=\left({\int}_{0}^{1}Df((1t)u+tv)\mathit{d}t\right)(vu).$$ 