# Gaussian measures and Bochner’s theorem

Jordan Bell
April 30, 2015

## 1 Fourier transforms of measures

Let $m_{n}$ be normalized Lebesgue measure on $\mathbb{R}^{n}$: $dm_{n}(x)=(2\pi)^{-n/2}dx$. If $\mu$ is a finite positive Borel measure on $\mathbb{R}^{n}$, the Fourier transform of $\mu$ is the function $\hat{\mu}:\mathbb{R}^{n}\to\mathbb{C}$ defined by

 $\hat{\mu}(\xi)=\int_{\mathbb{R}^{n}}e^{-i\xi\cdot x}d\mu(x),\qquad\xi\in% \mathbb{R}^{n}.$

One proves using the dominated convergence theorem that $\hat{\mu}$ is continuous. If $f\in L^{1}(\mathbb{R}^{n})$, the Fourier transform of $f$ is the function $\hat{f}:\mathbb{R}^{n}\to\mathbb{C}$ defined by

 $\hat{f}(\xi)=\int_{\mathbb{R}^{n}}e^{-i\xi\cdot x}f(x)dm_{n}(x),\qquad\xi\in% \mathbb{R}^{n}.$

Likewise, using the dominated convergence theorem, $\hat{f}$ is continuous. One proves that if $f\in L^{1}(\mathbb{R}^{n})$ and $\hat{f}\in L^{1}(\mathbb{R}^{n})$ then, for almost all $x\in\mathbb{R}^{n}$,

 $f(x)=\int_{\mathbb{R}^{n}}e^{ix\cdot\xi}\hat{f}(\xi)dm_{n}(\xi).$

As

 $\hat{\mu}(0)=\int_{\mathbb{R}^{n}}d\mu(x)=\mu(\mathbb{R}^{n}),$

$\mu$ is a probability measure if and only if $\hat{\mu}(0)=1$. (By a probability measure we mean a positive measure with mass $1$.)

If $\phi\in L^{1}(\mathbb{R}^{n})$ and $\hat{\phi}\in L^{1}(\mathbb{R}^{n})$, then, inverting the Fourier transform,

 $\displaystyle\left\langle\phi,\mu\right\rangle$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\phi(x)d\mu(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\left(\int_{\mathbb{R}^{n}}\hat{\phi}(\xi)e^% {ix\cdot\xi}dm_{n}(\xi)\right)d\mu(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\hat{\phi}(\xi)\int_{\mathbb{R}^{n}}e^{i\xi% \cdot x}d\mu(x)dm_{n}(\xi)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\hat{\phi}(\xi)\hat{\mu}(-\xi)dm_{n}(\xi)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\hat{\phi}(-\xi)\hat{\mu}(\xi)dm_{n}(\xi).$
###### Theorem 1.

If $\mu$ and $\nu$ are finite Borel measures on $\mathbb{R}^{n}$ and $\hat{\mu}=\hat{\nu}$, then $\mu=\nu$.

###### Proof.

To prove that $\mu=\nu$ it suffices to prove that for any ball $B$ in $\mathbb{R}^{n}$ we have $\mu(B)=\nu(B)$. Let $\phi_{n}\in C_{c}^{\infty}(\mathbb{R}^{n})\to\chi_{B}$ pointwise. On the one hand, by the dominated convergence theorem, $\left\langle\phi_{n},\mu\right\rangle\to\mu(B)$ and $\left\langle\phi_{n},\nu\right\rangle\to\nu(B)$ as $n\to\infty$. On the other hand, because $\hat{\mu}=\hat{\nu}$ we have

 $\left\langle\phi_{n},\mu\right\rangle=\int_{\mathbb{R}^{n}}\hat{\phi}_{n}(-\xi% )\hat{\mu}(\xi)dm_{n}(\xi)=\int_{\mathbb{R}^{n}}\hat{\phi}_{n}(-\xi)\hat{\nu}(% \xi)dm_{n}(\xi)=\left\langle\phi_{n},\nu\right\rangle.$

Therefore $\mu(B)=\nu(B)$, and it follows that $\mu=\nu$. ∎

## 2 Gaussian measures

Let $\lambda_{1},\ldots,\lambda_{n}>0$, and let $\Lambda:\mathbb{R}^{n}\to\mathbb{R}^{n}$ be the linear map defined by $\Lambda e_{i}=\lambda_{i}e_{i}$. Define

 $d\mu(x)=\sqrt{\det\Lambda}\exp\left(-\frac{1}{2}x\cdot\Lambda x\right)dm_{n}(x),$

called a Gaussian measure.

###### Theorem 2.
 $\hat{\mu}(\xi)=\exp\left(-\frac{1}{2}\xi\cdot\Lambda^{-1}\xi\right),\qquad\xi% \in\mathbb{R}^{n}.$
###### Proof.

We have

 $\displaystyle\hat{\mu}(\xi)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}e^{-i\xi\cdot x}\sqrt{\det\Lambda}\exp\left(% -\frac{1}{2}x\cdot\Lambda x\right)dm_{n}(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{d}}e^{-i\xi_{1}x_{1}-\cdots-i\xi_{n}x_{n}}\sqrt% {\lambda_{1}\cdots\lambda_{n}}\exp\left(-\frac{1}{2}\lambda_{1}x_{1}^{2}-% \cdots-\frac{1}{2}\lambda_{n}x_{n}^{2}\right)dm_{n}(x)$ $\displaystyle=$ $\displaystyle\prod_{j=1}^{n}I_{j},$

where

 $I_{j}=\int_{\mathbb{R}}e^{-i\xi_{j}x_{j}}\sqrt{\lambda_{j}}\exp\left(-\frac{1}% {2}\lambda_{j}x_{j}^{2}\right)dm_{1}(x_{j}).$

Using

 $-i\xi_{j}x_{j}-\frac{1}{2}\lambda_{j}x_{j}^{2}=-\frac{\lambda_{j}}{2}\left(% \left(x_{j}+\frac{i\xi_{j}}{\lambda_{j}}\right)^{2}+\frac{\xi_{j}^{2}}{\lambda% _{j}^{2}}\right)=-\frac{\lambda_{j}}{2}\left(x_{j}+\frac{i\xi_{j}}{\lambda_{j}% }\right)^{2}-\frac{\xi_{j}^{2}}{2\lambda_{j}},$

we get, doing contour integration,

 $\displaystyle I_{j}$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}}\sqrt{\lambda_{j}}\exp\left(-\frac{\lambda_{j}}{% 2}\left(x_{j}+\frac{i\xi_{j}}{\lambda_{j}}\right)^{2}\right)\exp\left(-\frac{% \xi_{j}^{2}}{2\lambda_{j}}\right)dm_{1}(x_{j})$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}}\sqrt{\lambda_{j}}\exp\left(-\frac{\lambda_{j}x_% {j}^{2}}{2}\right)\exp\left(-\frac{\xi_{j}^{2}}{2\lambda_{j}}\right)dm_{1}(x_{% j})$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}}\sqrt{\lambda_{j}}\exp(-y_{j}^{2})\exp\left(-% \frac{\xi_{j}^{2}}{2\lambda_{j}}\right)\sqrt{\frac{2}{\lambda_{j}}}dm_{1}(y_{j})$ $\displaystyle=$ $\displaystyle\exp\left(-\frac{\xi_{j}^{2}}{2\lambda_{j}}\right)\int_{\mathbb{R% }}\sqrt{2}\exp(-y_{j}^{2})dm_{1}(y_{j})$ $\displaystyle=$ $\displaystyle\exp\left(-\frac{\xi_{j}^{2}}{2\lambda_{j}}\right)\int_{\mathbb{R% }}\frac{1}{\sqrt{\pi}}\exp(-y_{j}^{2})dy_{j}$ $\displaystyle=$ $\displaystyle\exp\left(-\frac{\xi_{j}^{2}}{2\lambda_{j}}\right).$

Therefore, as $\Lambda^{-1}\xi=\sum_{j=1}^{n}\frac{\xi_{j}}{\lambda_{j}}e_{j}$ and $\xi\cdot\Lambda^{-1}\xi=\sum_{j=1}^{n}\frac{\xi_{j}^{2}}{\lambda_{j}}$,

 $\displaystyle\hat{\mu}(\xi)$ $\displaystyle=$ $\displaystyle\prod_{j=1}^{n}\exp\left(-\frac{\xi_{j}^{2}}{2\lambda_{j}}\right)$ $\displaystyle=$ $\displaystyle\exp\left(-\frac{1}{2}\sum_{j=1}^{n}\frac{\xi_{j}^{2}}{\lambda_{j% }}\right)$ $\displaystyle=$ $\displaystyle\exp\left(-\frac{1}{2}\xi\cdot\Lambda^{-1}\xi\right).$

From the above theorem we get

 $\hat{\mu}(0)=1,$

and hence a Gaussian measure is a probability measure.

For $h\in\mathbb{R}^{n}$, define $T_{h}:\mathbb{R}^{n}\to\mathbb{R}^{n}$ by $T_{h}(x)=x-h$. If $E$ is a Borel subset of $\mathbb{R}^{n}$, because $\chi_{T_{-h}(E)}=\chi_{E}\circ T_{h}$,

 $((T_{h})_{*}\mu)(E)=\mu(T_{h}^{-1}(E))=\mu(T_{-h}(E))=\int_{\mathbb{R}^{n}}% \chi_{T_{-h}(E)}d\mu=\int_{\mathbb{R}^{n}}\chi_{E}\circ T_{h}d\mu.$

Then, because $T_{h}\circ T_{-h}=\mathrm{id}_{\mathbb{R}^{n}}$,

 $\displaystyle\int_{\mathbb{R}^{n}}\chi_{E}\circ T_{h}d\mu$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\chi_{E}\circ T_{h}(x)\sqrt{\det\Lambda}\exp% \left(-\frac{1}{2}x\cdot\Lambda x\right)dm_{n}(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\chi_{E}(x)\sqrt{\det\Lambda}\exp\left(-% \frac{1}{2}(T_{-h}x)\cdot(\Lambda T_{-h}x)\right)d((T_{-h})_{*}m_{n})(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\chi_{E}(x)\sqrt{\det\Lambda}\exp\left(-% \frac{1}{2}(T_{-h}x)\cdot(\Lambda T_{-h}x)\right)dm_{n}(x).$

As $\Lambda$ is self-adjoint $\Lambda x\cdot h=x\cdot\Lambda h$,

 $\displaystyle(T_{-h}x)\cdot(\Lambda T_{-h}x)$ $\displaystyle=$ $\displaystyle(x+h)\cdot(\Lambda(x+h))$ $\displaystyle=$ $\displaystyle(x+h)\cdot(\Lambda x+\Lambda h)$ $\displaystyle=$ $\displaystyle x\cdot\Lambda x+x\cdot\Lambda h+h\cdot\Lambda x+h\cdot\Lambda h$ $\displaystyle=$ $\displaystyle x\cdot\Lambda x+2x\cdot\Lambda h+h\cdot\Lambda h.$

Therefore,

 $\displaystyle((T_{h})_{*}\mu)(E)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\chi_{E}(x)\exp\left(-\frac{1}{2}\left(2x% \cdot\Lambda h+h\cdot\Lambda h\right)\right)d\mu(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\chi_{E}(x)\exp\left(-x\cdot\Lambda h-\frac{% 1}{2}h\cdot\Lambda h\right)d\mu(x).$

This shows that the Radon-Nikodym derivative of $(T_{h})_{*}\mu$ with respect to $\mu$ is

 $\frac{d(T_{h})_{*}\mu}{d\mu}(x)=\exp\left(-x\cdot\Lambda h-\frac{1}{2}h\cdot% \Lambda h\right).$

## 3 Positive-definite functions

We say that a function $\phi:\mathbb{R}^{n}\to\mathbb{C}$ is positive-definite if $x_{1},\ldots,x_{r}\in\mathbb{R}^{n}$ and $c_{1},\ldots,c_{r}\in\mathbb{C}$ imply that

 $\sum_{i,j=1}^{r}c_{i}\overline{c_{j}}\phi(x_{i}-x_{j})\geq 0;$

in particular, the left-hand side is real.

Using $r=1$, $c_{1}=1$, we have for any $x_{1}\in\mathbb{R}^{n}$ that $\phi(x_{1}-x_{1})\geq 0$, i.e. $\phi(0)\geq 0$. For $x\in\mathbb{R}^{n}$, using $r=2$, $x_{1}=x,x_{2}=0$ and choosing fitting $c_{1},c_{2}\in\mathbb{C}$ gives

 $\phi(-x)=\overline{\phi(x)},$

and using this with $c_{2}=1$ and for appropriate $c_{1}$ gives

 $|\phi(x)|\leq\phi(0).$

For $f,g\in L^{1}(\mathbb{R}^{n})$, the convolution of $f$ and $g$ is the function $f*g:\mathbb{R}^{n}\to\mathbb{C}$ defined by

 $(f*g)(x)=\int_{\mathbb{R}^{n}}f(y)g(x-y)dm_{n}(y),\qquad x\in\mathbb{R}^{n},$

and $\left\|f*g\right\|_{L^{1}}\leq\left\|f\right\|_{L^{1}}\left\|g\right\|_{L^{1}}$, a case of Young’s inequality. For $f:\mathbb{R}^{n}\to\mathbb{C}$, we denote by $\mathrm{supp}\,f$ the essential support of $f$; if $f$ is continuous, then $\mathrm{supp}\,f$ is the closure of the set $\{x\in\mathbb{R}^{n}:f(x)\neq 0\}$. A fact that we will use later is11 1 Gerald B. Folland, Real Analysis: Modern Techniques and their Applications, second ed., p. 240, Proposition 8.6.

 $\mathrm{supp}\,(f*g)\subseteq\overline{\mathrm{supp}\,f+\mathrm{supp}\,g}.$

We denote by $f^{*}$ the function defined by $f^{*}(x)=\overline{f(-x)}$.

$C_{c}(\mathbb{R}^{n})$ is the set of all $f\in C(\mathbb{R}^{n})$ for which $\mathrm{supp}\,f$ is a compact set. The set $C_{c}(\mathbb{R}^{n})$ is dense in the Banach space $C_{0}(\mathbb{R}^{n})$ and also in the Banach space $L^{1}(\mathbb{R}^{n})$; $C_{c}(\mathbb{R}^{n})$ is not a Banach space or even a Fréchet space, and thus does not have a robust structure itself, but is used because it is easier to prove things for it which one then extends in some way to spaces in which the set is dense. The proof of the following theorem follows Folland.22 2 Gerald B. Folland, A Course in Abstract Harmonic Analysis, p. 85, Proposition 3.35.

###### Theorem 3.

If $\phi:\mathbb{R}^{n}\to\mathbb{C}$ is positive-definite and continuous and $f\in C_{c}(\mathbb{R}^{n})$, then

 $\int(f^{*}*f)\phi\geq 0.$
###### Proof.

Write $K=\mathrm{supp}\,f$, and define $F:\mathbb{R}^{n}\times\mathbb{R}^{n}\to\mathbb{C}$ by

 $F(x,y)=f(x)\overline{f(y)}\phi(x-y).$

$F$ is continuous, and $\mathrm{supp}\,F\subseteq K\times K$, hence $\mathrm{supp}\,F$ is compact. Thus $F\in C_{c}(\mathbb{R}^{n}\times\mathbb{R}^{n})$; in particular $F$ is uniformly continuous on $K\times K$, and it follows that for each $\epsilon>0$ there is some $\delta>0$ such that if $|x-a|<\delta$ and $|y-b|<\delta$ then $|F(x,y)-F(a,b)|<\epsilon$. The collection $\{B_{\delta}(x):x\in K\}$ covers $K$ and hence there are finitely many distinct $x_{i}\in K$ such that the collection $\{B_{\delta}(x_{i}):i\}$ covers $K$. Then $\{B_{\delta}(x_{i})\times B_{\delta}(x_{j}):i,j\}$ covers $K\times K$. Let $E_{i}$ be pairwise disjoint, measurable, and satisfy $x_{i}\in E_{i}\subseteq B_{\delta}(x_{i})$. The collection $\{E_{i}:i,\}$ covers $K$, so the collection $\{E_{i}\times E_{j}:i,j\}$ covers $K\times K$.

Define

 $R=\sum_{i,j}\int_{E_{i}\times E_{j}}(F(x,y)-F(x_{i},x_{j}))dm_{n}(x)dm_{n}(y).$

$R$ satisfies

 $\displaystyle|R|$ $\displaystyle\leq$ $\displaystyle\sum_{i,j}\int_{E_{i}\times E_{j}}|F(x,y)-F(x_{i},x_{j})|dm_{n}(x% )dm_{n}(y)$ $\displaystyle\leq$ $\displaystyle\sum_{i,j}\int_{E_{i}\times E_{j}}\epsilon dm_{n}(x)dm_{n}(y)$ $\displaystyle=$ $\displaystyle\epsilon\sum_{i,j}m_{n}(E_{i})m_{n}(E_{j})$ $\displaystyle=$ $\displaystyle\epsilon m_{n}(K)^{2}.$

We obtain

 $\displaystyle\int_{K\times K}F(x,y)dm_{n}(x)dm_{n}(y)$ $\displaystyle=$ $\displaystyle\sum_{i,j}\int_{E_{i}\times E_{j}}F(x,y)dm_{n}(x)dm_{n}(y)$ $\displaystyle=$ $\displaystyle\sum_{i,j}F(x_{i},x_{j})m_{n}(E_{i})m_{n}(E_{j})+R$ $\displaystyle=$ $\displaystyle\sum_{i,j}f(x_{i})\overline{f(x_{j})}\phi(x_{i}-x_{j})m_{n}(E_{i}% )m_{n}(E_{j})+R.$

Using $c_{i}=f(x_{i})m_{n}(E_{i})$, the fact that $\phi$ is positive-definite means that the sum is $\geq 0$. Therefore

 $\int_{K\times K}F(x,y)dm_{n}(x)dm_{n}(y)\geq-|R|\geq-\epsilon m_{n}(K)^{2}.$

This is true for all $\epsilon>0$, hence

 $\int_{\mathbb{R}^{n}}\int_{\mathbb{R}^{n}}f(x)\overline{f(y)}\phi(x-y)dm_{n}(x% )dm_{n}(y)=\int_{K\times K}F(x,y)dm_{n}(x)dm_{n}(y)\geq 0.$

But

 $\displaystyle\int_{\mathbb{R}^{n}}(f^{*}*f)(x)\phi(x)dm_{n}(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\left(\int_{\mathbb{R}^{n}}f^{*}(y)f(x-y)dm_% {n}(y)\right)\phi(x)dm_{n}(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\int_{\mathbb{R}^{n}}\overline{f(-y)}f(x-y)% \phi(x)dm_{n}(x)dm_{n}(y)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\int_{\mathbb{R}^{n}}\overline{f(-y)}f(x)% \phi(x+y)dm_{n}(x)dm_{n}(y)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\int_{\mathbb{R}^{n}}\overline{f(y)}f(x)\phi% (x-y)dm_{n}(x)dm_{n}(y).$

###### Corollary 4.

If $\phi:\mathbb{R}^{n}\to\mathbb{C}$ is positive-definite and continuous and $f\in L^{1}(\mathbb{R}^{n})$, then

 $\int(f^{*}*f)\phi\geq 0.$
###### Proof.

Let $f_{n}\in C_{c}(\mathbb{R}^{n})$ converge to $f$ in $L^{1}(\mathbb{R}^{n})$ as $n\to\infty$; that there is such a sequence is given to us by the fact that $C_{c}(\mathbb{R}^{n})$ is a dense subset of $L^{1}(\mathbb{R}^{n})$. Using

 $\displaystyle f_{n}^{*}*f_{n}-f^{*}*f$ $\displaystyle=$ $\displaystyle f_{n}^{*}*f_{n}-f_{n}^{*}*f+f_{n}^{*}*f-f^{*}*f$ $\displaystyle=$ $\displaystyle f_{n}^{*}*(f_{n}-f)+(f_{n}^{*}-f^{*})*f$ $\displaystyle=$ $\displaystyle f_{n}^{*}*(f_{n}-f)+(f_{n}-f)^{*}*f,$

and $\left\|g^{*}\right\|_{L^{1}}=\left\|g\right\|_{L^{1}}$, we get

 $\displaystyle\left\|f_{n}^{*}*f_{n}-f^{*}*f\right\|_{L^{1}}$ $\displaystyle\leq$ $\displaystyle\left\|f_{n}^{*}*(f_{n}-f)\right\|_{L^{1}}+\left\|(f_{n}-f)^{*}*f% \right\|_{L^{1}}$ $\displaystyle\leq$ $\displaystyle\left\|f_{n}^{*}\right\|_{L^{1}}\left\|f_{n}-f\right\|_{L^{1}}+% \left\|(f_{n}-f)^{*}\right\|_{L^{1}}\left\|f\right\|_{L^{1}}$ $\displaystyle=$ $\displaystyle\left\|f_{n}\right\|_{L^{1}}\left\|f_{n}-f\right\|_{L^{1}}+\left% \|f_{n}-f\right\|_{L^{1}}\left\|f\right\|_{L^{1}},$

which converges to $0$ because $\left\|f_{n}-f\right\|_{L^{1}}\to 0$. Therefore, because $\phi$ is bounded,

 $\int_{\mathbb{R}^{n}}(f_{n}^{*}*f_{n})\phi dm_{n}\to\int_{\mathbb{R}^{n}}(f^{*% }*f)\phi dm_{n}.$

As $\int_{\mathbb{R}^{n}}(f_{n}^{*}*f_{n})\phi dm_{n}\geq 0$ for each $n$, this implies that $\int_{\mathbb{R}^{n}}(f^{*}*f)\phi dm_{n}\geq 0$. ∎

It is straightforward to prove that the Fourier transform of a finite positive Borel measure is a positive-definite function; one ends up with the expression

 $\int_{\mathbb{R}^{n}}\left|\sum_{j=1}^{n}c_{j}e^{i\xi_{j}\cdot x}\right|^{2}d% \mu(x),$

which is finite and nonnegative because $\mu$ is finite and positive respectively. We have established already that the Fourier transform of a finite positive Borel measure $\mu$ on $\mathbb{R}^{n}$ is continuous and satisfies $\hat{\mu}(0)=1$. Bochner’s theorem is the statement that a function with these three properties is indeed the Fourier transform of a finite positive Borel measure. Our proof of the following theorem follows Folland.33 3 Gerald B. Folland, A Course in Abstract Harmonic Analysis, p. 95, Theorem 4.18.

###### Theorem 5 (Bochner).

If $\phi:\mathbb{R}^{n}\to\mathbb{C}$ is positive-definite, continuous, and satisfies $\phi(0)=1$, then there is some Borel probability measure $\mu$ on $\mathbb{R}^{n}$ such that $\phi=\hat{\mu}$.

###### Proof.

Let $\{\psi_{U}\}$ be an approximate identity. That is, for each neighborhood $U$ of $0$, $\psi_{U}$ is a function such that $\mathrm{supp}\,\psi_{U}$ is compact and contained in $U$, $\psi\geq 0$, $\psi_{U}(-x)=\psi_{U}(x)$, and $\int_{\mathbb{R}^{n}}\psi_{U}dm_{n}=1$. For every $f\in L^{1}(\mathbb{R}^{n})$, an approximate identity satisfies $\left\|f*\psi_{U}-f\right\|_{L^{1}}\to 0$ as $U\to\{0\}$.44 4 Gerald B. Folland, A Course in Abstract Harmonic Analysis, p. 53, Proposition 2.42.

We have $\psi_{U}^{*}=\psi_{-U}$, so

 $\mathrm{supp}\,(\psi_{U}^{*}*\psi_{U})\subseteq\overline{\mathrm{supp}\,\psi_{% -U}+\mathrm{supp}\,\psi_{U}}=\mathrm{supp}\,\psi_{-U}+\mathrm{supp}\,\psi_{U}% \subseteq-U+U,$

and as always, $\int_{\mathbb{R}^{n}}f*gdm_{n}=\int_{\mathbb{R}^{n}}fdm_{n}\int_{\mathbb{R}^{n% }}gdm_{n}$. Therefore $\{\psi_{U}^{*}*\psi_{U}\}$ is an approximate identity:

For $f,g\in L^{1}(\mathbb{R}^{n})$, define

 $\left\langle f,g\right\rangle_{\phi}=\int_{\mathbb{R}^{n}}(g^{*}*f)\phi dm_{n}.$

One checks that this is a positive Hermitian form; positive means that $\left\langle f,f\right\rangle_{\phi}\geq 0$ for all $f\in L^{1}(\mathbb{R}^{n})$, and this is given to us by Corollary 4. Using the Cauchy-Schwarz inequality,55 5 Jean Dieudonne, Foundations of Modern Analysis, 1969, p. 117, Theorem 6.2.1.

 $|\left\langle f,g\right\rangle_{\phi}|^{2}\leq\left\langle f,f\right\rangle_{% \phi}\left\langle g,g\right\rangle_{\phi}.$

We have laid out the tools that we will use. Let $f\in L^{1}(\mathbb{R}^{n})$. $\psi_{U}*f\to f$ in $L^{1}$ as $U\to\{0\}$, and as $\phi$ is bounded this gives $\int_{\mathbb{R}^{n}}(\psi_{U}^{*}*f)\phi dm_{n}\to\int_{\mathbb{R}^{n}}f\phi dm% _{n}$ as $U\to\{0\}$. Because $\{\psi_{U}^{*}*\psi_{U}\}$ is an approximate identity, $\int_{\mathbb{R}^{n}}(\psi_{U}^{*}*\psi_{U})\phi dm_{n}\to\phi(0)$ as $U\to\{0\}$. That is, we have $\left\langle f,\psi_{U}\right\rangle_{\phi}\to\int_{\mathbb{R}^{n}}f\phi dm_{n}$ and $\left\langle\psi_{U},\psi_{U}\right\rangle_{\phi}\to\phi(0$ as $U\to\{0\}$, and as $\phi(0)=1$, the above statemtn of the Cauchy-Schwarz inequality produces

 $\left|\int_{\mathbb{R}^{n}}f\phi dm_{n}\right|^{2}\leq\int_{\mathbb{R}^{n}}(f^% {*}*f)\phi dm_{n}.$ (1)

With $h=f^{*}*f$, the inequality (1) reads

 $\left|\int_{\mathbb{R}^{n}}f\phi dm_{n}\right|^{2}\leq\int_{\mathbb{R}^{n}}h% \phi dm_{n}.$

Defining $h^{(1)}=h$, $h^{(2)}=h*h$, $h^{(3)}=h*h*h$, etc., applying (1) to $h$ gives, because $h^{*}=h$,

 $\left|\int_{\mathbb{R}^{n}}h\phi dm_{n}\right|^{2}\leq\int_{\mathbb{R}^{n}}h^{% (2)}\phi dm_{n}.$

Then applying (1) to $h^{(2)}$, which satisfies $(h^{(2)})^{*}=h^{(2)}$,

 $\left|\int_{\mathbb{R}^{n}}h^{(2)}\phi dm_{n}\right|^{2}\leq\int_{\mathbb{R}^{% n}}h^{(4)}\phi dm_{n}.$

Thus, for any $m\geq 0$ we have

 $\displaystyle\left|\int_{\mathbb{R}^{n}}f\phi dm_{n}\right|$ $\displaystyle\leq$ $\displaystyle\left|\int_{\mathbb{R}^{n}}h^{\left(2^{m}\right)}\phi dm_{n}% \right|^{2^{-(m+1)}}$ $\displaystyle\leq$ $\displaystyle\left\|h^{\left(2^{m}\right)}\right\|_{L^{1}}^{2^{-(m+1)}}$ $\displaystyle=$ $\displaystyle\left(\left\|h^{\left(2^{m}\right)}\right\|_{L^{1}}^{2^{-m}}% \right)^{1/2},$

since $\left\|\phi\right\|_{\infty}=\phi(0)=1$.

With convolution as multiplication, $L^{1}(\mathbb{R}^{n})$ is a commutative Banach algebra, and the Gelfand transform is an algebra homomorphism $L^{1}(\mathbb{R}^{n})\to C_{0}(\mathbb{R}^{n})$ that satisfies66 6 Gerald B. Folland, A Course in Abstract Harmonic Analysis, p. 15, Theorem 1.30. Namely, this is the Gelfand-Naimark theorem.

 $\left\|\hat{g}\right\|_{\infty}=\lim_{k\to\infty}\left\|g^{(k)}\right\|_{L^{1}% }^{1/k},\qquad g\in L^{1}(\mathbb{R}^{n});$

for $L^{1}(\mathbb{R}^{n})$, the Gelfand transform is the Fourier transform. Write the Fourier transform as $\mathscr{F}:L^{1}(\mathbb{R}^{n})\to C_{0}(\mathbb{R}^{n})$. Stating that the Gelfand transform is a homomorphism means that $\mathscr{F}(g_{1}*g_{2})=\mathscr{F}(g_{1})\mathscr{F}(g_{2})$, because multiplication in the Banach algebra $C_{0}(\mathbb{R}^{n})$ is pointwise multiplication. Then, since a subsequence of a convergent sequence converges to the same limit,

 $\lim_{m\to\infty}\left(\left\|h^{\left(2^{m}\right)}\right\|_{L^{1}}^{2^{-m}}% \right)^{1/2}=\left(\left\|\hat{h}\right\|_{\infty}\right)^{1/2}.$

But

 $\hat{h}=\mathscr{F}(f^{*}*f)=\mathscr{F}(f^{*})\mathscr{F}(f)=\overline{% \mathscr{F}(f)}\mathscr{F}(f)=\left|\mathscr{F}(f)\right|^{2},$

so

 $\left(\left\|\hat{h}\right\|_{\infty}\right)^{1/2}=\left(\left\||\hat{f}|^{2}% \right\|_{\infty}\right)^{1/2}=\left\|\hat{f}\right\|_{\infty}.$

Putting things together, we have that for any $f\in L^{1}(\mathbb{R}^{n})$,

 $\left|\int_{\mathbb{R}^{n}}f\phi dm_{n}\right|\leq\left\|\hat{f}\right\|_{% \infty}.$

Therefore $\hat{f}\mapsto\int_{\mathbb{R}^{n}}f\phi dm_{n}$ is a bounded linear functional $\mathscr{F}(L^{1}(\mathbb{R}^{n}))\to\mathbb{C}$, of norm $\leq 1$. Using $\phi(0)=1$, one proves that this functional has norm $1$. (If we could apply this inequality to $\mathscr{F}(\delta)$ the two sides would be equal, thus to prove that the operator norm is 1, one applies the inequality to a sequence of functions that converge weakly to $\delta$.) We take as known that $\mathscr{F}(L^{1}(\mathbb{R}^{n}))$ is dense in the Banach space $C_{0}(\mathbb{R}^{n})$, so there is a bounded linear functional $\Phi:C_{0}(\mathbb{R}^{n})\to\mathbb{C}$ whose restriction to $\mathscr{F}(L^{1}(\mathbb{R}^{n}))$ is equal to $\hat{f}\mapsto\int_{\mathbb{R}^{n}}f\phi dm_{n}$, and $\left\|\Phi\right\|=1$.

Using the Riesz-Markov theorem,77 7 Walter Rudin, Real and Complex Analysis, third ed., p. 130, Theorem 6.19. there is a regular complex Borel measure $\mu$ on $\mathbb{R}^{n}$ such that

 $\Phi(g)=\int_{\mathbb{R}^{n}}gd\mu,\qquad g\in C_{0}(\mathbb{R}^{n}),$

and $\left\|\mu\right\|=\left\|\Phi\right\|$; $\left\|\mu\right\|$ is the total variation norm of $\mu$, $\left\|\mu\right\|=|\mu|(\mathbb{R}^{n})$. Then for $f\in L^{1}(\mathbb{R}^{n})$ we have

 $\displaystyle\int_{\mathbb{R}^{n}}f\phi dm_{n}$ $\displaystyle=$ $\displaystyle\Phi(\hat{f})$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\hat{f}d\mu$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}\left(\int_{\mathbb{R}^{n}}e^{-i\xi\cdot x}f% (x)dm_{n}(x)\right)d\mu(\xi)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}f(x)\left(\int_{\mathbb{R}^{n}}e^{-ix\cdot% \xi}d\mu(\xi)\right)dm_{n}(x)$ $\displaystyle=$ $\displaystyle\int_{\mathbb{R}^{n}}f(x)\hat{\mu}(x)dm_{n}(x).$

That this is true for all $f\in L^{1}(\mathbb{R}^{n})$ implies that $\phi=\hat{\mu}$. As $\mu(\mathbb{R}^{n})=\hat{\mu}(0)=\phi(0)=1$ and $\left\|\mu\right\|=\left\|\Phi\right\|=1$ we have $\mu(\mathbb{R}^{n})=\left\|\mu\right\|$, and this implies that $\mu$ is positive measure, hence, as $\mu(\mathbb{R}^{n})=1$, a probability measure.