# The Glivenko-Cantelli theorem

Jordan Bell
April 12, 2015

## 1 Narrow topology

Let $X$ be a metrizable space and let $C_{b}(X)$ be the Banach space of bounded continuous functions $X\to\mathbb{R}$, with the norm $\left\|f\right\|=\sup_{x\in X}|f(x)|$. If $X$ is metrizable with the metric $d$, let $U_{d}(X)$ be the collection of bounded $d$-uniformly continuous functions $X\to\mathbb{R}$. This is a vector space and is a closed subset of $C_{b}(X)$, thus is itself a Banach space.

Let $X$ be a metrizable space and denote by $\mathscr{P}(X)$ the collection of Borel probability measures on $X$. The narrow topology on $\mathscr{P}(X)$ is the coarsest topology on $\mathscr{P}(X)$ such that for every $f\in C_{b}(X)$, the mapping $\mu\mapsto\int_{X}fd\mu$ is continuous $\mathscr{P}(X)\to\mathbb{R}$. It can be proved that if $X$ is metrizable with a metric $d$ and $D$ is a dense subset of $U_{d}(X)$, then the narrow topology is equal to the coarsest topology such that for each $f\in U_{d}(X)$, the mapping $\mu\mapsto\int_{X}fd\mu$ is continuous $\mathscr{P}(X)\to\mathbb{R}$.11 1 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 507, Theorem 15.2.

If $X$ is a separable metrizable space, then it is metrizable by a metric $d$ such that the metric space $(X,d)$ is totally bounded. It is a fact that if $(X,d)$ is a totally bounded metric space, then $U_{d}(X)$ is separable.22 2 Daniel W. Stroock, Probability Theory: An Analytic View, p. 371, Lemma 9.1.4.

###### Theorem 1.

If $X$ is a separable metrizable space, then $X$ is metrizable by a metric $d$ for which there is a countable dense subset $D$ of $U_{d}(X)$ such that $\mu_{n}$ converges narrowly to $\mu$ if and only if

 $\int_{X}fd\mu_{n}\to\int_{X}fd\mu,\qquad f\in D.$

## 2 Independent and identically distributed random variables

Let $(\Omega,S,P)$ be a probability space and let $X$ be a separable metric space, with the Borel $\sigma$-algebra $\mathscr{B}_{X}$. We say that a finite collection measurable functions $\xi_{i}:\Omega\to X$, $1\leq i\leq n$, is independent if

 $P\left(\bigcap_{i=1}^{n}\xi_{i}^{-1}(A_{i})\right)=\prod_{i=1}^{n}P(\xi_{i}^{-% 1}(A_{i})),\qquad A_{1},\ldots,A_{n}\in\mathscr{B}_{X},$

i.e.

 $P(\xi_{1}\in A_{1},\ldots,\xi_{n}\in A_{n})=P(\xi_{1}\in A_{1})\cdots P(\xi_{n% }\in A_{n}),\qquad A_{1},\ldots,A_{n}\in\mathscr{B}_{X}.$

We say that a family of measurable functions is independent if every finite subset of it is independent.

We say that two measurable functions $f,g:\Omega\to X$ are identically distributed if the pushforward $f_{*}P$ of $P$ by $f$ is equal to the pushforward $g_{*}P$ of $P$ by $g$, i.e. $P(f^{-1}(A))=P(g^{-1}(A))$ for every $A\in\mathscr{B}_{X}$. We say that a family of measurable functions is identically distributed if any two of them are identically distributed.

## 3 Strong law of large numbers

If $\zeta\in L^{1}(P)$, the expectation of $\zeta$ is

 $E(\zeta)=\int_{\Omega}\zeta dP,$

and by the change of variables theorem,

 $\int_{\Omega}\zeta(\omega)dP(\omega)=\int_{\mathbb{R}}xd(\zeta_{*}P)(x).$

The strong law of large numbers33 3 M. Loève, Probability Theory I, 4th ed., p. 251, 17.B. states that if $\zeta_{1},\zeta_{2},\ldots\in L^{1}(P)$ are independent and identically distributed, with common expectation $E_{0}$, then

 $P\left(\left\{\omega\in\Omega:\sum_{i=1}^{n}\frac{\zeta_{i}(\omega)}{n}\to E_{% 0}\right\}\right)=1.$

## 4 Sample distributions

Let $X$ be a separable metrizable space and let $\xi_{1},\xi_{2},\ldots$ be independent and identically distributed measurable functions $\Omega\to X$. For $\omega\in\Omega$, define $\mu_{n}^{\omega}$ on $\mathscr{B}_{X}$ by

 $\mu_{n}^{\omega}=\sum_{i=1}^{n}\frac{1}{n}\delta_{\xi_{i}(\omega)},$

which is a probability measure. We call the sequence $\mu_{n}^{\omega}$ the sample distribution of $\omega$.

The following is the Glivenko-Cantelli theorem, which shows that the sample distributions of a sequence of independent and identically distributed measurable functions converge narrowly almost everywhere to the common pushforward measure.44 4 K. R. Parthasarathy, Probability Measures on Metric Spaces, p. 53, Theorem 7.1.

###### Theorem 2 (Glivenko-Cantelli theorem).

Let $(\Omega,S,P)$ be a probability space, let $X$ be a separable metrizable space and let $\xi_{1},\xi_{2},\ldots$ be independent and identically distributed measurable functions $\Omega\to X$, with common pushforward measure $\mu$. Then

 $P\left(\left\{\omega\in\Omega:\textnormal{\mu_{n}^{\omega}\to\mu narrowly}% \right\}\right)=1.$
###### Proof.

For $g\in C_{b}(X)$, $g\circ\xi_{i}:\Omega\to\mathbb{R}$ is measurable bounded, hence belongs to $L^{1}(P)$. Also, $(g\circ\xi_{i})_{*}P=g_{*}\mu$, so the sequence $g\circ\xi_{i}$ are identically distributed. We now check that the sequence is independent. Let $A_{1},\ldots,A_{n}\in\mathscr{B}_{\mathbb{R}}$. Then $g^{-1}(A_{1}),\ldots,g^{-1}(A_{n})\in\mathscr{B}_{X}$, and because $\xi_{1},\xi_{2},\ldots$ are independent,

 $P\left(\bigcap_{i=1}^{n}\xi_{i}^{-1}(g^{-1}(A_{i}))\right)=\prod_{i=1}^{n}P(% \xi_{i}^{-1}(g^{-1}(A_{i}))),$

i.e.,

 $P\left(\bigcap_{i=1}^{n}(g\circ\xi_{i})^{-1}(A_{i})\right)=\prod_{i=1}^{n}P((g% \circ\xi_{i})^{-1}(A_{i})),$

showing that $(g\circ\xi_{1}),(g\circ\xi_{2}),\ldots$ are independent. For any $i$, by the change of variables theorem

 $E(g\circ\xi_{i})=\int_{\Omega}g\circ\xi_{i}dP=\int_{X}gd((\xi_{i})_{*}P)=\int_% {X}gd\mu,$

so the strong law of large numbers tells us that there is a set $N_{g}\in S$ with $P(N_{g})=0$ such that for all $\omega\in\Omega\setminus N_{g}$,

 $\sum_{i=1}^{n}\frac{(g\circ\xi_{i})(\omega)}{n}\to\int_{X}gd\mu.$

But

 $\sum_{i=1}^{n}\frac{(g\circ\xi_{i})(\omega)}{n}=\sum_{i=1}^{n}\frac{1}{n}\int_% {X}gd\delta_{\xi_{i}(\omega)}=\int_{X}gd\mu_{n}^{\omega},$

so for all $\omega\in\Omega\setminus N_{g}$,

 $\int_{X}gd\mu_{n}^{\omega}\to\int_{X}gd\mu.$

Because $X$ is separable, Theorem 1 tells us that there is a metric $d$ that induces the topology of $X$ and some countable dense subset $G$ of $U_{d}(X)$ such that a sequence $\nu_{n}$ in $\mathscr{P}(X)$ converges narrowly to $\nu$ if and only if

 $\int_{X}gd\nu_{n}\to\int_{X}gd\nu,\qquad g\in G.$

Now let $N=\bigcup_{g\in G}N_{g}$, which satisfies $P(N)=0$, and if $\omega\in\Omega\setminus N$ then for each $g\in G$,

 $\int_{X}gd\mu_{n}^{\omega}\to\int_{X}gd\mu.$

This implies that for all $\mu_{n}^{\omega}$ converges narrowly to $\mu$. That is, there is a set $N\in S$ with $P(N)=0$ such that for all $\omega\in\Omega\setminus N$, the sample distribution $\mu_{n}^{\omega}$ converges narrowly to the common pushforward measure $\mu$, proving the claim. ∎