The Glivenko-Cantelli theorem

Jordan Bell
April 12, 2015

1 Narrow topology

Let X be a metrizable space and let Cb(X) be the Banach space of bounded continuous functions X, with the norm f=supxX|f(x)|. If X is metrizable with the metric d, let Ud(X) be the collection of bounded d-uniformly continuous functions X. This is a vector space and is a closed subset of Cb(X), thus is itself a Banach space.

Let X be a metrizable space and denote by 𝒫(X) the collection of Borel probability measures on X. The narrow topology on P(X) is the coarsest topology on 𝒫(X) such that for every fCb(X), the mapping μXf𝑑μ is continuous 𝒫(X). It can be proved that if X is metrizable with a metric d and D is a dense subset of Ud(X), then the narrow topology is equal to the coarsest topology such that for each fUd(X), the mapping μXf𝑑μ is continuous 𝒫(X).11 1 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 507, Theorem 15.2.

If X is a separable metrizable space, then it is metrizable by a metric d such that the metric space (X,d) is totally bounded. It is a fact that if (X,d) is a totally bounded metric space, then Ud(X) is separable.22 2 Daniel W. Stroock, Probability Theory: An Analytic View, p. 371, Lemma 9.1.4.

Theorem 1.

If X is a separable metrizable space, then X is metrizable by a metric d for which there is a countable dense subset D of Ud(X) such that μn converges narrowly to μ if and only if

Xf𝑑μnXf𝑑μ,fD.

2 Independent and identically distributed random variables

Let (Ω,S,P) be a probability space and let X be a separable metric space, with the Borel σ-algebra X. We say that a finite collection measurable functions ξi:ΩX, 1in, is independent if

P(i=1nξi-1(Ai))=i=1nP(ξi-1(Ai)),A1,,AnX,

i.e.

P(ξ1A1,,ξnAn)=P(ξ1A1)P(ξnAn),A1,,AnX.

We say that a family of measurable functions is independent if every finite subset of it is independent.

We say that two measurable functions f,g:ΩX are identically distributed if the pushforward f*P of P by f is equal to the pushforward g*P of P by g, i.e. P(f-1(A))=P(g-1(A)) for every AX. We say that a family of measurable functions is identically distributed if any two of them are identically distributed.

3 Strong law of large numbers

If ζL1(P), the expectation of ζ is

E(ζ)=Ωζ𝑑P,

and by the change of variables theorem,

Ωζ(ω)𝑑P(ω)=xd(ζ*P)(x).

The strong law of large numbers33 3 M. Loève, Probability Theory I, 4th ed., p. 251, 17.B. states that if ζ1,ζ2,L1(P) are independent and identically distributed, with common expectation E0, then

P({ωΩ:i=1nζi(ω)nE0})=1.

4 Sample distributions

Let X be a separable metrizable space and let ξ1,ξ2, be independent and identically distributed measurable functions ΩX. For ωΩ, define μnω on X by

μnω=i=1n1nδξi(ω),

which is a probability measure. We call the sequence μnω the sample distribution of ω.

The following is the Glivenko-Cantelli theorem, which shows that the sample distributions of a sequence of independent and identically distributed measurable functions converge narrowly almost everywhere to the common pushforward measure.44 4 K. R. Parthasarathy, Probability Measures on Metric Spaces, p. 53, Theorem 7.1.

Theorem 2 (Glivenko-Cantelli theorem).

Let (Ω,S,P) be a probability space, let X be a separable metrizable space and let ξ1,ξ2, be independent and identically distributed measurable functions ΩX, with common pushforward measure μ. Then

P({ωΩ:μnωμ narrowly})=1.
Proof.

For gCb(X), gξi:Ω is measurable bounded, hence belongs to L1(P). Also, (gξi)*P=g*μ, so the sequence gξi are identically distributed. We now check that the sequence is independent. Let A1,,An. Then g-1(A1),,g-1(An)X, and because ξ1,ξ2, are independent,

P(i=1nξi-1(g-1(Ai)))=i=1nP(ξi-1(g-1(Ai))),

i.e.,

P(i=1n(gξi)-1(Ai))=i=1nP((gξi)-1(Ai)),

showing that (gξ1),(gξ2), are independent. For any i, by the change of variables theorem

E(gξi)=Ωgξi𝑑P=Xgd((ξi)*P)=Xg𝑑μ,

so the strong law of large numbers tells us that there is a set NgS with P(Ng)=0 such that for all ωΩNg,

i=1n(gξi)(ω)nXg𝑑μ.

But

i=1n(gξi)(ω)n=i=1n1nXg𝑑δξi(ω)=Xg𝑑μnω,

so for all ωΩNg,

Xg𝑑μnωXg𝑑μ.

Because X is separable, Theorem 1 tells us that there is a metric d that induces the topology of X and some countable dense subset G of Ud(X) such that a sequence νn in 𝒫(X) converges narrowly to ν if and only if

Xg𝑑νnXg𝑑ν,gG.

Now let N=gGNg, which satisfies P(N)=0, and if ωΩN then for each gG,

Xg𝑑μnωXg𝑑μ.

This implies that for all μnω converges narrowly to μ. That is, there is a set NS with P(N)=0 such that for all ωΩN, the sample distribution μnω converges narrowly to the common pushforward measure μ, proving the claim. ∎