The narrow topology on the set of Borel probability measures on a metrizable space

Jordan Bell

March 29, 2015

1 Introduction

In this note we talk about the total variation metric and the narrow topology (also called the weak-* topology) on the set of Borel probability measures on a metrizable topological space. In the course of the note we define all the terms used in the previous sentence.

The reason for talking about metrizable spaces rather than metric spaces in this note is to make explicit that the objects we define do not depend on a metric but only on the topological properties of a space. On the other hand, the Prokhorov metric, which we do not talk about in this note, depends on the metric, not merely on the fact that a topological space is metrizable.

2 Preliminaries

If $\mathfrak{M}$ is a $\sigma$ -algebra on a set $X$ , a positive measure is a function $\mu:\mathfrak{M}\to[0,\infty]$ such that $\{A_{i}\}$ being a countable subset of $\mathfrak{M}$ with pairwise disjoint members implies that

\mu\left(\bigcup_{i=1}^{\infty}A_{i}\right)=\sum_{i=1}^{\infty}\mu(A_{i}).

A positive measure $\mu$ is called finite if $X$ has finite measure, a probability measure if $\mu(X)=1$ , and $\sigma$ -finite if there is a countable collection $\{E_{i}\}\subset\mathfrak{M}$ each member of which has finite measure and that satisfies $X=\bigcup_{i=1}^{\infty}E_{i}$ .

Suppose that $\mathfrak{M}$ is a $\sigma$ -algebra on a set and that $E\in\mathfrak{M}$ . A countable collection $\{E_{i}\}\subset\mathfrak{M}$ whose members are pairwise disjoint and that satisfies $E=\bigcup_{i=1}^{\infty}E_{i}$ is called a partition of $E$ . A signed measure is a function $\mu:\mathfrak{M}\to\mathbb{R}$ such that if $E\in\mathfrak{M}$ , then

\mu(E)=\sum_{i=1}^{\infty}\mu(E_{i})

(1)

for every partition $\{E_{i}\}$ of $E$ .

Any statement we make about a measure on a $\sigma$ -algebra without specifying whether it is a positive measure or a signed measure applies to both classes.

3 Total variation

If $\mathfrak{M}$ is a $\sigma$ -algebra on a set $X$ and $\mu$ is a signed measure on $\mathfrak{M}$ , for $E\in\mathfrak{M}$ we define

|\mu|(E)=\sup\left\{\sum_{i=1}^{\infty}|\mu(E_{i})|:\textrm{$\{E_{i}\}$ is a % partition of $E$}\right\}.

We call $|\mu|$ the total variation of $\mu$ . One proves that if $\mu$ is a signed measure on $\mathfrak{M}$ , then the total variation $|\mu|$ is a finite positive measure on $\mathfrak{M}$ .¹¹ 1 Walter Rudin, Real and Complex Analysis, third ed., pp. 117-118, Theorem 6.2 and Theorem 6.4.

If $\mathfrak{M}$ is a $\sigma$ -algebra on a set, we denote by $ca(\mathfrak{M})$ the set of signed measures on $\mathfrak{M}$ ; the notation $c a$ stands for “countably additive”. For $\mu,\lambda\in ca(\mathfrak{M})$ and $c\in\mathbb{R}$ , we define

(a\mu+\lambda)(E)=a\mu(E)+\lambda(E),\qquad E\in\mathfrak{M}.

It is straightforward to check that $a\mu+\lambda\in ca(\mathfrak{M})$ , so that $ca(\mathfrak{M})$ is a vector space. We check that

\left\|\mu\right\|=|\mu|(X),

is a norm $ca(\mathfrak{M})$ , called the total variation norm.

Theorem 1.

Suppose that $\mathfrak{M}$ is a $\sigma$ -algebra on a set. Then $ca(\mathfrak{M})$ is a real Banach space with the total variation norm.

Suppose that $\mu$ is a positive measure on a $\sigma$ -algebra $\mathfrak{M}$ and that $\lambda$ is a measure on $\mathfrak{M}$ . We say that $\lambda$ absolutely continuous with respect to $\mu$ , denoted

\lambda\ll\mu,

if $E\in\mathfrak{M}$ and $\mu(E)=0$ imply that $\lambda(E)=0$ .

If $A\in\mathfrak{M}$ and $\lambda(E)=\lambda(A\cap E)$ for every $E\in\mathfrak{M}$ , we say that $\lambda$ is concentrated on $A$ . If $\lambda_{1},\lambda_{2}$ are measures on $\mathfrak{M}$ and there are disjoint sets $A_{1},A_{2}\in\mathfrak{M}$ such that $\lambda_{1}$ is concentrated on $A_{1}$ and $\lambda_{2}$ is concentrated on $A_{2}$ , we say that $\lambda_{1}$ and $\lambda_{2}$ are mutually singular and write

\lambda_{1}\perp\lambda_{2}.

The following theorem states both the Lebesgue decomposition and the Radon-Nikodym theorem.²² 2 Walter Rudin, Real and Complex Analysis, third ed., p. 121, Theorem 6.10.

Theorem 2 (The Lebesgue decomposition and the Radon-Nikodym thoerem).

Suppose that $\mathfrak{M}$ is a $\sigma$ -algebra on a set $X$ and that $\mu$ is a $\sigma$ -finite positive measure on $\mathfrak{M}$ . If $\lambda\in ca(\mathfrak{M})$ , then:

1.

There is a unique pair $(\lambda_{a},\lambda_{s})$ of elements of $ca(\mathfrak{M})$ such that

$\lambda=\lambda_{a}+\lambda_{s},\qquad\lambda_{a}\ll\mu,\qquad\lambda_{s}\perp\mu.$

If $\lambda$ is a finite positive measure, then so are $\lambda_{a}$ and $\lambda_{s}$ .
2.

There is a unique $h\in L^{1}(\mu)$ such that

$\lambda_{a}(E)=\int_{E}hd\mu,\qquad E\in\mathfrak{M}.$

The function $h\in L^{1}(\mu)$ in the above theorem is called the Radon-Nikodym derivative of $\lambda$ with respect to $\mu$ , and we write

d\lambda=hd\mu.

Suppose that $\mu$ is a signed measure on a $\sigma$ -algebra $\mathfrak{M}$ . Then $|\mu|$ is a finite positive measure on $\mathfrak{M}$ . For any $E\in\mathfrak{M}$ , $|\mu|(E)\geq|\mu(E)|$ , so $\mu$ is absolutely continuous with respect to $|\mu|$ and by the Radon-Nikodym theorem there is some $h\in L^{1}(|\mu|)$ such that

d\mu=hd|\mu|.

One proves³³ 3 Walter Rudin, Real and Complex Analysis, third ed., p. 124, Theorem 6.12. that $|h|=1$ , that is, that for $|\mu|$ -almost all $x$ , $h(x)\in\{-1,1\}$ . Using this function $h$ , we define

\int fd\mu=\int fhd|\mu|.

The right-hand side is the integral of a real valued function with respect to a finite positive measure.

If one wishes to speak only about probability measures rather than signed measures, one might choose to use the expression in the following lemma to define a metric on the set of probability measures.

Lemma 3.

If $\mu$ and $\nu$ are probability measures on a $\sigma$ -algebra $\mathfrak{M}$ on a set $X$ , then

\left\|\mu-\nu\right\|=2\sup_{E\in\mathfrak{M}}|\mu(E)-\nu(E)|.

Proof.

Let $m=\mu+\nu$ . Then $\mu$ and $\nu$ are each absolutely continuous with respect to $m$ , so by the Radon-Nikodym theorem there are $f,g\in L^{1}(m)$ such that $d\mu=fdm$ and $d\nu=gdm$ . Let $\lambda=\mu-\nu$ , which satisfies $d\lambda=(f-g)dm$ , and write $h=f-g$ . For any $E\in\mathfrak{M}$ we have

\lambda(E)+\lambda(X\setminus E)=\lambda(X)=\mu(X)-\nu(X)=0,

hence $|\lambda(E)|=|\lambda(X\setminus E)|$ and therefore

$\displaystyle\|\lambda(E)\|$	$\displaystyle=$	$\displaystyle\frac{1}{2}(\|\lambda(E)\|+\|\lambda(X\setminus E)\|)$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\left(\left\|\int_{E}hdm\right\|+\left\|\int_{X\setminus E% }hdm\right\|\right)$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2}\left(\int_{E}\|h\|dm+\int_{X\setminus E}\|h\|dm\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\int_{X}\|h\|dm.$

This shows that

\sup_{E\in\mathfrak{M}}|\lambda(E)|\leq\frac{1}{2}\int_{X}|h|dm.

Let $E=\{x\in X:h(x)>0\}\in\mathfrak{M}$ . For this set $E$ ,

$\displaystyle\|\lambda(E)\|$	$\displaystyle=$	$\displaystyle\frac{1}{2}\left(\left\|\int_{E}hdm\right\|+\left\|\int_{X\setminus E% }hdm\right\|\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\left(\int_{E}hdm-\int_{X\setminus E}hdm\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\left(\int_{E}\|h\|dm+\int_{X\setminus E}\|h\|dm\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\int_{X}\|h\|dm.$

Therefore the previous inequality is in fact an equality:

\sup_{E\in\mathfrak{M}}|\lambda(E)|=\frac{1}{2}\int_{X}|h|dm.

But $d\lambda=hdm$ implies that

\left\|\lambda\right\|=\int_{X}|h|dm;

this is proved in Rudin.⁴⁴ 4 Walter Rudin, Real and Complex Analysis, third ed., p. 125, Theorem 6.13. Therefore

\left\|\mu-\nu\right\|=\left\|\lambda\right\|=2\sup_{E\in\mathfrak{M}}|\lambda% (E)|=2\sup_{E\in\mathfrak{M}}|\mu(E)-\nu(E)|.

∎

4 Borel measures

Suppose that $X$ is a topological space. The smallest $\sigma$ -algebra containing the topology of $X$ is called the Borel $\sigma$ -algebra of $X$ . The Borel $\sigma$ -algebra of $X$ is denoted $\mathscr{B}_{X}$ and elements of $\mathscr{B}_{X}$ are called Borel sets. A measure on $\mathscr{B}_{X}$ is called a Borel measure on $X$ .

Suppose that $\mu$ is a positive Borel measure on $X$ . We say that $\mu$ is inner regular if for every Borel set $E$ we have

\mu(E)=\sup\{\mu(C):\textrm{$C\subset E$ and $C$ is closed}\}

and outer regular if for every Borel set $E$ we have

\mu(E)=\inf\{\mu(V):\textrm{$E\subset V$ and $V$ is open}\}.

If $X$ is Hausdorff, we say that $\mu$ is tight if for every $\epsilon>0$ there is some compact subset $K$ of $X$ such that $\mu(X\setminus K)<\epsilon$ . It is straightforward to check that a finite positive Borel measure is tight if and only if for every Borel set $E$ ,

\mu(E)=\sup\{\mu(K):\textrm{$K\subset E$ and $K$ is compact}\}.

We specify that $X$ be Hausdorff to ensure that any compact subset of $X$ is closed and hence a Borel set.

It can be proved that if $X$ is a metrizable space, then any finite positive Borel measure on $X$ is inner regular and outer regular,⁵⁵ 5 Alexander S. Kechris, Classical Descriptive Set Theory, p. 107, Theorem 17.10. and that if $X$ is a Polish space then any finite positive Borel measure on $X$ is tight.⁶⁶ 6 Alexander S. Kechris, Classical Descriptive Set Theory, p. 107, Theorem 17.11. (A Polish space is a separable topological space whose topology is induced by a complete metric.)

For a topological space $X$ , we define $C(X)$ to be the set of continuous functions $X\to\mathbb{R}$ and $C_{b}(X)$ to be the set of bounded continuous functions $X\to\mathbb{R}$ . One checks that $C_{b}(X)$ with the supremum norm is a Banach space. We remind ourselves that if $X$ is compact then $C(X)=C_{b}(X)$ .

For a topological space $X$ , we write $ca(X)=ca(\mathscr{B}_{X})$ . For a compact metrizable space, the Riesz representation theorem states that $\Lambda:ca(X)\to C(X)^{*}$ defined by

\Lambda_{\mu}f=\int_{X}fd\mu,\qquad\mu\in ca(X),\quad f\in C(X),

is an isomorphism of real Banach spaces, where $C(X)^{*}$ has the operator norm.⁷⁷ 7 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 498, Corollary 14.15.

5 The narrow topology on 𝒫(𝘟)

Suppose that $X$ is a topological space. The narrow topology on $ca(X)$ is the coarsest topology such that for each $f\in C_{b}(X)$ , the map $\mu\mapsto\int_{X}fd\mu$ is continuous $ca(X)\to\mathbb{R}$ . The narrow topology is also called the weak-* topology.

Suppose that $X$ is a metrizable topological space. We denote by $\mathscr{P}(X)$ the set of Borel probability measures on $X$ . $\mathscr{P}(X)$ is a convex subset of $ca(X)$ . We shall be interested in the narrow topology on $\mathscr{P}(X)$ . This is the coarsest topology such that for each $f\in C_{b}(X)$ , the map $\mu\mapsto\int_{X}fd\mu$ is continuous $\mathscr{P}(X)\to\mathbb{R}$ , or equivalently, the subspace topology on $\mathscr{P}(X)$ inherited from $ca(X)$ with the narrow topology.

Lemma 4.

Suppose that $X$ is a metrizable topological space. Then $\mathscr{P}(X)$ is a closed subset of $ca(X)$ with the narrow topology.

It is a fact that if $\mu,\nu$ are Borel probability measures on a metrizable topological space $X$ , then $\mu=\nu$ if and only if $\int_{X}fd\mu=\int_{X}fd\nu$ for all $f\in C_{b}(X)$ .⁸⁸ 8 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 506, Theorem 15.1. Thus, the map $\mathscr{P}(X)\to\mathbb{R}^{C_{b}(X)}$ defined by $\mu\mapsto(f\mapsto\int_{X}fd\mu)$ is one-to-one.

A useful characterization of the narrow topology on $\mathscr{P}(X)$ is the following, called the portmanteau theorem.⁹⁹ 9 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 508, Theorem 15.3.

Theorem 5.

Suppose that $X$ is a metrizable topological space, that $\mu\in\mathscr{P}(X)$ , and that $\mu_{i}$ is a net in $\mathscr{P}(X)$ . The net $\mu_{i}$ narrowly converges to $\mu$ if and only if for every closed set $C$ in $X$ , $\limsup_{i}\mu_{i}(C)\leq\mu(C)$ .

For $x\in X$ , define $\delta_{x}$ on $\mathscr{B}_{X}$ by

\delta_{x}(E)=\begin{cases}1&x\in E\\ 0&x\not\in E,\end{cases}\qquad E\in\mathscr{B}_{X}.

Then $\delta_{x}\in\mathscr{P}(X)$ . We prove that the mapping $x\mapsto\delta_{x}$ is an embedding of $X$ into $\mathscr{P}(X)$ with the narrow topology, and that if $X$ is separable then its image is closed.¹⁰¹⁰ 10 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 512, Theorem 15.8.

Theorem 6.

Suppose that $X$ is a metrizable topological space and assign $\mathscr{P}(X)$ the narrow topology. Then $x\mapsto\delta_{x}$ is a homeomorphism onto its image, and if $X$ is separable then the image is a closed subset of $\mathscr{P}(X)$ .

Proof.

It is a fact that because $X$ is metrizable, a net $x_{i}$ converges to $x$ if and only if $f(x_{i})\to f(x)$ for every $f\in C_{b}(X)$ ; for these to be equivalent, it suffices that $X$ be a completely regular topological space.¹¹¹¹ 11 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 50, Corollary 2.57. A topological space is called completely regular if any closed set and any point not in this closed set can be separated by a continuous function. $f(x_{i})\to f(x)$ can be written

\int_{X}fd\delta_{x_{i}}\to\int_{X}fd\delta_{x}.

This being true for all $f\in C_{b}(X)$ means that $\delta_{x_{i}}\to\delta_{x}$ . This shows that $x\mapsto\delta_{x}$ is a homeomorphism between $X$ and $\{\delta_{x}:x\in X\}$ .

We remind ourselves that the support of a positive Borel measure $\mu$ on a topological space $(Y,\tau)$ is the set of all $y\in Y$ such that if $N$ is an open neighborhood of $y$ then $\mu(N)>0$ . The support of $\mu$ is denoted $\mathrm{supp}\,\mu$ , and can be written

\mathrm{supp}\,\mu=X\setminus\bigcup_{V\in\tau,\mu(V)=0}V,

which makes it apparent that $\mathrm{supp}\,\mu$ is a closed set. One checks that $G\in\tau$ and $G\cap\mathrm{supp}\,\mu\neq\emptyset$ imply that $\mu(G\cap\mathrm{supp}\,\mu)>0$ ; $\mathrm{supp}\,\mu$ is in fact the largest closed set with this property. One can prove that if $Y$ is second-countable, then $\mu(X\setminus\mathrm{supp}\,\mu)=0$ .¹²¹² 12 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 442, Theorem 12.14.

Suppose that $X$ is separable and suppose that $\delta_{x_{i}}\to\mu$ in $\mathscr{P}(X)$ . Since $X$ is separable and metrizable it is second-countable, so $\mu(X\setminus\mathrm{supp}\,\mu)=0$ , which because $\mu$ is a finite measure can be written $\mu(\mathrm{supp}\,\mu)=\mu(X)=1$ . Let $x\in\mathrm{supp}\,\mu$ and let $N$ be an open neighborhood of $x$ . A metrizable space is completely regular, so there is some continuous function $f:X\to[0,1]$ such that $f(x)=1$ and $f(y)=0$ for all $y\in X\setminus N$ . This function $f$ belongs to $C_{b}(X)$ as its range is contained in $[0,1]$ . Let $G=\{w\in X:f(w)>\frac{1}{2}\}$ , which is an open set containing $x$ and hence $\mu(G)>0$ . Then,

\int_{X}fd\mu\geq\int_{G}fd\mu\geq\frac{1}{2}\int_{G}d\mu=\frac{1}{2}\mu(G)>0.

$\delta_{x_{i}}\to\mu$ implies

f(x_{i})=\int_{X}fd\delta_{x_{i}}\to\int_{X}fd\mu.

Thus $f(x_{i})$ has a positive limit, and so there is some $i_{0}$ such that $i\geq i_{0}$ implies that $f(x_{i})>0$ . But from the definition of $f$ , if $y\in X\setminus N$ then $f(y)=0$ , so $f(y)>0$ implies that $y\in N$ . Thus $x_{i}\in N$ for $i\geq i_{0}$ . This shows that for every open neighborhood $N$ of $x$ there is some $i_{0}$ such that $i\geq i_{0}$ implies that $x_{i}\in N$ , which is what it means to say that $x_{i}\to x$ . Hence, if $x\in\mathrm{supp}\,\mu$ then the net $x_{i}$ converges to $x$ . But $X$ is Hausdorff and we know that $\mathrm{supp}\,\mu\neq\emptyset$ , so $\mathrm{supp}\,\mu$ has exactly one element. Write $\mathrm{supp}\,\mu=\{x\}$ , and check that $\mu=\delta_{x}$ , which completes the proof. ∎

One can further prove that if $X$ is a separable metrizable topological space, then¹³¹³ 13 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 512, Theorem 15.19.

\mathrm{ext}\,\mathscr{P}(X)=\{\delta_{x}:x\in X\}.

We remind ourselves that if $C$ is a subset of a vector space $V$ then $x\in V$ is called an extreme point of $C$ if $x=(1-t)y+tz$ with $0<t<1$ implies that $y=x$ and $z=x$ , and $\mathrm{ext}\,C$ denotes the set of extreme points of $C$ . Here the vector space is $ca(X)$ , of which $\mathscr{P}(X)$ is a convex subset.

We only speak about $\mathscr{P}(X)$ when $X$ is a metrizable topological space, but $\mathscr{P}(X)$ need not itself be metrizable. However, the following theorem shows that $X$ is compact if and only if $\mathscr{P}(X)$ is both compact and metrizable.¹⁴¹⁴ 14 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 513, Theorem 15.11.

Theorem 7.

Suppose that $X$ is a metrizable topological space. Then $X$ is compact if and only if $\mathscr{P}(X)$ with the narrow topology is compact and metrizable.

Proof.

Suppose that $X$ is compact. Then $C(X)$ is a separable Banach space. Denote the closed unit ball in $C(X)^{*}$ by $B$ and assign $B$ the subspace topology inherited from $C(X)^{*}$ with the weak-* topology. Because $C(X)$ is a normed space the Banach-Alaoglu theorem tells us that $B$ is compact, and because $C(X)$ is a separable normed space, $B$ is metrizable.¹⁵¹⁵ 15 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 239, Theorem 6.30. Earlier in this note we gave a statement of the Riesz representation theorem for the case of compact metrizable topological spaces: the map $\Lambda:ca(X)\to C(X)^{*}$ defined by

\Lambda_{\mu}f=\int_{X}fd\mu,\qquad\mu\in ca(X),\quad f\in C(X),

is an isomorphism of real Banach spaces. Check that $\Lambda\mathscr{P}(X)$ is a closed subset of $B$ and hence is compact, and therefore is a weak-* compact subset of $C(X)^{*}$ . Because $B$ is metrizable and $\Lambda\mathscr{P}(X)$ is a subset of $B$ , the subspace topology on $\Lambda\mathscr{P}(X)$ inherited from $B$ is metrizable, and this topology is the same as the subspace topology on $\Lambda\mathscr{P}(X)$ inherited from $C(X)^{*}$ with the weak-* topology. Check that $\Lambda$ is a homeomorphism when $ca(X)$ has the narrow topology and $C(X)^{*}$ has the weak-* topology. Then, because the subspace topology $\Lambda\mathscr{P}(X)$ inherited from $C(X)^{*}$ with the weak-* topology is compact and metrizable, the subspace topology on $\mathscr{P}(X)$ inherited from $ca(X)$ with the narrow topology is compact and metrizable.

Suppose that $\mathscr{P}(X)$ with the narrow topology is compact and metrizable. Because $\mathscr{P}(X)$ is compact and metrizable it is separable (any compact metrizable topological space is separable), and so $\{\delta_{x}:x\in X\}$ with the subspace topology inherited from $\mathscr{P}(X)$ is separable. But Theorem 6 tells us that there is a homeomorphism between $X$ and $\{\delta_{x}:x\in X\}$ , so $X$ is separable too. We now know that $X$ is a separable metrizable space, so by Theorem 6 we get that $\{\delta_{x}:x\in X\}$ is a closed subset of $\mathscr{P}(X)$ , and is therefore compact. Finally, again using that $X$ and $\{\delta_{x}:x\in X\}$ are homeomorphic, we get that $X$ is compact. ∎

The following theorem shows the same type of result as above for separable spaces.¹⁶¹⁶ 16 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 513, Theorem 15.12.

Theorem 8.

Suppose that $X$ is a metrizable topological space. Then $X$ is separable if and only if $\mathscr{P}(X)$ with the narrow topology is both separable and metrizable.

Proof.

It is a fact that if $Y$ is a separable metrizable topological space then there is a compatible metric $d$ on $Y$ (a metric on a topological space is called compatible when it induces the topology) such that $(Y,d)$ is a totally bounded metric space: for every $\epsilon>0$ , there are $y_{1},\ldots,y_{n}\in Y$ such that for every $y\in Y$ there is some $j$ such that $d(y,y_{j})<\epsilon$ .¹⁷¹⁷ 17 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 91, Corollary 3.41.

Another fact we shall use is the following. If $(Y,d)$ is a metric space, $A$ is a nonempty subset of $Y$ , and $f:A\to\mathbb{R}$ is uniformly continuous, then there is a unique uniformly continuous $\hat{f}:\overline{A}\to\mathbb{R}$ whose restriction to $A$ is equal to $f$ , and this extension satisfies $\left\|\hat{f}\right\|_{\infty}=\left\|f\right\|_{\infty}$ , and finally $\widehat{f+g}=\hat{f}+\hat{g}$ .¹⁸¹⁸ 18 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 77, Lemma 3.11.

Suppose that $X$ is separable, let $d$ be a compatible metric such that $(X,d)$ is a totally bounded metric space, and let $(\hat{X},\hat{d})$ be the completion of $(X,d)$ . $(\hat{X},\hat{d})$ is compact because a metric space is totally bounded if and only if its completion is compact. Let $U_{d}(X)$ denote the set of bounded uniformly continuous functions $X\to\mathbb{R}$ . It is apparent that with the supremum norm $U_{d}(X)$ is a real normed space, and one proves that it is a closed subset of $C_{b}(X)$ and hence itself a Banach space. Define $\phi:U_{d}(X)\to U_{\hat{d}}(\hat{X})=C(\hat{X})$ by $\phi(f)=\hat{f}$ , where $\hat{f}$ is the extension of $f$ to $\overline{X}=\hat{X}$ explained in the previous paragraph; the equality $U_{\hat{d}}(\hat{X})=C(\hat{X})$ is because $\hat{X}$ is compact. What we have said so far makes it apparent that $\phi$ is a linear isometry. Because $\hat{X}$ is compact, the Banach space $C(\hat{X})$ is separable and hence the subspace $\phi(U_{d}(X))$ is separable. Then, because $\phi$ is an isometry, $U_{d}(X)$ is separable, say with a countable dense subset $D$ .

It is a fact that if $Y$ is a set and $\mathscr{F}$ is a family of functions $Y\to\mathbb{R}$ that separates points in $Y$ (if $x\neq y$ then there is some $f\in\mathscr{F}$ such that $f(x)\neq f(y)$ ) then the initial topology on $Y$ induced by $\mathscr{F}$ is equal to the subspace topology on $Y$ inherited from $\mathbb{R}^{\mathscr{F}}$ with the product topology.¹⁹¹⁹ 19 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 53, Lemma 2.63. One proves that $D$ being dense in $U_{d}(X)$ implies that it separates points in $\mathscr{P}(X)$ ,²⁰²⁰ 20 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 506, Theorem 15.1. and hence that the initial topology on $\mathscr{P}(X)$ induced by $D$ is equal to the subspace topology on $\mathscr{P}(X)$ inherited from $\mathbb{R}^{D}$ . But because $D$ is countable and $\mathbb{R}$ is separable and metrizable, $\mathbb{R}^{D}$ with the product topology is separable and metrizable. As $\mathbb{R}^{D}$ is a separable metrizable topological space, the subspace topology on $\mathscr{P}(X)$ inherited from $\mathbb{R}^{D}$ is separable and metrizable. (If $Y$ is a metrizable topological space and $A$ is a subset of $Y$ , then the subspace topology on $A$ inherited from $Y$ is metrizable. If $Y$ is a separable metrizable topological space and $A$ is a subset of $Y$ , then the subspace topology on $A$ inherited from $Y$ is separable, but this need not be true if $Y$ is not metrizable.) This shows that the initial topology on $\mathscr{P}(X)$ induced by $D$ is separable and metrizable. But because $D$ is a dense subset of $U_{d}(X)$ , it can be proved that the initial topology on $\mathscr{P}(X)$ induced by $D$ is equal to the initial topology on $\mathscr{P}(X)$ induced by $C_{b}(X)$ ,²¹²¹ 21 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 507, Theorem 15.2. and the initial topology on $\mathscr{P}(X)$ induced by $C_{b}(X)$ is precisely the narrow* topology on $\mathscr{P}(X)$ . This shows that the narrow topology on $\mathscr{P}(X)$ is separable and metrizable.

Suppose that the narrow topology on $\mathscr{P}(X)$ is separable and metrizable. Then, $\{\delta_{x}:x\in X\}$ with the subspace topology inherited from $\mathscr{P}(X)$ is separable, and by Theorem 6 there is a homeomorphism $X\to\{\delta_{x}:x\in X\}$ , so $X$ is separable too. ∎

The same type of result is true for Polish spaces and Borel spaces: if $X$ is a metrizable topological space, then $X$ is a Polish space if and only if $\mathscr{P}(X)$ with the narrow topology is a Polish space, and $X$ is a Borel space if and only if $\mathscr{P}(X)$ with the narrow topology is a Borel space.²²²² 22 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 515, Theorem 15.15 and p. 517, Theorem 15.18. A Borel space is a topological space that is homeomorphic to a Borel subset of a Polish space.

Suppose that $X$ is a Hausdorff space and that $\mathscr{F}$ is a family of finite positive Borel measures on $X$ . We say that $\mathscr{F}$ is tight if for every $\epsilon>0$ there is a compact subset $K$ of $X$ such that each $\mu\in\mathscr{F}$ satisfies $\mu(X\setminus K)<\epsilon$ , i.e. $\mu(K)>\mu(X)-\epsilon$ . We specified that $X$ be Hausdorff to ensure that any compact subset of $X$ is a Borel set. Earlier in this note we defined the notion of a tight measure, and saying that a measure $\mu$ is tight is equivalent to saying that the family $\{\mu\}$ is tight.

A subset $A$ of a topological space $Y$ is said to be relatively compact if its closure is a compact set. We prove in the following theorem that every tight family of Borel probability measures on a separable metrizable topological space $X$ is relatively compact in $\mathscr{P}(X)$ .²³²³ 23 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 518, Lemma 15.21.

Theorem 9.

Suppose that $X$ is a separable metrizable topological space. Then any tight subset of $\mathscr{P}(X)$ is relatively compact.

Proof.

The Urysohn metrization theorem states that for a Hausdorff space $Y$ the following three statements are equivalent: (i) $Y$ is separable and metrizable, (ii) $Y$ is second-countable and regular, (iii) $Y$ is a subspace of a topological space $\mathscr{H}$ that is homeomorphic with $[0,1]^{\mathbb{N}}$ . Thus, $X$ is a subspace of a topological space $\mathscr{H}$ that is homeomorphic with $[0,1]^{\mathbb{N}}$ . Except for the last paragraph of the proof, this is the only time where we invoke that $X$ is separable. In the course of the proof we shall use that $\mathscr{H}$ is compact and metrizable.

Suppose that $\mathscr{F}$ is a tight family of Borel probability measures on $X$ . If $\mathscr{F}=\emptyset$ then it is immediate that the claim is true. Otherwise, for each $n\in\mathbb{N}$ let $\mu_{n}\in\mathscr{F}$ . For each $m\in\mathbb{N}$ , because $\mathscr{F}$ is tight there is a compact subset $K_{m}$ of $X$ such that $\mu_{n}(K_{m})>1-\frac{1}{m}$ for all $n\in\mathbb{N}$ . Because $K_{m}$ is a compact subset of $X$ and $X$ has the subspace topology inherited from $\mathscr{H}$ , $K_{m}$ is a compact subset of $\mathscr{H}$ and hence is a Borel set in $\mathscr{H}$ . (This is worth pointing out, because $X$ need not be a Borel set in $\mathscr{H}$ .) Let

E=\bigcup_{m\in\mathbb{N}}K_{m},

which is a Borel set in $\mathscr{H}$ . We assign $E$ the subspace topology inherited from $\mathscr{H}$ . For $n,m\in\mathbb{N}$ , because $K_{m}\subset E$ ,

1\geq\mu_{n}(E)\geq\mu_{n}(K_{m})>1-\frac{1}{m}.

This is true for all $m$ , so $\mu_{n}(E)=1$ , and therefore $\mu_{n}\in\mathscr{P}(E)$ for all $n\in\mathbb{N}$ . For each $n\in\mathbb{N}$ , define $\lambda_{n}:\mathscr{B}_{\mathscr{H}}\to[0,1]$ by

\lambda_{n}(B)=\mu_{n}(E\cap B),\qquad B\in\mathscr{B}_{\mathscr{H}}.

One checks that $\lambda_{n}$ is a positive measure. Because $\lambda_{n}(\mathscr{H})=\mu_{n}(E)=1$ , we have $\lambda_{n}\in\mathscr{P}(\mathscr{H})$ . The topological space $\mathscr{H}$ is compact and metrizable, so by Theorem 7, the space $\mathscr{P}(\mathscr{H})$ with the narrow topology is compact and metrizable, and since $\lambda_{n}$ is a sequence in $\mathscr{P}(\mathscr{H})$ , it has a subsequence $\lambda_{a(n)}$ that converges to some $\lambda\in\mathscr{P}(\mathscr{H})$ . For each $m\in\mathbb{N}$ , $K_{m}$ is a compact subset of $\mathscr{H}$ and hence closed, so using $\lambda_{a(n)}\to\lambda$ in $\mathscr{P}(\mathscr{H})$ we have by Theorem 5 that

$\displaystyle\lambda(E)$	$\displaystyle\geq$	$\displaystyle\lambda(K_{m})$
	$\displaystyle\geq$	$\displaystyle\limsup_{n\to\infty}\lambda_{a(n)}(K_{m})$
	$\displaystyle=$	$\displaystyle\limsup_{n\to\infty}\mu_{a(n)}(E\cap K_{m})$
	$\displaystyle=$	$\displaystyle\limsup_{n\to\infty}\mu_{a(n)}(K_{m})$
	$\displaystyle\geq$	$\displaystyle\limsup_{n\to\infty}\left(1-\frac{1}{m}\right)$
	$\displaystyle=$	$\displaystyle 1-\frac{1}{m}.$

This is true for all $m$ , so $\lambda(E)=1$ , and therefore $\lambda\in\mathscr{P}(E)$ .

It is a fact that if $Z$ is a Borel subset of a metrizable topological space $Y$ , then the narrow topology on $\mathscr{P}(Z)$ is equal to the subspace topology on $\mathscr{P}(Z)$ inherited from $\mathscr{P}(Y)$ with the narrow topology.²⁴²⁴ 24 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 510, Lemma 15.4. As $E$ is a Borel set in $\mathscr{H}$ , the narrow topology on $\mathscr{P}(E)$ is equal to the subspace topology on $\mathscr{P}(E)$ inherited from $\mathscr{P}(\mathscr{H})$ with the narrow topology. We know that $\lambda_{a(n)}\to\lambda$ in $\mathscr{P}(\mathscr{H})$ , and since the members of the sequence and the limit $\lambda$ belong to $\mathscr{P}(E)$ , $\lambda_{a(n)}\to\lambda$ in $\mathscr{P}(E)$ . ( $\lambda_{n}\in\mathscr{P}(E)$ because $\lambda_{n}(E)=1$ for each $n$ .) But as elements of $\mathscr{P}(E)$ , $\lambda_{n}=\mu_{n}$ : for any $B\in\mathscr{B}_{E}$ , $\lambda_{n}(B)=\mu_{n}(E\cap B)=\mu_{n}(B)$ . So $\mu_{n}\to\lambda$ in $\mathscr{P}(E)$ . Because $E$ is a Borel set in $X$ , the narrow topology on $\mathscr{P}(E)$ is equal to the subspace topology on $\mathscr{P}(E)$ inherited from $\mathscr{P}(X)$ with the narrow topology. We have established that $\mu_{n}\to\lambda$ in $\mathscr{P}(E)$ with the narrow topology, and therefore $\mu_{n}\to\lambda$ in $\mathscr{P}(X)$ with the narrow topology. (If $A$ is a subset of a topological space $Y$ and a sequence converges in $A$ with the subspace topology, then the sequence converges in $Y$ to the same limit.)

Because $X$ is separable, by Theorem 8 we have that $\mathscr{P}(X)$ is metrizable (and separable, but we don’t care about that here). We have established that any sequence of elements of $\mathscr{F}$ has a subsequence that converges to some element of $\mathscr{P}(X)$ , and because $\mathscr{P}(X)$ this suffices to show that $\mathscr{F}$ is relatively compact, completing the proof. ∎

6 The Prokhorov metric on 𝒫(𝘟)

Let $(X,d)$ be a metric space. If $A\subset X$ and $x\in X$ , we define

d(x,A)=\inf_{a\in A}d(x,a);

if $A=\emptyset$ then $d(x,A)=\infty$ . For $A\subset X$ and $\alpha>0$ , we define

A_{\alpha}=\{x\in X:d(x,A)<\alpha\}.

One checks that for any $A\subset X$ , $\lim_{\epsilon\to 0}A_{\epsilon}=\overline{A}$ , the closure of $A$ .

For $\mu,\nu\in\mathscr{P}(X)$ we define $d_{P}(\mu,\nu)$ to be

\inf\{\alpha>0:\textrm{for all $E\in\mathscr{B}_{X}$, $\mu(E)\leq\nu(E_{\alpha% })+\alpha$ and $\nu(E)\leq\mu(E_{\alpha})+\alpha$}\}.

(2)

One proves that $d_{P}$ is a metric on $\mathscr{P}(X)$ , called the Prokhorov metric. It is a bounded metric: for any $E\in\mathscr{B}_{X}$ , $\mu(E),\nu(E)\leq 1$ , so $\mu(E)\leq\nu(E_{1})+1$ and $\nu(E)\leq\mu(E_{1})+1$ , hence $d_{P}(\mu,\nu)\leq 1$ .

Theorem 10.

If $\mu,\mu_{1},\mu_{2},\ldots\in\mathscr{P}(X)$ and $d_{P}(\mu_{i},\mu)\to 0$ , then $\mu_{i}$ converges narrowly to $\mu$ .

Proof.

For each $i$ , $d_{P}(\mu_{i},\mu)$ is defined in (2) as an infimum. We inductively define a sequence $\alpha_{i}>0$ by taking $\alpha_{i}$ to be (i) an element of the set of which $d_{P}(\mu_{i},\mu)$ is the infimum, (ii) $\alpha_{i+1}<\alpha_{i}$ for each $i$ , and (iii) $\alpha_{i}\to 0$ ; we can satisfy (iii) because $d_{P}(\mu_{i},\mu)\to 0$ . Then for any $E\in\mathscr{B}_{X}$ ,

$\displaystyle\limsup_{i\to\infty}\mu_{i}(E)$	$\displaystyle\leq$	$\displaystyle\limsup_{i\to\infty}\mu(E_{\alpha_{i}})+\alpha_{i}$
	$\displaystyle\leq$	$\displaystyle\limsup_{i\to\infty}\mu(E_{\alpha_{i}})+\limsup_{i\to\infty}% \alpha_{i}$
	$\displaystyle=$	$\displaystyle\limsup_{i\to\infty}\mu(E_{\alpha_{i}})$
	$\displaystyle=$	$\displaystyle\mu(\overline{A}).$

Hence if $C$ is a closed subset of $X$ , then, as $C\in\mathscr{B}_{X}$ , the above tells us

\limsup_{i\to\infty}\mu_{i}(C)\leq\mu(C),

which shows by Theorem 5 that $\mu_{i}$ converges narrowly to $\mu$ . ∎

It follows from the above theorem that the topology on $\mathscr{P}(X)$ induced by the Prokhorov metric is finer than the narrow topology on $\mathscr{P}(X)$ .

It can be proved that if $X$ is a separable metric space then the Prokhorov metric on $\mathscr{P}(X)$ induces the narrow topology. This is proved in notes by van Gaans.²⁵²⁵ 25 Onno van Gaans, Probability measures on metric spaces, http://www.math.leidenuniv.nl/~vangaans/jancol1.pdf Van Gaans also proves Prokhorov’s theorem in his notes, which states that if $X$ is a Polish space, then a subset of $\mathscr{P}(X)$ is tight if and only if it is relatively compact. We proved one of these implications in Theorem 9 (which is the implication that takes more work to prove) without needing to use that $X$ is Polish, but only a separable metrizable space. Another result proved in those notes is that if $X$ is a separable complete metric space, then so is $\mathscr{P}(X)$ with the Prokhorov metric; it is worth reminding ourselves that this is not immediate from the fact mentioned earlier that if $X$ if Polish then $\mathscr{P}(X)$ with the narrow topology is Polish, because even though in this case the narrow topology on $\mathscr{P}(X)$ is induced by the Prokhorov metric, it need not be the case that a metric that induces completely metrizable topology is itself a complete metric.

7 Supports of positive Borel measures

If $X$ is a topological space and $\mu$ is a positive Borel measure on $X$ , the support of $\mu$ is a subset $S$ of $X$ such that (i) $S$ is closed, (ii) $\mu(X\setminus S)=0$ , and (iii) if $U$ is an open set with $U\cap S\neq\emptyset$ then $\mu(U\cap S)>0$ . It is straightforward to check that $\mu$ has at most one support. We now prove conditions under which a support exists.²⁶²⁶ 26 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 442, Theorem 12.14.

Theorem 11.

Suppose that $X$ is a topological space and that $\mu$ is a positive Borel measure on $X$ . If $X$ is second-countable or $\mu$ is a tight measure, then $\mu$ has a support.

Proof.

Assume that $(X,\tau)$ is second-countable, and let $\beta$ be a countable base for $X$ . Let

G=\bigcup_{V\in\beta,\mu(V)=0}V.

$G$ is an open set, and because $\beta$ is countable,

\mu(G)\leq\sum_{V\in\beta}\mu(V)=0.

Let $S=X\setminus G$ , which is closed and has measure $0$ . Suppose that $U$ is open, that $U\cap S\neq\emptyset$ , and that $\mu(U\cap S)=0$ . Then

\mu(U)=\mu(U\cap S)+\mu(U\cap G)=\mu(U\cap G)\leq\mu(G)=0.

On the other hand, there are $V_{1},V_{2},\ldots\in\beta$ such that $U=\bigcap_{i}V_{i}$ , and thus for each $i$ , $\mu(V_{i})=0$ , which implies that $U\subset G$ . But this contradicts that $U\cap S\neq\emptyset$ , hence $\mu(U\cap S)>0$ . Therefore $S$ is the support of $\mu$ .

Assume that $\mu$ is tight and let

G=\bigcup_{V\in\tau,\mu(V)=0}V.

Then $G$ is open and $S=X\setminus G$ is closed. If $K$ is a compact subset of $G$ , there are $V_{1},\ldots,V_{n}\in\tau$ with $\mu(V_{i})=0$ such that $K\subset\bigcup_{i=1}^{n}V_{i}$ , and so $\mu(K)\leq\sum_{i=1}^{n}\mu(V_{i})=0$ . Because $\mu$ is tight,

\mu(X\setminus S)=\mu(G)=\sup\{\mu(K):\textrm{$K$ is compact and $K\subset G$}\},

and this supremum is equal to $0$ . If $U\in\tau$ and $U\cap S\neq\emptyset$ , then because $U$ is not contained in $G$ , $\mu(U)>0$ . This shows that $S$ is the support of $\mu$ . ∎

$\displaystyle\|\lambda(E)\|$	$\displaystyle=$	$\displaystyle\frac{1}{2}(\|\lambda(E)\|+\|\lambda(X\setminus E)\|)$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\left(\left\|\int_{E}hdm\right\|+\left\|\int_{X\setminus E% }hdm\right\|\right)$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2}\left(\int_{E}\|h\|dm+\int_{X\setminus E}\|h\|dm\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\int_{X}\|h\|dm.$