Martingales, Lévy’s continuity theorem, and the martingale central limit theorem

Jordan Bell
May 29, 2015

1 Introduction

In this note, any statement we make about filtrations and martingales is about filtrations and martingales indexed by the positive integers, rather than the nonnegative real numbers.

We take

 $\inf\emptyset=\infty,$

and for $m>n$, we take

 $\sum_{k=m}^{n}=0.$

(Defined rightly, these are not merely convenient ad hoc definitions.)

2 Conditional expectation

Let $(\Omega,\mathscr{A},P)$ be a probability space and let $\mathscr{B}$ be a sub-$\sigma$-algebra of $\mathscr{A}$. For each $f\in L^{1}(\Omega,\mathscr{A},P)$, there is some $g:\Omega\to\mathbb{R}$ such that (i) $g$ is $\mathscr{B}$-measurable and (ii) for each $B\in\mathscr{B}$, $\int_{B}gdP=\int_{B}fdP$, and if $h:\Omega\to\mathbb{R}$ satisfies (i) and (ii) then $h(\omega)=g(\omega)$ for almost all $\omega\in\Omega$.11 1 Manfred Einsiedler and Thomas Ward, Ergodic Theory: with a view towards Number Theory, p. 121, Theorem 5.1. We denote some $g:\Omega\to\mathbb{R}$ satisfying (i) and (ii) by $E(f|\mathscr{B})$, called the conditional expectation of $f$ with respect to $\mathscr{B}$. In other words, $E(f|\mathscr{B})$ is the unique element of $L^{1}(\Omega,\mathscr{B},P)$ such that for each $B\in\mathscr{B}$,

 $\int_{B}E(f|\mathscr{B})dP=\int_{B}fdP.$

The map $f\mapsto E(f|\mathscr{B})$ satisfies the following:

1. 1.

$f\mapsto E(f|\mathscr{B})$ is positive linear operator $L^{1}(\Omega,\mathscr{A},P)\to L^{1}(\Omega,\mathscr{B},P)$ with norm $1$.

2. 2.

If $f\in L^{1}(\Omega,\mathscr{A},P)$ and $g\in L^{\infty}(\Omega,\mathscr{B},P)$, then for almost all $\omega\in\Omega$,

 $E(gf|\mathscr{B})(\omega)=g(\omega)E(f|\mathscr{B})(\omega).$
3. 3.

If $\mathscr{C}$ is a sub-$\sigma$-algebra of $\mathscr{B}$, then for almost all $\omega\in\Omega$,

 $E(E(f|\mathscr{B})|\mathscr{C})(\omega)=E(f|\mathscr{C})(\omega).$
4. 4.

If $f\in L^{1}(\Omega,\mathscr{B},P)$ then for almost all $\omega\in\Omega$,

 $E(f|\mathscr{B})(\omega)=f(\omega).$
5. 5.

If $f\in L^{1}(\Omega,\mathscr{A},P)$, then for almost all $\omega\in\Omega$,

 $|E(f|\mathscr{B})(\omega)|\leq E(|f||\mathscr{B})(\omega).$
6. 6.

If $f\in L^{1}(\Omega,\mathscr{A},P)$ is independent of $\mathscr{B}$, then for almost all $\omega\in\Omega$,

 $E(f|\mathscr{B})(\omega)=E(f).$

3 Filtrations

A filtration of a $\sigma$-algebra $\mathscr{A}$ is a sequence $\mathscr{F}_{n}$, $n\geq 1$, of sub-$\sigma$-algebras of $\mathscr{A}$ such that $\mathscr{F}_{m}\subset\mathscr{F}_{n}$ if $m\leq n$. We set $\mathscr{F}_{0}=\{\emptyset,\Omega\}$.

A sequence of random variables $\xi_{n}:(\Omega,\mathscr{A},P)\to\mathbb{R}$ is said to be adapted to the filtration $\mathscr{F}_{n}$ if for each $n$, $\xi_{n}$ is $\mathscr{F}_{n}$-measurable.

Let $\xi_{n}:(\Omega,\mathscr{A},P)\to\mathbb{R}$, $n\geq 1$, be a sequence of random variables. The natural filtration of $\mathscr{A}$ corresponding to $\xi_{n}$ is

 $\mathscr{F}_{n}=\sigma(\xi_{1},\ldots,\xi_{n}).$

It is apparent that $\mathscr{F}_{n}$ is a filtration and that the sequence $\xi_{n}$ is adapted to $\mathscr{F}_{n}$.

4 Martingales

Let $\mathscr{F}_{n}$ be a filtration of a $\sigma$-algebra $\mathscr{A}$ and let $\xi_{n}:(\Omega,\mathscr{A},P)\to\mathbb{R}$ be a sequence of random variables. We say that $\xi_{n}$ is a martingale with respect to $\mathscr{F}_{n}$ if (i) the sequence $\xi_{n}$ is adapted to the filtration $\mathscr{F}_{n}$, (ii) for each $n$, $\xi_{n}\in L^{1}(P)$, and (iii) for each $n$, for almost all $\omega\in\Omega$,

 $E(\xi_{n+1}|\mathscr{F}_{n})(\omega)=\xi_{n}(\omega).$

In particular,

 $E(\xi_{1})=E(\xi_{2})=\cdots,$

i.e.

 $E(\xi_{m})=E(\xi_{n}),\qquad m\leq n.$

We say that $\xi_{n}$ is a submartingale with respect to $\mathscr{F}_{n}$ if (i) and (ii) above are true, and if for each $n$, for almost all $\omega\in\Omega$,

 $E(\xi_{n+1}|\mathscr{F}_{n})(\omega)\geq\xi_{n}(\omega).$

In particular,

 $E(\xi_{1})\leq E(\xi_{2})\leq\cdots,$

i.e.

 $E(\xi_{m})\leq E(\xi_{n}),\qquad m\leq n.$

We say that $\xi_{n}$ is a supermartingale with respect to $\mathscr{F}_{n}$ if (i) and (ii) above are true, and if for each $n$, for almost all $\omega\in\Omega$,

 $\xi_{n}(\omega)\geq E(\xi_{n+1}|\mathscr{F}_{n})(\omega).$

In particular,

 $E(\xi_{1})\geq E(\xi_{2})\geq\cdots,$

i.e.

 $E(\xi_{m})\geq E(\xi_{n}),\qquad m\leq n.$

If we speak about a martingale without specifying a filtration, we mean a martingale with respect to the natural filtration corresponding to the sequence of random variables.

5 Stopping times

Let $\mathscr{F}_{n}$ be a filtration of a $\sigma$-algebra $\mathscr{A}$. A stopping time with respect to $\mathscr{F}_{n}$ is a function $\tau:\Omega\to\{1,2,\ldots\}\cup\{\infty\}$ such that for each $n\geq 1$,

 $\{\omega\in\Omega:\tau(\omega)=n\}\in\mathscr{F}_{n}.$

It is straightforward to check that a function $\tau:\Omega\to\{1,2,\ldots\}\cup\{\infty\}$ is a stopping time with respect to $\mathscr{F}_{n}$ if and only if for each $n\geq 1$,

 $\{\omega\in\Omega:\tau(\omega)\leq n\}\in\mathscr{F}_{n}.$

The following lemma shows that the time of first entry into a Borel subset of $\mathbb{R}$ of a sequence of random variables adapted to a filtration is a stopping time.22 2 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 55, Exercise 3.9.

Lemma 1.

Let $\xi_{n}$ be a sequence of random variables adapted to a filtration $\mathscr{F}_{n}$ and let $B\in\mathscr{B}_{\mathbb{R}}$. Then

 $\tau(\omega)=\inf\{n\geq 1:\xi_{n}(\omega)\in B\}$

is a stopping time with respect to $\mathscr{F}_{n}$.

Proof.

Let $n\geq 1$. Then

 $\displaystyle\{\omega\in\Omega:\tau(\omega)=n\}$ $\displaystyle=\left(\bigcap_{k=1}^{n-1}\{\omega\in\Omega:\xi_{k}(\omega)\not% \in B\}\right)\cap\{\omega\in\Omega:\xi_{n}(\omega)\in B\}$ $\displaystyle=\left(\bigcap_{k=1}^{n-1}A_{k}^{c}\right)\cap A_{n},$

where

 $A_{k}=\{\omega\in\Omega:\xi_{k}(\omega)\in B\}.$

Because the sequence $\xi_{k}$ is adapated to the filtration $\mathscr{F}_{k}$, $A_{k}^{c}\in\mathscr{F}_{k}$ and $A_{n}\in\mathscr{F}_{n}$, and because $\mathscr{F}_{k}$ is a filtration, the right-hand side of the above belongs to $\mathscr{F}_{n}$. ∎

If $\xi_{n}$ is a sequence of random variables adapted to a filtration $\mathscr{F}_{n}$ and a stopping time $\tau$ with respect to $\mathscr{F}_{n}$, for $n\geq 1$ we define $\xi_{\tau\wedge n}:\Omega\to\mathbb{R}$ by

 $\xi_{\tau\wedge n}(\omega)=\xi_{\tau(\omega)\wedge n}(\omega),\qquad\omega\in\Omega.$

$\xi_{\tau\wedge n}$ is called the sequence $\xi_{n}$ stopped at $\tau$.33 3 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 55, Exercise 3.10.

Lemma 2.

$\xi_{\tau\wedge n}:(\Omega,\mathscr{A},P)\to\mathbb{R}$ is a sequence of random variables adapted to the filtration $\mathscr{F}_{n}$.

Proof.

Let $n\geq 1$ and let $B\in\mathscr{B}_{\mathbb{R}}$. Because

 $\{\omega:\xi_{\tau\wedge n}(\omega)\in B,\tau(\omega)>n\}=\{\omega:\xi_{n}(% \omega)\in B,\tau(\omega)>n\}$

and for any $k$,

 $\{\omega:\xi_{\tau\wedge n}(\omega)\in B,\tau(\omega)=k\}=\{\omega:\xi_{k}\in B% ,\tau(\omega)=k\},$

we get

 $\{\omega:\xi_{\tau\wedge n}(\omega)\in B\}=\{\omega:\xi_{n}(\omega)\in B,\tau(% \omega)>n\}\cup\bigcup_{k=1}^{n}\{\omega:\xi_{k}(\omega)\in B,\tau(\omega)=k\}.$

But

 $\{\xi_{n}\in B,\tau>n\}=\{\xi_{n}\in B\}\cap\{\tau>n\}\in\mathscr{F}_{n}$

and

 $\{\xi_{k}\in B,\tau=k\}=\{\xi_{k}\in B\}\cap\{\tau=k\}\in\mathscr{F}_{k},$

and therefore

 $\{\xi_{\tau\wedge n}\in B\}\in\mathscr{F}_{n}.$

In particular, $\{\xi_{\tau\wedge n}\in B\}\in\mathscr{A}$, namely, $\xi_{\tau\wedge n}$ is a random variable, and the above shows that this sequence is adapted to the filtration $\mathscr{F}_{n}$. ∎

We now prove that a stopped martingale is itself a martingale with respect to the same filtration.44 4 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 56, Proposition 3.2.

Theorem 3.

Let $\mathscr{F}_{n}$ be a filtration of a $\sigma$-algebra $\mathscr{A}$ and let $\tau$ be a stopping time with respect to $\mathscr{F}_{n}$.

1. 1.

If $\xi_{n}$ is a submartingale with respect to $\mathscr{F}_{n}$ then so is $\xi_{\tau\wedge n}$.

2. 2.

If $\xi_{n}$ is a supermartingale with respect to $\mathscr{F}_{n}$ then so is $\xi_{\tau\wedge n}$.

3. 3.

If $\xi_{n}$ is a martingale with respect to $\mathscr{F}_{n}$ then so is $\xi_{\tau\wedge n}$.

Proof.

For $n\geq 1$, define

 $\alpha_{n}(\omega)=\begin{cases}1&\tau(\omega)\geq n\\ 0&\tau(\omega)

we remark that $\tau(\omega)\geq n$ if and only if $\tau(\omega)>n-1$ and $\tau(\omega) if and only if $\tau(\omega)\leq n-1$. For $B\in\mathscr{B}_{\mathbb{R}}$, (i) if $0,1\not\in B$ then

 $\{\omega\in\Omega:\alpha_{n}(\omega)\in B\}=\emptyset\in\mathscr{F}_{n-1},$

(ii) if $0,1\in B$ then

 $\{\omega\in\Omega:\alpha_{n}(\omega)\in B\}=\Omega\in\mathscr{F}_{n-1},$

(iii) if $0\in B$ and $1\not\in B$ then

 $\{\omega\in\Omega:\alpha_{n}(\omega)\in B\}=\{\omega\in\Omega:\alpha_{n}(% \omega)=0\}=\{\omega\in\Omega:\tau(\omega)\leq n-1\}\in\mathscr{F}_{n-1},$

and (iv) if $1\in B$ and $0\not\in B$ then

 $\{\omega\in\Omega:\alpha_{n}(\omega)\in B\}=\{\omega\in\Omega:\alpha_{n}(% \omega)=1\}=\{\omega\in\Omega:\tau(\omega)>n-1\}\in\mathscr{F}_{n-1},$

Therefore $\{\alpha_{n}\in B\}\in\mathscr{F}_{n-1}$.

Set $\xi_{0}=0$, and we check that

 $\xi_{\tau\wedge n}=\sum_{k=1}^{n}\alpha_{k}(\xi_{k}-\xi_{k-1}).$

It is apparent from this expression that if $\xi_{n}$ is adapted to $\mathscr{F}_{n}$ then $\xi_{\tau\wedge n}$ is adapted to $\mathscr{F}_{n}$, and that if each $\xi_{n}$ belongs to $L^{1}(P)$ then each $\xi_{\tau\wedge n}$ belongs to $L^{1}(P)$. As each of $\alpha_{1},\ldots,\alpha_{n+1}$ is $\mathscr{F}_{n}$-measurable and is bounded,

 $E(\xi_{\tau\wedge(n+1)}|\mathscr{F}_{n})=\sum_{k=1}^{n+1}E(\alpha_{k}(\xi_{k}-% \xi_{k-1})|\mathscr{F}_{n})=\sum_{k=1}^{n+1}\alpha_{k}E(\xi_{k}-\xi_{k-1}|% \mathscr{F}_{n}).$ (1)

Suppose that $\xi_{n}$ is a submartingale. By (1),

 $\displaystyle E(\xi_{\tau\wedge(n+1)}|\mathscr{F}_{n})$ $\displaystyle=\sum_{k=1}^{n}\alpha_{k}(\xi_{k}-\xi_{k-1})+\alpha_{n+1}E(\xi_{n% +1}|\mathscr{F}_{n})-\alpha_{n+1}\xi_{n}$ $\displaystyle\geq\xi_{\tau\wedge n}+\alpha_{n+1}\xi_{n}-\alpha_{n+1}\xi_{n}$ $\displaystyle=\xi_{\tau\wedge n},$

which shows that $\xi_{\tau\wedge n}$ is a submartingale; the statement that $E(\xi_{\tau\wedge(n+1)}|\mathscr{F}_{n})\geq\xi_{\tau\wedge n}$ means that $E(\xi_{\tau\wedge(n+1)}|\mathscr{F}_{n})(\omega)\geq\xi_{\tau\wedge n}(\omega)$ for almost all $\omega\in\Omega$. ∎

We now prove the optional stopping theorem.55 5 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 58, Theorem 3.1.

Theorem 4 (Optional stopping theorem).

Let $\mathscr{F}_{n}$ be a filtration of a $\sigma$-algebra $\mathscr{A}$, let $\xi_{n}$ be a martingale with respect to $\mathscr{F}_{n}$, and let $\tau$ be a stopping time with respect to $\mathscr{F}_{n}$. Suppose that:

1. 1.

For almost all $\omega\in\Omega$, $\tau(\omega)<\infty$.

2. 2.

$\xi_{\tau}\in L^{1}(\Omega,\mathscr{A},P)$.

3. 3.

$E(\xi_{n}1_{\{\tau>n\}})\to 0$ as $n\to\infty$.

Then

 $E(\xi_{\tau})=E(\xi_{1}).$
Proof.

For each $n$, $\Omega=\{\tau\leq n\}\cup\{\tau>n\}$, and therefore

 $\xi_{\tau}=\xi_{\tau\wedge n}+\xi_{\tau}1_{\{\tau>n\}}-\xi_{n}1_{\{\tau>n\}}=% \xi_{\tau\wedge n}+\sum_{k=n+1}^{\infty}\xi_{k}1_{\{\tau=k\}}-\xi_{n}1_{\{\tau% >n\}}.$

Theorem 3 tells us that $\xi_{\tau\wedge n}$ is a martingale with respect to to $\mathscr{F}_{n}$, and hence

 $E(\xi_{\tau\wedge n})=E(\xi_{\tau\wedge 1})=E(\xi_{1}),$

so

 $E(\xi_{\tau})=E(\xi_{1})+\sum_{k=n+1}^{\infty}E(\xi_{k}1_{\{\tau=k\}})-E(\xi_{% n}1_{\{\tau>n\}}).$ (2)

But as $\xi_{\tau}\in L^{1}(P)$,

 $\int_{\Omega}(\xi_{\tau})(\omega)dP(\omega)=\sum_{k=1}^{\infty}\int_{\{\tau=k% \}}\xi_{k}(\omega)dP(\omega)=\sum_{k=1}^{\infty}E(\xi_{k}1_{\{\tau=k\}}),$

and the fact that this series converges means that $\sum_{k=n+1}^{\infty}E(\xi_{k}1_{\{\tau=k\}})\to 0$. With the hypothesis $E(\xi_{n}1_{\{\tau>n\}})\to 0$, as $n\to\infty$ we have

 $E(\xi_{1})+\sum_{k=n+1}^{\infty}E(\xi_{k}1_{\{\tau=k\}})-E(\xi_{n}1_{\{\tau>n% \}})\to E(\xi_{1}).$

But (2) is true for each $n$, so we get $E(\xi_{\tau})=E(\xi_{1})$, proving the claim. ∎

Suppose that $\eta_{n}$ is a sequence of independent random variables each with the Rademacher distribution:

 $P(\eta_{n}=1)=\frac{1}{2},\qquad P(\eta_{n}=-1)=\frac{1}{2}.$

Let $\xi_{n}=\sum_{k=1}^{n}\eta_{k}$ and let $\mathscr{F}_{n}=\sigma(\eta_{1},\ldots,\eta_{n})$. Because

 $\xi_{n+1}^{2}=(\xi_{n}+\eta_{n+1})^{2}=\eta_{n+1}^{2}+2\eta_{n+1}\xi_{n}+\xi_{% n}^{2},$

we have, as $\xi_{n}$ is $\mathscr{F}_{n}$-measurable and belongs to $L^{\infty}(P)$ and as $\eta_{n+1}$ is independent of the $\sigma$-algebra $\mathscr{F}_{n}$,

 $\displaystyle E(\xi_{n+1}^{2}-(n+1)|\mathscr{F}_{n})$ $\displaystyle=E(\eta_{n+1}^{2}+2\eta_{n+1}\xi_{n}+\xi_{n}^{2}-(n+1)|\mathscr{F% }_{n})$ $\displaystyle=E(\eta_{n+1}^{2})+2\xi_{n}E(\eta_{n+1})+\xi_{n}^{2}-(n+1)$ $\displaystyle=1+0+\xi_{n}^{2}-(n+1)$ $\displaystyle=\xi_{n}^{2}-n.$

Therefore, $\xi_{n}^{2}-n$ is a martingale with respect to $\mathscr{F}_{n}$.

Let $K$ be a positive integer and let

 $\tau=\inf\{n\geq 1:|\xi_{n}|=K\}.$

Namely, $\tau$ is the time of first entry in the Borel subset $\{-K,K\}$ of $\mathbb{R}$, hence by Lemma 1 is a stopping time with respect to the filtration $\mathscr{F}_{n}$. With some work,66 6 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 59, Example 3.7. one shows that (i) $P(\tau>2Kn)\to 0$ as $n\to\infty$, (ii) $E(|\xi_{\tau}^{2}-\tau|)<\infty$, and (iii) $E((\xi_{n}^{2}-n)1_{\{\tau>n\}})\to 0$ as $n\to\infty$. Then we can apply the optional stopping theorem to the martingale $\xi_{n}^{2}-n$: we get that

 $E(\xi_{\tau}^{2}-\tau)=E(\xi_{1}^{2}-1)=E(\xi_{1}^{2})-1=E(\eta_{1}^{2})-1=0.$

Hence

 $E(\tau)=E(\xi_{\tau}^{2}).$

But $|\xi_{\tau}|=K$, so $\xi_{\tau}^{2}=K^{2}$, hence

 $E(\tau)=E(K^{2})=K^{2}.$

6 Maximal inequalities

We now prove Doob’s maximal inequality.77 7 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 68, Proposition 4.1.

Theorem 5 (Doob’s maximal inequality).

Suppose that $\mathscr{F}_{n}$ is a filtration of a $\sigma$-algebra $\mathscr{A}$, that $\xi_{n}$ is a submartingale with respect to $\mathscr{F}_{n}$, and that for each $n$, $\xi_{n}\geq 0$. Then for each $n\geq 1$ and $\lambda>0$,

 $\lambda P\left(\max_{1\leq k\leq n}\xi_{k}\geq\lambda\right)\leq E\left(\xi_{n% }1_{\{\max_{1\leq k\leq n}\xi_{k}\geq\lambda\}}\right).$
Proof.

Define $\zeta_{n}(\omega)=\max_{1\leq k\leq n}\xi_{k}(\omega)$, which is $\mathscr{F}_{n}$-measurable, and define $\tau:\Omega\to\{1,\ldots,n\}$ by

 $\tau(\omega)=\min\{1\leq k\leq n:\xi_{k}(\omega)\geq\lambda\}$

if there is some $1\leq k\leq n$ for which $\xi_{k}(\omega)\geq\lambda$, and $\tau(\omega)=n$ otherwise. For $1\leq k\leq n$,

 $\{\tau=k\}=\left(\bigcap_{j=1}^{k-1}\{\xi_{k}<\lambda\}\right)\cap\{\xi_{k}% \geq\lambda\}\in\mathscr{F}_{k},$

and for $k>n$,

 $\{\tau=k\}=\emptyset\in\mathscr{F}_{k},$

showing that $\tau$ is a stopping time with respect to the filtration $\mathscr{F}_{k}$.

For $k\geq 1$,

 $\displaystyle\xi_{k+1}-\xi_{\tau\wedge(k+1)}$ $\displaystyle=\sum_{j=1}^{k}1_{\{\tau=j\}}(\xi_{k+1}-\xi_{\tau\wedge(k+1)})=% \sum_{j=1}^{k}1_{\{\tau=j\}}(\xi_{k+1}-\xi_{j}),$

hence, because $\tau$ is a stopping time with respect to the filtration $\mathscr{F}_{k}$ and because $\xi_{k}$ is a submartingale with respect to this filtration,

 $\displaystyle E(\xi_{k+1}-\xi_{\tau\wedge(k+1)}|\mathscr{F}_{k})$ $\displaystyle=\sum_{j=1}^{k}1_{\{\tau=j\}}E((\xi_{k+1}-\xi_{j})|\mathscr{F}_{k})$ $\displaystyle=\sum_{j=1}^{k}1_{\{\tau=j\}}(E(\xi_{k+1}|\mathscr{F}_{k})-\xi_{j})$ $\displaystyle\geq\sum_{j=1}^{k}1_{\{\tau=j\}}(\xi_{k}-\xi_{j})$ $\displaystyle=\sum_{j=1}^{k-1}1_{\{\tau=j\}}(\xi_{k}-\xi_{j})$ $\displaystyle=\xi_{k}-\xi_{\tau\wedge k},$

from which we have that the sequence $\xi_{k}-\xi_{\tau\wedge k}$ is a submartingale with respect to the filtration $\mathscr{F}_{k}$. Therefore

 $E(\xi_{k}-\xi_{\tau\wedge k})\geq E(\xi_{1}-\xi_{\tau\wedge 1})=E(\xi_{1})-E(% \xi_{\tau\wedge 1})=E(\xi_{1})-E(\xi_{1})=0,$

and so $E(\xi_{\tau\wedge k})\leq E(\xi_{k})$. Because $\tau\wedge n=\tau$, this yields

 $E(\xi_{\tau})\leq E(\xi_{n}).$

We have

 $E(\xi_{\tau})=E(\xi_{\tau}1_{\{\zeta_{n}\geq\lambda\}})+E(\xi_{\tau}1_{\{\zeta% _{n}<\lambda\}}).$

If $\omega\in\{\zeta_{n}\geq\lambda\}$ then $(\xi_{\tau})(\omega)\geq\lambda$, and if $\omega\in\{\zeta_{n}<\lambda\}$ then $\tau(\omega)=n$ and so $(\xi_{\tau})(\omega)=\xi_{n}(\omega)$. Therefore

 $E(\xi_{\tau})\geq E(\lambda\cdot 1_{\{\zeta_{n}\geq\lambda\}})+E(\xi_{n}1_{\{% \zeta_{n}<\lambda\}})=\lambda P(\zeta_{n}\geq\lambda)+E(\xi_{n}1_{\{\zeta_{n}<% \lambda\}}).$

Therefore

 $\lambda P(\zeta_{n}\geq\lambda)+E(\xi_{n}1_{\{\zeta_{n}<\lambda\}})\leq E(\xi_% {n}).$

But $\xi_{n}=\xi_{n}1_{\zeta_{n}<\lambda}+\xi_{n}1_{\zeta_{n}\geq\lambda}$, hence

 $\lambda P(\zeta_{n}\geq\lambda)\leq E(\xi_{n}1_{\{\zeta_{n}\geq\lambda\}}),$

which proves the claim. ∎

The following is Doob’s $L^{2}$ maximal inequality, which we prove using Doob’s maximal inequality.88 8 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 68, Theorem 4.1.

Theorem 6 (Doob’s $L^{2}$ maximal inequality).

Suppose that $\mathscr{F}_{n}$ is a filtration of a $\sigma$-algebra $\mathscr{A}$ and that $\xi_{n}$ is a submartingale with respect to $\mathscr{F}_{n}$ such that for each $n\geq 1$, $\xi_{n}\geq 0$ and $\xi_{n}\in L^{2}(P)$. Then for each $n\geq 1$,

 $E\left(\left|\max_{1\leq k\leq n}\xi_{k}\right|^{2}\right)\leq 4E(\xi_{n}^{2}).$
Proof.

Define $\zeta_{n}(\omega)=\max_{1\leq k\leq n}\xi_{k}(\omega)$. It is a fact that if $\eta\in L^{2}(P)$ and $\eta\geq 0$ then

 $E(\eta^{2})=2\int_{0}^{\infty}tP(\eta\geq t)dt.$

Using this, Doob’s maximal inequality, Fubini’s theorem, and the Cauchy-Schwarz inequality,

 $\displaystyle E(\zeta_{n}^{2})$ $\displaystyle=2\int_{0}^{\infty}tP(\zeta_{n}>t)dt$ $\displaystyle\leq 2\int_{0}^{\infty}E(\xi_{n}1_{\{\zeta_{n}\geq t\}}dt$ $\displaystyle=2\int_{0}^{\infty}\left(\int_{\{\zeta_{n}\geq t\}}\xi_{n}(\omega% )dP(\omega)\right)dt$ $\displaystyle=2\int_{\Omega}\left(\int_{0}^{\zeta_{n}(\omega)}dt\right)\xi_{n}% (\omega)dP(\omega)$ $\displaystyle=2\int_{\Omega}\zeta_{n}(\omega)\xi_{n}(\omega)dP(\omega)$ $\displaystyle\leq 2(E(\zeta_{n}^{2}))^{1/2}(E(\xi_{n}^{2}))^{1/2}.$

If $E(\zeta_{n}^{2})=0$ the claim is immediate. Otherwise, we divide this inequality by $(E(\zeta_{n}^{2}))^{1/2}$ and obtain

 $(E(\zeta_{n}^{2}))^{1/2}\leq 2(E(\xi_{n}^{2}))^{1/2},$

and so

 $E(\zeta_{n}^{2})\leq 4E(\xi_{n}^{2}),$

proving the claim. ∎

7 Upcrossings

Suppose that $\xi_{n}$ is a sequence of random variables that is adapted to a filtration $\mathscr{F}_{n}$ and let $a be real numbers. Define

 $\tau_{0}=0,$

and by induction for $m\geq 1$,

 $\sigma_{m}(\omega)=\inf\{k\geq\tau_{m-1}(\omega):\xi_{k}(\omega)\leq a\}$

and

 $\tau_{m}(\omega)=\inf\{k\geq\sigma_{m}(\omega):\xi_{k}(\omega)\geq b\},$

where $\inf\emptyset=\infty$. For each $m$, $\tau_{m}$ and $\sigma_{m}$ are each stopping times with respect to the filtration $\mathscr{F}_{k}$. For $n\geq 0$ we define

 $U_{n}[a,b](\omega)=\sup\{m\geq 0:\tau_{m}(\omega)\leq n\}.$

For $x\in\mathbb{R}$, we write

 $x^{-}=\max\{0,-x\}=-\min\{0,x\}.$

namely, the negative part of $x$.

We now prove the upcrossings inequality.

Theorem 7 (Upcrossings inequality).

If $\xi_{n}$, $n\geq 1$, is a supermartingale with respect to a filtration $\mathscr{F}_{n}$ and $a, then for each $n\geq 1$,

 $(b-a)E(U_{n}[a,b])\leq E((\xi_{n}-a)^{-}).$
Proof.

For $n\geq 1$ and $\omega\in\Omega$, and writing $N=U_{n}[a,b](\omega)$, for which $N\leq n$, we have

 $\begin{split}&\displaystyle\sum_{m=1}^{n}(\xi_{\tau_{m}\wedge n}(\omega)-\xi_{% \sigma_{m}\wedge n}(\omega))\\ \displaystyle=&\displaystyle\sum_{m=1}^{N}(\xi_{\tau_{m}\wedge n}(\omega)-\xi_% {\sigma_{m}\wedge n}(\omega))+\xi_{\tau_{N+1}\wedge n}(\omega)-\xi_{\sigma_{N+% 1}\wedge n}(\omega)\\ &\displaystyle+\sum_{m=N+2}^{n}(\xi_{\tau_{m}\wedge n}(\omega)-\xi_{\sigma_{m}% \wedge n}(\omega))\\ \displaystyle=&\displaystyle\sum_{m=1}^{N}(\xi_{\tau_{m}}(\omega)-\xi_{\sigma_% {m}}(\omega))+\xi_{n}(\omega)-\xi_{\sigma_{N+1}\wedge n}(\omega)+\sum_{m=N+1}^% {n}(\xi_{n}(\omega)-\xi_{n}(\omega))\\ \displaystyle=&\displaystyle\sum_{m=1}^{N}(\xi_{\tau_{m}}(\omega)-\xi_{\sigma_% {m}}(\omega))+1_{\{\sigma_{N+1}\leq n\}}(\omega)(\xi_{n}(\omega)-\xi_{\sigma_{% N+1}}(\omega))\\ \displaystyle\geq&\displaystyle\sum_{m=1}^{N}(b-a)+1_{\{\sigma_{N+1}\leq n\}}(% \omega)(\xi_{n}(\omega)-\xi_{\sigma_{N+1}}(\omega)).\end{split}$

Because $\xi_{\sigma_{N+1}}(\omega)\leq a$, we have

 $(b-a)N\leq 1_{\{\sigma_{N+1}\leq n\}}(\omega)(a-\xi_{n}(\omega))+\sum_{m=1}^{n% }(\xi_{\tau_{m}\wedge n}(\omega)-\xi_{\sigma_{m}\wedge n}(\omega)).$

One proves that99 9 I am not this “one”. I have not sorted out why this inequality is true. In every proof of the upcrossings inequality I have seen there are pictures and things like this are asserted to be obvious. I am not satisfied with that reasoning; one should not have to interpret an inequality visually to prove it.

 $1_{\{\sigma_{N+1}\leq n\}}(\omega)(a-\xi_{n}(\omega))\leq\min\{0,a-\xi_{n}(% \omega)\}=(\xi_{n}(\omega)-a)^{-}.$

Thus

 $(b-a)E(U_{n}[a,b])\leq E((\xi_{n}-a)^{-})+\sum_{m=1}^{n}E(\xi_{\tau_{m}\wedge n% }-\xi_{\sigma_{m}\wedge n}).$

Using that $\xi_{n}$ is a supermartingale, for each $1\leq m\leq n$,

 $\displaystyle E(\xi_{\tau_{m}\wedge n}-\xi_{\sigma_{m}\wedge n})$ $\displaystyle\leq 0.$

Therefore

 $(b-a)E(U_{n}[a,b])\leq E((\xi_{n}-a)^{-}).$

8 Doob’s martingale convergence theorem

We now use the uprossings inequality to prove Doob’s martingale convergence theorem.1010 10 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 71, Theorem 4.2.

Theorem 8 (Doob’s martingale convergence theorem).

Suppose that $\xi_{n}$, $n\geq 1$, is a supermartingale with respect to a filtration $\mathscr{F}_{n}$ and that

 $M=\sup_{n}E(|\xi_{n}|)<\infty.$

Then there is some $\xi\in L^{1}(\Omega,\mathscr{A},P)$ such that for almost all $\omega\in\Omega$,

 $\lim_{n\to\infty}\xi_{n}(\omega)=\xi(\omega)$

and with $E(|\xi|)\leq M$.

Proof.

For any $a and $n\geq 1$, the upcrossings inequality tells us that

 $E(U_{n}[a,b])\leq\frac{E(\xi_{n}-a)^{-})}{b-a}\leq\frac{E(|\xi_{n}-a|)}{b-a}% \leq\frac{E(|\xi_{n}|+|a|)}{b-a}\leq\frac{M+|a|}{b-a}.$

For each $\omega\in\Omega$, the sequence $U_{n}[a,b](\omega)\in[0,\infty)$ is nondecreasing, so by the monotone convergence theorem,

 $E\left(\lim_{n\to\infty}U_{n}[a,b]\right)=\lim_{n\to\infty}E(U_{n}[a,b])\leq% \frac{M+|a|}{b-a}.$

This implies that

 $P\left(\omega\in\Omega:\lim_{n\to\infty}U_{n}[a,b](\omega)<\infty\right)=1.$

Let

 $A=\bigcap_{a,b\in\mathbb{Q},a

This is an intersection of countably many sets each with measure $1$, so $P(A)=1$.

Let

 $B=\{\omega\in\Omega:\liminf_{n}\xi_{n}(\omega)<\limsup_{n}\xi_{n}(\omega)\}.$

If $\omega\in B$, then there are $a,b\in\mathbb{Q}$, $a, such that

 $\liminf_{n}\xi_{n}(\omega)

It follows from this $\lim_{n\to\infty}U_{n}[a,b](\omega)=\infty$. Thus $\omega\not\in A$, so $B\cap A=\emptyset$, and because $P(A)=1$ we get $P(B)=0$.

We define $\xi:\Omega\to\mathbb{R}$ by

 $\xi(\omega)=\begin{cases}\lim_{n\to\infty}\xi_{n}(\omega)&\omega\not\in B\\ 0&\omega\in B,\end{cases}$

which is Borel measurable. Furthermore, since $|\xi|=\liminf_{n}|\xi_{n}|$ almost everywhere, by Fatou’s lemma we obtain

 $\displaystyle E(|\xi|)$ $\displaystyle=E(\liminf_{n}|\xi_{n}|)$ $\displaystyle\leq\liminf_{n}E(|\xi_{n}|)$ $\displaystyle\leq\sup_{n}E(|\xi_{n}|)$ $\displaystyle=M.$

9 Uniform integrability

Let $\xi:(\Omega,\mathscr{A},P)\to\mathbb{R}$ be a random variable. It is a fact1111 11 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 73, Exercise 4.3. that $\xi\in L^{1}$ if and only if for each $\epsilon>0$ there is some $M$ such that

 $\int_{\{|\xi|>M\}}|\xi|dP<\epsilon.$

(One’s instinct might be to try to use the Cauchy-Schwarz inequality to prove this. This doesn’t work.) Thus, if $\xi_{n}$ is a sequence in $L^{1}(\Omega,\mathscr{A},P)$ then for each $\epsilon>0$ there are $M_{n}$ such that, for each $n$,

 $\int_{\{|\xi_{n}|>M_{n}\}}|\xi_{n}|dP<\epsilon.$

A sequence of random variables $\xi_{n}$ is said to be uniformly integrable if for each $\epsilon>0$ there is some $M$ such that, for each $n$,

 $\int_{\{|\xi_{n}|>M\}}|\xi_{n}|dP<\epsilon.$

If a sequence $\xi_{n}$ is uniformly integrable, then there is some $M$ such that for each $n$,

 $\int_{\{|\xi_{n}|>M\}}|\xi_{n}|dP<1,$

and so

 $E(|\xi_{n}|)=\int_{\{|\xi_{n}|\leq M\}}|\xi_{n}|dP+\int_{\{|\xi_{n}|>M\}}|\xi_% {n}|dP<\int_{\{|\xi_{n}|\leq M\}}MdP+1\leq M+1.$

The following lemma states that the conditional expectations of an integrable random variable with respect to a filtration is a uniformly integrable martingale with respect to that filtration.1212 12 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 75, Exercise 4.5.

Lemma 9.

Suppose that $\xi\in L^{1}(\Omega,\mathscr{A},P)$ and that $\mathscr{F}_{n}$ is a filtration of $\mathscr{A}$. Then $E(\xi|\mathscr{F}_{n})$ is a martingale with respect to $\mathscr{F}_{n}$ and is uniformly integrable.

We now prove that a uniformly integrable supermartingale converges in $L^{1}$.1313 13 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 76, Theorem 4.3.

Theorem 10.

Suppose that $\xi_{n}$ is a supermartingale with respect to a filtration $\mathscr{F}_{n}$, and that the sequence $\xi_{n}$ is uniformly integrable. Then there is some $\xi\in L^{1}(\Omega,\mathscr{A},P)$ such that $\xi_{n}\to\xi$ in $L^{1}$.

Proof.

Because the sequence $\xi_{n}$ is uniformly integrable, there is some $M$ such that for each $n\geq 1$,

 $E(|\xi_{n}|)\leq M+1.$

Thus, because $\xi_{n}$ is a supermartingale, Doob’s martingale convergence theorem tells us that there is some $\xi\in L^{1}(\Omega,\mathscr{A},P)$ such that for almost all $\omega\in\Omega$,

 $\lim_{n\to\infty}\xi_{n}(\omega)=\xi(\omega).$

Because $\xi_{n}$ is uniformly integrable and converges almost surely to $\xi$, the Vitali convergence theorem1414 14 V. I. Bogachev, Measure Theory, volume I, p. 268, Theorem 4.5.4; http://individual.utoronto.ca/jordanbell/notes/L0.pdf, p. 8, Theorem 9. tells us that $\xi_{n}\to\xi$ in $L^{1}$. ∎

The above theorem shows in particular that a uniformly integrable martingale converges to some limit in $L^{1}$. The following theorem shows that the terms of the sequence are equal to the conditional expectations of this limit with respect to the natural filtration.1515 15 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 77, Theorem 4.4.

Theorem 11.

Suppose that a sequence of random variables $\xi_{n}$ is uniformly integrable and is a martingale with respect to its natural filtration

 $\mathscr{F}_{n}=\sigma(\xi_{1},\ldots,\xi_{n}).$

Then there is some $\xi\in L^{1}(\Omega,\mathscr{A},P)$ such that $\xi_{n}\to\xi$ in $L^{1}$ and such that for each $n\geq 1$, for almost all $\omega\in\Omega$,

 $\xi_{n}(\omega)=E(\xi|\mathscr{F}_{n})(\omega).$
Proof.

By Theorem 10, there is some $\xi\in L^{1}(\Omega,\mathscr{A},P)$ such that $\xi_{n}\to\xi$ in $L^{1}$. The hypothesis that the sequence $\xi_{n}$ is a martingale with respect to $\mathscr{F}_{n}$ tells us that for that for $n\geq 1$ and for any $m\geq n$,

 $E(\xi_{m}|\mathscr{F}_{n})=\xi_{n},$

and so for $A\in\mathscr{F}_{n}$,

 $\int_{A}\xi_{m}dP=\int_{A}E(\xi_{m}|\mathscr{F}_{n})dP=\int_{A}\xi_{n}dP.$

Thus

 $\displaystyle\left|\int_{A}(\xi_{n}-\xi)dP\right|$ $\displaystyle=\left|\int_{A}(\xi_{m}-\xi)dP\right|$ $\displaystyle\leq\int_{A}|\xi_{m}-\xi|dP$ $\displaystyle\leq E(|\xi_{m}-\xi|).$

But $E(|\xi_{m}-\xi|)\to 0$ as $m\to\infty$. Since $m$ does not appear in the left-hand side, we have

 $\left|\int_{A}(\xi_{n}-\xi)dP\right|=0,$

and thus

 $\int_{A}\xi_{n}dP=\int_{A}\xi dP.$

But $E(f|\mathscr{F}_{n})$ is the unique element of $L^{1}(\Omega,\mathscr{F}_{n},P)$ such that for each $A\in\mathscr{F}_{n}$,

 $\int_{A}E(f|\mathscr{F}_{n})dP=\int_{A}fdP,$

and because $\xi_{n}$ satisfies this, we get that $\xi_{n}=E(f|\mathscr{F}_{n})$ in $L^{1}$, i.e., for almost all $\omega\in\Omega$,

 $\xi_{n}(\omega)=E(f|\mathscr{F}_{n})(\omega),$

proving the claim. ∎

10 Lévy’s continuity theorem

For a metrizable topological space $X$, we denote by $\mathscr{P}(X)$ the set of Borel probability measures on $X$. The narrow topology on $\mathscr{P}(X)$ is the coarsest topology such that for each $f\in C_{b}(X)$, the map

 $\mu\mapsto\int_{X}fd\mu$

is continuous $\mathscr{P}(X)\to\mathbb{C}$.

A subset $\mathscr{H}$ of $\mathscr{P}(X)$ is called tight if for each $\epsilon>0$ there is a compact subset $K_{\epsilon}$ of $X$ such that if $\mu\in\mathscr{H}$ then $\mu(X\setminus K_{\epsilon})<\epsilon$, i.e. $\mu(K_{\epsilon})>1-\epsilon$. (An element $\mu$ of $\mathscr{P}(X)$ is called tight when $\{\mu\}$ is a tight subset of $\mathscr{P}(X)$.)

For a Borel probability measure $\mu$ on $\mathbb{R}^{d}$, we define its characteristic function $\tilde{\mu}:\mathbb{R}^{d}\to\mathbb{C}$ by

 $\tilde{\mu}(u)=\int_{\mathbb{R}^{d}}e^{ix\cdot u}d\mu(x),\qquad u\in\mathbb{R}% ^{d}.$

$\tilde{\mu}$ is bounded by $1$ and is uniformly continuous. Because $\mu(\mathbb{R}^{d})=1$,

 $\tilde{\mu}(0)=1.$
Lemma 12.

Let $\mu\in\mathscr{P}(\mathbb{R})$. For $\delta>0$,

 $\mu\left(\left\{x\in\mathbb{R}:|x|\geq\frac{2}{\delta}\right\}\right)\leq\frac% {1}{\delta}\int_{-\delta}^{\delta}(1-\tilde{\mu}(u))du;$

in particular, the right-hand side of this inequality is real.

Proof.

Using Fubini’s theorem and the fact that all real $t$, $1-\frac{\sin t}{t}\geq 0$,

 $\displaystyle\int_{-\delta}^{\delta}(1-\tilde{\mu}(u))du$ $\displaystyle=\int_{-\delta}^{\delta}\left(\int_{\mathbb{R}}(1-e^{ixu})d\mu(x)% \right)du$ $\displaystyle=\int_{\mathbb{R}}\left(\int_{-\delta}^{\delta}1-e^{iux}du\right)% d\mu(x)$ $\displaystyle=\int_{\mathbb{R}}\left(u-\frac{e^{iux}}{ix}\right)_{-\delta}^{% \delta}d\mu(x)$ $\displaystyle=\int_{\mathbb{R}}\left(2\delta-\frac{e^{i\delta x}}{ix}+\frac{e^% {-i\delta x}}{ix}\right)d\mu(x)$ $\displaystyle=2\delta\int_{\mathbb{R}}\left(1-\frac{\sin(\delta x)}{\delta x}% \right)d\mu(x)$ $\displaystyle\geq 2\delta\int_{|\delta x|\geq 2}\left(1-\frac{\sin(\delta x)}{% \delta x}\right)d\mu(x)$ $\displaystyle\geq 2\delta\int_{|\delta x|\geq 2}\left(1-\frac{1}{|\delta x|}% \right)d\mu(x)$ $\displaystyle\geq 2\delta\int_{|\delta x|\geq 2}\frac{1}{2}d\mu(x)$ $\displaystyle=\delta\mu(\{x\in\mathbb{R}:|\delta x|\geq 2\}).$

The following lemma gives a condition on the characteristic functions of a sequence of Borel probability measures on $\mathbb{R}$ under which the sequence is tight.1616 16 Krishna B. Athreya and Soumendra N. Lahiri, Measure Theory and Probability Theory, p. 329, Lemma 10.3.3.

Lemma 13.

Suppose that $\mu_{n}\in\mathscr{P}(\mathbb{R})$ and that $\tilde{\mu}_{n}$ converges pointwise to a function $\phi:\mathbb{R}\to\mathbb{C}$ that is continuous at $0$. Then the sequence $\mu_{n}$ is tight.

Proof.

Write $\phi_{n}=\tilde{\mu}_{n}$. Because $|\phi_{n}|\leq 1$, for each $\delta>0$, by the dominated convergence theorem we have

 $\frac{1}{\delta}\int_{-\delta}^{\delta}(1-\phi_{n}(t))dt\to\frac{1}{\delta}% \int_{-\delta}^{\delta}(1-\phi(t))dt.$

On the other hand, that $\phi$ is continuous at $0$ implies that for any $\epsilon>0$ there is some $\eta>0$ such that when $|t|<\eta$, $|\phi(t)-1|<\epsilon$, and hence for $\delta<\eta$,

 $\frac{1}{\delta}\int_{-\delta}^{\delta}(1-\phi(t))dt\leq 2\sup_{|t|\leq\delta}% |1-\phi(t)|\leq 2\epsilon,$

thus

 $\frac{1}{\delta}\int_{-\delta}^{\delta}(1-\phi(t))dt\to 0,\qquad\delta\to 0.$

Let $\epsilon>0$. There is some $\delta>0$ for which

 $\left|\frac{1}{\delta}\int_{-\delta}^{\delta}(1-\phi(t))dt\right|<\epsilon.$

Then there is some $n_{\delta}$ such that when $n\geq n_{\delta}$,

 $\left|\frac{1}{\delta}\int_{-\delta}^{\delta}(1-\phi_{n}(t))dt-\frac{1}{\delta% }\int_{-\delta}^{\delta}(1-\phi(t))dt\right|<\epsilon,$

whence

 $\left|\frac{1}{\delta}\int_{-\delta}^{\delta}(1-\phi_{n}(t))dt\right|<2\epsilon.$

Lemma 12 then says

 $\mu_{n}\left(\left\{x\in\mathbb{R}:|x|\geq\frac{2}{\delta}\right\}\right)\leq% \frac{1}{\delta}\int_{-\delta}^{\delta}(1-\phi_{n}(t))dt<2\epsilon.$

Furthermore, any Borel probability measure on a Polish space is tight (Ulam’s theorem).1717 17 Alexander S. Kechris, Classical Descriptive Set Theory, p. 107, Theorem 17.11. Thus, for each $1\leq n, there is a compact set $K_{n}$ for which $\mu_{n}(\mathbb{R}\setminus K_{n})<\epsilon$. Let

 $K_{\epsilon}=K_{1}\cup\cdots\cup K_{n_{\delta}-1}\cup\left\{x\in\mathbb{R}:|x|% \leq\frac{2}{\delta}\right\},$

which is a compact set, and for any $n\geq 1$,

 $\mu_{n}(\mathbb{R}\setminus K_{\epsilon})<2\epsilon,$

showing that the sequence $\mu_{n}$ is tight. ∎

For metrizable spaces $X_{1},\ldots,X_{d}$, let $X=\prod_{i=1}^{d}X_{i}$ and let $\pi_{i}:X\to X_{i}$ be the projection map. We establish that if $\mathscr{H}$ is a subset of $\mathscr{P}(X)$ such that for each $1\leq i\leq d$ the family of $i$th marginals of $\mathscr{H}$ is tight, then $\mathscr{H}$ itself is tight.1818 18 Luigi Ambrosio, Nicola Gigli, and Giuseppe Savare, Gradient Flows: In Metric Spaces and in the Space of Probability Measures, p. 119, Lemma 5.2.2; V. I. Bogachev, Measure Theory, volume II, p. 94, Lemma 7.6.6.

Lemma 14.

Let $X_{1},\ldots,X_{d}$ be metrizable topological spaces, let $X=\prod_{i=1}^{d}X_{i}$, and let $\mathscr{H}\subset\mathscr{P}(X)$. Suppose that for each $1\leq i\leq d$,

 $\mathscr{H}_{i}=\{{\pi_{i}}_{*}\mu:\mu\in\mathscr{H}\}$

is a tight set in $\mathscr{P}(X_{i})$. Then $\mathscr{H}$ is a tight set in $\mathscr{P}(X)$.

Proof.

For $\mu\in\mathscr{H}$, write $\mu_{i}={\pi_{i}}_{*}\mu$. Let $\epsilon>0$ and take $1\leq i\leq d$. Because $\mathscr{H}_{i}$ is tight, there is a compact subset $K_{i}$ of $X_{i}$ such that for all $\mu_{i}\in\mathscr{H}_{i}$,

 $\mu_{i}(X_{i}\setminus K_{i})<\frac{\epsilon}{d}.$

Let

 $K=\prod_{i=1}^{d}K_{i}=\bigcap_{i=1}^{d}\pi_{i}^{-1}(K_{i}).$

Then for any $\mu\in\mathscr{H}$,

 $\displaystyle\mu(X\setminus K)$ $\displaystyle=\mu\left(X\setminus\bigcap_{i=1}^{d}\pi_{i}^{-1}(K_{i})\right)$ $\displaystyle=\mu\left(\bigcup_{i=1}^{d}\pi_{i}^{-1}(K_{i})^{c}\right)$ $\displaystyle=\mu\left(\bigcup_{i=1}^{d}\pi_{i}^{-1}(X_{i}\setminus K_{i})\right)$ $\displaystyle\leq\sum_{i=1}^{d}\mu(\pi_{i}^{-1}(X_{i}\setminus K_{i}))$ $\displaystyle=\sum_{i=1}^{d}\mu_{i}(X_{i}\setminus K_{i})$ $\displaystyle<\sum_{i=1}^{d}\frac{\epsilon}{d}$ $\displaystyle=\epsilon,$

which shows that $\mathscr{H}$ is tight. ∎

We now prove Lévy’s continuity theorem, which we shall use to prove the martingale central limit theorem.1919 19 cf. Jean Jacod and Philip Protter, Probability Essentials, second ed., p. 167, Theorem 19.1.

Theorem 15 (Lévy’s continuity theorem).

Suppose that $\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$, $n\geq 1$.

1. 1.

If $\mu\in\mathscr{P}(\mathbb{R}^{d})$ and $\mu_{n}\to\mu$ narrowly, then for any $u\in\mathbb{R}^{d}$,

 $\tilde{\mu}_{n}(u)\to\tilde{\mu}(u),\qquad n\to\infty.$
2. 2.

If there is some $\phi:\mathbb{R}^{d}\to\mathbb{C}$ to which $\tilde{\mu}_{n}$ converges pointwise and $\phi$ is continuous at $0$, then there is some $\mu\in\mathscr{P}(\mathbb{R}^{d})$ such that $\phi=\tilde{\mu}$ and such that $\mu_{n}\to\mu$ narrowly.

Proof.

Suppose that $\mu_{n}\to\mu$ narrowly. For each $u\in\mathbb{R}^{d}$, the function $x\mapsto e^{ix\cdot u}$ is continuous $\mathbb{R}^{d}\to\mathbb{C}$ and is bounded, so

 $\tilde{\mu}_{n}(u)=\int_{\mathbb{R}^{d}}e^{ix\cdot u}d\mu_{n}(x)\to\int_{% \mathbb{R}^{d}}e^{ix\cdot u}d\mu(x)=\tilde{\mu}(u).$

Suppose that $\tilde{\mu}_{n}$ converges pointwise to $\phi$ and that $\phi$ is continuous at $0$. For $1\leq i\leq d$, let $\pi_{i}:\mathbb{R}^{d}\to\mathbb{R}$ be the projection map and define $\iota_{i}:\mathbb{R}\to\mathbb{R}^{d}$ by taking the $i$th entry of $\iota_{i}(t)$ to be $t$ and the other entries to be $0$. Fix $1\leq i\leq d$ and write $\nu_{n}={\pi_{i}}_{*}\mu_{n}\in\mathscr{P}(\mathbb{R})$, and for $t\in\mathbb{R}$ we calculate

 $\displaystyle\tilde{\nu}_{n}(t)$ $\displaystyle=\int_{\mathbb{R}}e^{ist}d\nu_{n}(s)$ $\displaystyle=\int_{\mathbb{R}^{d}}e^{i\pi_{i}(x)t}d\mu_{n}(x)$ $\displaystyle=\int_{\mathbb{R}^{d}}e^{ix\cdot\iota_{i}(t)}d\mu_{n}(x)$ $\displaystyle=\tilde{\mu}_{n}(\iota_{i}(t)),$

so $\tilde{\nu}_{n}=\tilde{\mu}_{n}\circ\iota_{i}$. By hypothesis, $\tilde{\nu}_{n}$ converges pointwise to $\phi\circ\iota_{i}$. Because $\phi$ is continuous at $0\in\mathbb{R}^{d}$, the function $\phi\circ\iota_{i}$ is continuous at $0\in\mathbb{R}$. Then Lemma 13 tells us that the sequence $\nu_{n}$ is tight. That is, for each $1\leq i\leq d$, the set

 $\{{\pi_{i}}_{*}\mu_{n}:n\geq 1\}$

is tight in $\mathscr{P}(\mathbb{R})$. Thus Lemma 14 tells us that the set

 $\{\mu_{n}:n\geq 1\}$

is tight in $\mathscr{P}(\mathbb{R}^{d})$.

Prokhorov’s theorem2020 20 V. I. Bogachev, Measure Theory, volume II, p. 202, Theorem 8.6.2. states that if $X$ is a Polish space, then a subset $\mathscr{H}$ of $\mathscr{P}(X)$ is tight if and only if each sequence of elements of $\mathscr{H}$ has a subsequence that converges narrowly to some element of $\mathscr{P}(X)$. Thus, there is a subsequence $\mu_{a(n)}$ of $\mu_{n}$ and some $\mu\in\mathscr{P}(\mathbb{R}^{d})$ such that $\mu_{a(n)}$ converges narrowly to $\mu$. By the first part of the theorem, we get that $\tilde{\mu}_{a(n)}$ converges pointwise to $\tilde{\mu}$. But by hypothesis $\tilde{\mu}_{n}$ converges pointwise to $\phi$, so $\phi=\tilde{\mu}$.

Finally we prove that $\mu_{n}\to\mu$ narrowly. Let $\mu_{b(n)}$ be a subsequence of $\mu_{n}$. Because $\{\mu_{n}:n\geq 1\}$ is tight, Prokhorov’s theorem tells us that there is a subsequence $\mu_{c(n)}$ of $\mu_{b(n)}$ that converges narrowly to some $\lambda\in\mathscr{P}(\mathbb{R}^{d})$. By the first part of the theorem, $\tilde{\mu}_{c(n)}$ converges pointwise to $\tilde{\lambda}$. By hypothesis $\tilde{\mu}_{c(n)}$ converges pointwise to $\phi$, so $\tilde{\lambda}=\phi=\tilde{\mu}$. Then $\lambda=\mu$. That is, any subsequence of $\mu_{n}$ itself has a subsequence that converges narrowly to $\mu$, which implies that the sequence $\mu_{n}$ converges narrowly to $\mu$. (For a sequence $x_{n}$ in a topological space $X$ and $x\in X$, $x_{n}\to x$ if and only if each subsequence of $x_{n}$ has a subsequence that converges to $x$.) ∎

11 Martingale central limit theorem

Let $\gamma_{d}$ be the standard Gaussian measure on $\mathbb{R}^{d}$: $\gamma_{d}$ has density

 $\frac{1}{\sqrt{(2\pi)^{d}}}e^{-\frac{1}{2}|x|^{2}}$

with respect to Lebesgue measure on $\mathbb{R}^{d}$.

We now prove the martingale central limit theorem.2121 21 Jean Jacod and Philip Protter, Probability Essentials, second ed., p. 235, Theorem 27.7.

Theorem 16 (Martingale central limit theorem).

Suppose $X_{j}$ is a sequence in $L^{3}(\Omega,\mathscr{A},P)$ satisfying the following, with $\mathscr{F}_{k}=\sigma(X_{1},\ldots,X_{k})$:

1. 1.

$E(X_{j}|\mathscr{F}_{j-1})=0$.

2. 2.

$E(X_{j}^{2}|\mathscr{F}_{j-1})=1$.

3. 3.

There is some $K$ for which $E(|X_{j}|^{3}|\mathscr{F}_{j-1})\leq K$.

Then $\frac{S_{n}}{\sqrt{n}}$ converges in distribution to some random variable $Z:\Omega\to\mathbb{R}$ with $Z_{*}P=\gamma_{1}$, where

 $S_{n}=\sum_{j=1}^{n}X_{j}.$
Proof.

For positive integers $n$ and $j$, define

 $\phi_{n,j}(u)=E(e^{iu\frac{1}{\sqrt{n}}X_{j}}|\mathscr{F}_{j-1}).$

For each $\omega\in\Omega$, by Taylor’s theorem there is some $\xi_{n,j}(\omega)$ between $0$ and $X_{j}(\omega)$ such that

 $e^{iu\frac{1}{\sqrt{n}}X_{j}(\omega)}=1+iu\frac{1}{\sqrt{n}}X_{j}(\omega)-% \frac{u^{2}}{2n}X_{j}(\omega)^{2}-\frac{iu^{3}}{6n^{3/2}}\xi_{n,j}(\omega)^{3}.$

Because $f\mapsto E(f|\mathscr{F}_{j-1})$ is a positive operator and $|\xi_{n,j}|^{3}\leq|X_{j}|^{3}$, we have, by the last hypothesis of the theorem,

 $E(|\xi_{n,j}|^{3}|\mathscr{F}_{j-1})\leq E(|X_{j}|^{3}|\mathscr{F}_{j-1})\leq K$ (3)

we use this inequality later in the proof. Now, using that $E(X_{j}|\mathscr{F}_{j-1})=0$ and $E(X_{j}^{2}|\mathscr{F}_{j-1})=1$,

 $\displaystyle\phi_{n,j}(u)$ $\displaystyle=1+iu\frac{1}{\sqrt{n}}E(X_{j}|\mathscr{F}_{j-1})-\frac{u^{2}}{2n% }E(X_{j}^{2}|\mathscr{F}_{j-1})-\frac{iu^{3}}{6n^{3/2}}E(\xi_{n,j}^{3}|% \mathscr{F}_{j-1})$ $\displaystyle=1-\frac{u^{2}}{2n}-\frac{iu^{3}}{6n^{3/2}}E(\xi_{n,j}^{3}|% \mathscr{F}_{j-1}).$

For $p\geq 1$,

 $\displaystyle E(e^{iu\frac{1}{\sqrt{n}}S_{p}})$ $\displaystyle=E(e^{iu\frac{1}{\sqrt{n}}S_{p-1}}e^{iu\frac{1}{\sqrt{n}}X_{p}})$ $\displaystyle=E(E(e^{iu\frac{1}{\sqrt{n}}S_{p-1}}e^{iu\frac{1}{\sqrt{n}}X_{p}}% |\mathscr{F}_{p-1}))$ $\displaystyle=E(e^{iu\frac{1}{\sqrt{n}}S_{p-1}}E(e^{iu\frac{1}{\sqrt{n}}X_{p}}% |\mathscr{F}_{p-1}))$ $\displaystyle=E(e^{iu\frac{1}{\sqrt{n}}S_{p-1}}\phi_{n,p}(u))$ $\displaystyle=E\left(e^{iu\frac{1}{\sqrt{n}}S_{p-1}}\left(1-\frac{u^{2}}{2n}-% \frac{iu^{3}}{6n^{3/2}}E\left(\xi_{n,p}^{3}|\mathscr{F}_{p-1}\right)\right)% \right),$

which we write as

 $E\left(e^{i\frac{u}{\sqrt{n}}S_{p}}-\left(1-\frac{u^{2}}{2n}\right)e^{i\frac{u% }{\sqrt{n}}S_{p-1}}\right)=-E\left(e^{i\frac{u}{\sqrt{n}}S_{p-1}}\frac{iu^{3}}% {6n^{3/2}}E\left(\xi_{n,p}^{3}|\mathscr{F}_{p-1}\right)\right).$

Now using (3) we get

 $\begin{split}&\displaystyle\left|E\left(e^{i\frac{u}{\sqrt{n}}S_{p}}-\left(1-% \frac{u^{2}}{2n}\right)e^{i\frac{u}{\sqrt{n}}S_{p-1}}\right)\right|\\ \displaystyle\leq&\displaystyle E\left(\left|e^{i\frac{u}{\sqrt{n}}S_{p-1}}% \frac{iu^{3}}{6n^{3/2}}E\left(\xi_{n,p}^{3}|\mathscr{F}_{p-1}\right)\right|% \right)\\ \displaystyle=&\displaystyle E\left(\frac{|u|^{3}}{6n^{3/2}}\left|E\left(\xi_{% n,p}^{3}|\mathscr{F}_{p-1}\right)\right|\right)\\ &\displaystyle\leq\frac{|u|^{3}}{6n^{3/2}}\cdot K.\end{split}$

Let $u\in\mathbb{R}$ and let $n=n(u)$ be large enough so that $0\leq 1-\frac{u^{2}}{2n}\leq 1$. For $1\leq p\leq n$, multiplying the above inequality by $\left(1-\frac{u^{2}}{2n}\right)^{n-p}$ yields

 $\left|\left(1-\frac{u^{2}}{2n}\right)^{n-p}E(e^{iu\frac{1}{\sqrt{n}}S_{p}})-% \left(1-\frac{u^{2}}{2n}\right)^{n-p+1}E(e^{iu\frac{1}{\sqrt{n}}S_{p-1}}\right% |\leq K\frac{|u|^{3}}{6n^{3/2}}.$ (4)

Now, because $\sum_{p=1}^{n}(a_{p}-a_{p-1})=a_{n}-a_{0}$,

 $\begin{split}&\displaystyle\sum_{p=1}^{n}\left(\left(1-\frac{u^{2}}{2n}\right)% ^{n-p}E(e^{iu\frac{1}{\sqrt{n}}S_{p}})-\left(1-\frac{u^{2}}{2n}\right)^{n-(p-1% )}E(e^{iu\frac{1}{\sqrt{n}}S_{p-1}})\right)\\ \displaystyle=&\displaystyle E(e^{iu\frac{1}{\sqrt{n}}S_{n}})-\left(1-\frac{u^% {2}}{2n}\right)^{n}E(e^{iu\frac{1}{\sqrt{n}}S_{0}})\\ \displaystyle=&\displaystyle E(e^{iu\frac{1}{\sqrt{n}}S_{n}})-\left(1-\frac{u^% {2}}{2n}\right)^{n}.\end{split}$

Using this with (4) gives

 $\left|E(e^{iu\frac{1}{\sqrt{n}}S_{n}})-\left(1-\frac{u^{2}}{2n}\right)^{n}% \right|\leq n\cdot K\frac{|u|^{3}}{6n^{3/2}}=K\frac{|u|^{3}}{6n^{1/2}}.$

But if $|a_{n}-b_{n}|\leq c_{n}$, $c_{n}\to 0$, and $b_{n}\to b$, then $a_{n}\to b$. As

 $\lim_{n\to\infty}\left(1-\frac{u^{2}}{2n}\right)^{n}=e^{-\frac{u^{2}}{2}}$

and $K\frac{|u|^{3}}{6n^{1/2}}\to 0$, we therefore get that

 $E(e^{iu\frac{1}{\sqrt{n}}S_{n}})\to e^{-\frac{u^{2}}{2}}$

as $n\to\infty$.

Let $\mu_{n}=\left(\frac{S_{n}}{\sqrt{n}}\right)_{*}P$ and let $\phi(u)=e^{-\frac{u^{2}}{2}}$. We have just established that $\tilde{\mu}_{n}\to\phi$ pointwise. The function $\phi$ is continuous at $0$, so Lévy’s continuity theorem tells us that there is a Borel probability measure $\mu$ on $\mathbb{R}$ such that $\phi=\tilde{\mu}$ and such that $\mu_{n}$ converges narrowly to $\mu$. But $\phi(u)=e^{-\frac{u^{2}}{2}}$ is the characteristic function of $\gamma_{1}$, so we have that $\mu_{n}$ converges narrowly to $\gamma_{1}$.