Martingales, Lévy’s continuity theorem, and the martingale central limit theorem

Jordan Bell
May 29, 2015

1 Introduction

In this note, any statement we make about filtrations and martingales is about filtrations and martingales indexed by the positive integers, rather than the nonnegative real numbers.

We take

inf=,

and for m>n, we take

k=mn=0.

(Defined rightly, these are not merely convenient ad hoc definitions.)

2 Conditional expectation

Let (Ω,𝒜,P) be a probability space and let be a sub-σ-algebra of 𝒜. For each fL1(Ω,𝒜,P), there is some g:Ω such that (i) g is -measurable and (ii) for each B, Bg𝑑P=Bf𝑑P, and if h:Ω satisfies (i) and (ii) then h(ω)=g(ω) for almost all ωΩ.11 1 Manfred Einsiedler and Thomas Ward, Ergodic Theory: with a view towards Number Theory, p. 121, Theorem 5.1. We denote some g:Ω satisfying (i) and (ii) by E(f|), called the conditional expectation of f with respect to B. In other words, E(f|) is the unique element of L1(Ω,,P) such that for each B,

BE(f|)𝑑P=Bf𝑑P.

The map fE(f|) satisfies the following:

  1. 1.

    fE(f|) is positive linear operator L1(Ω,𝒜,P)L1(Ω,,P) with norm 1.

  2. 2.

    If fL1(Ω,𝒜,P) and gL(Ω,,P), then for almost all ωΩ,

    E(gf|)(ω)=g(ω)E(f|)(ω).
  3. 3.

    If 𝒞 is a sub-σ-algebra of , then for almost all ωΩ,

    E(E(f|)|𝒞)(ω)=E(f|𝒞)(ω).
  4. 4.

    If fL1(Ω,,P) then for almost all ωΩ,

    E(f|)(ω)=f(ω).
  5. 5.

    If fL1(Ω,𝒜,P), then for almost all ωΩ,

    |E(f|)(ω)|E(|f||)(ω).
  6. 6.

    If fL1(Ω,𝒜,P) is independent of , then for almost all ωΩ,

    E(f|)(ω)=E(f).

3 Filtrations

A filtration of a σ-algebra 𝒜 is a sequence n, n1, of sub-σ-algebras of 𝒜 such that mn if mn. We set 0={,Ω}.

A sequence of random variables ξn:(Ω,𝒜,P) is said to be adapted to the filtration Fn if for each n, ξn is n-measurable.

Let ξn:(Ω,𝒜,P), n1, be a sequence of random variables. The natural filtration of A corresponding to ξn is

n=σ(ξ1,,ξn).

It is apparent that n is a filtration and that the sequence ξn is adapted to n.

4 Martingales

Let n be a filtration of a σ-algebra 𝒜 and let ξn:(Ω,𝒜,P) be a sequence of random variables. We say that ξn is a martingale with respect to Fn if (i) the sequence ξn is adapted to the filtration n, (ii) for each n, ξnL1(P), and (iii) for each n, for almost all ωΩ,

E(ξn+1|n)(ω)=ξn(ω).

In particular,

E(ξ1)=E(ξ2)=,

i.e.

E(ξm)=E(ξn),mn.

We say that ξn is a submartingale with respect to Fn if (i) and (ii) above are true, and if for each n, for almost all ωΩ,

E(ξn+1|n)(ω)ξn(ω).

In particular,

E(ξ1)E(ξ2),

i.e.

E(ξm)E(ξn),mn.

We say that ξn is a supermartingale with respect to Fn if (i) and (ii) above are true, and if for each n, for almost all ωΩ,

ξn(ω)E(ξn+1|n)(ω).

In particular,

E(ξ1)E(ξ2),

i.e.

E(ξm)E(ξn),mn.

If we speak about a martingale without specifying a filtration, we mean a martingale with respect to the natural filtration corresponding to the sequence of random variables.

5 Stopping times

Let n be a filtration of a σ-algebra 𝒜. A stopping time with respect to Fn is a function τ:Ω{1,2,}{} such that for each n1,

{ωΩ:τ(ω)=n}n.

It is straightforward to check that a function τ:Ω{1,2,}{} is a stopping time with respect to n if and only if for each n1,

{ωΩ:τ(ω)n}n.

The following lemma shows that the time of first entry into a Borel subset of of a sequence of random variables adapted to a filtration is a stopping time.22 2 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 55, Exercise 3.9.

Lemma 1.

Let ξn be a sequence of random variables adapted to a filtration Fn and let BBR. Then

τ(ω)=inf{n1:ξn(ω)B}

is a stopping time with respect to Fn.

Proof.

Let n1. Then

{ωΩ:τ(ω)=n} =(k=1n-1{ωΩ:ξk(ω)B}){ωΩ:ξn(ω)B}
=(k=1n-1Akc)An,

where

Ak={ωΩ:ξk(ω)B}.

Because the sequence ξk is adapated to the filtration k, Akck and Ann, and because k is a filtration, the right-hand side of the above belongs to n. ∎

If ξn is a sequence of random variables adapted to a filtration n and a stopping time τ with respect to n, for n1 we define ξτn:Ω by

ξτn(ω)=ξτ(ω)n(ω),ωΩ.

ξτn is called the sequence ξn stopped at τ.33 3 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 55, Exercise 3.10.

Lemma 2.

ξτn:(Ω,𝒜,P) is a sequence of random variables adapted to the filtration Fn.

Proof.

Let n1 and let B. Because

{ω:ξτn(ω)B,τ(ω)>n}={ω:ξn(ω)B,τ(ω)>n}

and for any k,

{ω:ξτn(ω)B,τ(ω)=k}={ω:ξkB,τ(ω)=k},

we get

{ω:ξτn(ω)B}={ω:ξn(ω)B,τ(ω)>n}k=1n{ω:ξk(ω)B,τ(ω)=k}.

But

{ξnB,τ>n}={ξnB}{τ>n}n

and

{ξkB,τ=k}={ξkB}{τ=k}k,

and therefore

{ξτnB}n.

In particular, {ξτnB}𝒜, namely, ξτn is a random variable, and the above shows that this sequence is adapted to the filtration n. ∎

We now prove that a stopped martingale is itself a martingale with respect to the same filtration.44 4 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 56, Proposition 3.2.

Theorem 3.

Let Fn be a filtration of a σ-algebra A and let τ be a stopping time with respect to Fn.

  1. 1.

    If ξn is a submartingale with respect to n then so is ξτn.

  2. 2.

    If ξn is a supermartingale with respect to n then so is ξτn.

  3. 3.

    If ξn is a martingale with respect to n then so is ξτn.

Proof.

For n1, define

αn(ω)={1τ(ω)n0τ(ω)<n;

we remark that τ(ω)n if and only if τ(ω)>n-1 and τ(ω)<n if and only if τ(ω)n-1. For B, (i) if 0,1B then

{ωΩ:αn(ω)B}=n-1,

(ii) if 0,1B then

{ωΩ:αn(ω)B}=Ωn-1,

(iii) if 0B and 1B then

{ωΩ:αn(ω)B}={ωΩ:αn(ω)=0}={ωΩ:τ(ω)n-1}n-1,

and (iv) if 1B and 0B then

{ωΩ:αn(ω)B}={ωΩ:αn(ω)=1}={ωΩ:τ(ω)>n-1}n-1,

Therefore {αnB}n-1.

Set ξ0=0, and we check that

ξτn=k=1nαk(ξk-ξk-1).

It is apparent from this expression that if ξn is adapted to n then ξτn is adapted to n, and that if each ξn belongs to L1(P) then each ξτn belongs to L1(P). As each of α1,,αn+1 is n-measurable and is bounded,

E(ξτ(n+1)|n)=k=1n+1E(αk(ξk-ξk-1)|n)=k=1n+1αkE(ξk-ξk-1|n). (1)

Suppose that ξn is a submartingale. By (1),

E(ξτ(n+1)|n) =k=1nαk(ξk-ξk-1)+αn+1E(ξn+1|n)-αn+1ξn
ξτn+αn+1ξn-αn+1ξn
=ξτn,

which shows that ξτn is a submartingale; the statement that E(ξτ(n+1)|n)ξτn means that E(ξτ(n+1)|n)(ω)ξτn(ω) for almost all ωΩ. ∎

We now prove the optional stopping theorem.55 5 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 58, Theorem 3.1.

Theorem 4 (Optional stopping theorem).

Let Fn be a filtration of a σ-algebra A, let ξn be a martingale with respect to Fn, and let τ be a stopping time with respect to Fn. Suppose that:

  1. 1.

    For almost all ωΩ, τ(ω)<.

  2. 2.

    ξτL1(Ω,𝒜,P).

  3. 3.

    E(ξn1{τ>n})0 as n.

Then

E(ξτ)=E(ξ1).
Proof.

For each n, Ω={τn}{τ>n}, and therefore

ξτ=ξτn+ξτ1{τ>n}-ξn1{τ>n}=ξτn+k=n+1ξk1{τ=k}-ξn1{τ>n}.

Theorem 3 tells us that ξτn is a martingale with respect to to n, and hence

E(ξτn)=E(ξτ1)=E(ξ1),

so

E(ξτ)=E(ξ1)+k=n+1E(ξk1{τ=k})-E(ξn1{τ>n}). (2)

But as ξτL1(P),

Ω(ξτ)(ω)𝑑P(ω)=k=1{τ=k}ξk(ω)𝑑P(ω)=k=1E(ξk1{τ=k}),

and the fact that this series converges means that k=n+1E(ξk1{τ=k})0. With the hypothesis E(ξn1{τ>n})0, as n we have

E(ξ1)+k=n+1E(ξk1{τ=k})-E(ξn1{τ>n})E(ξ1).

But (2) is true for each n, so we get E(ξτ)=E(ξ1), proving the claim. ∎

Suppose that ηn is a sequence of independent random variables each with the Rademacher distribution:

P(ηn=1)=12,P(ηn=-1)=12.

Let ξn=k=1nηk and let n=σ(η1,,ηn). Because

ξn+12=(ξn+ηn+1)2=ηn+12+2ηn+1ξn+ξn2,

we have, as ξn is n-measurable and belongs to L(P) and as ηn+1 is independent of the σ-algebra n,

E(ξn+12-(n+1)|n) =E(ηn+12+2ηn+1ξn+ξn2-(n+1)|n)
=E(ηn+12)+2ξnE(ηn+1)+ξn2-(n+1)
=1+0+ξn2-(n+1)
=ξn2-n.

Therefore, ξn2-n is a martingale with respect to n.

Let K be a positive integer and let

τ=inf{n1:|ξn|=K}.

Namely, τ is the time of first entry in the Borel subset {-K,K} of , hence by Lemma 1 is a stopping time with respect to the filtration n. With some work,66 6 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 59, Example 3.7. one shows that (i) P(τ>2Kn)0 as n, (ii) E(|ξτ2-τ|)<, and (iii) E((ξn2-n)1{τ>n})0 as n. Then we can apply the optional stopping theorem to the martingale ξn2-n: we get that

E(ξτ2-τ)=E(ξ12-1)=E(ξ12)-1=E(η12)-1=0.

Hence

E(τ)=E(ξτ2).

But |ξτ|=K, so ξτ2=K2, hence

E(τ)=E(K2)=K2.

6 Maximal inequalities

We now prove Doob’s maximal inequality.77 7 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 68, Proposition 4.1.

Theorem 5 (Doob’s maximal inequality).

Suppose that Fn is a filtration of a σ-algebra A, that ξn is a submartingale with respect to Fn, and that for each n, ξn0. Then for each n1 and λ>0,

λP(max1knξkλ)E(ξn1{max1knξkλ}).
Proof.

Define ζn(ω)=max1knξk(ω), which is n-measurable, and define τ:Ω{1,,n} by

τ(ω)=min{1kn:ξk(ω)λ}

if there is some 1kn for which ξk(ω)λ, and τ(ω)=n otherwise. For 1kn,

{τ=k}=(j=1k-1{ξk<λ}){ξkλ}k,

and for k>n,

{τ=k}=k,

showing that τ is a stopping time with respect to the filtration k.

For k1,

ξk+1-ξτ(k+1) =j=1k1{τ=j}(ξk+1-ξτ(k+1))=j=1k1{τ=j}(ξk+1-ξj),

hence, because τ is a stopping time with respect to the filtration k and because ξk is a submartingale with respect to this filtration,

E(ξk+1-ξτ(k+1)|k) =j=1k1{τ=j}E((ξk+1-ξj)|k)
=j=1k1{τ=j}(E(ξk+1|k)-ξj)
j=1k1{τ=j}(ξk-ξj)
=j=1k-11{τ=j}(ξk-ξj)
=ξk-ξτk,

from which we have that the sequence ξk-ξτk is a submartingale with respect to the filtration k. Therefore

E(ξk-ξτk)E(ξ1-ξτ1)=E(ξ1)-E(ξτ1)=E(ξ1)-E(ξ1)=0,

and so E(ξτk)E(ξk). Because τn=τ, this yields

E(ξτ)E(ξn).

We have

E(ξτ)=E(ξτ1{ζnλ})+E(ξτ1{ζn<λ}).

If ω{ζnλ} then (ξτ)(ω)λ, and if ω{ζn<λ} then τ(ω)=n and so (ξτ)(ω)=ξn(ω). Therefore

E(ξτ)E(λ1{ζnλ})+E(ξn1{ζn<λ})=λP(ζnλ)+E(ξn1{ζn<λ}).

Therefore

λP(ζnλ)+E(ξn1{ζn<λ})E(ξn).

But ξn=ξn1ζn<λ+ξn1ζnλ, hence

λP(ζnλ)E(ξn1{ζnλ}),

which proves the claim. ∎

The following is Doob’s L2 maximal inequality, which we prove using Doob’s maximal inequality.88 8 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 68, Theorem 4.1.

Theorem 6 (Doob’s L2 maximal inequality).

Suppose that Fn is a filtration of a σ-algebra A and that ξn is a submartingale with respect to Fn such that for each n1, ξn0 and ξnL2(P). Then for each n1,

E(|max1knξk|2)4E(ξn2).
Proof.

Define ζn(ω)=max1knξk(ω). It is a fact that if ηL2(P) and η0 then

E(η2)=20tP(ηt)dt.

Using this, Doob’s maximal inequality, Fubini’s theorem, and the Cauchy-Schwarz inequality,

E(ζn2) =20tP(ζn>t)dt
20E(ξn1{ζnt}dt
=20({ζnt}ξn(ω)𝑑P(ω))𝑑t
=2Ω(0ζn(ω)𝑑t)ξn(ω)𝑑P(ω)
=2Ωζn(ω)ξn(ω)𝑑P(ω)
2(E(ζn2))1/2(E(ξn2))1/2.

If E(ζn2)=0 the claim is immediate. Otherwise, we divide this inequality by (E(ζn2))1/2 and obtain

(E(ζn2))1/22(E(ξn2))1/2,

and so

E(ζn2)4E(ξn2),

proving the claim. ∎

7 Upcrossings

Suppose that ξn is a sequence of random variables that is adapted to a filtration n and let a<b be real numbers. Define

τ0=0,

and by induction for m1,

σm(ω)=inf{kτm-1(ω):ξk(ω)a}

and

τm(ω)=inf{kσm(ω):ξk(ω)b},

where inf=. For each m, τm and σm are each stopping times with respect to the filtration k. For n0 we define

Un[a,b](ω)=sup{m0:τm(ω)n}.

For x, we write

x-=max{0,-x}=-min{0,x}.

namely, the negative part of x.

We now prove the upcrossings inequality.

Theorem 7 (Upcrossings inequality).

If ξn, n1, is a supermartingale with respect to a filtration Fn and a<b, then for each n1,

(b-a)E(Un[a,b])E((ξn-a)-).
Proof.

For n1 and ωΩ, and writing N=Un[a,b](ω), for which Nn, we have

m=1n(ξτmn(ω)-ξσmn(ω))=m=1N(ξτmn(ω)-ξσmn(ω))+ξτN+1n(ω)-ξσN+1n(ω)+m=N+2n(ξτmn(ω)-ξσmn(ω))=m=1N(ξτm(ω)-ξσm(ω))+ξn(ω)-ξσN+1n(ω)+m=N+1n(ξn(ω)-ξn(ω))=m=1N(ξτm(ω)-ξσm(ω))+1{σN+1n}(ω)(ξn(ω)-ξσN+1(ω))m=1N(b-a)+1{σN+1n}(ω)(ξn(ω)-ξσN+1(ω)).

Because ξσN+1(ω)a, we have

(b-a)N1{σN+1n}(ω)(a-ξn(ω))+m=1n(ξτmn(ω)-ξσmn(ω)).

One proves that99 9 I am not this “one”. I have not sorted out why this inequality is true. In every proof of the upcrossings inequality I have seen there are pictures and things like this are asserted to be obvious. I am not satisfied with that reasoning; one should not have to interpret an inequality visually to prove it.

1{σN+1n}(ω)(a-ξn(ω))min{0,a-ξn(ω)}=(ξn(ω)-a)-.

Thus

(b-a)E(Un[a,b])E((ξn-a)-)+m=1nE(ξτmn-ξσmn).

Using that ξn is a supermartingale, for each 1mn,

E(ξτmn-ξσmn) 0.

Therefore

(b-a)E(Un[a,b])E((ξn-a)-).

8 Doob’s martingale convergence theorem

We now use the uprossings inequality to prove Doob’s martingale convergence theorem.1010 10 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 71, Theorem 4.2.

Theorem 8 (Doob’s martingale convergence theorem).

Suppose that ξn, n1, is a supermartingale with respect to a filtration Fn and that

M=supnE(|ξn|)<.

Then there is some ξL1(Ω,A,P) such that for almost all ωΩ,

limnξn(ω)=ξ(ω)

and with E(|ξ|)M.

Proof.

For any a<b and n1, the upcrossings inequality tells us that

E(Un[a,b])E(ξn-a)-)b-aE(|ξn-a|)b-aE(|ξn|+|a|)b-aM+|a|b-a.

For each ωΩ, the sequence Un[a,b](ω)[0,) is nondecreasing, so by the monotone convergence theorem,

E(limnUn[a,b])=limnE(Un[a,b])M+|a|b-a.

This implies that

P(ωΩ:limnUn[a,b](ω)<)=1.

Let

A=a,b,a<b{ωΩ:limnUn[a,b](ω)<}.

This is an intersection of countably many sets each with measure 1, so P(A)=1.

Let

B={ωΩ:lim infnξn(ω)<lim supnξn(ω)}.

If ωB, then there are a,b, a<b, such that

lim infnξn(ω)<a<b<lim supnξn(ω).

It follows from this limnUn[a,b](ω)=. Thus ωA, so BA=, and because P(A)=1 we get P(B)=0.

We define ξ:Ω by

ξ(ω)={limnξn(ω)ωB0ωB,

which is Borel measurable. Furthermore, since |ξ|=lim infn|ξn| almost everywhere, by Fatou’s lemma we obtain

E(|ξ|) =E(lim infn|ξn|)
lim infnE(|ξn|)
supnE(|ξn|)
=M.

9 Uniform integrability

Let ξ:(Ω,𝒜,P) be a random variable. It is a fact1111 11 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 73, Exercise 4.3. that ξL1 if and only if for each ϵ>0 there is some M such that

{|ξ|>M}|ξ|𝑑P<ϵ.

(One’s instinct might be to try to use the Cauchy-Schwarz inequality to prove this. This doesn’t work.) Thus, if ξn is a sequence in L1(Ω,𝒜,P) then for each ϵ>0 there are Mn such that, for each n,

{|ξn|>Mn}|ξn|𝑑P<ϵ.

A sequence of random variables ξn is said to be uniformly integrable if for each ϵ>0 there is some M such that, for each n,

{|ξn|>M}|ξn|𝑑P<ϵ.

If a sequence ξn is uniformly integrable, then there is some M such that for each n,

{|ξn|>M}|ξn|𝑑P<1,

and so

E(|ξn|)={|ξn|M}|ξn|𝑑P+{|ξn|>M}|ξn|𝑑P<{|ξn|M}M𝑑P+1M+1.

The following lemma states that the conditional expectations of an integrable random variable with respect to a filtration is a uniformly integrable martingale with respect to that filtration.1212 12 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 75, Exercise 4.5.

Lemma 9.

Suppose that ξL1(Ω,A,P) and that Fn is a filtration of A. Then E(ξ|Fn) is a martingale with respect to Fn and is uniformly integrable.

We now prove that a uniformly integrable supermartingale converges in L1.1313 13 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 76, Theorem 4.3.

Theorem 10.

Suppose that ξn is a supermartingale with respect to a filtration Fn, and that the sequence ξn is uniformly integrable. Then there is some ξL1(Ω,A,P) such that ξnξ in L1.

Proof.

Because the sequence ξn is uniformly integrable, there is some M such that for each n1,

E(|ξn|)M+1.

Thus, because ξn is a supermartingale, Doob’s martingale convergence theorem tells us that there is some ξL1(Ω,𝒜,P) such that for almost all ωΩ,

limnξn(ω)=ξ(ω).

Because ξn is uniformly integrable and converges almost surely to ξ, the Vitali convergence theorem1414 14 V. I. Bogachev, Measure Theory, volume I, p. 268, Theorem 4.5.4; http://individual.utoronto.ca/jordanbell/notes/L0.pdf, p. 8, Theorem 9. tells us that ξnξ in L1. ∎

The above theorem shows in particular that a uniformly integrable martingale converges to some limit in L1. The following theorem shows that the terms of the sequence are equal to the conditional expectations of this limit with respect to the natural filtration.1515 15 Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, p. 77, Theorem 4.4.

Theorem 11.

Suppose that a sequence of random variables ξn is uniformly integrable and is a martingale with respect to its natural filtration

n=σ(ξ1,,ξn).

Then there is some ξL1(Ω,A,P) such that ξnξ in L1 and such that for each n1, for almost all ωΩ,

ξn(ω)=E(ξ|n)(ω).
Proof.

By Theorem 10, there is some ξL1(Ω,𝒜,P) such that ξnξ in L1. The hypothesis that the sequence ξn is a martingale with respect to n tells us that for that for n1 and for any mn,

E(ξm|n)=ξn,

and so for An,

Aξm𝑑P=AE(ξm|n)𝑑P=Aξn𝑑P.

Thus

|A(ξn-ξ)𝑑P| =|A(ξm-ξ)𝑑P|
A|ξm-ξ|𝑑P
E(|ξm-ξ|).

But E(|ξm-ξ|)0 as m. Since m does not appear in the left-hand side, we have

|A(ξn-ξ)𝑑P|=0,

and thus

Aξn𝑑P=Aξ𝑑P.

But E(f|n) is the unique element of L1(Ω,n,P) such that for each An,

AE(f|n)𝑑P=Af𝑑P,

and because ξn satisfies this, we get that ξn=E(f|n) in L1, i.e., for almost all ωΩ,

ξn(ω)=E(f|n)(ω),

proving the claim. ∎

10 Lévy’s continuity theorem

For a metrizable topological space X, we denote by 𝒫(X) the set of Borel probability measures on X. The narrow topology on P(X) is the coarsest topology such that for each fCb(X), the map

μXf𝑑μ

is continuous 𝒫(X).

A subset of 𝒫(X) is called tight if for each ϵ>0 there is a compact subset Kϵ of X such that if μ then μ(XKϵ)<ϵ, i.e. μ(Kϵ)>1-ϵ. (An element μ of 𝒫(X) is called tight when {μ} is a tight subset of 𝒫(X).)

For a Borel probability measure μ on d, we define its characteristic function μ~:d by

μ~(u)=deixu𝑑μ(x),ud.

μ~ is bounded by 1 and is uniformly continuous. Because μ(d)=1,

μ~(0)=1.
Lemma 12.

Let μP(R). For δ>0,

μ({x:|x|2δ})1δ-δδ(1-μ~(u))𝑑u;

in particular, the right-hand side of this inequality is real.

Proof.

Using Fubini’s theorem and the fact that all real t, 1-sintt0,

-δδ(1-μ~(u))𝑑u =-δδ((1-eixu)𝑑μ(x))𝑑u
=(-δδ1-eiuxdu)𝑑μ(x)
=(u-eiuxix)-δδ𝑑μ(x)
=(2δ-eiδxix+e-iδxix)𝑑μ(x)
=2δ(1-sin(δx)δx)𝑑μ(x)
2δ|δx|2(1-sin(δx)δx)𝑑μ(x)
2δ|δx|2(1-1|δx|)𝑑μ(x)
2δ|δx|212𝑑μ(x)
=δμ({x:|δx|2}).

The following lemma gives a condition on the characteristic functions of a sequence of Borel probability measures on under which the sequence is tight.1616 16 Krishna B. Athreya and Soumendra N. Lahiri, Measure Theory and Probability Theory, p. 329, Lemma 10.3.3.

Lemma 13.

Suppose that μnP(R) and that μ~n converges pointwise to a function ϕ:RC that is continuous at 0. Then the sequence μn is tight.

Proof.

Write ϕn=μ~n. Because |ϕn|1, for each δ>0, by the dominated convergence theorem we have

1δ-δδ(1-ϕn(t))𝑑t1δ-δδ(1-ϕ(t))𝑑t.

On the other hand, that ϕ is continuous at 0 implies that for any ϵ>0 there is some η>0 such that when |t|<η, |ϕ(t)-1|<ϵ, and hence for δ<η,

1δ-δδ(1-ϕ(t))𝑑t2sup|t|δ|1-ϕ(t)|2ϵ,

thus

1δ-δδ(1-ϕ(t))𝑑t0,δ0.

Let ϵ>0. There is some δ>0 for which

|1δ-δδ(1-ϕ(t))𝑑t|<ϵ.

Then there is some nδ such that when nnδ,

|1δ-δδ(1-ϕn(t))𝑑t-1δ-δδ(1-ϕ(t))𝑑t|<ϵ,

whence

|1δ-δδ(1-ϕn(t))𝑑t|<2ϵ.

Lemma 12 then says

μn({x:|x|2δ})1δ-δδ(1-ϕn(t))𝑑t<2ϵ.

Furthermore, any Borel probability measure on a Polish space is tight (Ulam’s theorem).1717 17 Alexander S. Kechris, Classical Descriptive Set Theory, p. 107, Theorem 17.11. Thus, for each 1n<nδ, there is a compact set Kn for which μn(Kn)<ϵ. Let

Kϵ=K1Knδ-1{x:|x|2δ},

which is a compact set, and for any n1,

μn(Kϵ)<2ϵ,

showing that the sequence μn is tight. ∎

For metrizable spaces X1,,Xd, let X=i=1dXi and let πi:XXi be the projection map. We establish that if is a subset of 𝒫(X) such that for each 1id the family of ith marginals of is tight, then itself is tight.1818 18 Luigi Ambrosio, Nicola Gigli, and Giuseppe Savare, Gradient Flows: In Metric Spaces and in the Space of Probability Measures, p. 119, Lemma 5.2.2; V. I. Bogachev, Measure Theory, volume II, p. 94, Lemma 7.6.6.

Lemma 14.

Let X1,,Xd be metrizable topological spaces, let X=i=1dXi, and let HP(X). Suppose that for each 1id,

i={πi*μ:μ}

is a tight set in P(Xi). Then H is a tight set in P(X).

Proof.

For μ, write μi=πi*μ. Let ϵ>0 and take 1id. Because i is tight, there is a compact subset Ki of Xi such that for all μii,

μi(XiKi)<ϵd.

Let

K=i=1dKi=i=1dπi-1(Ki).

Then for any μ,

μ(XK) =μ(Xi=1dπi-1(Ki))
=μ(i=1dπi-1(Ki)c)
=μ(i=1dπi-1(XiKi))
i=1dμ(πi-1(XiKi))
=i=1dμi(XiKi)
<i=1dϵd
=ϵ,

which shows that is tight. ∎

We now prove Lévy’s continuity theorem, which we shall use to prove the martingale central limit theorem.1919 19 cf. Jean Jacod and Philip Protter, Probability Essentials, second ed., p. 167, Theorem 19.1.

Theorem 15 (Lévy’s continuity theorem).

Suppose that μnP(Rd), n1.

  1. 1.

    If μ𝒫(d) and μnμ narrowly, then for any ud,

    μ~n(u)μ~(u),n.
  2. 2.

    If there is some ϕ:d to which μ~n converges pointwise and ϕ is continuous at 0, then there is some μ𝒫(d) such that ϕ=μ~ and such that μnμ narrowly.

Proof.

Suppose that μnμ narrowly. For each ud, the function xeixu is continuous d and is bounded, so

μ~n(u)=deixu𝑑μn(x)deixu𝑑μ(x)=μ~(u).

Suppose that μ~n converges pointwise to ϕ and that ϕ is continuous at 0. For 1id, let πi:d be the projection map and define ιi:d by taking the ith entry of ιi(t) to be t and the other entries to be 0. Fix 1id and write νn=πi*μn𝒫(), and for t we calculate

ν~n(t) =eist𝑑νn(s)
=deiπi(x)t𝑑μn(x)
=deixιi(t)𝑑μn(x)
=μ~n(ιi(t)),

so ν~n=μ~nιi. By hypothesis, ν~n converges pointwise to ϕιi. Because ϕ is continuous at 0d, the function ϕιi is continuous at 0. Then Lemma 13 tells us that the sequence νn is tight. That is, for each 1id, the set

{πi*μn:n1}

is tight in 𝒫(). Thus Lemma 14 tells us that the set

{μn:n1}

is tight in 𝒫(d).

Prokhorov’s theorem2020 20 V. I. Bogachev, Measure Theory, volume II, p. 202, Theorem 8.6.2. states that if X is a Polish space, then a subset of 𝒫(X) is tight if and only if each sequence of elements of has a subsequence that converges narrowly to some element of 𝒫(X). Thus, there is a subsequence μa(n) of μn and some μ𝒫(d) such that μa(n) converges narrowly to μ. By the first part of the theorem, we get that μ~a(n) converges pointwise to μ~. But by hypothesis μ~n converges pointwise to ϕ, so ϕ=μ~.

Finally we prove that μnμ narrowly. Let μb(n) be a subsequence of μn. Because {μn:n1} is tight, Prokhorov’s theorem tells us that there is a subsequence μc(n) of μb(n) that converges narrowly to some λ𝒫(d). By the first part of the theorem, μ~c(n) converges pointwise to λ~. By hypothesis μ~c(n) converges pointwise to ϕ, so λ~=ϕ=μ~. Then λ=μ. That is, any subsequence of μn itself has a subsequence that converges narrowly to μ, which implies that the sequence μn converges narrowly to μ. (For a sequence xn in a topological space X and xX, xnx if and only if each subsequence of xn has a subsequence that converges to x.) ∎

11 Martingale central limit theorem

Let γd be the standard Gaussian measure on Rd: γd has density

1(2π)de-12|x|2

with respect to Lebesgue measure on d.

We now prove the martingale central limit theorem.2121 21 Jean Jacod and Philip Protter, Probability Essentials, second ed., p. 235, Theorem 27.7.

Theorem 16 (Martingale central limit theorem).

Suppose Xj is a sequence in L3(Ω,A,P) satisfying the following, with Fk=σ(X1,,Xk):

  1. 1.

    E(Xj|j-1)=0.

  2. 2.

    E(Xj2|j-1)=1.

  3. 3.

    There is some K for which E(|Xj|3|j-1)K.

Then Snn converges in distribution to some random variable Z:ΩR with Z*P=γ1, where

Sn=j=1nXj.
Proof.

For positive integers n and j, define

ϕn,j(u)=E(eiu1nXj|j-1).

For each ωΩ, by Taylor’s theorem there is some ξn,j(ω) between 0 and Xj(ω) such that

eiu1nXj(ω)=1+iu1nXj(ω)-u22nXj(ω)2-iu36n3/2ξn,j(ω)3.

Because fE(f|j-1) is a positive operator and |ξn,j|3|Xj|3, we have, by the last hypothesis of the theorem,

E(|ξn,j|3|j-1)E(|Xj|3|j-1)K (3)

we use this inequality later in the proof. Now, using that E(Xj|j-1)=0 and E(Xj2|j-1)=1,

ϕn,j(u) =1+iu1nE(Xj|j-1)-u22nE(Xj2|j-1)-iu36n3/2E(ξn,j3|j-1)
=1-u22n-iu36n3/2E(ξn,j3|j-1).

For p1,

E(eiu1nSp) =E(eiu1nSp-1eiu1nXp)
=E(E(eiu1nSp-1eiu1nXp|p-1))
=E(eiu1nSp-1E(eiu1nXp|p-1))
=E(eiu1nSp-1ϕn,p(u))
=E(eiu1nSp-1(1-u22n-iu36n3/2E(ξn,p3|p-1))),

which we write as

E(eiunSp-(1-u22n)eiunSp-1)=-E(eiunSp-1iu36n3/2E(ξn,p3|p-1)).

Now using (3) we get

|E(eiunSp-(1-u22n)eiunSp-1)|E(|eiunSp-1iu36n3/2E(ξn,p3|p-1)|)=E(|u|36n3/2|E(ξn,p3|p-1)|)|u|36n3/2K.

Let u and let n=n(u) be large enough so that 01-u22n1. For 1pn, multiplying the above inequality by (1-u22n)n-p yields

|(1-u22n)n-pE(eiu1nSp)-(1-u22n)n-p+1E(eiu1nSp-1|K|u|36n3/2. (4)

Now, because p=1n(ap-ap-1)=an-a0,

p=1n((1-u22n)n-pE(eiu1nSp)-(1-u22n)n-(p-1)E(eiu1nSp-1))=E(eiu1nSn)-(1-u22n)nE(eiu1nS0)=E(eiu1nSn)-(1-u22n)n.

Using this with (4) gives

|E(eiu1nSn)-(1-u22n)n|nK|u|36n3/2=K|u|36n1/2.

But if |an-bn|cn, cn0, and bnb, then anb. As

limn(1-u22n)n=e-u22

and K|u|36n1/20, we therefore get that

E(eiu1nSn)e-u22

as n.

Let μn=(Snn)*P and let ϕ(u)=e-u22. We have just established that μ~nϕ pointwise. The function ϕ is continuous at 0, so Lévy’s continuity theorem tells us that there is a Borel probability measure μ on such that ϕ=μ~ and such that μn converges narrowly to μ. But ϕ(u)=e-u22 is the characteristic function of γ1, so we have that μn converges narrowly to γ1.