The Lindeberg central limit theorem

Jordan Bell
May 29, 2015

1 Convergence in distribution

We denote by 𝒫(d) the collection of Borel probability measures on d. Unless we say otherwise, we use the narrow topology on 𝒫(d): the coarsest topology such that for each fCb(d), the map

μdf𝑑μ

is continuous 𝒫(d). Because d is a Polish space it follows that 𝒫(d) is a Polish space.11 1 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 515, Theorem 15.15; http://individual.utoronto.ca/jordanbell/notes/narrow.pdf (In fact, its topology is induced by the Prokhorov metric.22 2 Onno van Gaans, Probability measures on metric spaces, http://www.math.leidenuniv.nl/~vangaans/jancol1.pdf; Bert Fristedt and Lawrence Gray, A Modern Approach to Probability Theory, p. 365, Theorem 25.)

2 Characteristic functions

For μ𝒫(d), we define its characteristic function μ~:d by

μ~(u)=deiux𝑑μ(x).
Theorem 1.

If μ𝒫() has finite kth moment, k0, then, writing ϕ=μ~:

  1. 1.

    ϕCk().

  2. 2.

    ϕ(k)(v)=(i)kxkeivx𝑑μ(x).

  3. 3.

    ϕ(k) is uniformly continuous.

  4. 4.

    |ϕk(v)||x|k𝑑μ(x).

Proof.

For 0lk, define fl: by

fl(v)=xleivx𝑑μ(x).

For h0,

|xleivxeihx-1h||xlx|=|x|l+1,

so by the dominated convergence theorem we have for 0lk-1,

limh0fl(v+h)-fl(v)h =limh0xleivxeihx-1h𝑑μ(x)
=xleivx(limh0eihx-1h)𝑑μ(x)
=ixl+1eivx𝑑μ(x).

That is,

fl=ifl+1.

And, by the dominated convergence, for ϵ>0 there is some δ>0 such that if |w|<δ then

|x|k|eiwx-1|𝑑μ(x)<ϵ,

hence if |v-u|<δ then

|fk(v)-fk(u)| =|xkeiux(ei(v-u)x-1)𝑑μ(x)|
|x|k|ei(v-u)x-1|𝑑μ(x)
<ϵ,

showing that fk is uniformly continuous. As well,

|fk(v)||x|k𝑑μ(x)

But ϕ=f0, i.e. ϕ(0)=f0, so

ϕ(1)=f0=if1,ϕ(2)=(if1)=(i)2f2,,ϕ(k)=(i)kfk.

If ϕCk(), Taylor’s theorem tells us that for each x,

ϕ(x) =l=0k-1ϕ(l)(0)l!xl+0x(x-t)k-1(k-1)!ϕ(k)(t)𝑑t
=l=0kϕ(l)(0)l!xl+0x(x-t)k-1(k-1)!(ϕ(k)(t)-ϕ(k)(0))𝑑t
=l=0kϕ(l)(0)l!xl+Rk(x),

and Rk(x) satisfies

|Rk(x)|(sup0u1|ϕ(k)(ux)-ϕ(k)(0)|)|x|kk!.

Define θk: by θk(0)=0 and for x0

θk(x)=k!xkRk(x),

with which, for all x,

ϕ(x)=l=0kϕ(l)(0)l!xl+1k!θk(x)xk.

Because Rk is continuous on , θk is continuous at each x0. Moreover,

|θk(x)|sup0u1|ϕ(k)(ux)-ϕ(k)(0)|,

and as ϕ(k) is continuous it follows that θk is continuous at 0. Thus θk is continuous on .

Lemma 2.

If μ𝒫() have finite kth moment, k0, and for 0lk,

Ml=xl𝑑μ(x),

then there is a continuous function θ: for which

μ~(x)=l=0k(i)lMll!xl+1k!θ(x)xk.

The function θ satisfies

|θ(x)|sup0u1|μ~(k)(ux)-μ~(k)(0)|.
Proof.

From Theorem 1, μ~Ck() and

μ~(l)(0)=(i)lxl𝑑μ(x)=(i)lMl.

Thus from what we worked out above with Taylor’s theorem,

μ~(x)=l=0k(i)lMll!xl+1k!θk(x)xk,

for which

|θk(x)|sup0u1|μ~(k)(ux)-μ~(k)(0)|.

For a and σ>0, let

p(t,a,σ2)=1σ2πexp(-(t-a)22σ2),t.

Let γa,σ2 be the measure on whose density with respect to Lebesgue measure is p(,a,σ2). We call γa,σ2 a Gaussian measure. We calculate that the first moment of γa,σ2 is a and that its second moment is σ2. We also calculate that

γ~a,σ2(x)=exp(iax-12σ2x2).

Lévy’s continuity theorem is the following.33 3 http://individual.utoronto.ca/jordanbell/notes/martingaleCLT.pdf, p. 19, Theorem 15.

Theorem 3 (Lévy’s continuity theorem).

Let μn be a sequence in 𝒫(d).

  1. 1.

    If μ𝒫(d) and μnμ, then for each μ~n converges to μ~ pointwise.

  2. 2.

    If there is some function ϕ:d to which μ~n converges pointwise and ϕ is continuous at 0, then there is some μ𝒫(d) such that ϕ=μ~ and such that μnμ.

3 The Lindeberg condition, the Lyapunov condition, the Feller condition, and asymptotic negligibility

Let (Ω,,P) be a probability and let Xn, n1, be independent L2 random variables. We specify when we impose other hypotheses on them; in particular, we specfify if we suppose them to be identically distributed or to belong to Lp for p>2.

For a random variable X, write

σ(X)=Var(X)=E(|X-E(X)|2).

Write

σn=σ(Xn),

and, using that the Xn are independent,

sn=σ(j=1nXj)=(j=1nσj2)1/2

and

ηn=E(Xn).

For n1 and ϵ>0, define

Ln(ϵ) =1sn2j=1nE((Xj-ηj)2||Xj-ηj|ϵsn)
=1sn2j=1n|x-ηj|ϵsn(x-ηj)2d(Xj*P)(x).

We say that the sequence Xn satisfies the Lindeberg condition if for each ϵ>0,

limnLn(ϵ)=0.

For example, if the sequence Xn is identically distributed, then sn2=nσ12, so

Ln(ϵ) =1nσ12j=1n|x-η1|ϵn1/2σ1(x-η1)2d(X1*P)(x)
=1σ12|x-η1|ϵn1/2σ1(x-η1)2d(X1*P).

But if μ is a Borel probability measure on and fL1(μ) and Kn is a sequence of compact sets that exhaust , then44 4 V. I. Bogachev, Measure Theory, volume I, p. 125, Proposition 2.6.2.

Kn|f|𝑑μ0,n.

Hence Ln(ϵ)0 as n, showing that Xn satisfies the Lindeberg condition.

We say that the sequence Xn satisfies the Lyapunov condition if there is some δ>0 such that the Xn are L2+δ and

limn1sn2+δj=1nE(|Xj-ηj|2+δ)=0.

In this case, for ϵ>0, then |x-η|ϵsn implies |x-η|2+δ|x-η|2(ϵsn)δ and hence

Ln(ϵ) 1sn2j=1n|x-ηj|ϵsn|x-ηj|2+δ(ϵsn)δd(Xj*P)(x)
=1ϵsn2+δj=1n|x-ηj|ϵsn|x-ηj|2+δd(Xj*P)(x)
=1ϵsn2+δj=1n|Xj-ηj|ϵsn|Xj-ηj|2+δ𝑑P
1ϵsn2+δj=1nE(|Xj-ηj|2+δ)
0.

This is true for each ϵ>0, showing that if Xn satisfies the Lyapunov condition then it satisfies the Lindeberg condition.

For example, if Xn are identically distributed and L2+δ, then

1sn2+δj=1nE(|Xj-ηj|2+δ)=1nδ/2σ12+δE(|Xj-ηj|2+δ)0,

showing that Xn satisfies the Lyapunov condition.

Another example: Suppose that the sequence Xn is bounded by M almost surely and that sn. |Xn|M almost surely implies that

|ηn|=|E(Xn)|E(|Xn|)E(M)=M.

Therefore |Xn-ηn||Xn|+|ηn|2M almost surely. Let δ>0. Then, as sn2=nσ12,

1sn2+δj=1nE(|Xj-ηj|2+δ) 1sn2+δj=1NE(|Xj-ηj|2)(2M)δ
=(2M)δsnδ
0,

showing that Xn satisfies the Lyapunov condition.

We say that a sequence of random variables Xn satisfies the Feller condition when

limnmax1jnσjsn=0,

where σj=σ(Xj)=Var(Xj) and

sn=(j=1nσj2)1/2.

We prove that if a sequence satisfies the Lindeberg condition then it satisfies the Feller condition.55 5 Heinz Bauer, Probability Theory, p. 235, Lemma 28.2.

Lemma 4.

If a sequence of random variables Xn satisfies the Lindeberg condition, then it satisfies the Feller condition.

Proof.

Let ϵ>0¿ For n1 and 1kn, we calculate

σk2 =(x-ηk)2d(Xk*P)(x)
=|x-ηk|<ϵsn(x-ηk)2d(Xk*P)(x)+|x-ηk|ϵsn(x-ηk)2d(Xk*P)(x)
(ϵsn)2+j=1n|x-ηj|ϵsn(x-ηj)2d(Xj*P)(x)
=ϵ2sn2+sn2Ln(ϵ).

Hence

max1kn(σksn)2ϵ2+Ln(ϵ),

and so, because the Xn satisfy the Lindeberg condition,

lim supnmax1kn(σksn)2ϵ2.

This is true for all ϵ>0, which yields

limnmax1kn(σksn)2=0,

namely, that the Xn satisfy the Feller condition. ∎

We do not use the following idea of an asymptotically negligible family of random variables elsewhere, and merely take this as an excsuse to write out what it means. A family of random variables Xn,j, n1, 1jkn, is called asymptotically negligible66 6 Heinz Bauer, Probability Theory, p. 225, §27.2. if for each ϵ>0,

limnmax1jknP(|Xn,j|ϵ)=0.

A sequence of random variables Xn converging in probability to 0 is equivalent to it being asymptotically negligible, with kn=1 for each n.

For example, suppose that Xn,j are L2 random variables each with E(Xn,j)=0 and that they satisfy

limnmax1jknVar(Xn,j)=0.

For ϵ>0, by Chebyshev’s inequality,

P(|Xn,j|ϵ)1ϵ2E(|Xn,j|2)=1ϵ2Var(Xn,j),

whence

limnmax1jknP(|Xn,j|ϵ)lim supnmax1jkn1ϵ2Var(Xn,j)=0,

and so the random variables Xn,j are asymptotically negligible.

Another example: Suppose that random variables Xn,j are identically distributed, with μ=Xn,j*P. For ϵ>0,

P(|Xn,jn|ϵ)=P(|Xn,j|nϵ)=μ(An),

where An={x:|x|nϵ}. As An, limnμ(An)=0. Hence the random variables Xn,jn are asymptotic negligible.

The following is a statement about the characteristic functions of an asymptotically negligible family of random variables.77 7 Heinz Bauer, Probability Theory, p. 227, Lemma 27.3.

Lemma 5.

Suppose that a family Xn,j, n1, 1jkn, of random variables is asymptotically negligible, and write μn,j=Xn,j*P and ϕn,j=μ~n,j. For each x,

limnmax1jkn|ϕn,j(x)-1|=0.
Proof.

For any real t, |eit-1||t|. For x, ϵ>0, n1, and 1jkn,

|ϕn,j(x)-1| =|(eixy-1)𝑑μn,j(y)|
|y|<ϵ|eixy-1|𝑑μn,j(y)+|y|ϵ|eixy-1|𝑑μn,j(y)
|y|<ϵ|xy|𝑑μn,j(y)+|y|ϵ2𝑑μn,j(y)
ϵ|x|+2P(|Xn,j|ϵ).

Hence

max1jkn|ϕn,j(x)-1|ϵ|x|+2max1jknP(|Xn,j|ϵ).

Using that the family Xn,j is asymptotically negligible,

lim supnmax1jkn|ϕn,j(x)-1|2ϵ|x|.

But this is true for all ϵ>0, so

lim supnmax1jkn|ϕn,j(x)-1|=0,

proving the claim. ∎

4 The Lindeberg central limit theorem

We now prove the Lindeberg central limit theorem.88 8 Heinz Bauer, Probability Theory, p. 235, Theorem 28.3.

Theorem 6 (Lindeberg central limit theorem).

If Xn is a sequence of independent L2 random variables that satisfy the Lindeberg condition, then

Sn*Pγ1,

where

Sn=1snj=1n(Xj-ηj)=j=1n(Xj-E(Xj))σ(X1++Xn).
Proof.

The sequence Yn=Xn-E(Xn) are independent L2 random variables that satisfy the Lindeberg condition and σ(Yn)=σ(Xn). Proving the claim for the sequence Yn will prove the claim for the sequence Xn, and thus it suffices to prove the claim when E(Xn)=0, i.e. ηn=0.

For n1 and 1jn, let

μn,j=(Xjsn)*P  and  τn,j=σjsn.

The first moment of μn,j is

xd((Xjsn)*P)(x)=ΩXjsn𝑑P=1snE(Xj)=0,

and the second moment of μn,j is

x2d((Xjsn)*P)(x)=Ω(Xjsn)2𝑑P=1sn2E(Xj2)=σj2sn2=τn,j2,

for which

j=1nτn,j2=1sn2j=1nσj2=1.

For μ𝒫() with first moment x𝑑μ(x)=0 and second moment x2𝑑μ(x)=σ2<, Lemma 2 tells us that

μ~(x)=M0+iM1x-M22x2+12θ2(x)x2=1-σ22x2+12θ(x)x2,

with

|θ(x)|sup0u1|μ~′′(ux)-μ~′′(0)|.

But by Lemma 1,

μ~′′(ux)=-y2eiuxy𝑑μ(y),

so

|θ(x)| sup0u1|y2(-eiuxy+1)𝑑μ(y)|
sup0u1y2|eiuxy-1|𝑑μ(y).

For 0u1, |eiuxy-1||uxy||xy|, so for x and ϵ>0, with δ=min{ϵ,ϵ|x|}, when |y|<δ and 0u1 we have |eiuxy-1|<ϵ. Thus

|θ(x)| sup0u1|y|<δy2|eiuxy-1|𝑑μ(y)+sup0u1|y|δy2|eiuxy-1|𝑑μ(y)
ϵ|y|<δy2𝑑μ(y)+2|y|δy2𝑑μ(y)
ϵσ2+2|y|δy2𝑑μ(y).

Let x and ϵ>0, and take δ=min{ϵ,ϵ|x|}. On the one hand, for n1 and 1jn, because the first moment of μn,j is 0 and its second moment is τn,j2,

μ~n,j(x)=1-τn,j22x2+12θn,j(x)x2,

with, from the above,

|θn,j(x)|ϵτn,j2+2|y|δy2𝑑μn,j(y).

On the other hand, the first moment of the Gaussian measure γ0,τn,j2 is 0 and its second moment is τn,j2. Its characteristic function is

γ~0,τn,j2(x)=exp(-τn,j22x2)=1-τn,j22x2+12ψn,j(x)x2,

with, from the above,

|ψn,j(x)|ϵτn,j2+2|y|δy2𝑑γ0,τn,j2(x).

In particular, for all x,

μ~n,j(x)-γ~0,τn,j2(x)=x22(θn,j(x)-ψn,j(x)).

For k1 and for al,bl, 1lk,

l=1kal-l=1kbl=l=1kb1bl-1(al-bl)al+1ak.

If further |al|1, |bl|1, then

|l=1kal-l=1kbl|l=1k|al-bl|. (1)

Because the Xn are independent, the distribution of

Sn=j=1nXjsn

is the convolution of the distributions of the summands:

μn,1**μn,n,

whose characteristic function is

ϕn=j=1nμ~n,j,

since the characteristic function of a convolution of measures is the product of the characteristic functions of the measures. Using j=1nτn,j2=1 and (1), for x we have

|ϕn(x)-e-x22| =|j=1nμ~n,j(x)-j=1ne-12τn,j2x2|
j=1n|μ~n,j(x)-e-12τn,j2x2|
=j=1n|μ~n,j(x)-γ~0,τn,j2(x)|
=x22j=1n|θn,j(x)-ψn,j(x)|.

Therefore, for x, ϵ>0, and δ=min{ϵ,ϵ|x|},

|ϕn(x)-e-x22|x22j=1n(ϵτn,j2+2|y|δy2𝑑μn,j(y)+ϵτn,j2+2|y|δy2𝑑γ0,τn,j2(y))=ϵx2+x2j=1n|y|δy2𝑑μn,j(y)+x2j=1n|y|δy2𝑑γ0,τn,j2(y).

We calculate

Ln(δ) =1sn2j=1n|y|δsny2d(Xj*P)(y)
=1sn2j=1n|Xj|δsnXj2𝑑P
=j=1n|Xjsn|δ(Xjsn)2𝑑P
=j=1n|y|δy2d((Xjsn)*P)(y)
=j=1n|y|δy2𝑑μn,j(y).

Hence, the fact that the Xn satisfy the Lindeberg condition yields

lim supn|ϕn(x)-e-x22|ϵx2+x2lim supnj=1n|y|δy2𝑑γ0,τn,j2(y). (2)

Write

αn=max1jnτn,j=max1jnσjsn.

We calculate

j=1n|y|δy2𝑑γ0,τn,j2(y) =j=1n|y|δy21τn,j2πexp(-y22τn,j2)𝑑y
=j=1nτn,j2|u|δ/τn,ju2𝑑γ0,1(u)
j=1nτn,j2|u|δ/αnu2𝑑γ0,1(u)
=|u|δ/αnu2𝑑γ0,1(u).

Because the sequence Xn satisfies the Lindeberg condition, by Lemma 4 it satisfies the Feller condition, which means that αn0 as n. Because αn0 as n, δ/αn as n, hence

|u|δ/αnu2𝑑γ0,1(u)0

as n. Thus we get

j=1n|y|δy2𝑑γ0,τn,j2(y)0

as n. Using this with (2) yields

lim supn|ϕn(x)-e-x22|ϵx2.

This is true for all ϵ>0, so

limn|ϕn(x)-e-x22|=0,

namely, ϕn (the characteristic function of Sn*P) converges pointwise to e-x22. Moreover, e-x22 is indeed continuous at 0, and e-x22=γ~0,1(x). Therefore, Lévy’s continuity theorem (Theorem 3) tells us that Sn*P converges narrowly to γ0,1, which is the claim. ∎