The weak and strong laws of large numbers
1 Introduction
Using Egorov’s theorem, one proves that if a sequence of random variables converges almost surely to a random variable then converges in probability to .11 1 V. I. Bogachev, Measure Theory, volume I, p. 111, Theorem 2.2.3; http://individual.utoronto.ca/jordanbell/notes/L0.pdf, p. 3, Theorem 3.
Let be a sequence of random variables, . A weak law of large numbers is a statement that
(1) |
converges in probability to . A strong law of large numbers is a statement that (1) converges almost surely to . Thus, if the hypotheses assumed on the sequence of random variables are the same, a strong law implies a weak law.
We shall prove the weak law of large numbers for a sequence of independent identically distributed random variables, and the strong law of large for the same hypotheses. We give separate proofs for these theorems as an occasion to inspect different machinery, although to establish the weak law it thus suffices to prove the strong law. One reason to distinguish these laws is for cases when we impose different hypotheses.22 2 cf. Jordan M. Stoyanov, Counterexamples in Probability, third ed., p. 163, §15.3; Dieter Landers and Lothar Rogge, Identically distributed uncorrelated random variables not fulfilling the WLLN, Bull. Korean Math. Soc. 38 (2001), no. 3, 605–610.
We also prove Markov’s weak law of large numbers, which states that if is a sequence of random variables that are pairwise uncorrelated and
then converges to in , from which it follows using Chebyshev’s inequality that it converges in probability to . (We remark that a sequence of random variables converging in to does not imply that it converges almost surely to , although there is indeed a subsequence that converges almost surely to .33 3 http://individual.utoronto.ca/jordanbell/notes/L0.pdf, p. 4, Theorem 5.)
If is a probability space, is a measurable space, and is measurable, the pushforward measure of by is
Then is a probability space. We remind ourselves of the change of variables theorem, which we shall use.44 4 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 485, Theorem 13.46. Let be a function. On the one hand, if then and
(2) |
On the other hand, if is -measurable and , then and
2 The weak law of large numbers
Suppose that , , are independent identically distributed random variables, and write
for which .
Lemma 1.
If are , then for any and for any ,
Proof.
Using
we have
On the other hand,
So
Using this and Chebyshev’s inequality,
proving the claim. ∎
We now prove the weak law of large numbers, which states that if are independent identically distributed random variables each with mean , then converges in probability to . are independent and identically distributed random variables with mean , and if converges in probability to then converges in probability to , showing that it suffices to prove the theorem when .55 5 Allan Gut, Probability: A Graduate Course, second ed., p. 270, Theorem 6.3.1, and p. 121, Theorem 3.1.5.
Theorem 2 (Weak law of large numbers).
Suppose that . For each ,
Proof.
For and , let
where . Because are independent and identically distributed, so are .66 6 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed., p. 316, Proposition 10.2. Moreover, , so each belongs to . Let
for which .
If , then
and using this and Lemma 1, for we have
For this is
Now,
which tends to as because
Therefore,
that is,
which implies that converges in probability to .
Because ,
and hence
thus
which tends to as , because . Thus converges in probability to , and therefore converges in probability to , completing the proof. ∎
Lemma 3 (Bienaymé’s formula).
If , , are random variables that are pairwise uncorrelated, then
Proof.
Let . Using that the are pairwise uncorrelated, for we have
showing that the are pairwise uncorrelated. Then, because ,
and as , we have
∎
We now prove a weak law of large numbers, that is sometimes attributed to Markov, that neither supposes that the random variables are independent nor supposes that they are identically distributed. We remind ourselves that if a sequence of random variables converges to in then it converges in probability to ; this is proved using Chebyshev’s inequality.
Theorem 4 (Markov’s weak law of large numbers).
If , , are random variables that are pairwise uncorrelated and which satisfy
then converges to in and thus in probability.
Proof.
Because and , using Bienaymé’s formula we get
Thus
as , namely, converges to in as , proving the claim. ∎
3 Ergodic theory
We here assemble machinery and results that we will use to prove the strong law of large numbers. For a probability space , a function is said to be a measure-preserving transformation if (i) it is measurable, and (ii) . To say that means that for any , , i.e. .
A collection of subsets of is called an algebra of sets if (i) , (ii) , (iii) if then . If is a nonempty collection of subsets of , it is a fact that there is a unique algebra of sets that (i) contains and (ii) is contained in any algebra of sets that contains . We call the algebra of sets generated by .
A collection of subsets of is called a semialgebra of sets if (i) , (ii) , (iii) if then , (iv) if then there are pairwise disjoint such that
It is a fact that is equal to the collection of all unions of finitely many pairwise disjoint elements of .77 7 V. I. Bogachev, Measure Theory, volume I, p. 8, Lemma 1.2.14.
A nonempty collection of subsets of is called a monotone class if whenever , (an increasing sequence of sets), implies that and , (a decreasing sequence of sets), implies that . In other words, a monotone class is a nonempty collection of subsets of such that if is a monotone sequence in then . If is a nonempty collection of subsets of , it is a fact that there is a unique monotone class that (i) contains and (ii) is contained in any monotone class that contains . We call the monotone class generated by . The monotone class theorem states that if is an algebra of sets then .88 8 V. I. Bogachev, Measure Theory, volume I, p. 33, Theorem 1.9.3.
The following lemma gives conditions under which we can establish that a function is measure-preserving.99 9 Peter Walters, An Introduction to Ergodic Theory, p. 20, Theorem 1.1.
Lemma 5.
Let be a function and suppose that is a semialgebra of sets for which is the -algebra generated by . If (i) for each and (ii) for each , then is measure-preserving.
Proof.
Let
we wish to prove that .
If is an increasing sequence, let . Then, as for each ,
and as (i) is an increasing sequence, (ii) is continuous from below,1010 10 V. I. Bogachev, Measure Theory, volume I, p. 9, Proposition 1.3.3. and (iii) ,
and hence . If is a decreasing sequence, let . Because ,
and as (i) is a decreasing sequence, (ii) is continuous from above, and (iii) ,
and hence . Therefore, is a monotone class.
. If , then there are pairwise disjoint with . As ,
As the are pairwise disjoint, so are the sets , so, because ,
Therefore . This shows that .
The monotone class theorem tells us . On the one hand, and hence . On the other hand, and the fact that is a monotone class yield
Therefore
Of course , so , proving the claim. ∎
Let be a probability space. A measure-preserving transformation is called ergodic if and implies that or . It is proved1111 11 Peter Walters, An Introduction to Ergodic Theory, p. 37, Corollary 1.14.2. using the Birkhoff ergodic theorem that for a measure-preserving transformation , is ergodic if and only if for all ,
A measure-preserving transformation is called weak-mixing if for all ,
It is immediate that a weak-mixing transformation is ergodic.
A measure-preserving transformation is called strong-mixing if for all ,
If a sequence of real numbers tends to , then
and using this we check that a strong-mixing transformation is weak-mixing.
The following statement gives conditions under which a measure-preserving transformation is ergodic, weak-mixing, or strong-mixing.1212 12 Peter Walters, An Introduction to Ergodic Theory, p. 41, Theorem 1.17.
Theorem 6.
Let be a probability space, let be a semialgebra that generates , and let be a measure-preserving transformation.
-
1.
is ergodic if and only if for all ,
-
2.
is weak-mixing if and only if for all ,
-
3.
is strong-mixing if and only if for all ,
4 The strong law of large numbers
Let be a Borel probability measure on with finite first moment:
We shall specify when we use the hypothesis that has finite first moment; until we say so, what we say merely supposes that it is a Borel probability measure on .
For , let , let , the Borel -algebra of , and let , for which is a probability space. Let . A cylinder set is a subset of of the form
where for each and where is finite. We denote by the collection of all cylinder sets. It is a fact that is a semialgebra of sets.1313 13 S. J. Taylor, Introduction to Measure Theory and Integration, p. 136, §6.1; http://individual.utoronto.ca/jordanbell/notes/productmeasure.pdf
The product -algebra is , the -algebra generated by the collection of all cylinder sets. The product measure,1414 14 http://individual.utoronto.ca/jordanbell/notes/productmeasure.pdf which we denote by , is the unique probability measure on such that for any cylinder set ,
Let , , be the projection map. Define by
the left shift map. In other words, for , satisfies , and so .
Lemma 7.
is measure-preserving.
Proof.
For ,
where and for , . is a cylinder set so a fortiori belongs to , and
so
Therefore by Lemma 5, because is a semialgebra that generates , it follows that is measure-preserving. ∎
Lemma 8.
is strong-mixing.
Proof.
Let and be cylinder sets. For ,
where for and for . Because and are cylinder sets, there is some such that when , and . Thus for ,
Hence
That is, there is some such that when ,
and so a fortiori,
Therefore, because the cylinder sets generate the -algebra , by Theorem 3 we get that is strong-mixing. ∎
We now use the hypothesis that has finite first moment.1515 15 Elias M. Stein and Rami Shakarchi, Functional Analysis, Princeton Lectures in Analysis, volume IV, p. 208, chapter 5, §2.1.
Lemma 9.
.
Proof.
is measurable, and . The statement that has finite first moment means that defined by belongs to . Therefore by the change of variables theorem (2), we have . ∎
Lemma 10.
For almost all ,
Proof.
Because is ergodic and , the Birkhoff ergodic theorem1616 16 Peter Walters, An Introduction to Ergodic Theory, p. 34, Theorem 1.14. tells us that for almost all ,
i.e.,
proving the claim. ∎
We will use Lemma 10 to prove the strong law of large numbers. First we prove two lemmas about joint distributions.1717 17 Elias M. Stein and Rami Shakarchi, Functional Analysis, Princeton Lectures in Analysis, volume IV, p. 208, Lemma 2.2, Lemma 2.3.
Lemma 11.
If and are random variables with the same joint distribution and for each , is a continuous function, then for and , the random variables and have the same joint distribution.
Proof.
Write , , , and , which are Borel measurable. Let , which is continuous , and for which
To show that and have the same joint distribution means to show that . Let , for which and
where, because and have the same joint distribution,
∎
Lemma 12.
If sequences of random variables and , , have the same finite-dimensional distributions and there is some such that converges to almost surely, then converges to almost surely.
Proof.
That the sequences and have the same finite-dimensional distributions means that for each ,
i.e., for each and for each ,
Define
Then define
and
Because and have the same distribution,
But
and
so
Then
Then
That the sequence converges almost surely means that , and therefore , i.e. the sequence converges almost surely. ∎
We now use what we have established to prove the strong law of large numbers.1818 18 Elias M. Stein and Rami Shakarchi, Functional Analysis, Princeton Lectures in Analysis, volume IV, p. 206, Theorem 2.1. (We write for a probability space because denotes the product probability space constructed already.)
Theorem 13.
If , , are independent identically distributed random variables, then for almost all ,
Proof.
For , let ; we do this to make the index set the same as for . Let and set . Because the are identically distributed, for each .
Because is , has finite first moment. As , applying the change of variables theorem (2),
With this, Lemma 10 says that for almost all ,
(3) |
For each ,
and because the are independent,
so the sequences and have the same finite-dimensional distributions. For , define by
Lemma 11 then tells us that the sequences
and
, have the same finite-dimensional distributions.