The Berry-Esseen theorem
1 Cumulative distribution functions
For a random variable , we define its cumulative distribution function by
A distribution function is a function such that (i) , (ii) , (iii) is nondecreasing, (iv) is right-continuous: for each ,
It is a fact that the cumulative distribution function of a random variable is a distribution function and that for any distribution function there is a random variable for which .
Let be the standard Gaussian measure on : has density
with respect to Lebesgue measure on . Let be the cumulative distribution function of :
We first prove the following lemma about distribution functions.11 1 Kai Lai Chung, A Course in Probability Theory, third ed., p. 236, Lemma 1; cf. Allan Gut, Probability: A Graduate Course, second ed., p. 358, Lemma 6.1.
Lemma 1.
Suppose that is a distribution function, that satisfies
and that is differentiable and its derivative satisfies
(1) |
Writing
there is some such that for all ,
Proof.
Because and , there is some compact interval such that for . Then, because is continuous it is bounded on , showing that is bounded on , and because we get .
Write . Because and , there is a compact interval for which
By the Bolzano-Weierstrass theorem, either there is a sequence increasing to some such that or there is a sequence decreasing to some such that .22 2 In the proof in Chung there are merely two cases but it is not explained why those are exhaustive. In the first case, either there is a subsequence of such that or there is a subsequence of such that . In the first subcase we get , thus
(2) |
In the second subcase we get , thus
(3) |
In the second case, either there is a subsequence of such that or there is a subsequence of such that . In the first subcase we get , thus
(4) |
In the second subcase we get , thus
(5) |
We now deal with the subcase (3). Let . For , by (1) we have
whence
Because and as is nondecreasing and using (3),
Then, because is an odd function,
On the other hand,
Thus
which yields the claim of the lemma, for the subcase (3). ∎
We now prove a lemma that gives an inequality for characteristic functions.33 3 Kai Lai Chung, A Course in Probability Theory, third ed., p. 237, Lemma 2; Zhengyan Lin and Zhidong Bai, Probability Inequalities, p. 29, Theorem 4.1.a. We remark that because is a distribution function, it makes sense to speak about the measure induced by , and because is of bounded variation and is continuous, its variation function is continuous and the functions and are nondecreasing, and it thus makes sense to speak about the signed measure induced by , which is equal to the difference of the measures induced by and .
Lemma 2.
Suppose that is a distribution function, that satisfies
that is differentiable and of bounded variation and that its derivative satisfies
and that
Write
and
Then for all ,
2 Berry-Esseen theorem
Let , , , be random variables, with , such that for each , the random variables , , are independent, and such that for all and ,
Let be the cumulative distribution function of :
Let be the characteristic function of (equivalently, the characteristic function of ):
Write, for ,
and let be the cumulative distribution function of :
Also, let be the characteristic function of (equivalently, the characteristic function of ). Because , , are independent, we have and hence
For and , write
and
We further assume that for each ,
(6) |
We will use the following inequality which we state separately because it is of general use.
Lemma 3.
For and ,
We now prove an inequality for , the characteristic function of .44 4 Kai Lai Chung, A Course in Probability Theory, third ed., p. 239, Lemma 3.
Lemma 4.
For , if then
Proof.
For and and ,
Thus
and
Then by Taylor’s theorem, there is some between and such that
Put
for which
Because the norm is upper bounded by the norm and because ,
and hence
Lemma 3 and the inequality then tell us that
Because and ,
Combining this with ,
Because this is true for each and because, according to (6), ,
For any it is true that , so the above yields
But , so
which completes the proof. ∎
The next lemma gives a different bound on the characteristic function of .55 5 Kai Lai Chung, A Course in Probability Theory, third ed., p. 240, Lemma 4.
Lemma 5.
For , if then
Proof.
First, for a distribution function with characteristic function ,
Because is real it follows that
Using
we have
and then
Using this for , and using that ,
Because for all ,
Then, by (6),
As ,
proving the claim. ∎
We now combine Lemma 4 and Lemma 5.66 6 Kai Lai Chung, A Course in Probability Theory, third ed., p. 240, Lemma 5.
Lemma 6.
For , if then
Proof.
We finally prove the Berry-Esseen theorem.77 7 Kai Lai Chung, A Course in Probability Theory, third ed., p. 235, Theorem 7.4.1; cf. Allan Gut, Probability: A Graduate Course, second ed., p. 356, Theorem 6.2; John E. Kolassa, Series Approximation Methods in Statistics, p. 25, Theorem 2.6.1; Alexandr A. Borovkov, Probability Theory, p. 659, Theorem A5.1; Ivan Nourdin and Giovanni Peccati, Normal Approximations with Malliavin Calculus: From Stein’s Method to Universality, p. 71, Theorem 3.7.1.
Theorem 7 (Berry-Esseen theorem).
There is some such that for each ,
Proof.
Let be a random variable with , i.e. whose cumulative distribution function is . By (6) and because , , are independent and satisfy ,
If then by Chebyshev’s inequality
and
If then also by Chebyshev’s inequality
and
Therefore, because and are nonnegative and and are nonnegative, for all we have
Then, because and ,
. We apply Lemma 2 with , , and , and because the characteristic function of is , we obtain for ,
Then applying Lemma 6,
This proves the claim with
∎