# The Berry-Esseen theorem

Jordan Bell
June 3, 2015

## 1 Cumulative distribution functions

For a random variable $X:(\Omega,\mathscr{F},P)\to\mathbb{R}$, we define its cumulative distribution function $F_{X}:\mathbb{R}\to[0,1]$ by

 $F_{X}(x)=P(X\leq x)=\int_{\{X\leq x\}}dP=\int_{t\leq x}d(X_{*}P)(t)=(X_{*}P)((% -\infty,x]).$

A distribution function is a function $F:\mathbb{R}\to[0,1]$ such that (i) $F(-\infty)=\lim_{x\to-\infty}F(x)=0$, (ii) $F(\infty)=\lim_{x\to\infty}F(x)=1$, (iii) $F$ is nondecreasing, (iv) $F$ is right-continuous: for each $x\in\mathbb{R}$,

 $F(x+)=\lim_{t\downarrow x}F(t)=F(x).$

It is a fact that the cumulative distribution function of a random variable is a distribution function and that for any distribution function $F$ there is a random variable $X$ for which $F=F_{X}$.

Let $\gamma_{1}$ be the standard Gaussian measure on $\mathbb{R}$: $\gamma_{1}$ has density

 $p(t,0,1)=\frac{1}{\sqrt{2\pi}}e^{-t^{2}/2}$

with respect to Lebesgue measure on $\mathbb{R}$. Let $\Phi$ be the cumulative distribution function of $\gamma_{1}$:

 $\Phi(x)=\gamma_{1}((-\infty,x])=\int_{-\infty}^{x}d\gamma_{1}(t)=\int_{-\infty% }^{x}\frac{1}{\sqrt{2\pi}}e^{-t^{2}/2}dt.$

We first prove the following lemma about distribution functions.11 1 Kai Lai Chung, A Course in Probability Theory, third ed., p. 236, Lemma 1; cf. Allan Gut, Probability: A Graduate Course, second ed., p. 358, Lemma 6.1.

###### Lemma 1.

Suppose that $F$ is a distribution function, that $G:\mathbb{R}\to\mathbb{R}$ satisfies

 $G(-\infty)=\lim_{x\to-\infty}G(x)=0,\qquad G(\infty)=\lim_{x\to\infty}G(x)=1,$

and that $G$ is differentiable and its derivative satisfies

 $M=\sup_{x\in\mathbb{R}}|G^{\prime}(x)|<\infty.$ (1)

Writing

 $\Delta=\frac{1}{2M}\sup_{x\in\mathbb{R}}|F(x)-G(x)|,$

there is some $a\in\mathbb{R}$ such that for all $T>0$,

 $\begin{split}&\displaystyle 2MT\Delta\left(3\int_{0}^{T\Delta}\frac{1-\cos x}{% x^{2}}dx-\pi\right)\\ \displaystyle\leq&\displaystyle\left|\int_{\mathbb{R}}\frac{1-\cos Tx}{x^{2}}(% F(x+a)-G(x+a))dx\right|.\end{split}$
###### Proof.

Because $G(-\infty)=0$ and $G(\infty)=1$, there is some compact interval $K$ such that $-1 for $x\in\mathbb{R}\setminus K$. Then, because $G$ is continuous it is bounded on $K$, showing that $G$ is bounded on $\mathbb{R}$, and because $M>0$ we get $\Delta<\infty$.

Write $H=F-G$. Because $H(\infty)=0$ and $H(-\infty)=0$, there is a compact interval $K$ for which

 $2M\Delta=\sup_{x\in\mathbb{R}}|H(x)|=\sup_{x\in K}|H(x)|.$

By the Bolzano-Weierstrass theorem, either there is a sequence $u_{n}\in K$ increasing to some $u\in K$ such that $|H(u_{n})|\uparrow 2M\Delta$ or there is a sequence $u_{n}\in K$ decreasing to some $u\in K$ such that $|H(u_{n})|\uparrow 2M\Delta$.22 2 In the proof in Chung there are merely two cases but it is not explained why those are exhaustive. In the first case, either there is a subsequence $v_{n}$ of $u_{n}$ such that $|H(v_{n})|=H(v_{n})$ or there is a subsequence $v_{n}$ of $u_{n}$ such that $|H(v_{n})|=-H(v_{n})$. In the first subcase we get $H(u-)=2M\Delta$, thus

 $F(u-)-G(u)=2M\Delta.$ (2)

In the second subcase we get $H(u-)=-2M\Delta$, thus

 $F(u-)-G(u)=-2M\Delta.$ (3)

In the second case, either there is a subsequence $v_{n}$ of $u_{n}$ such that $|H(v_{n})|=H(v_{n})$ or there is a subsequence $v_{n}$ of $u_{n}$ such that $|H(v_{n})=-H(v_{n})$. In the first subcase we get $H(u+)=2M\Delta$, thus

 $F(u)-G(u)=2M\Delta.$ (4)

In the second subcase we get $H(u+)=2M\Delta$, thus

 $F(u)-G(u)=-2M\Delta.$ (5)

We now deal with the subcase (3). Let $a=u-\Delta$. For $|x|<\Delta$, by (1) we have

 $|G(x+a)-G(u)|=\left|\int_{u}^{u+x-\Delta}G^{\prime}(y)dy\right|\leq|x-\Delta|M% =(\Delta-x)M,$

whence

 $G(x+a)\geq G(u)+(x-\Delta)M.$

Because $x+a=x+u-\Delta and as $F$ is nondecreasing and using (3),

 $\displaystyle F(x+a)-G(x+a)$ $\displaystyle\leq F(u-)-G(x+a)$ $\displaystyle\leq F(u-)-(G(u)+(x-\Delta)M)$ $\displaystyle=-M(x+\Delta).$

Then, because $x\mapsto\frac{1-\cos Tx}{x^{2}}x$ is an odd function,

 $\displaystyle\int_{-\Delta}^{\Delta}\frac{1-\cos Tx}{x^{2}}(F(x+a)-G(x+a))dx$ $\displaystyle\leq-M\int_{-\Delta}^{\Delta}\frac{1-\cos Tx}{x^{2}}(x+\Delta)dx$ $\displaystyle=-2M\Delta\int_{0}^{\Delta}\frac{1-\cos Tx}{x^{2}}dx.$

On the other hand,

 $\begin{split}&\displaystyle\left|\int_{(-\infty,-\Delta)\cup(\Delta,\infty)}% \frac{1-\cos Tx}{x^{2}}(F(x+a)-G(x+a))dx\right|\\ \displaystyle\leq&\displaystyle 2M\Delta\int_{(-\infty,-\Delta)\cup(\Delta,% \infty)}\frac{1-\cos Tx}{x^{2}}dx\\ \displaystyle=&\displaystyle 4M\Delta\int_{\Delta}^{\infty}\frac{1-\cos Tx}{x^% {2}}dx.\end{split}$

Thus

 $\begin{split}&\displaystyle\int_{\mathbb{R}}\frac{1-\cos Tx}{x^{2}}(F(x+a)-G(x% +a))dx\\ \displaystyle\leq&\displaystyle-2M\Delta\int_{0}^{\Delta}\frac{1-\cos Tx}{x^{2% }}dx+4M\Delta\int_{\Delta}^{\infty}\frac{1-\cos Tx}{x^{2}}dx\\ \displaystyle=&\displaystyle 2M\Delta\left(-3\int_{0}^{\Delta}\frac{1-\cos Tx}% {x^{2}}dx+2\int_{0}^{\infty}\frac{1-\cos Tx}{x^{2}}dx\right)\\ \displaystyle=&\displaystyle 2M\Delta\left(-3\int_{0}^{\Delta}\frac{1-\cos Tx}% {x^{2}}dx+2\cdot\frac{\pi T}{2}\right)\\ \displaystyle=&\displaystyle 2MT\Delta\left(-3\int_{0}^{T\Delta}\frac{1-\cos x% }{x^{2}}dx+\pi\right),\end{split}$

which yields the claim of the lemma, for the subcase (3). ∎

We now prove a lemma that gives an inequality for characteristic functions.33 3 Kai Lai Chung, A Course in Probability Theory, third ed., p. 237, Lemma 2; Zhengyan Lin and Zhidong Bai, Probability Inequalities, p. 29, Theorem 4.1.a. We remark that because $F$ is a distribution function, it makes sense to speak about the measure induced by $F$, and because $G$ is of bounded variation and is continuous, its variation function $V_{G}$ is continuous and the functions $V_{G}-G$ and $V_{G}$ are nondecreasing, and it thus makes sense to speak about the signed measure induced by $G=V_{G}-(V_{G}-G)$, which is equal to the difference of the measures induced by $V_{G}$ and $V_{G}-G$.

###### Lemma 2.

Suppose that $F$ is a distribution function, that $G:\mathbb{R}\to\mathbb{R}$ satisfies

 $G(-\infty)=\lim_{x\to-\infty}G(x)=0,\qquad G(\infty)=\lim_{x\to\infty}G(x)=1,$

that $G$ is differentiable and of bounded variation and that its derivative satisfies

 $M=\sup_{x\in\mathbb{R}}|G^{\prime}(x)|<\infty,$

and that

 $\int_{\mathbb{R}}|F-G|dx<\infty.$

Write

 $\Delta=\frac{1}{2M}\sup_{x\in\mathbb{R}}|F(x)-G(x)|$

and

 $f(t)=\int_{\mathbb{R}}e^{itx}dF(x),\qquad g(t)=\int_{\mathbb{R}}e^{itx}dG(x).$

Then for all $T>0$,

 $\Delta\leq\frac{1}{\pi M}\int_{0}^{T}\frac{|f(t)-g(t)|}{t}dt+\frac{12}{\pi T}.$
###### Proof.

For any $t\in\mathbb{R}$, because $(F-G)(-\infty)=0$ and $(F-G)(\infty)=0$ and because $\int_{\mathbb{R}}|F-G|dx<\infty$, integrating by parts gives

 $\displaystyle f(t)-g(t)$ $\displaystyle=\int_{\mathbb{R}}e^{itx}dF(x)-\int_{\mathbb{R}}e^{itx}dG(x)$ $\displaystyle=\int_{\mathbb{R}}e^{itx}d(F-G)(x)$ $\displaystyle=-it\int_{\mathbb{R}}(F-G)(x)e^{itx}dx.$

Take $a$ to be the real number that Lemma 1 yields. As

 $\frac{f(t)-g(t)}{-it}e^{-ita}(T-|t|)=(T-|t|)\int_{\mathbb{R}}(F(x+a)-G(x+a))e^% {itx}dx,$

we obtain, using Fubini’s theorem,

 $\begin{split}&\displaystyle\int_{-T}^{T}\frac{f(t)-g(t)}{-it}e^{-ita}(T-|t|)dt% \\ \displaystyle=&\displaystyle\int_{-T}^{T}\left((T-|t|)\int_{\mathbb{R}}(F(x+a)% -G(x+a))e^{itx}dx\right)dt\\ \displaystyle=&\displaystyle\int_{\mathbb{R}}(F(x+a)-G(x+a))\left(\int_{-T}^{T% }(T-|t|)e^{itx}dt\right)dx\\ \displaystyle=&\displaystyle 2\int_{\mathbb{R}}(F(x+a)-G(x+a))\frac{1-\cos Tx}% {x^{2}}dx.\end{split}$

Therefore, because $F$ and $G$ are real valued and thus $|f(-t)-g(-t)|=|\overline{f(t)-g(t)}|=|f(t)-g(t)|$,

 $\displaystyle\left|\int_{\mathbb{R}}(F(x+a)-G(x+a))\frac{1-\cos Tx}{x^{2}}dx\right|$ $\displaystyle\leq\frac{1}{2}\int_{-T}^{T}\frac{|f(t)-g(t)|}{|t|}(T-|t|)dt$ $\displaystyle=\int_{0}^{T}\frac{|f(t)-g(t)|}{t}(T-t)dt$ $\displaystyle\leq T\int_{0}^{T}\frac{|f(t)-g(t)|}{t}dt.$

Using this with Lemma 1,

 $\displaystyle 2MT\Delta\left(3\int_{0}^{T\Delta}\frac{1-\cos x}{x^{2}}dx-\pi\right)$ $\displaystyle\leq T\int_{0}^{T}\frac{|f(t)-g(t)|}{t}dt.$

But

 $\displaystyle 3\int_{0}^{T\Delta}\frac{1-\cos x}{x^{2}}dx-\pi$ $\displaystyle=3\int_{0}^{\infty}\frac{1-\cos x}{x^{2}}dx-3\int_{T\Delta}^{% \infty}\frac{1-\cos x}{x^{2}}dx-\pi$ $\displaystyle\geq 3\int_{0}^{\infty}\frac{1-\cos x}{x^{2}}dx-6\int_{T\Delta}^{% \infty}\frac{1}{x^{2}}dx-\pi$ $\displaystyle=3\cdot\frac{\pi}{2}-\frac{6}{T\Delta}-\pi$ $\displaystyle=\frac{\pi}{2}-\frac{6}{T\Delta},$

with which we have

 $2MT\Delta\left(3\int_{0}^{T\Delta}\frac{1-\cos x}{x^{2}}dx-\pi\right)\geq 2MT% \Delta\cdot\left(\frac{\pi}{2}-\frac{6}{T\Delta}\right)=MT\Delta\pi-12M,$

and hence

 $MT\Delta\pi-12M\leq T\int_{0}^{T}\frac{|f(t)-g(t)|}{t}dt,$

i.e.

 $\Delta\leq\frac{12}{\pi T}+\frac{1}{\pi M}\int_{0}^{T}\frac{|f(t)-g(t)|}{t}dt,$

proving the claim. ∎

## 2 Berry-Esseen theorem

Let $X_{n,j}$, $n\geq 1$, $1\leq j\leq k_{n}$, be $L^{3}$ random variables, with $k_{n}\to\infty$, such that for each $n$, the random variables $X_{n,j}$, $1\leq j\leq k_{n}$, are independent, and such that for all $n$ and $j$,

 $E(X_{n,j})=0.$

Let $F_{n,j}$ be the cumulative distribution function of $X_{n,j}$:

 $F_{n,j}(x)=P(X_{n,j}\leq x).$

Let $f_{n,j}$ be the characteristic function of $X_{n,j}$ (equivalently, the characteristic function of $F_{n,j}$):

 $f_{n,j}(t)=\int_{\mathbb{R}}e^{itx}d({X_{n,j}}_{*}P)(x)=\int_{\mathbb{R}}e^{% itx}dF_{n,j}(x).$

Write, for $n\geq 1$,

 $S_{n}=\sum_{j=1}^{k_{n}}X_{n,j},$

and let $F_{n}$ be the cumulative distribution function of $S_{n}$:

 $F_{n}(x)=P(S_{n}\leq x)$

Also, let $f_{n}$ be the characteristic function of $S_{n}$ (equivalently, the characteristic function of $F_{n}$). Because $X_{n,j}$, $1\leq j\leq k_{n}$, are independent, we have ${S_{n}}_{*}P=({X_{n,1}}_{*}P)*\cdots*({X_{n,k_{n}}}_{*}P)$ and hence

 $f_{n}(t)=\int_{\mathbb{R}}e^{itx}d({S_{n}}_{*}P)(x)=\prod_{j=1}^{k_{n}}f_{n,j}% (t).$

For $n\geq 1$ and $1\leq j\leq k_{n}$, write

 $\sigma_{n,j}^{2}=E(X_{n,j}^{2}),\qquad s_{n}^{2}=\sum_{j=1}^{k_{n}}\sigma_{n,j% }^{2}$

and

 $\gamma_{n,j}=E(|X_{n,j}|^{3}),\qquad\Gamma_{n}=\sum_{j=1}^{k_{n}}\gamma_{n,j}.$

We further assume that for each $n$,

 $s_{n}^{2}=\sum_{j=1}^{k_{n}}\sigma_{n,j}^{2}=1.$ (6)

We will use the following inequality which we state separately because it is of general use.

###### Lemma 3.

For $n\geq 1$ and $|z|<1$,

 $\left|\log(1+z)-\sum_{m=1}^{n-1}\frac{(-1)^{m-1}z^{m}}{m}\right|\leq\frac{|z|^% {n}}{n(1-|z|)}.$

We now prove an inequality for $f_{n}$, the characteristic function of $S_{n}$.44 4 Kai Lai Chung, A Course in Probability Theory, third ed., p. 239, Lemma 3.

###### Lemma 4.

For $n\geq 1$, if $|t|<\frac{1}{2\Gamma_{n}^{1/3}}$ then

 $|f_{n}(t)-e^{-t^{2}/2}|\leq\Gamma_{n}|t|^{3}e^{-t^{2}/2}.$
###### Proof.

For $1\leq j\leq k_{n}$ and $l\geq 0$ and $v\in\mathbb{R}$,

 $f_{n,j}^{(l)}(v)=(i)^{l}E(X_{n,j}^{l}e^{ivX_{n,j}}).$

Thus

 $f_{n,j}(0)=1,\quad f_{n,j}^{\prime}(0)=iE(X_{n,j})=0,\quad f_{n,j}^{\prime% \prime}(0)=-E(X_{n,j}^{2})=-\sigma_{n,j}^{2},$

and

 $f_{n,j}^{\prime\prime\prime}(v)=-iE(X_{n,j}^{3}e^{ivX_{n,j}}).$

Then by Taylor’s theorem, there is some $s$ between $0$ and $t$ such that

 $f_{n,j}(t)=1-\frac{\sigma_{n,j}^{2}}{2}t^{2}-\frac{iE(X_{n,j}^{3}e^{isX_{n,j}}% )}{6}t^{3}.$

Put

 $-iE(X_{n,j}^{3}e^{isX_{n,j}})=\theta\gamma_{n,j},$

for which

 $|\theta|=\frac{|E(X_{n,j}^{3}e^{isX_{n,j}})|}{E(|X_{n,j}|^{3})}\leq 1.$

Because the $L^{2}$ norm is upper bounded by the $L^{3}$ norm and because $|t|<\frac{1}{2\Gamma_{n}^{1/3}}$,

 $|\sigma_{n,j}t|\leq|\gamma_{n,j}^{1/3}t|\leq|\Gamma_{n}^{1/3}t|<\frac{1}{2},$

and hence

 $\displaystyle|f_{n,j}(t)-1|$ $\displaystyle=\left|-\frac{\sigma_{n,j}^{2}}{2}t^{2}+\frac{\theta\gamma_{n,j}t% ^{3}}{6}\right|$ $\displaystyle\leq\frac{1}{2}|\sigma_{n,j}t|^{2}+\frac{\gamma_{n,j}}{48\Gamma_{% n}}$ $\displaystyle<\frac{1}{8}+\frac{1}{48}$ $\displaystyle<\frac{1}{4}.$

Lemma 3 and the inequality $|a+b|^{2}\leq 2(|a|^{2}+|b|^{2})$ then tell us that

 $\displaystyle\left|\log f_{n,j}(t)-(f_{n,j}(t)-1)\right|$ $\displaystyle\leq\frac{|f_{n,j}(t)-1|^{2}}{2(1-|f_{n,j}(t)-1|)}$ $\displaystyle<\frac{2}{3}|f_{n,j}(t)-1|^{2}$ $\displaystyle=\frac{2}{3}\left|-\frac{\sigma_{n,j}^{2}}{2}t^{2}+\frac{\theta% \gamma_{n,j}}{6}t^{3}\right|^{2}$ $\displaystyle\leq\frac{4}{3}\left(\frac{\sigma_{n,j}^{4}}{4}t^{4}+\frac{|% \theta|^{2}\gamma_{n,j}^{2}}{36}t^{6}\right).$

Because $\sigma_{n,j}\leq\gamma_{n,j}^{1/3}$ and $|\theta|\leq 1$,

 $\displaystyle\left|\log f_{n,j}(t)-(f_{n,j}(t)-1)\right|$ $\displaystyle\leq\frac{4}{3}\left(\frac{\sigma_{n,j}\gamma_{n,j}}{4}t^{4}+% \frac{\gamma_{n,j}^{2}}{36}t^{6}\right)$ $\displaystyle=\frac{4}{3}\left(\frac{|\sigma_{n,j}t|}{4}+\frac{|\gamma_{n,j}^{% 1/3}t|^{3}}{36}\right)\gamma_{n,j}|t|^{3}$ $\displaystyle\leq\frac{4}{3}\left(\frac{1}{2\cdot 4}+\frac{1}{8\cdot 36}\right% )\gamma_{n,j}|t|^{3}$ $\displaystyle=\frac{37}{216}\gamma_{n,j}|t|^{3}$ $\displaystyle<\frac{1}{5}\gamma_{n,j}|t|^{3}.$

Combining this with $f_{n,j}(t)=1-\frac{\sigma_{n,j}^{2}}{2}t^{2}+\frac{\theta\gamma_{n,j}}{6}t^{3}$,

 $\left|\log f_{n,j}(t)+\frac{\sigma_{n,j}^{2}}{2}t^{2}\right|\leq\left|\frac{% \theta\gamma_{n,j}}{6}t^{3}\right|+\frac{1}{5}\gamma_{n,j}|t|^{3}\leq\frac{1}{% 6}\gamma_{n,j}|t|^{3}+\frac{1}{5}\gamma_{n,j}|t|^{3}\leq\frac{1}{2}\gamma_{n,j% }|t|^{3}.$

Because this is true for each $1\leq j\leq k_{n}$ and because, according to (6), $\sum_{j=1}^{k_{n}}\sigma_{n,j}^{2}=1$,

 $\left|\log f_{n}(t)+\frac{t^{2}}{2}\right|\leq\frac{|t|3^{2}}{2}\sum_{j=1}^{k_% {n}}\gamma_{n,j}=\frac{|t|^{3}}{2}\Gamma_{n}.$

For any $z\in\mathbb{C}$ it is true that $|e^{z}-1|\leq|z|e^{|z|}$, so the above yields

 $\displaystyle|f_{n}(t)e^{t^{2}/2}-1|$ $\displaystyle=|\exp(\log(f_{n}(t)e^{t^{2}/2}))-1|$ $\displaystyle\leq|\log(f_{n}(t)e^{t^{2}/2})|\exp\left(\left|\log(f_{n}(t)e^{t^% {2}/2})\right|\right)$ $\displaystyle=\left|\log f_{n}(t)+\frac{t^{2}}{2}\right|\exp\left(\left|\log(f% _{n}(t)e^{t^{2}/2})\right|\right)$ $\displaystyle\leq\frac{|t|^{3}}{2}\Gamma_{n}\exp\left(\frac{|t|^{3}}{2}\Gamma_% {n}\right).$

But $|t|^{3}<\frac{1}{8\Gamma_{n}}$, so

 $|f_{n}(t)e^{t^{2}/2}-1|\leq\frac{|t|^{3}}{2}\Gamma_{n}e^{1/16}\leq|t|^{3}% \Gamma_{n},$

which completes the proof. ∎

The next lemma gives a different bound on the characteristic function of $S_{n}$.55 5 Kai Lai Chung, A Course in Probability Theory, third ed., p. 240, Lemma 4.

###### Lemma 5.

For $n\geq 1$, if $|t|<\frac{1}{4\Gamma_{n}}$ then

 $|f_{n}(t)|\leq e^{-t^{2}/3}.$
###### Proof.

First, for a distribution function $F$ with characteristic function $f$,

 $\displaystyle|f(t)|^{2}$ $\displaystyle=f(t)\overline{f(t)}$ $\displaystyle=\int_{\mathbb{R}}e^{itx}dF(x)\cdot\int_{\mathbb{R}}e^{-itx}dF(y)$ $\displaystyle=\int_{\mathbb{R}}\left(\int_{\mathbb{R}}e^{it(x-y)}dF(x)\right)% dF(y)$ $\displaystyle=\int_{\mathbb{R}}\left(\int_{\mathbb{R}}\cos t(x-y)+i\sin t(x-y)% dF(x)\right)dF(y).$

Because $|f(t)|^{2}$ is real it follows that

 $|f(t)|^{2}=\int_{\mathbb{R}}\left(\int_{\mathbb{R}}\cos t(x-y)dF(x)\right)dF(y).$

Using

 $\left|\cos u-\left(1-\frac{u^{2}}{2}\right)\right|\leq\frac{|u|^{3}}{6},\qquad% |a+b|^{p}\leq 2^{p-1}(|a|^{p}+|b|^{p}),$

we have

 $\left|\cos t(x-y)-\left(1-\frac{(t(x-y))^{2}}{2}\right)\right|\leq\frac{2}{3}% \left(|tx|^{3}+|ty|^{3}\right)=\frac{2|t|^{3}}{3}(|x|^{3}+|y|^{3})$

and then

 $\displaystyle|f(t)|^{2}$ $\displaystyle\leq\int_{\mathbb{R}}\left(\int_{\mathbb{R}}1-\frac{(t(x-y))^{2}}% {2}+\frac{2|t|^{3}}{3}(|x|^{3}+|y|^{3})dF(x)\right)dF(y).$

Using this for $f_{n,j}$, and using that $E(X_{n,j})=0$,

 $\displaystyle|f_{n,j}(t)|^{2}$ $\displaystyle\leq\int_{\mathbb{R}}\left(\int_{\mathbb{R}}1-\frac{(t(x-y))^{2}}% {2}+\frac{2|t|^{3}}{3}(|x|^{3}+|y|^{3})dF_{n,j}(x)\right)dF_{n,j}(y)$ $\displaystyle=\int_{\mathbb{R}}1-\frac{t^{2}\sigma_{n,j}^{2}}{2}-\frac{t^{2}y^% {2}}{2}+\frac{2|t|^{3}\gamma_{n,j}}{3}+\frac{2|t|^{3}|y|^{3}}{3}dF(y)$ $\displaystyle=1-\frac{t^{2}\sigma_{n,j}^{2}}{2}-\frac{t^{2}\sigma_{n,j}^{2}}{2% }+\frac{2|t|^{3}\gamma_{n,j}}{3}+\frac{2|t|^{3}\gamma_{n,j}}{3}$ $\displaystyle=1-t^{2}\sigma_{n,j}^{2}+\frac{4|t|^{3}\gamma_{n,j}}{3}.$

Because $1+u\leq e^{u}$ for all $u\in\mathbb{R}$,

 $|f_{n,j}(t)|^{2}\leq\exp\left(-t^{2}\sigma_{n,j}^{2}+\frac{4|t|^{3}\gamma_{n,j% }}{3}\right).$

Then, by (6),

 $\displaystyle|f_{n}(t)|^{2}$ $\displaystyle=\prod_{j=1}^{k_{n}}|f_{n,j}(t)|^{2}$ $\displaystyle\leq\prod_{j=1}^{k_{n}}\exp\left(-t^{2}\sigma_{n,j}^{2}+\frac{4|t% |^{3}\gamma_{n,j}}{3}\right)$ $\displaystyle=\exp\left(-t^{2}\sum_{j=1}^{k_{n}}\sigma_{n,j}^{2}+\frac{4|t|^{3% }}{3}\sum_{j=1}^{k_{n}}\gamma_{n,j}\right)$ $\displaystyle=\exp\left(-t^{2}+\frac{4|t|^{3}}{3}\Gamma_{n}\right).$

As $|t|<\frac{1}{4\Gamma_{n}}$,

 $|f_{n}(t)|\leq\exp\left(-\frac{t^{2}}{2}+\frac{2|t|^{3}}{3}\Gamma_{n}\right)% \leq\exp\left(-\frac{t^{2}}{2}+\frac{2|t|^{2}}{12}\right)=e^{-t^{2}/3},$

proving the claim. ∎

We now combine Lemma 4 and Lemma 5.66 6 Kai Lai Chung, A Course in Probability Theory, third ed., p. 240, Lemma 5.

###### Lemma 6.

For $n\geq 1$, if $|t|<\frac{1}{4\Gamma_{n}}$ then

 $|f_{n}(t)-e^{-t^{2}/2}|\leq 16\Gamma_{n}|t|^{3}e^{-t^{2}/3}.$
###### Proof.

Either $|t|<\frac{1}{2\Gamma_{n}^{1/3}}$ or $\frac{1}{2\Gamma_{n}^{1/3}}\leq|t|<\frac{1}{4\Gamma_{n}}$. In the first case, Lemma 4 tells us

 $|f_{n}(t)-e^{-t^{2}/2}|\leq\Gamma_{n}|t|^{3}e^{-t^{2}/2}\leq\Gamma_{n}|t|^{3}e% ^{-t^{2}/3}\leq 16\Gamma_{n}|t|^{3}e^{-t^{2}/3}.$

In the second case, Lemma 5 tells us

 $|f_{n}(t)|\leq e^{-t^{2}/3},$

and so, as in this case we have $1\leq 8\Gamma_{n}|t|^{3}$,

 $|f_{n}(t)-e^{-t^{2}/2}|\leq|f_{n}(t)|+e^{-t^{2}/2}\leq e^{-t^{2}/3}+e^{-t^{2}/% 2}\leq 2e^{-t^{2}/3}\leq 16\Gamma_{n}|t|^{3}e^{-t^{2}/3},$

showing that the claim is true in both cases. ∎

We finally prove the Berry-Esseen theorem.77 7 Kai Lai Chung, A Course in Probability Theory, third ed., p. 235, Theorem 7.4.1; cf. Allan Gut, Probability: A Graduate Course, second ed., p. 356, Theorem 6.2; John E. Kolassa, Series Approximation Methods in Statistics, p. 25, Theorem 2.6.1; Alexandr A. Borovkov, Probability Theory, p. 659, Theorem A5.1; Ivan Nourdin and Giovanni Peccati, Normal Approximations with Malliavin Calculus: From Stein’s Method to Universality, p. 71, Theorem 3.7.1.

###### Theorem 7 (Berry-Esseen theorem).

There is some $A_{0}<36$ such that for each $n\geq 1$,

 $\sup_{x\in\mathbb{R}}|F_{n}(x)-\Phi(x)|\leq A_{0}\Gamma_{n}.$
###### Proof.

Let $Z$ be a random variable with $Z_{*}P=\gamma_{1}$, i.e. whose cumulative distribution function is $\Phi$. By (6) and because $X_{n,j}$, $1\leq j\leq k_{n}$, are independent and satisfy $E(X_{n,j})=0$,

 $E(S_{n}^{2})=\sum_{j=1}^{k_{n}}E(X_{n,j}^{2})=\sum_{j=1}^{k_{n}}\sigma_{n,j}^{% 2}=1.$

If $x<0$ then by Chebyshev’s inequality

 $F_{n}(x)=P(S_{n}\leq x)=P(-S_{n}\geq-x)\leq\frac{1}{x^{2}}E(|S_{n}|^{2})=\frac% {1}{x^{2}}$

and

 $\Phi(x)=P(Z\leq x)=P(-Z\geq-x)\leq\frac{1}{x^{2}}E(|Z|^{2})=\frac{1}{x^{2}}.$

If $x>0$ then also by Chebyshev’s inequality

 $1-F_{n}(x)=1-P(S_{n}\leq x)=P(S_{n}>x)\leq\frac{1}{x^{2}}$

and

 $1-\Phi(x)=1-P(Z\leq x)=P(Z>x)\leq\frac{1}{x^{2}}.$

Therefore, because $F_{n}$ and $\Phi$ are nonnegative and $1-F_{n}$ and $1-\Phi$ are nonnegative, for all $x\in\mathbb{R}$ we have

 $|F_{n}(x)-\Phi(x)|\leq\frac{1}{x^{2}}.$

Then, because $|F_{n}|\leq 1$ and $|\Phi|\leq 1$,

 $\int_{\mathbb{R}}|F_{n}(x)-\Phi(x)|dx\leq\int_{|x|\leq 1}2dx+\int_{|x|>1}\frac% {1}{x^{2}}dx=6<\infty.$

$\Phi^{\prime}(x)=\frac{1}{\sqrt{2\pi}}e^{-x^{2}/2}\leq\frac{1}{\sqrt{2\pi}}$. We apply Lemma 2 with $F=F_{n}$, $G=\Phi$, and $M=\frac{1}{\sqrt{2\pi}}$, and because the characteristic function of $\Phi$ is $\phi(t)=e^{-t^{2}/2}$, we obtain for $T=\frac{1}{4\Gamma_{n}}$,

 $\displaystyle\sup_{x\in\mathbb{R}}|F_{n}(x)-\Phi(x)|$ $\displaystyle\leq\frac{2}{\pi}\int_{0}^{\frac{1}{4\Gamma_{n}}}\frac{|f_{n}(t)-% \phi(t)|}{t}dt+\frac{96M\Gamma_{n}}{\pi}$ $\displaystyle=\frac{2}{\pi}\int_{0}^{\frac{1}{4\Gamma_{n}}}\frac{|f_{n}(t)-e^{% -t^{2}/2}|}{t}+\frac{96\Gamma_{n}}{\pi\sqrt{2\pi}}.$

Then applying Lemma 6,

 $\displaystyle\sup_{x\in\mathbb{R}}|F_{n}(x)-\Phi(x)|$ $\displaystyle\leq\frac{2}{\pi}\int_{0}^{\frac{1}{4\Gamma_{n}}}\frac{16\Gamma_{% n}t^{3}e^{-t^{2}/3}}{t}dt+\frac{96\Gamma_{n}}{\pi\sqrt{2\pi}}$ $\displaystyle=\Gamma_{n}\left(\frac{32}{\pi}\int_{0}^{\frac{1}{4\Gamma_{n}}}t^% {2}e^{-t^{2}/3}dt+\frac{96}{\pi\sqrt{2\pi}}\right).$

This proves the claim with

 $A_{0}=\frac{32}{\pi}\int_{0}^{\infty}t^{2}e^{-t^{2}/3}dt+\frac{96}{\pi\sqrt{2% \pi}}=\frac{32}{\pi}\cdot\frac{3\sqrt{3\pi}}{4}+\frac{96}{\pi\sqrt{2\pi}}=35.6% 4\ldots.$