# Subgaussian random variables, Hoeffding’s inequality, and Cramér’s large deviation theorem

Jordan Bell
June 4, 2014

## 1 Subgaussian random variables

For a random variable $X$, let $\Lambda_{X}(t)=\log E(e^{tX})$, the cumulant generating function of $X$. A $b$-subgaussian random variable, $b>0$, is a random variable $X$ such that

 $\Lambda_{X}(t)\leq\frac{b^{2}t^{2}}{2},\qquad t\in\mathbb{R}.$

We remark that for $\gamma_{a,\sigma^{2}}$ a Gaussian measure, whose density with respect to Lebesgue measure on $\mathbb{R}$ is

 $p(x,a,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{(x-a)^{2}}{2\sigma^% {2}}},$

we have

 $\int_{\mathbb{R}}e^{tx}d\gamma_{0,b^{2}}(x)=\int_{\mathbb{R}}e^{bty}\frac{1}{% \sqrt{2\pi}}e^{-\frac{y^{2}}{2}}dy=\int_{\mathbb{R}}e^{\frac{b^{2}t^{2}}{2}}% \frac{1}{\sqrt{2\pi}}e^{-\frac{(y-bt)^{2}}{2}}dy=e^{\frac{b^{2}t^{2}}{2}}.$

We prove that a $b$-subgaussian random variable is centered and has variance $\leq b^{2}$.11 1 Karl R. Stromberg, Probability for Analysts, p. 293, Proposition 9.8.

###### Theorem 1.

If $X$ is $b$-subgaussian then $E(X)=0$ and $\mathrm{Var}(X)\leq b^{2}$.

###### Proof.

For each $\omega\in\Omega$, $\sum_{k=0}^{n}\frac{t^{k}X(\omega)^{k}}{k!}\to e^{tX(\omega)}$, and by the dominated convergence theorem,

 $\sum_{k=0}^{n}\frac{t^{k}E(X)^{k}}{k!}\to E(e^{tX})\leq e^{\frac{b^{2}t^{2}}{2% }}=\sum_{k=0}^{\infty}\left(\frac{b^{2}t^{2}}{2}\right)^{k}\frac{1}{k!}.$

Therefore

 $1+tE(X)+t^{2}E(X^{2})+O(t^{3})\leq 1+\frac{b^{2}t^{2}}{2}+O(t^{4}),$

whence

 $tE(X)+t^{2}E(X^{2})\leq\frac{b^{2}t^{2}}{2}+o(t^{2}),$

and so, for $t>0$,

 $E(X)+tE(X^{2})\leq\frac{b^{2}t}{2}+o(t).$

First, this yields $E(X)=o(t)$, which means that $E(X)=0$. Second, since $E(X)=0$,

 $tE(X^{2})\leq\frac{b^{2}t}{2}+o(t),$

and then

 $E(X^{2})\leq\frac{b^{2}}{2}+o(1),$

which measn that $E(X^{2})\leq\frac{b^{2}}{2}$. ∎

Stromberg attributes the following theorem to Saeki; further, it is proved in Stromberg that if for some $t$ the inequality in the theorem is an equality then the random variable has the Rademacher distribution.22 2 Karl R. Stromberg, Probability for Analysts, p. 293, Proposition 9.9; Omar Rivasplata, Subgaussian random variables: An expository note, http://www.math.ualberta.ca/~orivasplata/publications/subgaussians.pdf

###### Theorem 2.

If $X$ is a random variable satisfying $E(X)=0$ and $P(X\in[-1,1])=1$, then

 $E(e^{tX})\leq\cosh t,\qquad t\in\mathbb{R}.$
###### Proof.

Define $f:\mathbb{R}\to\mathbb{R}$ by

 $f(t)=e^{t}\left(\cosh t-E(e^{tX})\right)=\frac{e^{2t}}{2}+\frac{1}{2}-e^{t}E(e% ^{tX}).$

Then

 $f^{\prime}(t)=e^{2t}-e^{t}E(e^{tX})-e^{t}E(Xe^{tX});$

the derivative of $E(e^{tX})$ with respect to $t$ is obtained using the dominated convergence theorem. Let $Y=1+X$, with which

 $f^{\prime}(t)=e^{2t}-E(e^{tY})-E(Xe^{tY})=e^{2t}-E(e^{tY})-E((Y-1)e^{tY})=e^{2% t}-E(Ye^{tY}).$

$E(X)=0$, so $E(Y)=1$, hence

 $f^{\prime}(t)=E(e^{2t}Y)-E(Ye^{tY})=E(Y(e^{2t}-e^{tY})).$

Because $P(Y\in[0,2])=1$, for $t\geq 0$, we have almost surely $e^{2t}-e^{tY}\geq 0$, and therefore almost surely $Y(e^{2t}-e^{tY})\geq 0$. Therefore, for $t\geq 0$,

 $f^{\prime}(t)=E(Y(e^{2t}-e^{tY}))\geq 0,$

which tells us that for $t\geq 0$,

 $f(0)\leq f(t).$

As $f(0)=0$, for $t\geq 0$,

 $0\leq e^{t}\left(\cosh t-E(e^{tX})\right),$

and so

 $E(e^{tX})\leq\cosh t.$

###### Corollary 3.

If a random variable $X$ satisfies $E(X)=0$ and $P(|X|\leq b)=1$, then $X$ is $b$-subgaussian.

## 2 Hoeffding’s inequality

We first prove Hoeffding’s lemma.33 3 Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, p. 27, Lemma 2.2.

###### Lemma 4 (Hoeffding’s lemma).

If a random variable $X$ satisfies $E(X)=0$ and $P(X\in[a,b])=1$, then $X$ is $\frac{b-a}{2}$-subgaussian.

###### Proof.

Because $P(X\in[a,b])=1$, it follows that

 $\mathrm{Var}(X)\leq\frac{(b-a)^{2}}{4},$

not using that $P(X)=0$. (Namely, Popoviciu’s inequality.)

Write $\mu=X_{*}P$ and for $\lambda\in\mathbb{R}$ define

 $d\nu_{\lambda}(t)=\frac{e^{\lambda t}}{e^{\Lambda(\lambda)}}d\mu(t).$

We check

 $\int_{\mathbb{R}}d\nu_{\lambda}(t)=\frac{1}{e^{\Lambda(\lambda)}}\int_{\mathbb% {R}}e^{\lambda t}d(X_{*}P)(t)=\frac{1}{e^{\Lambda(\lambda)}}\int_{\Omega}e^{% \lambda X}dP=1.$

There is a random variable $X_{\lambda}:(\Omega_{\lambda},\mathscr{F}_{\lambda},P_{\lambda})\to\mathbb{R}$ for which ${X_{\lambda}}_{*}P_{\lambda}=\nu_{\lambda}$. $X_{\lambda}$ satisfies $P_{\lambda}(X_{\lambda}\in[a,b])=1$, and so

 $\mathrm{Var}(X_{\lambda})\leq\frac{(b-a)^{2}}{4}.$

We calculate

 $\Lambda_{X}^{\prime}(t)=\frac{E(Xe^{tX})}{E(e^{tX})}$

and

 $\Lambda_{X}^{\prime\prime}(t)=\frac{E(X^{2}e^{tX})E(e^{tX})-E(Xe^{tX})E(Xe^{tX% })}{E(e^{tX})^{2}}.$

But

 $E(X_{\lambda})=\int_{\mathbb{R}}td\nu_{\lambda}(t)=\int_{\mathbb{R}}t\frac{e^{% \lambda t}}{e^{\Lambda(\lambda)}}d\mu(t)=\frac{1}{e^{\Lambda(\lambda)}}E(Xe^{% \lambda X})$

and

 $E(X_{\lambda}^{2})=\int_{\mathbb{R}}t^{2}d\nu_{\lambda}(t)=\frac{1}{e^{\Lambda% (\lambda)}}E(X^{2}e^{\lambda X}),$

and so

 $\displaystyle\mathrm{Var}(X_{\lambda})$ $\displaystyle=E(X_{\lambda}^{2})-E(X_{\lambda})^{2}$ $\displaystyle=\frac{E(X^{2}e^{\lambda X})}{e^{\Lambda(\lambda)}}-\frac{E(Xe^{% \lambda X})^{2}}{e^{2\Lambda(\lambda)}}$ $\displaystyle=\Lambda_{X}^{\prime\prime}(\lambda).$

For $\lambda\in\mathbb{R}$, Taylor’s theorem tells us that there is some $\theta$ between $0$ and $\lambda$ such that

 $\Lambda_{X}(\lambda)=\Lambda_{X}(0)+\lambda\Lambda_{X}^{\prime}(0)+\frac{% \lambda^{2}}{2}\Lambda_{X}^{\prime\prime}(\theta)=\frac{\lambda^{2}}{2}\Lambda% _{X}^{\prime\prime}(\theta);$

here we have used that $E(X)=0$. But from what we have shown, $\mathrm{Var}(X_{\theta})=\Lambda_{X}^{\prime\prime}(\theta)$ and $\mathrm{Var}(X_{\theta})\leq\frac{(b-a)^{2}}{4}$, so

 $\Lambda_{X}(\lambda)=\frac{\lambda^{2}}{2}\mathrm{Var}(X_{\theta})\leq\frac{% \lambda^{2}}{2}\cdot\frac{(b-a)^{2}}{4},$

which shows that $X$ is $\frac{b-a}{2}$-subgaussian. ∎

We now prove Hoeffding’s inequality.44 4 Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, p. 34, Theorem 2.8.

###### Theorem 5 (Hoeffding’s inequality).

Let $X_{1},\ldots,X_{n}$ be independent random variables such that for each $1\leq k\leq n$, $P(X_{k}\in[a_{k},b_{k}])=1$, and write $S_{n}=\sum_{k=1}^{n}X_{k}$. For any $a>0$,

 $P(S_{n}-E(S_{n})\geq a)\leq\exp\left(-\frac{2a^{2}}{\sum_{k=1}^{n}(b_{k}-a_{k}% )^{2}}\right).$
###### Proof.

For $\lambda>0$ and $\phi(t)=e^{\lambda t}$, because $\phi$ is nonnegative and nondecreasing, for $X$ a random variable we have

 $1_{X\geq a}\phi(a)\leq\phi(X),$

and so $E(1_{X\geq a}\phi(a))\leq E(\phi(X))$, i.e.

 $P(X\geq a)\leq\frac{E(e^{\lambda X})}{e^{\lambda a}}.$

Using this with $X=S_{n}-E(S_{n})$ and because the $X_{k}$ are independent,

 $P(S_{n}-E(S_{n})\geq a)\leq\frac{1}{e^{\lambda a}}E(e^{\lambda(S_{n}-E(S_{n}))% })=e^{-\lambda a}\prod_{k=1}^{n}E(e^{\lambda(X_{k}-E(X_{k}))}).$

Because $P(X_{k}\in[a_{k},b_{k}])=1$, we have $P(X_{k}-E(X_{k})\in[a_{k}-E(X_{k}),b_{k}-E(X_{k})])=1$, and as $(b_{k}-E(X_{k}))-(a_{k}-E(X_{k}))=b_{k}-a_{k}$, Hoeffding’s lemma tells us

 $\log E(e^{\lambda(X_{k}-E(X_{k}))})\leq\frac{(b_{k}-a_{k})^{2}\lambda^{2}}{8},$

and thus

 $\displaystyle P(S_{n}-E(S_{n})\geq a)$ $\displaystyle\leq e^{-\lambda a}\exp\left(\sum_{k=1}^{n}\frac{(b_{k}-a_{k})^{2% }\lambda^{2}}{8}\right)$ $\displaystyle=\exp\left(-\lambda a+\frac{\lambda^{2}}{8}\sum_{k=1}^{n}(b_{k}-a% _{k})^{2}\right).$

We remark that $\lambda$ does not appear in the left-hand side. Define

 $g(\lambda)=-\lambda a+\frac{\lambda^{2}}{8}\sum_{k=1}^{n}(b_{k}-a_{k})^{2},$

for which

 $g^{\prime}(\lambda)=-a+\frac{\lambda}{4}\sum_{k=1}^{n}(b_{k}-a_{k})^{2}.$

Then $g^{\prime}(\lambda)=0$ if and only if

 $\lambda=\frac{4a}{\sum_{k=1}^{n}(b_{k}-a_{k})^{2}},$

at which $g$ assumes its infimum. Then

 $\displaystyle P(S_{n}-E(S_{n})\geq a)$ $\displaystyle\leq\exp\left(-\frac{4a^{2}}{\sum_{k=1}^{n}(b_{k}-a_{k})^{2}}+% \frac{16a^{2}}{8}\frac{1}{\sum_{k=1}^{n}(b_{k}-a_{k})^{2}}\right)$ $\displaystyle=\exp\left(-\frac{2a}{\sum_{k=1}^{n}(b_{k}-a_{k})^{2}}\right),$

proving the claim. ∎

## 3 Cramér’s large deviation theorem

The following is Cramér’s large deviation theorem.55 5 Achim Klenke, Probability Theory: A Comprehensive Course, p. 508, Theorem 23.3.

###### Theorem 6 (Cramér’s large deviation theorem).

Suppose that $X_{n}:(\Omega,\mathscr{F},P)\to\mathbb{R}$, $n\geq 1$, are independent identically distributed random variables such that for all $t\in\mathbb{R}$,

 $\Lambda(t)=\log E(e^{tX_{1}})<\infty.$

For $x\in\mathbb{R}$ define

 $\Lambda^{*}(x)=\sup_{t\in\mathbb{R}}(tx-\Lambda(t)).$

If $a>E(X_{1})$, then

 $\lim_{n\to\infty}\frac{1}{n}\log P(S_{n}\geq an)=-\Lambda^{*}(a),$

where $S_{n}=\sum_{k=1}^{n}X_{k}$.

###### Proof.

For $a>E(X_{1})$, let $Y_{n}=X_{n}-a$, let

 $L(t)=\log E(e^{tY_{1}})=\log E(e^{tX_{1}}e^{-ta})=-ta+\Lambda(t)$

and let

 $L^{*}(x)=\sup_{t\in\mathbb{R}}(tx-L(t))=\sup_{t\in\mathbb{R}}(t(x+a)-\Lambda(t% ))=\Lambda^{*}(x+a).$

Lastly, let $T_{n}=\sum_{k=1}^{n}Y_{k}=S_{n}-na$, with which

 $P(T_{n}\geq bn)=P(S_{n}\geq(b+a)n).$

Thus, if we have

 $\lim_{n\to\infty}\frac{1}{n}\log P(T_{n}\geq 0)=-L^{*}(0),$

then

 $\lim_{n\to\infty}\frac{1}{n}\log P(S_{n}\geq an)=-L^{*}(0)=-\Lambda^{*}(a).$

Therefore it suffices to prove the theorem for when $E(X_{1})<0$ and $a=0$.

Define

 $\phi(t)=e^{\Lambda(t)}=E(e^{tX_{1}})=\int_{\Omega}e^{tX_{1}}dP=\int_{\mathbb{R% }}e^{tx}d({X_{1}}_{*}P)(x),\qquad t\in\mathbb{R},$

the moment generating function of $X_{1}$, and define

 $\rho=e^{-\Lambda^{*}(0)}=\exp\left(-\sup_{t\in\mathbb{R}}(-\Lambda(t))\right)=% \exp\left(\inf_{t\in\mathbb{R}}\Lambda(t)\right)=\inf_{t\in\mathbb{R}}\phi(t),$

using that $x\mapsto e^{x}$ is increasing.

Using the dominated convergence theorem, for $k\geq 0$ we obtain

 $\phi^{(k)}(t)=\int_{\mathbb{R}}x^{k}e^{tx}d({X_{1}}_{*}P)(x)=E(X_{1}^{k}e^{tX_% {1}}).$

In particular, $\phi^{\prime}(t)=E(X_{1}e^{tX_{1}})$, for which $\phi^{\prime}(0)=E(X_{1})<0$, and $\phi^{\prime\prime}(t)=E(X_{1}^{2}e^{tX_{1}})>0$ for all $t$ (either the expectation is $0$ or positive, and if it is $0$ then $X_{1}^{2}e^{tX_{1}}$ is $0$ almost everywhere, which contradicts $E(X_{1})<0$).

Either $P(X_{1}\leq 0)=1$ or $P(X_{1}\leq 0)<1$. In the first case,

 $\phi^{\prime}(t)=\int_{\Omega}X_{1}e^{tX_{1}}dP=\int_{X_{1}\leq 0}X_{1}e^{tX_{% 1}}dP\leq 0,$

so, using the dominated convergence theorem,

 $\rho=\inf_{t\in\mathbb{R}}\phi(t)=\lim_{t\to\infty}\phi(t)=\int_{X_{1}\leq 0}% \left(\lim_{t\to\infty}e^{tX_{1}}\right)dP=\int_{X_{1}=0}dP=P(X_{1}=0).$

Then

 $P(S_{n}\geq 0)=P(X_{1}=0,\ldots,X_{n}=0)=P(X_{1}=0)\cdots P(X_{n}=0)=\rho^{n}.$

That is, as $a=0$,

 $P(S_{n}\geq a)=e^{-n\Lambda^{*}(a)},$

and the claim is immediate in this case.

In the second case, $P(X_{1}\leq 0)<1$. As $\phi^{\prime\prime}(t)>0$ for all $t$, there is some $\tau\in\mathbb{R}$ at which $\phi(\tau)<\phi(t)$ for all $t\neq\tau$ (namely, a unique global minimum). Thus,

 $\phi(\tau)=\rho,\qquad\phi^{\prime}(\tau)=0.$

And $\phi^{\prime}(0)=E(X_{1})<0$, which with the above yields $\tau>0$. Because $\tau>0$, $S_{n}(\omega)\geq 0$ if and only if $\tau S_{n}(\omega)\geq 0$ if and only if $e^{\tau S_{n}(\omega)}\geq 1$. Applying Chebyshev’s inequality, and because $X_{n}$ are independent,

 $P(S_{n}\geq 0)=P(e^{\tau S_{n}}\geq 1)\leq E(e^{\tau S_{n}})=E(e^{\tau X_{1}})% \cdots E(e^{\tau X_{n}})=\phi(\tau)^{n}=\rho^{n},$

thus $\log P(S_{n}\geq 0)\leq n\log\rho$ and then

 $\limsup_{n\to\infty}\frac{1}{n}\log P(S_{n}\geq 0)\leq\limsup_{n\to\infty}\log% \rho=\log\rho=\log e^{-\Lambda^{*}(0)}=-\Lambda^{*}(0).$

To prove the claim, it now suffices to prove that, in the case $P(X_{1}\leq 0)<1$,

 $\liminf_{n\to\infty}\frac{1}{n}\log P(S_{n}\geq 0)\geq\log\rho.$ (1)

Let $\mu={X_{1}}_{*}P$, and let

 $d\nu(x)=\frac{e^{\tau x}}{\rho}d\mu(x).$

$\nu$ is a Borel probability measure: it is apparent that it is a Borel measure, and

 $\nu(\mathbb{R})=\int_{\mathbb{R}}d\nu(x)=\int_{\mathbb{R}}\frac{e^{\tau x}}{% \rho}d\mu(x)=\frac{1}{\rho}\int_{\mathbb{R}}e^{\tau x}d\mu(x)=\frac{\phi(\tau)% }{\rho}=1.$

There are independent identically distributed random variables $Y_{n}$, $n\geq 1$, each with ${Y_{n}}_{*}Q=\nu$. 66 6 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, p. 329, Corollary 10.19. Define

 $\psi(t)=E(e^{tY_{1}})=\int_{\mathbb{R}}e^{tx}d\nu(x)=\int_{\mathbb{R}}e^{tx}% \frac{e^{\tau x}}{\rho}d\mu(x)=\frac{1}{\rho}\int_{\mathbb{R}}e^{(t+\tau)x}d% \mu(x)=\frac{\phi(t+\tau)}{\rho},$

the moment generating function of $Y_{1}$. As $\phi^{\prime}(\tau)=0$,

 $E(Y_{1})=\psi^{\prime}(0)=\frac{\phi^{\prime}(\tau)}{\rho}=0.$

As $\rho>0$ and $\phi^{\prime\prime}(t)>0$ for all $t$,

 $\mathrm{Var}(Y_{1})=E(Y_{1}^{2})=\psi^{\prime\prime}(0)=\frac{\phi^{\prime% \prime}(\tau)}{\rho}\in(0,\infty).$

For $T_{n}=\sum_{k=1}^{n}Y_{k}$, using that the $X_{n}$ are independent and that the $Y_{n}$ are independent,

 $\displaystyle P(S_{n}\geq 0)$ $\displaystyle=\int_{x_{1}+\cdots+x_{n}\geq 0}d({S_{n}}_{*}P)(x)$ $\displaystyle=\int_{x_{1}+\cdots+x_{n}\geq 0}d\mu(x_{1})\cdots d\mu(x_{n})$ $\displaystyle=\int_{x_{1}+\cdots+x_{n}\geq 0}\left(\frac{\rho}{e^{\tau x_{1}}}% d\nu(x_{1})\right)\cdots\left(\frac{\rho}{e^{\tau x_{n}}}d\nu(x_{n})\right)$ $\displaystyle=\rho^{n}\int_{x_{1}+\cdots+x_{n}\geq 0}e^{-\tau(x_{1}+\cdots+x_{% n})}d({T_{n}}_{*}Q).$

But

 $\displaystyle\int_{x_{1}+\cdots+x_{n}\geq 0}e^{-\tau(x_{1}+\cdots+x_{n})}d({T_% {n}}_{*}Q)$ $\displaystyle=\int_{T_{n}\geq 0}e^{-\tau T_{n}}dQ$ $\displaystyle=E(1_{\{T_{n}\geq 0\}}\cdot e^{-\tau T_{n}}),$

hence

 $P(S_{n}\geq 0)=\rho^{n}E(1_{\{T_{n}\geq 0\}}\cdot e^{-\tau T_{n}}).$

Thus, (1) is equivalent to

 $\liminf_{n\to\infty}\frac{1}{n}\log\left(\rho^{n}E(1_{\{T_{n}\geq 0\}}\cdot e^% {-\tau T_{n}})\right)\geq\log\rho,$

so, to prove the claim it suffices to prove that

 $\liminf_{n\to\infty}\frac{1}{n}\log\left(E(1_{\{T_{n}\geq 0\}}\cdot e^{-\tau T% _{n}})\right)\geq 0.$

For any $c>0$,

 $\displaystyle\log\left(E(1_{\{T_{n}\geq 0\}}\cdot e^{-\tau T_{n}})\right)$ $\displaystyle\geq\log E\left(1_{\{0\leq T_{n}\leq c\sqrt{n}\}}\cdot e^{-\tau T% _{n}}\right)$ $\displaystyle\geq\log\left(e^{-\tau c\sqrt{n}}\cdot Q\left(0\leq T_{n}\leq c% \sqrt{n}\right)\right)$ $\displaystyle=-\tau c\sqrt{n}+\log Q\left(\frac{T_{n}}{\sqrt{n}}\in[0,c]\right).$

Because the $Y_{n}$ are independent identically distributed $L^{2}$ random variables with mean $0$ and variance $\sigma^{2}=\mathrm{Var}(Y_{1})=\frac{\phi^{\prime\prime}(\tau)}{\rho}$, the central limit theorem tells us that as $n\to\infty$,

 $Q\left(\frac{T_{n}}{\sqrt{n}}\in[0,c]\right)\to\gamma_{0,\sigma^{2}}([0,c]),$

where $\gamma_{a,\sigma^{2}}$ is the Gaussian measure, whose density with respect to Lebesgue measure on $\mathbb{R}$ is

 $p(t,a,\sigma^{2})=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(t-a)^{2}}{2\sigma^{2}}}.$

Thus, because for $c>0$ we have $\gamma_{0,\sigma^{2}}([0,c])>0$,

 $\displaystyle\liminf_{n\to\infty}\frac{1}{n}\log\left(E(1_{\{T_{n}\geq 0\}}% \cdot e^{-\tau T_{n}})\right)$ $\displaystyle\geq\liminf_{n\to\infty}\left(\frac{-\tau c}{\sqrt{n}}+\frac{1}{n% }\log Q\left(\frac{T_{n}}{\sqrt{n}}\in[0,c]\right)\right)$ $\displaystyle=\lim_{n\to\infty}-\frac{\tau c}{\sqrt{n}}+\lim_{n\to\infty}\frac% {1}{n}\log Q\left(\frac{T_{n}}{\sqrt{n}}\in[0,c]\right)$ $\displaystyle=\lim_{n\to\infty}\frac{1}{n}\log\gamma_{0,\sigma^{2}}([0,c])$ $\displaystyle=0,$

which completes the proof. ∎

For example, say that $X_{n}$ are independent identically distributed random variables with ${X_{1}}_{*}P=\gamma_{0,1}$. We calculate that the cumulant generating function $\Lambda(t)=\log E(e^{tX_{1}})$ is

 $\displaystyle\Lambda(t)$ $\displaystyle=\log\left(\int_{\mathbb{R}}e^{tx}d\gamma_{0,1}(x)\right)$ $\displaystyle=\log\left(\int_{\mathbb{R}}e^{tx}\frac{e^{-\frac{x^{2}}{2}}}{% \sqrt{2\pi}}dx\right)$ $\displaystyle=\log\left(\int_{\mathbb{R}}\frac{e^{-\frac{1}{2}(x-t)^{2}}}{% \sqrt{2\pi}}e^{\frac{t^{2}}{2}}dx\right)$ $\displaystyle=\log e^{t^{2}}{2}$ $\displaystyle=\frac{t^{2}}{2},$

thus $\Lambda(t)<\infty$ for all $t$. Then

 $\Lambda^{*}(x)=\sup_{t\in\mathbb{R}}(tx-\Lambda(t))=\sup_{t\in\mathbb{R}}\left% (tx-\frac{t^{2}}{2}\right)=\frac{x^{2}}{2}.$

Now applying Cramér’s theorem we get that for $a>E(X_{1})=0$, for $S_{n}=\sum_{k=1}^{n}X_{k}$ we have

 $\lim_{n\to\infty}\frac{1}{n}\log P(S_{n}\geq an)=-\frac{a^{2}}{2}.$

Another example: If $X_{n}$ are independent identically distributed random variables with the Rademacher distribution:

 ${X_{n}}_{*}P=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{1}.$

Then

 $E(e^{tX_{1}})=\int_{\mathbb{R}}e^{tx}d\left(\frac{1}{2}\delta_{-1}+\frac{1}{2}% \delta_{1}\right)(x)=\frac{1}{2}e^{-t}+\frac{1}{2}e^{t}=\cosh t,$

so the cumulant generating function of $X_{1}$ is

 $\Lambda(t)=\log\cosh t,$

and indeed $\Lambda(t)<\infty$ for all $t\in\mathbb{R}$. Then, as $\frac{d}{dt}(tx-\log\cosh t)=x-\tanh t$,

 $\Lambda^{*}(x)=\sup_{t\in\mathbb{R}}\left(tx-\log\cosh t\right)=\mathrm{% arctanh}\,x\cdot x-\log\cosh\mathrm{arctanh}\,x.$

For $x\in(-1,1)$,

 $\mathrm{arctanh}\,x=\frac{1}{2}\log\frac{1+x}{1-x}.$

Then

 $\cosh\mathrm{arctanh}\,x=\frac{1}{2}\left(e^{\mathrm{arctanh}\,x}+e^{-\mathrm{% arctanh}\,x}\right)=\frac{1}{2}\sqrt{\frac{1+x}{1-x}}+\frac{1}{2}\sqrt{\frac{1% -x}{1+x}}=\frac{1}{\sqrt{1-x^{2}}}.$

With these identities,

 $\displaystyle\Lambda^{*}(t)$ $\displaystyle=\frac{x}{2}\log\frac{1+x}{1-x}+\frac{1}{2}\log(1-x^{2})$ $\displaystyle=\frac{x}{2}\log(1+x)-\frac{x}{2}\log(1-x)+\frac{1}{2}\log(1+x)+% \frac{1}{2}\log(1-x)$ $\displaystyle=\frac{1+x}{2}\log(1+x)+\frac{1-x}{2}\log(1-x).$

With $S_{n}=\sum_{k=1}^{n}X_{k}$, applying Cramér’s theorem, we get that for any $a>E(X_{1})=0$,

 $\lim_{n\to\infty}\frac{1}{n}\log P(S_{n}\geq an)=-\frac{1+x}{2}\log(1+x)-\frac% {1-x}{2}\log(1-x).$

For a Borel probability measure $\mu$ on $\mathbb{R}$, we define its Laplace transform $\check{\mu}:\mathbb{R}\to(0,\infty]$ by

 $\check{\mu}(t)=\int_{\mathbb{R}}e^{ty}d\mu(y).$

Suppose that $\int_{\mathbb{R}}|y|d\mu(y)<\infty$ and let $M_{1}=\int_{\mathbb{R}}yd\mu(y)$, the first moment of $\mu$. For any $t$ the function $x\mapsto e^{tx}$ is convex, so by Jensen’s inequality,

 $e^{tM_{1}}\leq\int_{\mathbb{R}}e^{ty}d\mu(y)=\check{\mu}(t).$

Thus for all $t\in\mathbb{R}$,

 $tM_{1}-\log\check{\mu}(t)\leq 0.$

For a Borel probability measure $\mu$ with finite first moment, we define its Cramér transform $I_{\mu}:\mathbb{R}\to[0,\infty]$ by77 7 Heinz Bauer, Probability Theory, pp. 89–90, §12.

 $I_{\mu}(x)=\sup_{t\in\mathbb{R}}(tx-\log\check{\mu}(t)).$

For $t=0$, $tx-\log\check{\mu}(t)=-\log\check{\mu}(0)=-\log(1)=0$, which shows that indeed $0\leq I_{\mu}(x)\leq\infty$ for all $x\in\mathbb{R}$. But $tM_{1}-\log\check{\mu}(t)\leq 0$ for all $t$ yields

 $I_{\mu}(M_{1})=0.$