Subgaussian random variables, Hoeffding’s inequality, and Cramér’s large deviation theorem
1 Subgaussian random variables
For a random variable , let , the cumulant generating function of . A -subgaussian random variable, , is a random variable such that
We remark that for a Gaussian measure, whose density with respect to Lebesgue measure on is
we have
We prove that a -subgaussian random variable is centered and has variance .11 1 Karl R. Stromberg, Probability for Analysts, p. 293, Proposition 9.8.
Theorem 1.
If is -subgaussian then and .
Proof.
For each , , and by the dominated convergence theorem,
Therefore
whence
and so, for ,
First, this yields , which means that . Second, since ,
and then
which measn that . ∎
Stromberg attributes the following theorem to Saeki; further, it is proved in Stromberg that if for some the inequality in the theorem is an equality then the random variable has the Rademacher distribution.22 2 Karl R. Stromberg, Probability for Analysts, p. 293, Proposition 9.9; Omar Rivasplata, Subgaussian random variables: An expository note, http://www.math.ualberta.ca/~orivasplata/publications/subgaussians.pdf
Theorem 2.
If is a random variable satisfying and , then
Proof.
Define by
Then
the derivative of with respect to is obtained using the dominated convergence theorem. Let , with which
, so , hence
Because , for , we have almost surely , and therefore almost surely . Therefore, for ,
which tells us that for ,
As , for ,
and so
∎
Corollary 3.
If a random variable satisfies and , then is -subgaussian.
2 Hoeffding’s inequality
We first prove Hoeffding’s lemma.33 3 Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, p. 27, Lemma 2.2.
Lemma 4 (Hoeffding’s lemma).
If a random variable satisfies and , then is -subgaussian.
Proof.
Because , it follows that
not using that . (Namely, Popoviciu’s inequality.)
Write and for define
We check
There is a random variable for which . satisfies , and so
We calculate
and
But
and
and so
For , Taylor’s theorem tells us that there is some between and such that
here we have used that . But from what we have shown, and , so
which shows that is -subgaussian. ∎
We now prove Hoeffding’s inequality.44 4 Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, p. 34, Theorem 2.8.
Theorem 5 (Hoeffding’s inequality).
Let be independent random variables such that for each , , and write . For any ,
Proof.
For and , because is nonnegative and nondecreasing, for a random variable we have
and so , i.e.
Using this with and because the are independent,
Because , we have , and as , Hoeffding’s lemma tells us
and thus
We remark that does not appear in the left-hand side. Define
for which
Then if and only if
at which assumes its infimum. Then
proving the claim. ∎
3 Cramér’s large deviation theorem
The following is Cramér’s large deviation theorem.55 5 Achim Klenke, Probability Theory: A Comprehensive Course, p. 508, Theorem 23.3.
Theorem 6 (Cramér’s large deviation theorem).
Suppose that , , are independent identically distributed random variables such that for all ,
For define
If , then
where .
Proof.
For , let , let
and let
Lastly, let , with which
Thus, if we have
then
Therefore it suffices to prove the theorem for when and .
Define
the moment generating function of , and define
using that is increasing.
Using the dominated convergence theorem, for we obtain
In particular, , for which , and for all (either the expectation is or positive, and if it is then is almost everywhere, which contradicts ).
Either or . In the first case,
so, using the dominated convergence theorem,
Then
That is, as ,
and the claim is immediate in this case.
In the second case, . As for all , there is some at which for all (namely, a unique global minimum). Thus,
And , which with the above yields . Because , if and only if if and only if . Applying Chebyshev’s inequality, and because are independent,
thus and then
To prove the claim, it now suffices to prove that, in the case ,
(1) |
Let , and let
is a Borel probability measure: it is apparent that it is a Borel measure, and
There are independent identically distributed random variables , , each with . 66 6 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, p. 329, Corollary 10.19. Define
the moment generating function of . As ,
As and for all ,
For , using that the are independent and that the are independent,
But
hence
Thus, (1) is equivalent to
so, to prove the claim it suffices to prove that
For any ,
Because the are independent identically distributed random variables with mean and variance , the central limit theorem tells us that as ,
where is the Gaussian measure, whose density with respect to Lebesgue measure on is
Thus, because for we have ,
which completes the proof. ∎
For example, say that are independent identically distributed random variables with . We calculate that the cumulant generating function is
thus for all . Then
Now applying Cramér’s theorem we get that for , for we have
Another example: If are independent identically distributed random variables with the Rademacher distribution:
Then
so the cumulant generating function of is
and indeed for all . Then, as ,
For ,
Then
With these identities,
With , applying Cramér’s theorem, we get that for any ,
For a Borel probability measure on , we define its Laplace transform by
Suppose that and let , the first moment of . For any the function is convex, so by Jensen’s inequality,
Thus for all ,
For a Borel probability measure with finite first moment, we define its Cramér transform by77 7 Heinz Bauer, Probability Theory, pp. 89–90, §12.
For , , which shows that indeed for all . But for all yields