# Wiener measure and Donsker’s theorem

Jordan Bell
September 4, 2015

## 1 Relatively compact sets of Borel probability measures on C[0,1]

Let $E=C[0,1]$, let $\mathscr{B}_{E}$ be the Borel $\sigma$-algebra of $E$, and let $\mathscr{P}_{E}$ be the collection of Borel probability measures on $E$. We assign $\mathscr{P}$ the narrow topology, the coarsest topology on $\mathscr{P}_{E}$ such that for each $F\in C_{b}(E)$ the map $\mu\mapsto\int_{E}Fd\mu$ is continuous.

For $f\in E$ and $\delta>0$ we define

 $\omega_{f}(\delta)=\sup_{s,t\in[0,1],|s-t|\leq\delta}|f(s)-f(t)|.$

For $f\in E$, $\omega_{f}(\delta)\downarrow 0$ as $\delta\downarrow 0$, and for $\delta>0$, $f\mapsto\omega_{f}(\delta)$ is continuous. We shall use the following characterization of a relatively compact subset $A$ of $E$, which is proved using the Arzelà-Ascoli theorem.

###### Lemma 1.

Let $A$ be a subset of $E$. $\overline{A}$ is compact if and only if

 $\sup_{f\in A}|f(0)|<\infty$

and

 $\sup_{f\in A}\omega_{f}(\delta)\downarrow 0,\qquad\delta\downarrow 0.$

We shall use Prokhorov’s theorem:11 1 K. R. Parthasarathy, Probability Measures on Metric Spaces, p. 47, Chapter II, Theorem 6.7. for $X$ a Polish space and for $\Gamma\subset\mathscr{P}_{X}$, $\overline{\Gamma}$ is compact if and only if for each $\epsilon>0$ there is a compact subset $K_{\epsilon}$ of $X$ such that $\mu(K_{\epsilon})\geq 1-\epsilon$ for all $\mu\in\Gamma$. Namely, a subset of $\mathscr{P}_{X}$ is relatively compact if and only if it is tight. We use Prokhorov’s theorem to prove a characterization of relatively compact subsets of $\mathscr{P}_{E}$, which we then use to prove the characterization in Theorem 3.22 2 K. R. Parthasarathy, Probability Measures on Metric Spaces, p. 213, Chapter VII, Lemma 2.2.

###### Lemma 2.

Let $\Gamma$ be a subset of $\mathscr{P}_{E}$. $\overline{\Gamma}$ is compact if and only if for each $\epsilon>0$ there is some $M_{\epsilon}<\infty$ and a function $\delta\mapsto\omega_{\epsilon}(\delta)$ satisfying $\omega_{\epsilon}(\delta)\downarrow 0$ as $\delta\downarrow 0$ and such that for all $\mu\in\Gamma$,

 $\mu(A_{\epsilon})\geq 1-\frac{\epsilon}{2},\qquad\mu(B_{\epsilon})\geq 1-\frac% {\epsilon}{2},$

where

 $A_{\epsilon}=\{f\in E:|f(0)|\leq M_{\epsilon}\},\qquad B_{\epsilon}=\{f\in E:% \textrm{\omega_{f}(\delta)\leq\omega_{\epsilon}(\delta) for all \delta>0}\}.$
###### Proof.

Suppose that $\Gamma$ satisfies the above conditions. Because $f\mapsto|f(0)|$ is continuous, $A_{\epsilon}$ is closed. For $\delta>0$, suppose that $f_{n}$ is a sequence in $B_{\epsilon}$ tending to some $f\in E$. Because $g\mapsto\omega_{g}(\delta)$ is continuous, $\omega_{f_{n}}(\delta)\to\omega_{f}(\delta)$, and because $\omega_{f_{n}}(\delta)\leq\omega_{\epsilon}(\delta)$ for each $n$, we get $\omega_{f}(\delta)\leq\omega_{\epsilon}(\delta)$ and hence $f\in B_{\epsilon}$, showing that $B_{\epsilon}$ is closed. Therefore $K_{\epsilon}=A_{\epsilon}\cap B_{\epsilon}$ is closed, i.e. $K_{\epsilon}=\overline{K_{\epsilon}}$. The set $K_{\epsilon}$ satisfies

 $\sup_{f\in K_{\epsilon}}|f(0)|\leq M_{\epsilon}$

and

 $\limsup_{\delta\downarrow 0}\sup_{f\in K_{\epsilon}}\omega_{f}(\delta)\leq% \limsup_{\delta\downarrow 0}\omega_{\epsilon}(\delta)=0,$

thus by Lemma 1, $K_{\epsilon}$ is compact. For $\mu\in\Gamma$,

 $\mu(K_{\epsilon})\geq 1-\frac{\epsilon}{2},$

and because $K_{\epsilon}$ is compact, this means that $\Gamma$ is tight, so by Prokhorov’s theorem, $\Gamma$ is relatively compact.

Now suppose that $\Gamma$ is relatively compact and let $\epsilon>0$. By Prokhorov’s theorem, there is a compact set $K_{\epsilon}$ in $E$ such that $\mu(K_{\epsilon})\geq 1-\frac{\epsilon}{2}$ for all $\mu\in\Gamma$. Define

 $M_{\epsilon}=\sup_{f\in K_{\epsilon}}|f(0)|,\qquad\omega_{\epsilon}(\delta)=% \sup_{f\in K_{\epsilon}}\omega_{f}(\delta),\qquad\delta>0.$

Because $K_{\epsilon}$ is compact, by Lemma 1 we get that $M_{\epsilon}<\infty$ and $\omega_{\epsilon}(\delta)\downarrow 0$ as $\delta\downarrow 0$. For $\mu\in\Gamma$,

 $\mu(A_{\epsilon})\geq\mu(K_{\epsilon})\geq 1-\frac{\epsilon}{2},\qquad\mu(B_{% \epsilon})\geq\mu(K_{\epsilon})\geq 1-\frac{\epsilon}{2},$

showing that $\Gamma$ satisfies the conditions of the theorem. ∎

We now prove the characterization of relatively compact subsets of $\mathscr{P}_{E}$ that we shall use in our proof of Donsker’s theorem.33 3 K. R. Parthasarathy, Probability Measures on Metric Spaces, p. 214, Chapter VII, Theorem 2.2.

###### Theorem 3 (Relatively compact sets in $\mathscr{P}$).

Let $\Gamma$ be a subset of $\mathscr{P}_{E}$. $\overline{\Gamma}$ is compact if and only if the following conditions are satisfied:

1. 1.

For each $\epsilon>0$ there is some $M_{\epsilon}<\infty$ such that

 $\mu(f:|f(0)|\leq M_{\epsilon})\geq 1-\frac{\epsilon}{2},\qquad\mu\in\Gamma.$
2. 2.

For each $\epsilon>0$ and $\delta>0$ there is some $\eta=\eta(\epsilon,\delta)>0$ such that

 $\mu(f:\omega_{f}(\eta)\leq\delta)\geq 1-\frac{\epsilon}{2},\qquad\mu\in\Gamma.$
###### Proof.

Suppose that $\overline{\Gamma}$ is compact and let $\epsilon>0$. By Lemma 2, there is some $M_{\epsilon}<\infty$ and a function $\eta\mapsto\omega_{\epsilon}(\eta)$ satisfying $\omega_{\epsilon}(\eta)\downarrow 0$ as $\eta\downarrow 0$ and

 $\mu(A_{\epsilon})\geq 1-\frac{\epsilon}{2},\qquad\mu(B_{\epsilon})\geq 1-\frac% {\epsilon}{2},\qquad\mu\in\Gamma.$

For $\delta>0$, there is some $\eta=\eta(\epsilon,\delta)$ with $\omega_{\epsilon}(\eta)\leq\delta$. Then for $\mu\in\Gamma$,

 $\mu(f:\omega_{f}(\eta)\leq\delta)\geq\mu(f:\omega_{f}(\eta)\leq\omega_{% \epsilon}(\eta))\geq\mu(B_{\epsilon})\geq 1-\frac{\epsilon}{2}.$

Now suppose that the conditions of the theorem hold. For each $\epsilon>0$ and $n\geq 1$ there is some $\eta_{\epsilon,n}>0$ such that

 $\mu(F_{\epsilon,n})\geq 1-\frac{\epsilon}{2^{n+1}},\qquad\mu\in\Gamma,$

where

 $F_{\epsilon,n}=\left\{f:\omega_{f}(\eta_{\epsilon,n})\leq\frac{1}{n}\right\}.$

Let

 $K_{\epsilon}=\{f:|f(0)|\leq M_{\epsilon}\}\cap\bigcap_{n=1}^{\infty}F_{% \epsilon,n},$

for which

 $\mu(K_{\epsilon})\geq\mu(f:|f(0)|\leq M_{\epsilon})\geq 1-\frac{\epsilon}{2},% \qquad\mu\in\Gamma.$

For $f\in K_{\epsilon}$, then for each $n\geq 1$ we have $f\in F_{\epsilon,n}$, which means that $\omega_{f}(\eta_{\epsilon,n})\leq\frac{1}{n}$, and therefore

 $\sup_{f\in K_{\epsilon}}\omega_{f}(\eta_{\epsilon,n})\leq\frac{1}{n}.$

Thus for $n\geq 1$, if $0<\eta\leq\eta_{\epsilon,n}$ then

 $\sup_{f\in K_{\epsilon}}\omega_{f}(\eta)\leq\frac{1}{n},$

which shows $\sup_{f\in K_{\epsilon}}\omega_{f}(\eta)\downarrow 0$ as $\eta\downarrow 0$. Then because

 $\sup_{f\in K_{\epsilon}}|f(0)|\leq M_{\epsilon},$

applying Lemma 1 we get that $\overline{K_{\epsilon}}$ is compact. The map $f\mapsto\omega_{f}(\eta_{\epsilon,n})$ is continuous, so the set $F_{\epsilon,n}$ is closed, and therefore the set $K_{\epsilon}$ is closed. Because $K_{\epsilon}$ is compact and $\mu(K_{\epsilon})\geq 1-\frac{\epsilon}{2}$ for all $\mu\in\Gamma$, it follows from by Prokhorov’s theorem that $\Gamma$ is relatively compact. ∎

## 2 Wiener measure

For $t_{1},\ldots,t_{d}\in[0,1]$, $t_{1}<\cdots, define $\pi_{t_{1},\ldots,t_{d}}:E\to\mathbb{R}^{d}$ by

 $\pi_{t_{1},\ldots,t_{d}}(f)=(f(t_{1}),\ldots,f(t_{d})),\qquad f\in E,$

which is continuous. We state the following results, which we will use later.44 4 K. R. Parthasarathy, Probability Measures on Metric Spaces, p. 212, Chapter VII, Theorem 2.1.

###### Theorem 4 (The Borel $\sigma$-algebra of $E$).

$\mathscr{B}_{E}$ is equal to the $\sigma$-algebra generated by $\{\pi_{t}:t\in[0,1]\}$.

Two elements $\mu$ and $\nu$ of $\mathscr{P}_{E}$ are equal if and only if for any $d$ and any $t_{1}<\cdots, the pushforward measures

 $\mu_{t_{1},\ldots,t_{d}}=(\pi_{t_{1},\ldots,t_{d}})_{*}\mu,\qquad\nu_{t_{1},% \ldots,t_{d}}=(\pi_{t_{1},\ldots,t_{d}})_{*}\nu$

are equal.

Let $(\xi_{t})_{t\in[0,1]}$ be a stochastic process with state space $\mathbb{R}$ and sample space $(\Omega,\mathscr{F},P)$. For $t_{1}<\cdots, let $\xi_{t_{1},\ldots,t_{d}}=\xi_{t_{1}}\otimes\cdots\otimes\xi_{t_{d}}$ and let $P_{t_{1},\ldots,t_{d}}=(\xi_{t_{1},\ldots,t_{d}})_{*}P$: for $B\in\mathscr{B}_{\mathbb{R}}^{d}$,

 $P_{t_{1},\ldots,t_{d}}(B)=((\xi_{t_{1},\ldots,t_{d}})_{*}P)(B)=P(\xi_{t_{1},% \ldots,t_{d}}^{-1}(B))=P((\xi_{t_{1}},\ldots,\xi_{t_{d}})\in B).$

$P_{t_{1},\ldots,t_{d}}$ is a Borel probability measure on $\mathbb{R}^{d}$ and is called a finite-dimensional distribution of the stochastic process.

The Kolmogorov continuity theorem55 5 K. R. Parthasarathy, Probability Measures on Metric Spaces, p. 216, Chapter VII, Theorem 3.1 tells us that if there are $\alpha,\beta,K>0$ such that for all $s,t\in[0,1]$,

 $E|\xi_{t}-\xi_{s}|^{\alpha}\leq K|t-s|^{1+\beta},$

then there is a unique $\mu\in\mathscr{P}_{E}$ such that for all $k$ and for all $t_{1}<\cdots,

 $\mu_{t_{1},\ldots,t_{d}}=P_{t_{1},\ldots,t_{d}}.$

We now define and prove the existence of Wiener measure.66 6 K. R. Parthasarathy, Probability Measures on Metric Spaces, p. 218, Chapter VII, Theorem 3.2.

###### Theorem 5 (Wiener measure).

There is a unique Borel probability measure $W$ on $E$ satisfying:

1. 1.

$W(f\in E:f(0)=0)=1$.

2. 2.

For $0\leq t_{0} the random variables

 $\pi_{t_{1}}-\pi_{t_{0}},\quad\pi_{t_{2}}-\pi_{t_{1}},\quad\pi_{t_{3}}-\pi_{t_{% 2}},\quad\pi_{t_{d}}-\pi_{t_{d-1}}$

are independent $(E,\mathscr{B}_{E},W)\to(\mathbb{R},\mathscr{B}_{\mathbb{R}})$.

3. 3.

If $0\leq s, the random variable $\pi_{t}-\pi_{s}:(E,\mathscr{B}_{E},W)\to(\mathbb{R},\mathscr{B}_{\mathbb{R}})$ is normal with mean $0$ and variance $t-s$.

###### Proof.

There is a stochastic process $(\xi_{t})_{t\in[0,1]}$ with state space $\mathbb{R}$ and some sample space $(\Omega,\mathscr{F},P)$, such that (i) $P(\xi_{0}=0)=1$, (ii) $(\xi_{t})_{t\in[0,1]}$ has independent increments, and (iii) for $s, $\xi_{t}-\xi_{s}$ is a normal random variable with mean $0$ and variance $t-s$. (Namely, Brownian motion with starting point $0$.) Because $\xi_{t}-\xi_{s}$ has mean $0$ and variance $t-s$, we calculate (cf. Isserlis’s theorem)

 $E|\xi_{t}-\xi_{s}|^{4}=3|t-s|^{2}.$

Thus using the Kolmogorov continuity theorem with $\alpha=4$, $\beta=1$, $K=3$, there is a unique $W\in\mathscr{P}_{E}$ such that for all $t_{1}<\cdots,

 $W_{t_{1},\ldots,t_{d}}=P_{t_{1},\ldots,t_{d}},$

i.e. for $B\in\mathscr{B}_{\mathbb{R}}^{d}$,

 $W(\pi_{t_{1}}\otimes\cdots\otimes\pi_{t_{d}}\in B)=P(\xi_{t_{1}}\otimes\cdots% \otimes\xi_{t_{d}}\in B).$

For $t_{1}<\cdots and $B\in\mathscr{B}_{\mathbb{R}}^{d}$, with $T:\mathbb{R}^{d}\to\mathbb{R}^{d}$ defined by $T(x_{1},\ldots,x_{d})=(x_{1},x_{2}-x_{1},\ldots,x_{d}-x_{d-1})$,

 $\begin{split}&\displaystyle W(\pi_{t_{1}}\otimes(\pi_{t_{2}}-\pi_{t_{1}})% \otimes\cdots\otimes(\pi_{t_{d}}-\pi_{t_{d-1}})\in B)\\ \displaystyle=&\displaystyle W(T\circ(\pi_{t_{1}}\otimes\pi_{t_{2}}\otimes% \cdots\otimes\pi_{t_{d}})\in B)\\ \displaystyle=&\displaystyle W(\pi_{t_{1}}\otimes\pi_{t_{2}}\otimes\cdots% \otimes\pi_{t_{d}}\in T^{-1}(B))\\ \displaystyle=&\displaystyle P(\xi_{t_{1}}\otimes\xi_{t_{2}}\otimes\cdots% \otimes\xi_{t_{d}}\in T^{-1}(B))\\ \displaystyle=&\displaystyle P(T\circ(\xi_{t_{1}}\otimes\xi_{t_{2}}\otimes% \cdots\otimes\xi_{t_{d}})\in B)\\ \displaystyle=&\displaystyle P(\xi_{t_{1}}\otimes(\xi_{t_{2}}-\xi_{t_{1}})% \otimes\cdots\otimes(\xi_{t_{d}}-\xi_{t_{d-1}})\in B).\end{split}$

Hence, because $\xi_{t_{1}},\xi_{t_{2}}-\xi_{t_{1}},\ldots,\xi_{t_{d}}-\xi_{t_{d-1}}$ are independent,

 $\begin{split}&\displaystyle(\pi_{t_{1}}\otimes(\pi_{t_{2}}-\pi_{t_{1}})\otimes% \cdots\otimes(\pi_{t_{d}}-\pi_{t_{d-1}}))_{*}W\\ \displaystyle=&\displaystyle(\xi_{t_{1}}\otimes(\xi_{t_{2}}-\xi_{t_{1}})% \otimes\cdots\otimes(\xi_{t_{d}}-\xi_{t_{d-1}}))_{*}P\\ \displaystyle=&\displaystyle(\xi_{t_{1}})_{*}P\otimes(\xi_{t_{2}}-\xi_{t_{1}})% _{*}P\otimes\cdots\otimes(\xi_{t_{d}}-\xi_{t_{d-1}})_{*}P\\ \displaystyle=&\displaystyle(\pi_{t_{1}})_{*}W\otimes(\pi_{t_{2}}-\pi_{t_{1}})% _{*}W\otimes\cdots\otimes(\pi_{t_{d}}-\pi_{t_{d-1}})_{*}W,\end{split}$

which means that the random variables $\pi_{t_{1}},\pi_{t_{2}}-\pi_{t_{1}},\ldots,\pi_{t_{d}}-\pi_{t_{d-1}}$ are independent.

If $s and $B_{1},B_{2}\in\mathscr{B}_{\mathbb{R}}$, and for $T:\mathbb{R}^{2}\to\mathbb{R}^{2}$ defined by $T(x,y)=(x,y-x)$,

 $\displaystyle W((\pi_{s},\pi_{t}-\pi_{s})\in(B_{1},B_{2}))$ $\displaystyle=W(T\circ(\pi_{s},\pi_{t})\in(B_{1},B_{2}))$ $\displaystyle=P((\xi_{s},\xi_{t})\in T^{-1}(B_{1},B_{2}))$ $\displaystyle=P((\xi_{s},\xi_{t}-\xi_{s})\in(B_{1},B_{2})),$

which implies that $(\pi_{t}-\pi_{s})_{*}W=(\xi_{t}-\xi_{s})_{*}P$, and because $\xi_{t}-\xi_{s}$ is a normal random variable with mean $0$ and variance $t-s$, so is $\pi_{t}-\pi_{s}$.

Finally,

 $W(f:f(0)=0)=W(\pi_{0}=0)=P(\xi_{0}=0)=1.$

$(E,\mathscr{B}_{E},W)$ is a probability space, and the stochastic process $(\pi_{t})_{t\in[0,1]}$ is a Brownian motion.

## 3 Interpolation and continuous stochastic processes

Let $(\xi_{t})_{t\in[0,1]}$ be a continuous stochastic process with state space $\mathbb{R}$ and sample space $(\Omega,\mathscr{F},P)$. To say that the stochastic process is continuous means that for each $\omega\in\Omega$ the map $t\mapsto\xi_{t}(\omega)$ is continuous $[0,1]\to\mathbb{R}$. Define $\xi:\Omega\to E$ by

 $\xi(\omega)=(t\mapsto\xi_{t}(\omega)),\qquad\omega\in\Omega.$

For $t\in[0,1]$ and $B$ a Borel set in $\mathbb{R}$,

 $\xi^{-1}\pi_{t}^{-1}B=\{\omega\in\Omega:\xi_{t}(\omega)\in B\}=\xi_{t}^{-1}B,$

and because $\xi_{t}:(\Omega,\mathscr{F})\to(\mathbb{R},\mathscr{B}_{\mathbb{R}})$ is measurable this belongs to $\mathscr{F}$. But by Theorem 4, $\mathscr{B}_{E}$ is generated by the collection $\{\pi_{t}^{-1}B:t\in[0,1],B\in\mathscr{B}_{\mathbb{R}}\}$. Now, for $f:X\to Y$ and for a nonempty collection $\mathscr{F}$ of subsets of $Y$,77 7 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 140, Lemma 4.23.

 $\sigma(f^{-1}(\mathscr{F}))=f^{-1}(\sigma(\mathscr{F})).$

Therefore $\xi^{-1}(\mathscr{B}_{E})\subset\mathscr{F}$, which means that $\xi:(\Omega,\mathscr{F})\to(E,\mathscr{B}_{E})$ is measurable. This means that a continuous stochastic proess with index set $[0,1]$ induces a random variable with state space $E$. Then the pushforward measure of $P$ by $\xi$ is a Borel probability measure on $E$. We shall end up constructing a sequence of pushforward measures from a sequence of continuous stochastic processes, that converge in $\mathscr{P}_{E}$ to Wiener measure $W$.

Let $(X_{n})_{n\geq 1}$ be a sequence of independent identically distributed random variables on a sample space $(\Omega,\mathscr{F},P)$ with $E(X_{n})=0$ and $V(X_{n})=1$, and let $S_{0}=0$ and

 $S_{k}=\sum_{i=1}^{k}X_{i}.$

Then $E(S_{k})=0$ and $V(S_{k})=k$. For $t\geq 0$ let

 $Y_{t}=S_{[t]}+(t-[t])X_{[t]+1}.$

Thus, for $k\geq 0$ and $k\leq t\leq k+1$,

 $\displaystyle Y_{t}$ $\displaystyle=S_{k}+(t-k)X_{k+1}$ $\displaystyle=S_{k}+(t-k)(S_{k+1}-S_{k})$ $\displaystyle=(1-t+k)S_{k}+(t-k)S_{k+1}.$

For each $\omega\in\Omega$, the map $t\mapsto Y_{t}(\omega)$ is piecewise linear, equal to $S_{k}(\omega)$ when $t=k$, and in particular it is continuous. For $n\geq 1$, define

 $X_{t}^{(n)}=n^{-1/2}Y_{nt}=n^{-1/2}S_{[nt]}+n^{-1/2}(nt-[nt])X_{[nt]+1},\qquad t% \in[0,1].$ (1)

For $0\leq k\leq n$,

 $X_{k/n}^{(n)}=n^{-1/2}S_{k}.$

For each $n\geq 1$, $(X^{(n)}_{t})_{t\in[0,1]}$ is a continuous stochastic process on the sample space $(\Omega,\mathscr{F},P)$, and we denote by $P_{n}\in\mathscr{P}_{E}$ the pushforward measure of $P$ by $X^{(n)}$.

## 4 Donsker’s theorem

###### Lemma 6.

If $Z_{n}$ and $U_{n}$ are random variables with state space $\mathbb{R}^{d}$ such that $Z_{n}\to Z$ in distribution and $U_{n}\to 0$ in distribution, then $Z_{n}+U_{n}\to 0$ in distribution.

If $Z_{n}$ are random variables with state space $\mathbb{R}$ that converge in distribution to some random variable $Z$ and $c_{n}$ are real numbers that converge to some real number $c$, then $c_{n}Z_{n}\to cZ$ in distribution.

For $\sigma\geq 0$, let $\nu_{\sigma^{2}}$ be the Gaussian measure on $\mathbb{R}$ with mean $0$ and variance $\sigma^{2}$. The characteristic function of $\nu_{\sigma^{2}}$ is, for $\sigma>0$,

 $\widetilde{\nu}_{\sigma^{2}}(\xi)=\int_{\mathbb{R}}e^{i\xi x}d\nu_{\sigma^{2}}% (x)=\int_{\mathbb{R}}e^{i\xi x}\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{x^{2}}{2% \sigma^{2}}}dx=e^{-\frac{1}{2}\sigma^{2}\xi^{2}},$

and $\widetilde{\nu}_{0}(\xi)=1$. One checks that $c_{*}\nu_{1}=\nu_{c^{2}}$ for $c\geq 0$.

In following theorem and in what follows, $X^{(n)}$ is the piecewise linear stochastic process defined in (1). We prove that a sequence of finite-dimensional distributions converge to a Gaussian measure.88 8 Bert Fristedt and Lawrence Gray, A Modern Approach to Probability Theory, p. 368, §19.1, Lemma 1.

###### Theorem 7.

For $0\leq t_{0}, the random vectors

 $(X^{(n)}_{t_{1}}-X^{(n)}_{t_{0}},\ldots,X^{(n)}_{t_{d}}-X^{(n)}_{t_{d-1}}),% \qquad(\Omega,\mathscr{F},P)\to(\mathbb{R}^{d},\mathscr{B}_{\mathbb{R}}^{d}),$

converge in distribution to $\nu_{t_{1}-t_{0}}\otimes\cdots\otimes\nu_{t_{d}-t_{d-1}}$ as $n\to\infty$.

###### Proof.

For $0 and $n\geq 1$ let

 $r_{j,n}=\frac{[nt_{j}]}{n},\qquad U_{j,n}=X^{(n)}_{t_{j}}-X^{(n)}_{r_{j,n}},$

and for $0\leq j and $n\geq 1$ let

 $s_{j,n}=\frac{\lceil nt_{j}\rceil}{n},\qquad V_{j,n}=X^{(n)}_{s_{j,n}}-X^{(n)}% _{t_{j}},$

with which

 $\displaystyle(X^{(n)}_{t_{1}}-X^{(n)}_{t_{0}},\ldots,X^{(n)}_{t_{d}}-X^{(n)}_{% t_{d-1}})$ $\displaystyle=(X^{(n)}_{r_{1,n}}-X^{(n)}_{s_{0,n}},\ldots,X^{(n)}_{r_{d,n}}-X^% {(n)}_{s_{d-1,n}})$ $\displaystyle+(U_{1,n},\ldots,U_{d,n})+(V_{0,n},\ldots,V_{d-1,n}).$

Because $E(X^{(n)}_{t})=0$,

 $E(U_{j,n})=0,\qquad E(V_{j,n})=0.$

Furthermore,

 $\begin{split}&\displaystyle V(U_{j,n})\\ \displaystyle=&\displaystyle V(X^{(n)}_{t_{j}}-X^{(n)}_{r_{j,n}})\\ \displaystyle=&\displaystyle n^{-1}V(S_{[nt_{j}]}+(nt_{j}-[nt_{j}])X_{[nt_{j}]% +1}-S_{[nr_{j,n}]}-(nr_{j,n}-[nr_{j,n}])X_{[nr_{j,n}]+1})\\ \displaystyle=&\displaystyle n^{-1}V(S_{[nt_{j}]}+(nt_{j}-[nt_{j}])X_{[nt_{j}]% +1}-S_{[nt_{j}]}-([nt_{j}]-[nt_{j}])X_{[nr_{j,n}]+1})\\ \displaystyle=&\displaystyle n^{-1}(nt_{j}-[nt_{j}])^{2}V(X_{[nt_{j}]+1})\\ \displaystyle=&\displaystyle n^{-1}(nt_{j}-[nt_{j}])^{2},\end{split}$

and because $0\leq nt_{j}-[nt_{j}]<1$ this tends to $0$ as $n\to\infty$. Likewise, $V(V_{j,n})\to 0$ as $n\to\infty$.

For $1\leq j\leq d$,

 $\displaystyle X^{(n)}_{r_{j,n}}-X^{(n)}_{s_{j-1,n}}$ $\displaystyle=n^{-1/2}S_{[nr_{j,n}]}+n^{-1/2}(nr_{j,n}-[nr_{j,n}])X_{[nr_{j,n}% ]+1}$ $\displaystyle-n^{-1/2}S_{[ns_{j-1,n}]}-n^{-1/2}(ns_{j-1,n}-[ns_{j-1,n}])X_{[ns% _{j-1,n}]+1}$ $\displaystyle=n^{-1/2}S_{[nt_{j}]}-n^{-1/2}S_{\lceil nt_{j-1}\rceil}$ $\displaystyle=n^{-1/2}\frac{([nt_{j}]-\lceil nt_{j-1}\rceil-1)^{1/2}}{([nt_{j}% ]-\lceil nt_{j-1}\rceil-1)^{1/2}}\sum_{i=\lceil nt_{j-1}\rceil+1}^{[nt_{j}]}X_% {i}.$

By the central limit theorem,

 $([nt_{j}]-\lceil nt_{j-1}\rceil-1)^{1/2}\sum_{i=\lceil nt_{j-1}\rceil+1}^{[nt_% {j}]}X_{i}\to\nu_{1}$

in distribution as $n\to\infty$. But

 $n^{-1/2}([nt_{j}]-\lceil nt_{j-1}\rceil-1)^{1/2}\to(t_{j}-t_{j-1})^{1/2}$

as $n\to\infty$, and $(t_{j}-t_{j-1})^{1/2}_{*}\nu_{1}=\nu_{t_{j}-t_{j-1}}$, so by Lemma 6,

 $X^{(n)}_{r_{j,n}}-X^{(n)}_{s_{j-1,n}}\to\nu_{t_{j}-t_{j-1}}$

in distribution as $n\to\infty$.

For sufficiently large $n$, depending on $t_{0},\ldots,t_{d}$,

 $t_{0}\leq s_{0,n}

Check that $(U_{1,n},\ldots,U_{d,n})\to 0$ in probability and that $(V_{0,n},\ldots,V_{d-1,n})\to 0$ in probability, and hence these random vectors converge to $0$ in distribution as $n\to\infty$. The random variables $X^{(n)}_{r_{1,n}}-X^{(n)}_{s_{0,n}},\ldots,X^{(n)}_{r_{d,n}}-X^{(n)}_{s_{d-1,n}}$ are independent, and therefore their joint distribution is equal to the product of their distributions. Now, if $\mu_{n}=\mu_{n}^{1}\otimes\cdots\otimes\mu_{n}^{d}$ and $\mu_{n}^{j}\to\mu^{j}$ as $n\to\infty$, $1\leq j\leq d$, then for $\xi\in\mathbb{R}^{d}$,

 $\displaystyle\widetilde{\mu}_{n}(\xi)$ $\displaystyle=\widetilde{\mu}_{n}^{1}(\xi_{1})\cdots\widetilde{\mu}_{n}^{d}(% \xi_{d})$ $\displaystyle\to\widetilde{\mu}^{1}(\xi_{1})\cdots\widetilde{\mu}^{d}(\xi_{d})$ $\displaystyle=(\mu^{1}\otimes\cdots\otimes\mu^{d})^{\widetilde{\;}}(\xi)$

as $n\to\infty$, and therefore by Lévy’s continuity theorem, $\mu_{n}\to\mu^{1}\otimes\cdots\otimes\mu^{d}$ as $n\to\infty$. This means that the joint distribution of $X^{(n)}_{r_{1,n}}-X^{(n)}_{s_{0,n}},\ldots,X^{(n)}_{r_{d,n}}-X^{(n)}_{s_{d-1,n}}$ converges to

 $\nu_{t_{1}-t_{0}}\otimes\cdots\otimes\nu_{t_{d}-t_{d-1}}$

as $n\to\infty$. Because $(U_{1,n},\ldots,U_{d,n})\to 0$ in distribution as $n\to\infty$ and $(V_{0,n},\ldots,V_{d-1,n})\to 0$ in distribution as $n\to\infty$, applying Lemma 6 we get that

 $(X^{(n)}_{t_{1}}-X^{(n)}_{t_{0}},\ldots,X^{(n)}_{t_{d}}-X^{(n)}_{t_{d-1}})\to% \nu_{t_{1}-t_{0}}\otimes\cdots\otimes\nu_{t_{d}-t_{d-1}}$

in distribution as $n\to\infty$, completing the proof. ∎

Let $t_{0}=0$ and let $0. As $X^{(n)}_{0}=0$, the above lemma tells us that

 $(X^{(n)}_{t_{1}},X^{(n)}_{t_{2}}-X^{(n)}_{t_{1}},\ldots,X^{(n)}_{t_{d}}-X^{(n)% }_{t_{d-1}})\to\nu_{t_{1}}\otimes\nu_{t_{2}-t_{1}}\otimes\cdots\otimes\nu_{t_{% d}-t_{d-1}}$

in distribution as $n\to\infty$. Define $g:\mathbb{R}^{d}\to\mathbb{R}^{d}$ by

 $g(x_{1},x_{2},\ldots,x_{d})=(x_{1},x_{1}+x_{2},\ldots,x_{1}+x_{2}+\cdots+x_{d}).$

The function $g$ is continuous and satisfies

 $g\circ(X^{(n)}_{t_{1}}-X^{(n)}_{t_{0}},\ldots,X^{(n)}_{t_{d}}-X^{(n)}_{t_{d-1}% })=(X^{(n)}_{t_{1}},X^{(n)}_{t_{2}},\ldots,X^{(n)}_{t_{d}}).$

Then by the continuous mapping theorem,

 $(X^{(n)}_{t_{1}},X^{(n)}_{t_{2}},\ldots,X^{(n)}_{t_{d}})\to g_{*}(\nu_{t_{1}}% \otimes\nu_{t_{2}-t_{1}}\otimes\cdots\otimes\nu_{t_{d}-t_{d-1}})$ (2)

in distribution as $n\to\infty$.99 9 Allan Gut, Probability: A Graduate Course, second ed., p. 245, Chapter 5, Theorem 10.4.

We prove a result that we use to prove the next lemma, and that lemma is used in the proof of Donsker’s theorem.1010 10 Ioannis Karatzas and Steven E. Shreve, Brownian Motion and Stochastic Calculus, second ed., p. 68, Lemma 4.18.

###### Lemma 8.

For $\epsilon>0$,

 $\lim_{\delta\downarrow 0}\limsup_{n\to\infty}\frac{1}{\delta}P\left(\max_{1% \leq j\leq[n\delta]+1}|S_{j}|>\epsilon n^{1/2}\right)=0.$
###### Proof.

For each $\delta>0$, by the central limit theorem,

 $([n\delta]+1)^{-1/2}S_{[n\delta]+1}\to Z$

in distribution as $n\to\infty$, where $Z_{*}P=\nu_{1}$. Because $\frac{([n\delta]+1)^{1/2}}{(n\delta)^{1/2}}\to 1$ as $n\to\infty$, by Lemma 6 we then get that

 $(n\delta)^{-1/2}S_{[n\delta]+1}\to Z$

in distribution as $n\to\infty$. Now let $\lambda>0$, and there is a sequence $\phi_{k}$ in $C_{b}(\mathbb{R})$ such that $\phi_{k}\downarrow 1_{(-\infty,-\lambda]\cup[\lambda,\infty)}=\chi_{\lambda}$ pointwise as $k\to\infty$. For each $k$, writing $X=S_{[n\delta]+1}$, using the change of variables formula,

 $\displaystyle P(|X|\geq\lambda(n\delta)^{1/2})$ $\displaystyle=\int_{\Omega}\chi_{\lambda(n\delta)^{1/2}}(X(\omega))dP(\omega)$ $\displaystyle=\int_{\Omega}\chi_{\lambda}((n\delta)^{-1/2}X(\omega))dP(\omega)$ $\displaystyle\leq\int_{\Omega}\phi_{k}((n\delta)^{-1/2}X(\omega))dP(\omega)$ $\displaystyle=E(\phi_{k}((n\delta)^{-1/2}X)).$

Therefore, by the continuous mapping theorem,

 $\displaystyle\limsup_{n\to\infty}P(|S_{[n\delta]+1}|\geq\lambda(n\delta)^{1/2})$ $\displaystyle\leq\lim_{n\to\infty}E(\phi_{k}((n\delta)^{-1/2}S_{[n\delta]+1}))$ $\displaystyle=E(\phi_{k}\circ Z).$

Because $\phi_{k}\downarrow\chi_{\lambda}$ pointwise as $k\to\infty$, using the monotone convergence theorem and then using Chebyshev’s inequality,

 $E(\phi_{k}\circ Z)\to E(\chi_{\lambda}\circ Z)=P(|Z|\geq\lambda)\leq\lambda^{-% 3}E|Z|^{3}.$

We have established that for each $\lambda>0$,

 $\limsup_{n\to\infty}P(|S_{[n\delta]+1}|\geq\lambda(n\delta)^{1/2})\leq\lambda^% {-3}E|Z|^{3}.$ (3)

Define

 $\tau=\min\{j\geq 1:|S_{j}|>n^{1/2}\epsilon\}.$

For $0<\delta<\epsilon^{2}/2$, it is a fact that

 $\begin{split}&\displaystyle P\left(\max_{0\leq j\leq[n\delta]+1}|S_{j}|>n^{1/2% }\epsilon\right)\\ \displaystyle\leq&\displaystyle P(|S_{[n\delta]+1}|\geq n^{1/2}(\epsilon-(2% \delta)^{1/2}))\\ &\displaystyle+\sum_{j=1}^{[n\delta]}P(|S_{[n\delta]+1}|

If $\tau(\omega)=j$ and $|S_{[n\delta]+1}(\omega)| then

 $|S_{j}(\omega)-S_{[n\delta]+1}(\omega)|\geq|S_{j}(\omega)|-|S_{[n\delta]+1}(% \omega)|>n^{1/2}\epsilon-n^{1/2}(\epsilon-(2\delta)^{1/2})=(2n\delta)^{1/2}.$

But by Chebyshev’s inequality and the fact that the random variables $X_{1},X_{2},\ldots$ are independent with mean $0$ and variance $1$,

 $P(|S_{j}-S_{[n\delta]+1}|>(2n\delta)^{1/2})\leq\frac{1}{2n\delta}E((S_{j}-S_{[% n\delta]+1})^{2})=\frac{1}{2n\delta}([n\delta]-j)\leq\frac{1}{2},$

so

 $P(|S_{[n\delta]+1}(\omega)|

Therefore,

 $\begin{split}&\displaystyle P\left(\max_{0\leq j\leq[n\delta]+1}|S_{j}|>n^{1/2% }\epsilon\right)\\ \displaystyle\leq&\displaystyle P(|S_{[n\delta]+1}|\geq n^{1/2}(\epsilon-(2% \delta)^{1/2}))+\sum_{j=1}^{[n\delta]}\frac{1}{2}\cdot P(\tau=j)\\ \displaystyle=&\displaystyle P(|S_{[n\delta]+1}|\geq n^{1/2}(\epsilon-(2\delta% )^{1/2}))+\frac{1}{2}P(\tau\leq[n\delta])\\ \displaystyle=&\displaystyle P(|S_{[n\delta]+1}|\geq n^{1/2}(\epsilon-(2\delta% )^{1/2}))+\frac{1}{2}P\left(\max_{0\leq j\leq[n\delta]+1}|S_{j}|>n^{1/2}% \epsilon\right),\end{split}$

so

 $P\left(\max_{0\leq j\leq[n\delta]+1}|S_{j}|>n^{1/2}\epsilon\right)\leq 2P(|S_{% [n\delta]+1}|\geq n^{1/2}(\epsilon-(2\delta)^{1/2})).$

Now using (3) with $\lambda=(\epsilon-(2\delta)^{1/2})\delta^{-1/2}$,

 $\limsup_{n\to\infty}P(|S_{[n\delta]+1}|\geq(\epsilon-(2\delta)^{1/2})\delta^{-% 1/2}(n\delta)^{1/2})\leq(\epsilon-(2\delta)^{1/2})^{-3}\delta^{3/2}E|Z|^{3},$

hence

 $\limsup_{n\to\infty}P\left(\max_{0\leq j\leq[n\delta]+1}|S_{j}|>n^{1/2}% \epsilon\right)\leq 2(\epsilon-(2\delta)^{1/2})^{-3}\delta^{3/2}E|Z|^{3}.$

Dividing both sides by $\delta$ and then taking $\delta\downarrow 0$ we obtain the claim. ∎

We prove one more result that we use to prove Donsker’s theorem.1111 11 Ioannis Karatzas and Steven E. Shreve, Brownian Motion and Stochastic Calculus, second ed., p. 69, Lemma 4.19.

###### Lemma 9.

For $T>0$ and $\epsilon>0$,

 $\lim_{\delta\downarrow 0}\limsup_{n\to\infty}P\left(\max_{0\leq k\leq[nT]+1}% \max_{1\leq j\leq[n\delta]+1}|S_{j+k}-S_{k}|>n^{1/2}\epsilon\right)=0.$
###### Proof.

For $0<\delta\leq T$, let $m=\lceil T/\delta\rceil$, so $T/m<\delta\leq T/(m-1)$. Then

 $\lim_{n\to\infty}\frac{[nT]+1}{[n\delta]+1}=\frac{T}{\delta}

so for all $n\geq n_{\delta}$ it is the case that $[nT]+1<([n\delta]+1)m$. Suppose that $\omega\in\Omega$ is such that there are $1\leq j\leq[n\delta]+1$ and $0\leq k\leq[nT]+1$ satisfying

 $|S_{j+k}(\omega)-S_{k}(\omega)|>n^{1/2}\epsilon,$

and then let $p=[k/([n\delta]+1)]$, which satisfies $0\leq p\leq m-1$ and

 $([n\delta]+1)p\leq k<([n\delta]+1)(p+1).$

Because $1\leq j\leq[n\delta]+1$, either

 $([n\delta]+1)p

or

 $([n\delta]+1)(p+1)

We separate the first case into the cases

 $|S_{k}(\omega)-S_{([n\delta]+1)p}(\omega)|>\frac{1}{2}n^{1/2}\epsilon$

and

 $|S_{j+k}(\omega)-S_{([n\delta]+1)p}(\omega)|>\frac{1}{2}n^{1/2}\epsilon,$

and we separate the second case into the cases

 $|S_{k}-S_{([n\delta]+1)p}(\omega)|>\frac{1}{3}n^{1/2}\epsilon,$

and

 $|S_{([n\delta]+1)p}(\omega)-S_{([n\delta]+1)(p+1)}(\omega)|>\frac{1}{3}n^{1/2}\epsilon,$

and

 $|S_{([n\delta]+1)(p+1)}(\omega)-S_{([n+\delta]+1)(p+2)}(\omega)|>\frac{1}{3}n^% {1/2}\epsilon.$

It follows that1212 12 This should be worked out more carefully. In Karatzas and Shreve, there is $m+1$ where I have $m$.

 $\begin{split}&\displaystyle\left\{\max_{1\leq j\leq[n\delta]+1}\max_{0\leq k% \leq[nT]+1}|S_{j+k}-S_{k}|>n^{1/2}\epsilon\right\}\\ \displaystyle\subset&\displaystyle\bigcup_{p=0}^{m-1}\left\{\max_{1\leq j\leq[% n\delta]+1}|S_{j+([n\delta]+1)p}-S_{([n\delta]+1)p}|>\frac{1}{3}n^{1/2}% \epsilon\right\}.\end{split}$

For $0\leq p\leq m-1$,

 $\begin{split}&\displaystyle P\left(\max_{1\leq j\leq[n\delta]+1}|S_{j+([n% \delta]+1)p}-S_{([n\delta]+1)p}|>\frac{1}{3}n^{1/2}\epsilon\right)\\ \displaystyle\leq&\displaystyle P\left(\max_{1\leq j\leq[n\delta]+1}|S_{j}|>% \frac{1}{3}n^{1/2}\epsilon\right),\end{split}$

so

 $\begin{split}&\displaystyle P\left\{\max_{1\leq j\leq[n\delta]+1}\max_{0\leq k% \leq[nT]+1}|S_{j+k}-S_{k}|>n^{1/2}\epsilon\right\}\\ \displaystyle\leq&\displaystyle\sum_{p=0}^{m-1}P\left(\max_{1\leq j\leq[n% \delta]+1}|S_{j}|>\frac{1}{3}n^{1/2}\epsilon\right)\\ \displaystyle=&\displaystyle mP\left(\max_{1\leq j\leq[n\delta]+1}|S_{j}|>% \frac{1}{3}n^{1/2}\epsilon\right).\end{split}$

Lemma 8 tells us

 $\lim_{\delta\downarrow 0}\limsup_{n\to\infty}\frac{1}{\delta}P\left(\max_{1% \leq j\leq[n\delta]+1}|S_{j}|>\frac{1}{3}n^{1/2}\epsilon\right)=0,$

and because $m\leq\frac{T}{\delta}+1=\frac{T+\delta}{\delta}$,

 $\lim_{\delta\downarrow 0}\limsup_{n\to\infty}P\left\{\max_{1\leq j\leq[n\delta% ]+1}\max_{0\leq k\leq[nT]+1}|S_{j+k}-S_{k}|>n^{1/2}\epsilon\right\}=0,$

proving the claim. ∎

In the following, $P_{n}\in\mathscr{P}_{E}$ denotes the pushforward measure of $P$ by $X^{(n)}$, for $X^{(n)}$ defined in (1). We now prove Donsker’s theorem.1313 13 Ioannis Karatzas and Steven E. Shreve, Brownian Motion and Stochastic Calculus, second ed., p. 70, Theorem 4.20.

###### Theorem 10 (Donsker’s theorem).

$P_{n}\to W$.

###### Proof.

We shall use Theorem 3 to prove that $\Gamma=\{P_{n}:n\geq 1\}$ is relatively compact in $\mathscr{P}_{E}$. For $n\geq 1$,

 $P_{n}(f\in E:|f(0)|=0)=P(\omega\in\Omega:|X_{0}^{(n)}(\omega)|=0)=1,$

thus the first condition of Theorem 3 is satisfied with $M_{\epsilon}=0$. For the second condition of Theorem 3 to be satisfied it suffices that for each $\epsilon>0$,

 $\lim_{\delta\downarrow 0}\limsup_{n\to\infty}P\left(\sup_{0\leq s,t\leq 1,|s-t% |\leq\delta}|X^{(n)}(s)-X^{(n)}(t)|>\epsilon\right)=0.$

Now,

 $P\left(\sup_{0\leq s,t\leq 1,|s-t|\leq\delta}|X^{(n)}_{s}-X^{(n)}_{t}|>% \epsilon\right)=P\left(\sup_{0\leq s,t\leq n,|s-t|\leq n\delta}|Y_{s}-Y_{t}|>n% ^{1/2}\epsilon\right).$

Also,

 $\displaystyle\sup_{0\leq s,t\leq n,|s-t|\leq n\delta}|Y_{s}-Y_{t}|$ $\displaystyle\leq\sup_{0\leq s,t\leq n,|s-t|\leq n\delta}|Y-s-Y_{t}|$ $\displaystyle\leq\max_{1\leq j\leq[n\delta]+1}\max_{0\leq k\leq n+1}|S_{j+k}-S% _{k}|,$

so applying Lemma 9,

 $\begin{split}&\displaystyle\lim_{\delta\downarrow 0}\limsup_{n\to\infty}P\left% (\sup_{0\leq s,t\leq 1,|s-t|\leq\delta}|X^{(n)}_{s}-X^{(n)}_{t}|>\epsilon% \right)\\ \displaystyle\leq&\displaystyle\lim_{\delta\downarrow 0}\limsup_{n\to\infty}P% \left(\max_{1\leq j\leq[n\delta]+1}\max_{0\leq k\leq n+1}|S_{j+k}-S_{k}|>n^{1/% 2}\epsilon\right)\\ \displaystyle\to&\displaystyle 0,\end{split}$

from which we get that $\Gamma$ is tight in $\mathscr{P}_{E}$. ∎