Markov kernels, convolution semigroups, and projective families of probability measures

Jordan Bell

June 12, 2015

1 Transition kernels

For a measurable space $(E,\mathscr{E})$ , we denote by $\mathscr{E}_{+}$ the set of functions $E\to[0,\infty]$ that are $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$ measurable. It can be proved that if $I:\mathscr{E}_{+}\to[0,\infty]$ is a function such that (i) $f=0$ implies that $I(f)=0$ , (ii) if $f,g\in\mathscr{E}_{+}$ and $a,b\geq 0$ then $I(af+bg)=aI(f)+bI(g)$ , and (iii) if $f_{n}$ is a sequence in $\mathscr{E}_{+}$ that increases pointwise to an element $f$ of $\mathscr{E}_{+}$ then $I(f_{n})$ increases to $I(f)$ , then there a unique measure $\mu$ on $\mathscr{E}$ such that $I(f)=\mu f$ for each $f\in\mathscr{E}_{+}$ .¹¹ 1 Erhan Çinlar, Probability and Stochastics, p. 28, Theorem 4.21.

Let $(E,\mathscr{E})$ and $(F,\mathscr{F})$ be a measurable space. A transition kernel is a function

K:E\times\mathscr{F}\to[0,\infty]

such that (i) for each $x\in E$ , the function $K_{x}:\mathscr{F}\to[0,\infty]$ defined by

B\mapsto K(x,B)

is a measure on $\mathscr{F}$ , and (ii) for each $B\in\mathscr{F}$ , the map

x\mapsto K(x,B)

is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$ .

If $\mu$ is a measure on $\mathscr{E}$ , define

(K_{*}\mu)(B)=\int_{E}K(x,B)d\mu(x),\qquad B\in\mathscr{F}.

If $B_{n}$ are pairwise disjoint elements of $\mathscr{F}$ , then using that $B\mapsto K(x,B)$ is a measure and the monotone convergence theorem,

	$\displaystyle(K_{*}\mu)\left(\bigcup_{n}B_{n}\right)$	$\displaystyle=\int_{E}K\left(x,\bigcup_{n}B_{n}\right)d\mu(x)$
		$\displaystyle=\int_{E}\sum_{n}K(x,B_{n})d\mu(x)$
		$\displaystyle=\sum_{n}\int_{E}K(x,B_{n})d\mu(x)$
		$\displaystyle=\sum_{n}(K_{*}\mu)(B_{n}),$

showing that $K_{*}\mu$ is a measure on $\mathscr{F}$ .

If $f\in\mathscr{F}_{+}$ , define $K^{*}f:E\to[0,\infty]$ by

(K^{*}f)(x)=\int_{F}f(y)dK_{x}(y),\qquad x\in E.

(1)

For $\phi=\sum_{j=1}^{k}b_{j}1_{B_{j}}$ with $b_{j}\geq 0$ and $B_{j}\in\mathscr{F}$ , because $x\mapsto K(x,B_{j})$ is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$ for each $j$ ,

(K^{*}\phi)(x)=\int_{F}\sum_{j=1}^{k}b_{j}1_{B_{j}}(y)dK_{x}(y)=\sum_{j=1}^{k}% b_{j}K_{x}(B_{j})=\sum_{j=1}b_{j}K(x,B_{j}),

is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$ . For $f\in\mathscr{F}_{+}$ , there is a sequence of simple functions $\phi_{n}$ with $0\leq\phi_{1}\leq\phi_{2}\leq\cdots$ that converges pointwise to $f$ ,²² 2 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed., p. 47, Theorem 2.10. and then by the monotone convergence theorem, for each $x\in E$ we have

(K^{*}\phi_{n})(x)=\int_{F}\phi_{n}(y)dK_{x}(y)\to\int_{F}f(y)dK_{x}(y)=(K^{*}% f)(x),

showing $K^{*}\phi_{n}$ converges pointwise to $K^{*}f$ , and because each $K^{*}\phi_{n}$ is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$ , $K^{*}f$ is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$ .³³ 3 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed., p. 45, Proposition 2.7. Therefore, if $f\in\mathscr{F}_{+}$ then $K^{*}f\in\mathscr{E}_{+}$ . In particular, if $K$ is a transition kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ ,

(K^{*}1_{B})(x)=\int_{F}1_{B}(y)dK_{x}(y)=K_{x}(B)=K(x,B),\qquad x\in E,\quad B% \in\mathscr{F}.

(2)

The following gives conditions under which (2) defines a transition kernel.⁴⁴ 4 Heinz Bauer, Probability Theory, p. 308, Lemma 36.2.

Lemma 1.

Suppose that $N:\mathscr{F}_{+}\to\mathscr{E}_{+}$ satisfies the following properties:

1.

$N(0)=0$ .
2.

$N(af+bg)=aN(f)+bN(g)$ for $f,g\in\mathscr{F}_{+}$ and $a,b\geq 0$ .
3.

If $f_{n}$ is a sequence in $\mathscr{F}_{+}$ increasing to $f\in\mathscr{F}_{+}$ , then $N(f_{n})\uparrow N(f)$ .

Then

K(x,B)=(N(1_{B}))(x),\qquad x\in E,\quad B\in\mathscr{F},

is a transition kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ . $K$ is the unique transition kernel satisfying

K^{*}f=N(f),\qquad f\in\mathscr{F}_{+}.

If $K$ is a transition kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ and $L$ is a transition kernel from $(F,\mathscr{F})$ to $(G,\mathscr{G})$ , the function $K^{*}\circ L^{*}:\mathscr{G}_{+}\to\mathscr{E}_{+}$ satisfies (i) $(K^{*}\circ L^{*})(0)=K^{*}(0)=0$ , (ii) if $f,g\in\mathscr{G}_{+}$ and $a,b\geq 0$ ,

	$\displaystyle(K^{}\circ L^{})(af+bg)$	$\displaystyle=K^{}(aL^{}(f)+bL^{*}(g))$
		$\displaystyle=aK^{}(L^{}(f))+K^{}(L^{}(g))$
		$\displaystyle=a(K^{}\circ L^{})(f)+b(K^{}\circ L^{})(g),$

and (iii) if $f_{n}\uparrow f$ in $\mathscr{G}_{+}$ , then by the monotone convergence theorem, $L^{*}(f_{n})\uparrow L^{*}(f)$ , and then again applying the monotone convergence theorem, $K^{*}(L^{*}(f_{n}))\uparrow K^{*}(L^{*}(f))$ , i.e.

(K^{*}\circ L^{*})(f_{n})\uparrow(K^{*}\circ L^{*})(f).

Therefore, from Lemma 1 we get that there is a unique transition kernel from $(E,\mathscr{E})$ to $(G,\mathscr{G})$ , denoted $K L$ and called the product of $K$ and $L$ , such that

(KL)^{*}f=(K^{*}\circ L^{*})(f),\qquad f\in\mathscr{G}_{+}.

For $f\in\mathscr{G}_{+}$ and $x\in E$ ,

	$\displaystyle(KL)^{*}(f)(x)$	$\displaystyle=(K^{}(L^{}f))(x)$
		$\displaystyle=\int_{F}(L^{*}f)(y)dK_{x}(y)$
		$\displaystyle=\int_{F}\left(\int_{G}f(z)dL_{y}(z)\right)dK_{x}(y).$

In particular, for $C\in\mathscr{G}$ ,

(KL)^{*}(1_{C})(x)=\int_{F}L_{y}(C)dK_{x}(y)=\int_{F}L(y,C)dK_{x}(y).

(3)

2 Markov kernels

A Markov kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ is a transition kernel $K$ such that for each $x\in E$ , $K_{x}$ is a probability measure on $\mathscr{F}$ . The unit kernel from $(E,\mathscr{E})$ to $(E,\mathscr{E})$ is

I(x,A)=\delta_{x}(A).

(4)

It is apparent that the unit kernel is a Markov kernel.

If $K$ is a Markov kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ and $L$ is a Markov kernel from $(F,\mathscr{F})$ to $(G,\mathscr{G})$ , then for $x\in E$ , by (3) we have

(KL)^{*}(1_{G})(x)=\int_{F}dK_{x}(y)=K_{x}(F)=K(x,F)=1,

and thus by (2),

(KL)_{x}(G)=(KL)(x,G)=1,

showing that for each $x\in E$ , $(KL)_{x}$ is a probability measure. Therefore, the product of two Markov kernels is a Markov kernel.

Let $(E,\mathscr{E})$ be a measurable space and let

B_{b}(\mathscr{E})

be the set of bounded functions $E\to\mathbb{R}$ that are measurable $\mathscr{E}\to\mathscr{B}_{\mathbb{R}}$ . $B_{b}(\mathscr{E})$ is a Banach space with the uniform norm

\left\|f\right\|_{u}=\sup_{x\in E}|f(x)|.

For $K$ a Markov kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ and for $f\in B_{b}(\mathscr{F})$ , define $K^{*}f:E\to\mathbb{R}$ by

(K^{*}f)(x)=\int_{F}f(y)dK_{x}(y),\qquad x\in E,

for which

|(K^{*}f)(x)|\leq\int_{F}|f(y)|dK_{x}(y)\leq\left\|f\right\|_{u}K_{x}(F)=\left% \|f\right\|_{u},

showing that $\left\|K^{*}f\right\|_{u}\leq\left\|f\right\|_{u}$ . Furthermore, there is a sequence of simple functions $\phi_{n}\in B_{b}(\mathscr{F})$ that converges to $f$ in the norm $\left\|\cdot\right\|_{u}$ .⁵⁵ 5 V. I. Bogachev, Measure Theory, p. 108, Lemma 2.1.8. For $x\in E$ , by the dominated convergence theorem we get that

(K^{*}\phi_{n})(x)=\int_{F}\phi_{n}(y)dK_{x}(y)\to\int_{F}f(y)dK_{x}(y)=(K^{*}% f)(x).

Each $K^{*}\phi_{n}$ is measurable $\mathscr{E}\to\mathscr{B}_{\mathbb{R}}$ , hence $K^{*}f$ is measurable $\mathscr{E}\to\mathscr{B}_{\mathbb{R}}$ and so belongs to $B_{b}(\mathscr{E})$ .

3 Markov semigroups

Let $(E,\mathscr{E})$ be a measurable space and for each $t\geq 0$ , let $P_{t}$ be a Markov kernel from $(E,\mathscr{E})$ to $(E,\mathscr{E})$ . We say that the family $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a Markov semigroup if

P_{s+t}=P_{s}P_{t},\qquad s,t\in\mathbb{R}_{\geq 0}.

For $x\in E$ and $A\in\mathscr{E}$ and for $s,t\geq 0$ , by (2) and (3),

(P_{s}P_{t})(x,A)=((P_{s}P_{t})^{*}1_{A})(x)=\int_{E}P_{t}(y,A)d(P_{s})_{x}(y)

Thus

P_{s+t}(x,A)=\int_{E}P_{t}(y,A)d(P_{s})_{x}(y),

(5)

called the Chapman-Kolmogorov equation.

4 Infinitely divisible distributions

Let $\mathscr{P}(\mathbb{R}^{d})$ be the collection of Borel probability measures on $\mathbb{R}^{d}$ . For $\mu\in\mathscr{P}(\mathbb{R}^{d})$ , its characteristic function $\tilde{\mu}:\mathbb{R}^{d}\to\mathbb{C}$ is defined by

\tilde{\mu}(x)=\int_{\mathbb{R}^{d}}e^{i\left\langle x,y\right\rangle}d\mu(y).

$\tilde{\mu}$ is uniformly continuous on $\mathbb{R}^{d}$ and $|\tilde{\mu}(x)|\leq\tilde{\mu}(0)=1$ for all $x\in\mathbb{R}^{d}$ .⁶⁶ 6 Heinz Bauer, Probability Theory, p. 183, Theorem 22.3. For $\mu_{1},\ldots,\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$ , let $\mu$ be their convolution:

\mu=\mu_{1}*\cdots*\mu_{n},

which for $A$ a Borel set in $\mathbb{R}^{d}$ is defined by

\mu(A)=\int_{(\mathbb{R}^{d})^{n}}1_{A}(x_{1}+\cdots+x_{n})d(\mu_{1}\times% \cdots\times\mu_{n})(x_{1},\ldots,x_{n}).

One computes that⁷⁷ 7 Heinz Bauer, Probability Theory, p. 184, Theorem 22.4.

\tilde{\mu}=\tilde{\mu}_{1}\cdots\tilde{\mu}_{n}.

An element $\mu$ of $\mathscr{P}(\mathbb{R}^{d})$ is called infinitely divisible if for each $n\geq 1$ , there is some $\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$ such that

\mu=\underbrace{\mu_{n}*\cdots*\mu_{n}}_{n}.

(6)

Thus,

\tilde{\mu}=(\tilde{\mu}_{n})^{n}.

(7)

On the other hand, if $\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$ is such that (7) is true, then because the characteristic function of $\mu_{n}*\cdots*\mu_{n}$ is $(\tilde{\mu}_{n})^{n}$ and the characteristic function of $\mu$ is $\tilde{\mu}$ and these are equal, it follows that $\mu_{n}*\cdots*\mu_{n}$ and $\mu$ are equal.

The following theorem is useful for doing calculations with the characteristic function of an infinitely divisible distribution.⁸⁸ 8 Heinz Bauer, Probability Theory, p. 246, Theorem 29.2.

Theorem 2.

Suppose that $\mu$ is an infinitely divisible distribution on $\mathbb{R}^{d}$ . First,

\tilde{\mu}(x)\neq 0,\qquad x\in\mathbb{R}^{d}.

Second, there is a unqiue continuous function $\phi:\mathbb{R}^{d}\to\mathbb{R}$ satisfying $\phi(0)=0$ and

\tilde{\mu}=|\tilde{\mu}|e^{i\phi}.

Third, for each $n\geq 1$ , there is a unique $\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$ for which $\mu=\mu_{n}*\cdots*\mu_{n}$ . The characteristic function of this unique $\mu_{n}$ is

\tilde{\mu}_{n}=|\tilde{\mu}|^{\frac{1}{n}}e^{i\frac{\phi}{n}}.

A convolution semigroup is a family $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ of elements of $\mathscr{P}(\mathbb{R}^{d})$ such that for $s,t\in\mathbb{R}_{\geq 0}$ ,

\mu_{s+t}=\mu_{s}*\mu_{t}.

The convolution semigroup is called continuous when $t\mapsto\mu_{t}$ is continuous $\mathbb{R}_{\geq 0}\to\mathscr{P}(\mathbb{R}^{d})$ , where $\mathscr{P}(\mathbb{R}^{d})$ has the narrow topology.

The following theorem connects convolution semigroups and infinitely divisible distributions.⁹⁹ 9 Heinz Bauer, Probability Theory, p. 248, Theorem 29.6.

Theorem 3.

If $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup on $\mathscr{B}_{\mathbb{R}^{d}}$ , then for each $t$ , the measure $\mu_{t}$ is infinitely divisible.

If $\mu\in\mathscr{P}(\mathbb{R}^{d})$ is infinitely divisible and $t_{0}>0$ , then there is a unique continuous convolution semigroup $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ such that $\mu_{t_{0}}=\mu$ .

It follows from the above theorem that for a convolution semigroup $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ on $\mathscr{B}_{\mathbb{R}^{d}}$ , $\mu_{1}$ is infinitely divisible and therefore by Theorem 2, $\tilde{\mu}_{1}(x)\neq 0$ for all $x$ . But $\mu_{0}*\mu_{1}=\mu_{1}$ , so $\tilde{\mu}_{0}\tilde{\mu}_{1}=\tilde{\mu}_{1}$ , and $\tilde{\mu}_{0}(x)=1$ for each $x$ . But $\tilde{\delta}_{0}(x)=1$ for all $x$ , so

\mu_{0}=\delta_{0}.

(8)

5 Translation-invariant semigroups

Let $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ be a Markov semigroup on $(\mathbb{R}^{d},\mathscr{B}_{\mathbb{R}^{d}})$ . We say that $(P_{t})_{t\in\mathbb{R}}$ is translation-invariant if for all $x,y\in\mathbb{R}^{d}$ , $A\in\mathscr{B}_{\mathbb{R}^{d}}$ , and $t\in\mathbb{R}_{\geq 0}$ ,

P_{t}(x,A)=P_{t}(x+y,A+y).

In this case, for $t\geq 0$ and for $A\in\mathscr{B}_{\mathbb{R}^{d}}$ , define

\mu_{t}(A)=P_{t}(0,A).

Each $\mu_{t}$ is a probability measure on $\mathscr{B}_{\mathbb{R}^{d}}$ , and

\mu_{t}(A-x)=P_{t}(0,A-x)=P_{t}(x,(A-x)+x)=P_{t}(x,A).

Using that the Chapman-Kolmogorov equation (5) and as $(P_{s})_{0}(B)=P_{s}(0,B)=\mu_{s}(B)$ ,

	$\displaystyle\mu_{s+t}(A)$	$\displaystyle=P_{s+t}(0,A)$
		$\displaystyle=\int_{\mathbb{R}^{d}}P_{t}(y,A)d(P_{s})_{0}(y)$
		$\displaystyle=\int_{\mathbb{R}^{d}}\mu_{t}(A-y)d\mu_{s}(y)$
		$\displaystyle=(\mu_{t}*\mu_{s})(A),$

showing that $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup on $\mathscr{B}_{\mathbb{R}^{d}}$ .

On the other hand, if $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup of probability measures on $\mathscr{B}_{\mathbb{R}^{d}}$ , for $t\geq 0$ , $x\in\mathbb{R}^{d}$ , and $A\in\mathscr{B}_{\mathbb{R}^{d}}$ define

P_{t}(x,A)=\mu_{t}(A-x).

Let $t\geq 0$ . For $x\in\mathbb{R}^{d}$ , the map $A\mapsto P_{t}(x,A)=\mu_{t}(A-x)$ is a probability measure on $\mathscr{B}_{\mathbb{R}^{d}}$ . The map $(x,y)\mapsto x+y$ is continuous $\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}^{d}$ , and for $A\in\mathscr{B}_{\mathbb{R}^{d}}$ , the map $1_{A}:\mathbb{R}^{d}\to\mathbb{R}$ is measurable $\mathscr{B}_{\mathbb{R}^{d}}\to\mathscr{B}_{\mathbb{R}}$ . Hence, as $\mathscr{B}_{\mathbb{R}^{d}\times\mathbb{R}^{d}}=\mathscr{B}_{\mathbb{R}^{d}}% \otimes\mathscr{B}_{\mathbb{R}^{d}}$ , the map $(x,y)\mapsto 1_{A}(x+y)$ is measurable $\mathscr{B}_{\mathbb{R}^{d}}\otimes\mathscr{B}_{\mathbb{R}^{d}}\to\mathscr{B}_% {\mathbb{R}}$ . Thus by Fubini’s theorem,

x\mapsto\int_{\mathbb{R}^{d}}1_{A}(x+y)d\mu_{t}(y)=\int_{\mathbb{R}^{d}}1_{A-x% }(y)d\mu_{t}(y)=\mu_{t}(A-x)

is measurable $\mathscr{B}_{\mathbb{R}^{d}}\to\mathscr{B}_{\mathbb{R}}$ . Hence $P_{t}$ is a Markov kernel, and thus $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a translation-invariant Markov semigroup.

Define $S:\mathbb{R}^{d}\to\mathbb{R}^{d}$ by $S(x)=-x$ . For $\mu,\nu\in\mathscr{P}(\mathbb{R}^{d})$ ,

	$\displaystyle S_{}(\mu\nu)(A)$	$\displaystyle=(\mu*\nu)(-A)$
		$\displaystyle=\int_{\mathbb{R}^{d}}\mu(-A-y)d\nu(y)$
		$\displaystyle=\int_{\mathbb{R}^{d}}\mu(-A+y)d\overline{\nu}(y)$
		$\displaystyle=\int_{\mathbb{R}^{d}}\overline{\mu}(A-y)d\overline{\nu}(y)$
		$\displaystyle=(\overline{\mu}*\overline{\nu})(A),$

thus

S_{*}(\mu*\nu)=(S_{*}\mu)*(S_{*}\nu).

(9)

For $\mu\in\mathscr{P}(\mathbb{R}^{d})$ , write

\overline{\mu}=S_{*}\mu\in\mathscr{P}(\mathbb{R}^{d}),

i.e.,

\overline{\mu}(A)=\mu(S^{-1}(A))=\mu(S(A))=\mu(-A).

We calculate

(P_{t}^{*}1_{A})(x)=P_{t}(x,A)=\mu_{t}(A-x)=\int_{\mathbb{R}^{d}}1_{A}(x+y)d% \mu_{t}(y).

Then if $f$ is a simple function, $f=\sum_{k}a_{k}1_{A_{k}}$ ,

(P_{t}^{*}f)(x)=\sum_{k}a_{k}\int_{\mathbb{R}^{d}}1_{A_{k}}(x+y)d\mu_{t}(y)=% \int_{\mathbb{R}^{d}}f(x+y)d\mu_{t}(y).

For $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$ , there is a sequence of simple functions $f_{n}$ that converge to $f$ in the uniform norm, and then by the dominated convergence theorem we get

(P_{t}^{*}f)(x)=\int_{\mathbb{R}^{d}}f(x+y)d\mu_{t}(y).

But

	$\displaystyle\int_{\mathbb{R}^{d}}f(x+y)d\mu_{t}(y)$	$\displaystyle=\int_{\mathbb{R}^{d}}f(x+S(S(y)))d\mu_{t}(y)$
		$\displaystyle=\int_{\mathbb{R}^{d}}f(x+S(y))d(S_{*}\mu_{t})(y)$
		$\displaystyle=\int_{\mathbb{R}^{d}}f(x-y)d\overline{\mu}_{t}(y)$
		$\displaystyle=(f*\overline{\mu}_{t})(x).$

Therefore for $t\geq 0$ and $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$ ,

P_{t}^{*}f=f*\overline{\mu}_{t}.

(10)

For $s,t\geq 0$ and $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$ , by (10), the fact that $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup, and (9), we get

	$\displaystyle P_{s+t}^{*}f$	$\displaystyle=f(S_{}\mu_{s+t})$
		$\displaystyle=f(S_{}(\mu_{s}*\mu_{t}))$
		$\displaystyle=f((S_{}\mu_{s})(S_{}\mu_{t}))$
		$\displaystyle=(f(S_{}\mu_{s}))(S_{}\mu_{t})$
		$\displaystyle=(P_{s}^{}f)(S_{*}\mu_{t})$
		$\displaystyle=P_{t}^{}(P_{s}^{}f).$

This shows that $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a Markov semigroup. Moreover, by (8) it holds that $\mu_{0}=\delta_{0}$ , and hence

P_{0}(x,A)=\mu_{0}(A-x)=\delta_{0}(A-x)=\delta_{x}(A).

Namely, $P_{0}$ is the unit kernel (4).

If $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup and some $\mu_{t}$ has density $q_{t}$ with respect to Lebesgue measure $\lambda_{d}$ on $\mathbb{R}^{d}$ ,

\mu_{t}=q_{t}\lambda_{d},

then writing $\overline{q}_{t}(x)=q_{t}(-x)$ , for $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$ by (10) we have

(P_{t}^{*}f)(x)=(f*\overline{\mu}_{t})(x)=\int_{\mathbb{R}^{d}}f(x-y)d% \overline{\mu}_{t}(y)=\int_{\mathbb{R}^{d}}f(x+y)q_{t}(y)d\lambda_{d}(y)

P_{t}*f=f*\overline{q}_{t}.

(11)

6 The Brownian semigroup

For $a\in\mathbb{R}$ and $\sigma>0$ , let $\gamma_{a,\sigma^{2}}$ be the Gaussian measure on $\mathbb{R}$ , the probability measure on $\mathbb{R}$ whose density with respect to Lebesgue measure is

p(x,a,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(x-a)^{2}}{2% \sigma^{2}}\right).

For $\sigma=0$ , let

\gamma_{a,0}=\delta_{a}.

Define for $t\in\mathbb{R}_{\geq 0}$ ,

\mu_{t}=\prod_{k=1}^{d}\gamma_{0,t},

which is an element of $\mathscr{P}(\mathbb{R}^{d})$ . For $s,t\in\mathbb{R}_{\geq 0}$ , we calculate

\mu_{s}*\mu_{t}=\left(\prod_{k=1}^{d}\gamma_{0,s}\right)*\left(\prod_{k=1}^{d}% \gamma_{0,t}\right)=\prod_{k=1}^{d}(\gamma_{0,s}*\gamma_{0,t})=\prod_{k=1}^{d}% \gamma_{0,s+t}=\mu_{s+t}.

Lévy’s continuity theorem states that if $\nu_{n}$ is a sequence in $\mathscr{P}(\mathbb{R}^{d})$ and there is some $\phi:\mathbb{R}^{d}\to\mathbb{C}$ that is continuous at $0$ and to which $\tilde{\nu}_{n}$ converges pointwise, then there is some $\nu\in\mathscr{P}(\mathbb{R}^{d})$ such that $\phi=\tilde{\nu}$ and such that $\nu_{n}\to\nu$ narrowly. But for $t\in\mathbb{R}_{\geq 0}$ and $x\in\mathbb{R}^{d}$ , we calculate

\tilde{\mu}_{t}(x)=\int_{\mathbb{R}^{d}}e^{i\left\langle x,y\right\rangle}d\mu% _{t}(y)=\exp\left(-\frac{t|x|^{2}}{2}\right).

(12)

Let $\phi(x)=1$ for all $x$ , for which $\tilde{\delta}_{0}=\phi$ . For $t_{n}\in\mathbb{R}_{\geq 0}$ tending to $0$ , let $\nu_{n}=\mu_{t_{n}}$ . Then by (12), $\tilde{\nu}_{n}$ converges pointwise to $\phi$ , so by Lévy’s continuity theorem, $\nu_{n}$ converges narrowly to $\delta_{0}$ . Moreover, because $\mathbb{R}^{d}$ is a Polish space, $\mathscr{P}(\mathbb{R}^{d})$ is a Polish space, and in particular is metrizable. It thus follows that $\mu_{t}$ converges narrowly to $\delta_{0}$ as $t\to 0$ . It then follows that $t\mapsto\mu_{t}$ is continuous $\mathbb{R}_{\geq 0}\to\mathscr{P}(\mathbb{R}^{d})$ . Summarizing, $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a continuous convolution semigroup.

For $t>0$ , $\mu_{t}$ has density

g_{t}(x)=\prod_{j=1}^{d}(2\pi t)^{-1/2}e^{-\frac{x_{j}^{2}}{2t}}=(2\pi t)^{-d/% 2}e^{-\frac{|x|^{2}}{2t}}

with respect to Lebesgue measure $\lambda_{d}$ on $\mathbb{R}^{d}$ . For $t\geq 0$ , let

P_{t}(x,A)=\mu_{t}(A-x).

We have established that $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a translation-invariant Markov semigroup for which $P_{0}(x,A)=\delta_{x}(A)$ . We call $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ the Brownian semigroup. For $t>0$ and $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$ , because $\overline{g}_{t}=g_{t}$ we have by (11),

(P_{t}f)(x)=(f*g_{t})(x)=(2\pi t)^{-d/2}\int_{\mathbb{R}^{d}}f(x-y)e^{-\frac{|% y|^{2}}{2t}}d\lambda_{d}(y).

7 Projective families

For a nonempty set $I$ , let $\mathscr{K}(I)$ denote the family of finite nonempty subsets of $I$ . We speak in this section about projective families of probability measures.

The following theorem shows how to construct a projective family from a Markov semigroup on a measurable space and a probability measure on this measurable space.¹⁰¹⁰ 10 Heinz Bauer, Probability Theory, p. 314, Theorem 36.4.

Theorem 4.

Let $I=\mathbb{R}_{\geq 0}$ , let $(E,\mathscr{E})$ be a measurable space, let $(P_{t})_{t\in I}$ be a Markov semigroup on $\mathscr{E}$ , and let $\mu$ be a probability measure on $\mathscr{E}$ . For $J\in\mathscr{K}(I)$ , with elements $t_{1}<\cdots<t_{n}$ , and for $A\in\mathscr{E}^{J}$ , let

P_{J}(A)=\underbrace{\int_{E}\int_{E}\cdots\int_{E}}_{n+1}1_{A}(x_{1},\ldots,x% _{n})d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{1}})_{x_{0}}(x_{1})d% \mu(x_{0}).

Then $(P_{J})_{J\in\mathscr{K}(I)}$ is a projective family of probability measures.

Proof.

Let $A_{k}$ be pairwise disjoint elements of $\mathscr{E}^{J}$ , and call their union $A$ . Then $1_{A}=\sum_{k}1_{A_{k}}$ , and applying the monotone convergence theorem $n+1$ times,

\begin{split}&\displaystyle\underbrace{\int_{E}\int_{E}\cdots\int_{E}}_{n+1}1_% {A}(x_{1},\ldots,x_{n})d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{1}}% )_{x_{0}}(x_{1})d\mu(x_{0})\\ \displaystyle=&\displaystyle\sum_{k}\underbrace{\int_{E}\int_{E}\cdots\int_{E}% }_{n+1}1_{A_{k}}(x_{1},\ldots,x_{n})d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})% \cdots d(P_{t_{1}})_{x_{0}}(x_{1})d\mu(x_{0}),\end{split}

i.e.

P_{J}(A)=\sum_{k}P_{J}(A_{k}).

Furthermore, because $(P_{t})_{x}$ is a probability measure for each $t$ and for each $x$ and $\mu$ is a probability measure, we calculate that

P_{J}(E^{J})=1.

Thus, $P_{J}$ is a probability measure on $\mathscr{E}^{J}$ .

To prove that $(P_{J})_{J\in\mathscr{K}(I)}$ is a projective family, it suffices to prove that when $J,K\in\mathscr{K}(I)$ , $J\subset K$ , and $K\setminus J$ is a singleton, then $(\pi_{K,J})_{*}P_{K}=P_{J}$ . Moreover, because (i) the product $\sigma$ -algebra $\mathscr{E}^{J}$ is generated by the collection of cylinder sets, i.e. sets of the form $\prod_{t\in J}A_{t}$ for $A_{t}\in\mathscr{E}$ , and (ii) the intersection of finitely many cylinder sets is a cylinder sets, it is proved using the monotone class theorem that if two probability measures on $\mathscr{E}^{J}$ coincide on the cylinder sets, then they are equal.¹¹¹¹ 11 V. I. Bogachev, Measure Theory, volume I, p. 35, Lemma 1.9.4. Let $t_{1}<\cdots<t_{n}$ be the elements of $J$ . To prove that $(\pi_{K,J})_{*}P_{K}$ and $P_{J}$ are equal, it suffices to prove that for any $A_{1},\ldots,A_{n}\in\mathscr{E}$ ,

(\pi_{K,J})_{*}P_{K}\left(\prod_{j=1}^{n}A_{j}\right)=P_{J}\left(\prod_{j=1}^{% n}A_{j}\right).

Moreover, for $A=\prod_{j=1}^{n}A_{j}$ ,

1_{A}=1_{A_{1}}\otimes\cdots\otimes 1_{A_{n}},

thus

\begin{split}&\displaystyle P_{J}\left(\prod_{j=1}^{n}A_{j}\right)\\ \displaystyle=&\displaystyle\underbrace{\int_{E}\int_{E}\cdots\int_{E}}_{n+1}1% _{A_{1}}(x_{1})\cdots 1_{A_{n}}(x_{n})d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})% \cdots d(P_{t_{1}})_{x_{0}}(x_{1})d\mu(x_{0})\\ \displaystyle=&\displaystyle\int_{E}\int_{A_{1}}\cdots\int_{A_{n}}d(P_{t_{n}-t% _{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{1}})_{x_{0}}(x_{1})d\mu(x_{0}).\end{split}

Let $K\setminus J=\{t^{\prime}\}$ . Either $t^{\prime}<t_{1}$ , or $t^{\prime}>t_{n}$ , or there is some $1\leq j\leq n-1$ for which $t_{j}<t^{\prime}<t_{j+1}$ . Take the case $t^{\prime}<t_{1}$ . Then

\pi_{K,J}^{-1}\left(\prod_{j=1}^{n}A_{j}\right)=\prod_{k=0}^{n}B_{k},

where $B_{0}=E$ and $B_{j}=A_{j}$ for $1\leq j\leq n$ . Then

\begin{split}&\displaystyle(\pi_{K,J})_{*}P_{K}\left(\prod_{j=1}^{n}A_{j}% \right)\\ \displaystyle=&\displaystyle P_{K}\left(\prod_{k=0}^{n}B_{k}\right)\\ \displaystyle=&\displaystyle\int_{E}\int_{E}\int_{A_{1}}\cdots\int_{A_{n}}d(P_% {t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{1}-t^{\prime}})_{x^{\prime}}(x% _{1})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})d\mu(x_{0})\\ \displaystyle=&\displaystyle\int_{E}\int_{E}\int_{A_{1}}f(x_{1})d(P_{t_{1}-t^{% \prime}})_{x^{\prime}}(x_{1})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})d\mu(x_{0}),% \end{split}

for

f(x_{1})=\int_{A_{2}}\cdots\int_{A_{n}}d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})% \cdots d(P_{t_{2}-t_{1}})_{x_{1}}(x_{2}).

By (1) and because $(P_{t})_{t\in I}$ is a Markov semigroup,

\begin{split}&\displaystyle\int_{E}\int_{A_{1}}f(x_{1})d(P_{t_{1}-t^{\prime}})% _{x^{\prime}}(x_{1})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})\\ \displaystyle=&\displaystyle\int_{E}\int_{E}f(x_{1})1_{A_{1}}(x_{1})d(P_{t_{1}% -t^{\prime}})_{x^{\prime}}(x_{1})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})\\ \displaystyle=&\displaystyle\int_{E}P_{t_{1}-t^{\prime}}^{*}(f1_{A_{1}})(x^{% \prime})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})\\ \displaystyle=&\displaystyle P_{t^{\prime}}^{*}(P_{t_{1}-t^{\prime}}^{*}(f1_{A% _{1}}))(x_{0})\\ \displaystyle=&\displaystyle P_{t_{1}}(f1_{A_{1}})(x_{0})\\ \displaystyle=&\displaystyle\int_{E}f(x_{1})1_{A_{1}}(x_{1})d(P_{t_{1}})_{x_{0% }}(x_{1})\\ \displaystyle=&\displaystyle\int_{A_{1}}f(x_{1})d(P_{t_{1}})_{x_{0}}(x_{1})\\ \displaystyle=&\displaystyle\int_{A_{1}}\int_{A_{2}}\cdots\int_{A_{n}}d(P_{t_{% n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{2}-t_{1}})_{x_{1}}(x_{2})d(P_{t_{1% }})_{x_{0}}(x_{1}).\end{split}

Thus

\begin{split}&\displaystyle(\pi_{K,J})_{*}P_{K}\left(\prod_{j=1}^{n}A_{j}% \right)\\ \displaystyle=&\displaystyle\int_{E}\int_{A_{1}}\int_{A_{2}}\cdots\int_{A_{n}}% d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{2}-t_{1}})_{x_{1}}(x_{2})d% (P_{t_{1}})_{x_{0}}(x_{1})d\mu(x_{0})\\ \displaystyle=&\displaystyle P_{J}\left(\prod_{j=1}^{n}A_{j}\right).\end{split}

This shows that the claim is true in the case $t^{\prime}<t_{1}$ . ∎

Thus, if $E$ is a Polish space with Borel $\sigma$ -algebra $\mathscr{E}$ , let $I=\mathbb{R}_{\geq 0}$ , let $(P_{t})_{t\in I}$ be a Markov semigroup on $\mathscr{E}$ , and let $\mu$ be a probability measure on $\mathscr{E}$ . The above theorem tells us that $(P_{J})_{\mathscr{K}(I)}$ is a projective family, and then the Kolmogorov extension theorem tells us that there is a probability measure¹²¹² 12 We write $P^{\mu}$ to indicate that this measure involves $\mu$ ; it also involves the Markov semigroup, which we do not indicate. $P^{\mu}$ on $\mathscr{E}^{I}$ such that for any $J\in\mathscr{K}(I)$ , ${\pi_{J}}_{*}P^{\mu}=P^{\mu}_{J}$ . This implies that there is a stochastic process $(X_{t})_{t\in I}$ whose finite-dimensional distributions are equal to the probability measures $P_{J}$ defined in Theorem 4 using the Markov semigroup $(P_{t})_{t\in I}$ and the probability measure $\mu$ .

	$\displaystyle(K^{}\circ L^{})(af+bg)$	$\displaystyle=K^{}(aL^{}(f)+bL^{*}(g))$
		$\displaystyle=aK^{}(L^{}(f))+K^{}(L^{}(g))$
		$\displaystyle=a(K^{}\circ L^{})(f)+b(K^{}\circ L^{})(g),$

	$\displaystyle P_{s+t}^{*}f$	$\displaystyle=f(S_{}\mu_{s+t})$
		$\displaystyle=f(S_{}(\mu_{s}*\mu_{t}))$
		$\displaystyle=f((S_{}\mu_{s})(S_{}\mu_{t}))$
		$\displaystyle=(f(S_{}\mu_{s}))(S_{}\mu_{t})$
		$\displaystyle=(P_{s}^{}f)(S_{*}\mu_{t})$
		$\displaystyle=P_{t}^{}(P_{s}^{}f).$