# Markov kernels, convolution semigroups, and projective families of probability measures

Jordan Bell
June 12, 2015

## 1 Transition kernels

For a measurable space $(E,\mathscr{E})$, we denote by $\mathscr{E}_{+}$ the set of functions $E\to[0,\infty]$ that are $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$ measurable. It can be proved that if $I:\mathscr{E}_{+}\to[0,\infty]$ is a function such that (i) $f=0$ implies that $I(f)=0$, (ii) if $f,g\in\mathscr{E}_{+}$ and $a,b\geq 0$ then $I(af+bg)=aI(f)+bI(g)$, and (iii) if $f_{n}$ is a sequence in $\mathscr{E}_{+}$ that increases pointwise to an element $f$ of $\mathscr{E}_{+}$ then $I(f_{n})$ increases to $I(f)$, then there a unique measure $\mu$ on $\mathscr{E}$ such that $I(f)=\mu f$ for each $f\in\mathscr{E}_{+}$.11 1 Erhan Çinlar, Probability and Stochastics, p. 28, Theorem 4.21.

Let $(E,\mathscr{E})$ and $(F,\mathscr{F})$ be a measurable space. A transition kernel is a function

 $K:E\times\mathscr{F}\to[0,\infty]$

such that (i) for each $x\in E$, the function $K_{x}:\mathscr{F}\to[0,\infty]$ defined by

 $B\mapsto K(x,B)$

is a measure on $\mathscr{F}$, and (ii) for each $B\in\mathscr{F}$, the map

 $x\mapsto K(x,B)$

is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$.

If $\mu$ is a measure on $\mathscr{E}$, define

 $(K_{*}\mu)(B)=\int_{E}K(x,B)d\mu(x),\qquad B\in\mathscr{F}.$

If $B_{n}$ are pairwise disjoint elements of $\mathscr{F}$, then using that $B\mapsto K(x,B)$ is a measure and the monotone convergence theorem,

 $\displaystyle(K_{*}\mu)\left(\bigcup_{n}B_{n}\right)$ $\displaystyle=\int_{E}K\left(x,\bigcup_{n}B_{n}\right)d\mu(x)$ $\displaystyle=\int_{E}\sum_{n}K(x,B_{n})d\mu(x)$ $\displaystyle=\sum_{n}\int_{E}K(x,B_{n})d\mu(x)$ $\displaystyle=\sum_{n}(K_{*}\mu)(B_{n}),$

showing that $K_{*}\mu$ is a measure on $\mathscr{F}$.

If $f\in\mathscr{F}_{+}$, define $K^{*}f:E\to[0,\infty]$ by

 $(K^{*}f)(x)=\int_{F}f(y)dK_{x}(y),\qquad x\in E.$ (1)

For $\phi=\sum_{j=1}^{k}b_{j}1_{B_{j}}$ with $b_{j}\geq 0$ and $B_{j}\in\mathscr{F}$, because $x\mapsto K(x,B_{j})$ is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$ for each $j$,

 $(K^{*}\phi)(x)=\int_{F}\sum_{j=1}^{k}b_{j}1_{B_{j}}(y)dK_{x}(y)=\sum_{j=1}^{k}% b_{j}K_{x}(B_{j})=\sum_{j=1}b_{j}K(x,B_{j}),$

is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$. For $f\in\mathscr{F}_{+}$, there is a sequence of simple functions $\phi_{n}$ with $0\leq\phi_{1}\leq\phi_{2}\leq\cdots$ that converges pointwise to $f$,22 2 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed., p. 47, Theorem 2.10. and then by the monotone convergence theorem, for each $x\in E$ we have

 $(K^{*}\phi_{n})(x)=\int_{F}\phi_{n}(y)dK_{x}(y)\to\int_{F}f(y)dK_{x}(y)=(K^{*}% f)(x),$

showing $K^{*}\phi_{n}$ converges pointwise to $K^{*}f$, and because each $K^{*}\phi_{n}$ is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$, $K^{*}f$ is measurable $\mathscr{E}\to\mathscr{B}_{[0,\infty]}$.33 3 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed., p. 45, Proposition 2.7. Therefore, if $f\in\mathscr{F}_{+}$ then $K^{*}f\in\mathscr{E}_{+}$. In particular, if $K$ is a transition kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$,

 $(K^{*}1_{B})(x)=\int_{F}1_{B}(y)dK_{x}(y)=K_{x}(B)=K(x,B),\qquad x\in E,\quad B% \in\mathscr{F}.$ (2)

The following gives conditions under which (2) defines a transition kernel.44 4 Heinz Bauer, Probability Theory, p. 308, Lemma 36.2.

###### Lemma 1.

Suppose that $N:\mathscr{F}_{+}\to\mathscr{E}_{+}$ satisfies the following properties:

1. 1.

$N(0)=0$.

2. 2.

$N(af+bg)=aN(f)+bN(g)$ for $f,g\in\mathscr{F}_{+}$ and $a,b\geq 0$.

3. 3.

If $f_{n}$ is a sequence in $\mathscr{F}_{+}$ increasing to $f\in\mathscr{F}_{+}$, then $N(f_{n})\uparrow N(f)$.

Then

 $K(x,B)=(N(1_{B}))(x),\qquad x\in E,\quad B\in\mathscr{F},$

is a transition kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$. $K$ is the unique transition kernel satisfying

 $K^{*}f=N(f),\qquad f\in\mathscr{F}_{+}.$

If $K$ is a transition kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ and $L$ is a transition kernel from $(F,\mathscr{F})$ to $(G,\mathscr{G})$, the function $K^{*}\circ L^{*}:\mathscr{G}_{+}\to\mathscr{E}_{+}$ satisfies (i) $(K^{*}\circ L^{*})(0)=K^{*}(0)=0$, (ii) if $f,g\in\mathscr{G}_{+}$ and $a,b\geq 0$,

 $\displaystyle(K^{*}\circ L^{*})(af+bg)$ $\displaystyle=K^{*}(aL^{*}(f)+bL^{*}(g))$ $\displaystyle=aK^{*}(L^{*}(f))+K^{*}(L^{*}(g))$ $\displaystyle=a(K^{*}\circ L^{*})(f)+b(K^{*}\circ L^{*})(g),$

and (iii) if $f_{n}\uparrow f$ in $\mathscr{G}_{+}$, then by the monotone convergence theorem, $L^{*}(f_{n})\uparrow L^{*}(f)$, and then again applying the monotone convergence theorem, $K^{*}(L^{*}(f_{n}))\uparrow K^{*}(L^{*}(f))$, i.e.

 $(K^{*}\circ L^{*})(f_{n})\uparrow(K^{*}\circ L^{*})(f).$

Therefore, from Lemma 1 we get that there is a unique transition kernel from $(E,\mathscr{E})$ to $(G,\mathscr{G})$, denoted $KL$ and called the product of $K$ and $L$, such that

 $(KL)^{*}f=(K^{*}\circ L^{*})(f),\qquad f\in\mathscr{G}_{+}.$

For $f\in\mathscr{G}_{+}$ and $x\in E$,

 $\displaystyle(KL)^{*}(f)(x)$ $\displaystyle=(K^{*}(L^{*}f))(x)$ $\displaystyle=\int_{F}(L^{*}f)(y)dK_{x}(y)$ $\displaystyle=\int_{F}\left(\int_{G}f(z)dL_{y}(z)\right)dK_{x}(y).$

In particular, for $C\in\mathscr{G}$,

 $(KL)^{*}(1_{C})(x)=\int_{F}L_{y}(C)dK_{x}(y)=\int_{F}L(y,C)dK_{x}(y).$ (3)

## 2 Markov kernels

A Markov kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ is a transition kernel $K$ such that for each $x\in E$, $K_{x}$ is a probability measure on $\mathscr{F}$. The unit kernel from $(E,\mathscr{E})$ to $(E,\mathscr{E})$ is

 $I(x,A)=\delta_{x}(A).$ (4)

It is apparent that the unit kernel is a Markov kernel.

If $K$ is a Markov kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ and $L$ is a Markov kernel from $(F,\mathscr{F})$ to $(G,\mathscr{G})$, then for $x\in E$, by (3) we have

 $(KL)^{*}(1_{G})(x)=\int_{F}dK_{x}(y)=K_{x}(F)=K(x,F)=1,$

and thus by (2),

 $(KL)_{x}(G)=(KL)(x,G)=1,$

showing that for each $x\in E$, $(KL)_{x}$ is a probability measure. Therefore, the product of two Markov kernels is a Markov kernel.

Let $(E,\mathscr{E})$ be a measurable space and let

 $B_{b}(\mathscr{E})$

be the set of bounded functions $E\to\mathbb{R}$ that are measurable $\mathscr{E}\to\mathscr{B}_{\mathbb{R}}$. $B_{b}(\mathscr{E})$ is a Banach space with the uniform norm

 $\left\|f\right\|_{u}=\sup_{x\in E}|f(x)|.$

For $K$ a Markov kernel from $(E,\mathscr{E})$ to $(F,\mathscr{F})$ and for $f\in B_{b}(\mathscr{F})$, define $K^{*}f:E\to\mathbb{R}$ by

 $(K^{*}f)(x)=\int_{F}f(y)dK_{x}(y),\qquad x\in E,$

for which

 $|(K^{*}f)(x)|\leq\int_{F}|f(y)|dK_{x}(y)\leq\left\|f\right\|_{u}K_{x}(F)=\left% \|f\right\|_{u},$

showing that $\left\|K^{*}f\right\|_{u}\leq\left\|f\right\|_{u}$. Furthermore, there is a sequence of simple functions $\phi_{n}\in B_{b}(\mathscr{F})$ that converges to $f$ in the norm $\left\|\cdot\right\|_{u}$.55 5 V. I. Bogachev, Measure Theory, p. 108, Lemma 2.1.8. For $x\in E$, by the dominated convergence theorem we get that

 $(K^{*}\phi_{n})(x)=\int_{F}\phi_{n}(y)dK_{x}(y)\to\int_{F}f(y)dK_{x}(y)=(K^{*}% f)(x).$

Each $K^{*}\phi_{n}$ is measurable $\mathscr{E}\to\mathscr{B}_{\mathbb{R}}$, hence $K^{*}f$ is measurable $\mathscr{E}\to\mathscr{B}_{\mathbb{R}}$ and so belongs to $B_{b}(\mathscr{E})$.

## 3 Markov semigroups

Let $(E,\mathscr{E})$ be a measurable space and for each $t\geq 0$, let $P_{t}$ be a Markov kernel from $(E,\mathscr{E})$ to $(E,\mathscr{E})$. We say that the family $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a Markov semigroup if

 $P_{s+t}=P_{s}P_{t},\qquad s,t\in\mathbb{R}_{\geq 0}.$

For $x\in E$ and $A\in\mathscr{E}$ and for $s,t\geq 0$, by (2) and (3),

 $(P_{s}P_{t})(x,A)=((P_{s}P_{t})^{*}1_{A})(x)=\int_{E}P_{t}(y,A)d(P_{s})_{x}(y)$

Thus

 $P_{s+t}(x,A)=\int_{E}P_{t}(y,A)d(P_{s})_{x}(y),$ (5)

called the Chapman-Kolmogorov equation.

## 4 Infinitely divisible distributions

Let $\mathscr{P}(\mathbb{R}^{d})$ be the collection of Borel probability measures on $\mathbb{R}^{d}$. For $\mu\in\mathscr{P}(\mathbb{R}^{d})$, its characteristic function $\tilde{\mu}:\mathbb{R}^{d}\to\mathbb{C}$ is defined by

 $\tilde{\mu}(x)=\int_{\mathbb{R}^{d}}e^{i\left\langle x,y\right\rangle}d\mu(y).$

$\tilde{\mu}$ is uniformly continuous on $\mathbb{R}^{d}$ and $|\tilde{\mu}(x)|\leq\tilde{\mu}(0)=1$ for all $x\in\mathbb{R}^{d}$.66 6 Heinz Bauer, Probability Theory, p. 183, Theorem 22.3. For $\mu_{1},\ldots,\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$, let $\mu$ be their convolution:

 $\mu=\mu_{1}*\cdots*\mu_{n},$

which for $A$ a Borel set in $\mathbb{R}^{d}$ is defined by

 $\mu(A)=\int_{(\mathbb{R}^{d})^{n}}1_{A}(x_{1}+\cdots+x_{n})d(\mu_{1}\times% \cdots\times\mu_{n})(x_{1},\ldots,x_{n}).$

One computes that77 7 Heinz Bauer, Probability Theory, p. 184, Theorem 22.4.

 $\tilde{\mu}=\tilde{\mu}_{1}\cdots\tilde{\mu}_{n}.$

An element $\mu$ of $\mathscr{P}(\mathbb{R}^{d})$ is called infinitely divisible if for each $n\geq 1$, there is some $\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$ such that

 $\mu=\underbrace{\mu_{n}*\cdots*\mu_{n}}_{n}.$ (6)

Thus,

 $\tilde{\mu}=(\tilde{\mu}_{n})^{n}.$ (7)

On the other hand, if $\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$ is such that (7) is true, then because the characteristic function of $\mu_{n}*\cdots*\mu_{n}$ is $(\tilde{\mu}_{n})^{n}$ and the characteristic function of $\mu$ is $\tilde{\mu}$ and these are equal, it follows that $\mu_{n}*\cdots*\mu_{n}$ and $\mu$ are equal.

The following theorem is useful for doing calculations with the characteristic function of an infinitely divisible distribution.88 8 Heinz Bauer, Probability Theory, p. 246, Theorem 29.2.

###### Theorem 2.

Suppose that $\mu$ is an infinitely divisible distribution on $\mathbb{R}^{d}$. First,

 $\tilde{\mu}(x)\neq 0,\qquad x\in\mathbb{R}^{d}.$

Second, there is a unqiue continuous function $\phi:\mathbb{R}^{d}\to\mathbb{R}$ satisfying $\phi(0)=0$ and

 $\tilde{\mu}=|\tilde{\mu}|e^{i\phi}.$

Third, for each $n\geq 1$, there is a unique $\mu_{n}\in\mathscr{P}(\mathbb{R}^{d})$ for which $\mu=\mu_{n}*\cdots*\mu_{n}$. The characteristic function of this unique $\mu_{n}$ is

 $\tilde{\mu}_{n}=|\tilde{\mu}|^{\frac{1}{n}}e^{i\frac{\phi}{n}}.$

A convolution semigroup is a family $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ of elements of $\mathscr{P}(\mathbb{R}^{d})$ such that for $s,t\in\mathbb{R}_{\geq 0}$,

 $\mu_{s+t}=\mu_{s}*\mu_{t}.$

The convolution semigroup is called continuous when $t\mapsto\mu_{t}$ is continuous $\mathbb{R}_{\geq 0}\to\mathscr{P}(\mathbb{R}^{d})$, where $\mathscr{P}(\mathbb{R}^{d})$ has the narrow topology.

The following theorem connects convolution semigroups and infinitely divisible distributions.99 9 Heinz Bauer, Probability Theory, p. 248, Theorem 29.6.

###### Theorem 3.

If $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup on $\mathscr{B}_{\mathbb{R}^{d}}$, then for each $t$, the measure $\mu_{t}$ is infinitely divisible.

If $\mu\in\mathscr{P}(\mathbb{R}^{d})$ is infinitely divisible and $t_{0}>0$, then there is a unique continuous convolution semigroup $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ such that $\mu_{t_{0}}=\mu$.

It follows from the above theorem that for a convolution semigroup $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ on $\mathscr{B}_{\mathbb{R}^{d}}$, $\mu_{1}$ is infinitely divisible and therefore by Theorem 2, $\tilde{\mu}_{1}(x)\neq 0$ for all $x$. But $\mu_{0}*\mu_{1}=\mu_{1}$, so $\tilde{\mu}_{0}\tilde{\mu}_{1}=\tilde{\mu}_{1}$, and $\tilde{\mu}_{0}(x)=1$ for each $x$. But $\tilde{\delta}_{0}(x)=1$ for all $x$, so

 $\mu_{0}=\delta_{0}.$ (8)

## 5 Translation-invariant semigroups

Let $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ be a Markov semigroup on $(\mathbb{R}^{d},\mathscr{B}_{\mathbb{R}^{d}})$. We say that $(P_{t})_{t\in\mathbb{R}}$ is translation-invariant if for all $x,y\in\mathbb{R}^{d}$, $A\in\mathscr{B}_{\mathbb{R}^{d}}$, and $t\in\mathbb{R}_{\geq 0}$,

 $P_{t}(x,A)=P_{t}(x+y,A+y).$

In this case, for $t\geq 0$ and for $A\in\mathscr{B}_{\mathbb{R}^{d}}$, define

 $\mu_{t}(A)=P_{t}(0,A).$

Each $\mu_{t}$ is a probability measure on $\mathscr{B}_{\mathbb{R}^{d}}$, and

 $\mu_{t}(A-x)=P_{t}(0,A-x)=P_{t}(x,(A-x)+x)=P_{t}(x,A).$

Using that the Chapman-Kolmogorov equation (5) and as $(P_{s})_{0}(B)=P_{s}(0,B)=\mu_{s}(B)$,

 $\displaystyle\mu_{s+t}(A)$ $\displaystyle=P_{s+t}(0,A)$ $\displaystyle=\int_{\mathbb{R}^{d}}P_{t}(y,A)d(P_{s})_{0}(y)$ $\displaystyle=\int_{\mathbb{R}^{d}}\mu_{t}(A-y)d\mu_{s}(y)$ $\displaystyle=(\mu_{t}*\mu_{s})(A),$

showing that $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup on $\mathscr{B}_{\mathbb{R}^{d}}$.

On the other hand, if $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup of probability measures on $\mathscr{B}_{\mathbb{R}^{d}}$, for $t\geq 0$, $x\in\mathbb{R}^{d}$, and $A\in\mathscr{B}_{\mathbb{R}^{d}}$ define

 $P_{t}(x,A)=\mu_{t}(A-x).$

Let $t\geq 0$. For $x\in\mathbb{R}^{d}$, the map $A\mapsto P_{t}(x,A)=\mu_{t}(A-x)$ is a probability measure on $\mathscr{B}_{\mathbb{R}^{d}}$. The map $(x,y)\mapsto x+y$ is continuous $\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}^{d}$, and for $A\in\mathscr{B}_{\mathbb{R}^{d}}$, the map $1_{A}:\mathbb{R}^{d}\to\mathbb{R}$ is measurable $\mathscr{B}_{\mathbb{R}^{d}}\to\mathscr{B}_{\mathbb{R}}$. Hence, as $\mathscr{B}_{\mathbb{R}^{d}\times\mathbb{R}^{d}}=\mathscr{B}_{\mathbb{R}^{d}}% \otimes\mathscr{B}_{\mathbb{R}^{d}}$, the map $(x,y)\mapsto 1_{A}(x+y)$ is measurable $\mathscr{B}_{\mathbb{R}^{d}}\otimes\mathscr{B}_{\mathbb{R}^{d}}\to\mathscr{B}_% {\mathbb{R}}$. Thus by Fubini’s theorem,

 $x\mapsto\int_{\mathbb{R}^{d}}1_{A}(x+y)d\mu_{t}(y)=\int_{\mathbb{R}^{d}}1_{A-x% }(y)d\mu_{t}(y)=\mu_{t}(A-x)$

is measurable $\mathscr{B}_{\mathbb{R}^{d}}\to\mathscr{B}_{\mathbb{R}}$. Hence $P_{t}$ is a Markov kernel, and thus $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a translation-invariant Markov semigroup.

Define $S:\mathbb{R}^{d}\to\mathbb{R}^{d}$ by $S(x)=-x$. For $\mu,\nu\in\mathscr{P}(\mathbb{R}^{d})$,

 $\displaystyle S_{*}(\mu*\nu)(A)$ $\displaystyle=(\mu*\nu)(-A)$ $\displaystyle=\int_{\mathbb{R}^{d}}\mu(-A-y)d\nu(y)$ $\displaystyle=\int_{\mathbb{R}^{d}}\mu(-A+y)d\overline{\nu}(y)$ $\displaystyle=\int_{\mathbb{R}^{d}}\overline{\mu}(A-y)d\overline{\nu}(y)$ $\displaystyle=(\overline{\mu}*\overline{\nu})(A),$

thus

 $S_{*}(\mu*\nu)=(S_{*}\mu)*(S_{*}\nu).$ (9)

For $\mu\in\mathscr{P}(\mathbb{R}^{d})$, write

 $\overline{\mu}=S_{*}\mu\in\mathscr{P}(\mathbb{R}^{d}),$

i.e.,

 $\overline{\mu}(A)=\mu(S^{-1}(A))=\mu(S(A))=\mu(-A).$

We calculate

 $(P_{t}^{*}1_{A})(x)=P_{t}(x,A)=\mu_{t}(A-x)=\int_{\mathbb{R}^{d}}1_{A}(x+y)d% \mu_{t}(y).$

Then if $f$ is a simple function, $f=\sum_{k}a_{k}1_{A_{k}}$,

 $(P_{t}^{*}f)(x)=\sum_{k}a_{k}\int_{\mathbb{R}^{d}}1_{A_{k}}(x+y)d\mu_{t}(y)=% \int_{\mathbb{R}^{d}}f(x+y)d\mu_{t}(y).$

For $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$, there is a sequence of simple functions $f_{n}$ that converge to $f$ in the uniform norm, and then by the dominated convergence theorem we get

 $(P_{t}^{*}f)(x)=\int_{\mathbb{R}^{d}}f(x+y)d\mu_{t}(y).$

But

 $\displaystyle\int_{\mathbb{R}^{d}}f(x+y)d\mu_{t}(y)$ $\displaystyle=\int_{\mathbb{R}^{d}}f(x+S(S(y)))d\mu_{t}(y)$ $\displaystyle=\int_{\mathbb{R}^{d}}f(x+S(y))d(S_{*}\mu_{t})(y)$ $\displaystyle=\int_{\mathbb{R}^{d}}f(x-y)d\overline{\mu}_{t}(y)$ $\displaystyle=(f*\overline{\mu}_{t})(x).$

Therefore for $t\geq 0$ and $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$,

 $P_{t}^{*}f=f*\overline{\mu}_{t}.$ (10)

For $s,t\geq 0$ and $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$, by (10), the fact that $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup, and (9), we get

 $\displaystyle P_{s+t}^{*}f$ $\displaystyle=f*(S_{*}\mu_{s+t})$ $\displaystyle=f*(S_{*}(\mu_{s}*\mu_{t}))$ $\displaystyle=f*((S_{*}\mu_{s})*(S_{*}\mu_{t}))$ $\displaystyle=(f*(S_{*}\mu_{s}))*(S_{*}\mu_{t})$ $\displaystyle=(P_{s}^{*}f)*(S_{*}\mu_{t})$ $\displaystyle=P_{t}^{*}(P_{s}^{*}f).$

This shows that $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a Markov semigroup. Moreover, by (8) it holds that $\mu_{0}=\delta_{0}$, and hence

 $P_{0}(x,A)=\mu_{0}(A-x)=\delta_{0}(A-x)=\delta_{x}(A).$

Namely, $P_{0}$ is the unit kernel (4).

If $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a convolution semigroup and some $\mu_{t}$ has density $q_{t}$ with respect to Lebesgue measure $\lambda_{d}$ on $\mathbb{R}^{d}$,

 $\mu_{t}=q_{t}\lambda_{d},$

then writing $\overline{q}_{t}(x)=q_{t}(-x)$, for $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$ by (10) we have

 $(P_{t}^{*}f)(x)=(f*\overline{\mu}_{t})(x)=\int_{\mathbb{R}^{d}}f(x-y)d% \overline{\mu}_{t}(y)=\int_{\mathbb{R}^{d}}f(x+y)q_{t}(y)d\lambda_{d}(y)$

so

 $P_{t}*f=f*\overline{q}_{t}.$ (11)

## 6 The Brownian semigroup

For $a\in\mathbb{R}$ and $\sigma>0$, let $\gamma_{a,\sigma^{2}}$ be the Gaussian measure on $\mathbb{R}$, the probability measure on $\mathbb{R}$ whose density with respect to Lebesgue measure is

 $p(x,a,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(x-a)^{2}}{2% \sigma^{2}}\right).$

For $\sigma=0$, let

 $\gamma_{a,0}=\delta_{a}.$

Define for $t\in\mathbb{R}_{\geq 0}$,

 $\mu_{t}=\prod_{k=1}^{d}\gamma_{0,t},$

which is an element of $\mathscr{P}(\mathbb{R}^{d})$. For $s,t\in\mathbb{R}_{\geq 0}$, we calculate

 $\mu_{s}*\mu_{t}=\left(\prod_{k=1}^{d}\gamma_{0,s}\right)*\left(\prod_{k=1}^{d}% \gamma_{0,t}\right)=\prod_{k=1}^{d}(\gamma_{0,s}*\gamma_{0,t})=\prod_{k=1}^{d}% \gamma_{0,s+t}=\mu_{s+t}.$

Lévy’s continuity theorem states that if $\nu_{n}$ is a sequence in $\mathscr{P}(\mathbb{R}^{d})$ and there is some $\phi:\mathbb{R}^{d}\to\mathbb{C}$ that is continuous at $0$ and to which $\tilde{\nu}_{n}$ converges pointwise, then there is some $\nu\in\mathscr{P}(\mathbb{R}^{d})$ such that $\phi=\tilde{\nu}$ and such that $\nu_{n}\to\nu$ narrowly. But for $t\in\mathbb{R}_{\geq 0}$ and $x\in\mathbb{R}^{d}$, we calculate

 $\tilde{\mu}_{t}(x)=\int_{\mathbb{R}^{d}}e^{i\left\langle x,y\right\rangle}d\mu% _{t}(y)=\exp\left(-\frac{t|x|^{2}}{2}\right).$ (12)

Let $\phi(x)=1$ for all $x$, for which $\tilde{\delta}_{0}=\phi$. For $t_{n}\in\mathbb{R}_{\geq 0}$ tending to $0$, let $\nu_{n}=\mu_{t_{n}}$. Then by (12), $\tilde{\nu}_{n}$ converges pointwise to $\phi$, so by Lévy’s continuity theorem, $\nu_{n}$ converges narrowly to $\delta_{0}$. Moreover, because $\mathbb{R}^{d}$ is a Polish space, $\mathscr{P}(\mathbb{R}^{d})$ is a Polish space, and in particular is metrizable. It thus follows that $\mu_{t}$ converges narrowly to $\delta_{0}$ as $t\to 0$. It then follows that $t\mapsto\mu_{t}$ is continuous $\mathbb{R}_{\geq 0}\to\mathscr{P}(\mathbb{R}^{d})$. Summarizing, $(\mu_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a continuous convolution semigroup.

For $t>0$, $\mu_{t}$ has density

 $g_{t}(x)=\prod_{j=1}^{d}(2\pi t)^{-1/2}e^{-\frac{x_{j}^{2}}{2t}}=(2\pi t)^{-d/% 2}e^{-\frac{|x|^{2}}{2t}}$

with respect to Lebesgue measure $\lambda_{d}$ on $\mathbb{R}^{d}$. For $t\geq 0$, let

 $P_{t}(x,A)=\mu_{t}(A-x).$

We have established that $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ is a translation-invariant Markov semigroup for which $P_{0}(x,A)=\delta_{x}(A)$. We call $(P_{t})_{t\in\mathbb{R}_{\geq 0}}$ the Brownian semigroup. For $t>0$ and $f\in B_{b}(\mathscr{B}_{\mathbb{R}^{d}})$, because $\overline{g}_{t}=g_{t}$ we have by (11),

 $(P_{t}f)(x)=(f*g_{t})(x)=(2\pi t)^{-d/2}\int_{\mathbb{R}^{d}}f(x-y)e^{-\frac{|% y|^{2}}{2t}}d\lambda_{d}(y).$

## 7 Projective families

For a nonempty set $I$, let $\mathscr{K}(I)$ denote the family of finite nonempty subsets of $I$. We speak in this section about projective families of probability measures.

The following theorem shows how to construct a projective family from a Markov semigroup on a measurable space and a probability measure on this measurable space.1010 10 Heinz Bauer, Probability Theory, p. 314, Theorem 36.4.

###### Theorem 4.

Let $I=\mathbb{R}_{\geq 0}$, let $(E,\mathscr{E})$ be a measurable space, let $(P_{t})_{t\in I}$ be a Markov semigroup on $\mathscr{E}$, and let $\mu$ be a probability measure on $\mathscr{E}$. For $J\in\mathscr{K}(I)$, with elements $t_{1}<\cdots, and for $A\in\mathscr{E}^{J}$, let

 $P_{J}(A)=\underbrace{\int_{E}\int_{E}\cdots\int_{E}}_{n+1}1_{A}(x_{1},\ldots,x% _{n})d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{1}})_{x_{0}}(x_{1})d% \mu(x_{0}).$

Then $(P_{J})_{J\in\mathscr{K}(I)}$ is a projective family of probability measures.

###### Proof.

Let $A_{k}$ be pairwise disjoint elements of $\mathscr{E}^{J}$, and call their union $A$. Then $1_{A}=\sum_{k}1_{A_{k}}$, and applying the monotone convergence theorem $n+1$ times,

 $\begin{split}&\displaystyle\underbrace{\int_{E}\int_{E}\cdots\int_{E}}_{n+1}1_% {A}(x_{1},\ldots,x_{n})d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{1}}% )_{x_{0}}(x_{1})d\mu(x_{0})\\ \displaystyle=&\displaystyle\sum_{k}\underbrace{\int_{E}\int_{E}\cdots\int_{E}% }_{n+1}1_{A_{k}}(x_{1},\ldots,x_{n})d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})% \cdots d(P_{t_{1}})_{x_{0}}(x_{1})d\mu(x_{0}),\end{split}$

i.e.

 $P_{J}(A)=\sum_{k}P_{J}(A_{k}).$

Furthermore, because $(P_{t})_{x}$ is a probability measure for each $t$ and for each $x$ and $\mu$ is a probability measure, we calculate that

 $P_{J}(E^{J})=1.$

Thus, $P_{J}$ is a probability measure on $\mathscr{E}^{J}$.

To prove that $(P_{J})_{J\in\mathscr{K}(I)}$ is a projective family, it suffices to prove that when $J,K\in\mathscr{K}(I)$, $J\subset K$, and $K\setminus J$ is a singleton, then $(\pi_{K,J})_{*}P_{K}=P_{J}$. Moreover, because (i) the product $\sigma$-algebra $\mathscr{E}^{J}$ is generated by the collection of cylinder sets, i.e. sets of the form $\prod_{t\in J}A_{t}$ for $A_{t}\in\mathscr{E}$, and (ii) the intersection of finitely many cylinder sets is a cylinder sets, it is proved using the monotone class theorem that if two probability measures on $\mathscr{E}^{J}$ coincide on the cylinder sets, then they are equal.1111 11 V. I. Bogachev, Measure Theory, volume I, p. 35, Lemma 1.9.4. Let $t_{1}<\cdots be the elements of $J$. To prove that $(\pi_{K,J})_{*}P_{K}$ and $P_{J}$ are equal, it suffices to prove that for any $A_{1},\ldots,A_{n}\in\mathscr{E}$,

 $(\pi_{K,J})_{*}P_{K}\left(\prod_{j=1}^{n}A_{j}\right)=P_{J}\left(\prod_{j=1}^{% n}A_{j}\right).$

Moreover, for $A=\prod_{j=1}^{n}A_{j}$,

 $1_{A}=1_{A_{1}}\otimes\cdots\otimes 1_{A_{n}},$

thus

 $\begin{split}&\displaystyle P_{J}\left(\prod_{j=1}^{n}A_{j}\right)\\ \displaystyle=&\displaystyle\underbrace{\int_{E}\int_{E}\cdots\int_{E}}_{n+1}1% _{A_{1}}(x_{1})\cdots 1_{A_{n}}(x_{n})d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})% \cdots d(P_{t_{1}})_{x_{0}}(x_{1})d\mu(x_{0})\\ \displaystyle=&\displaystyle\int_{E}\int_{A_{1}}\cdots\int_{A_{n}}d(P_{t_{n}-t% _{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{1}})_{x_{0}}(x_{1})d\mu(x_{0}).\end{split}$

Let $K\setminus J=\{t^{\prime}\}$. Either $t^{\prime}, or $t^{\prime}>t_{n}$, or there is some $1\leq j\leq n-1$ for which $t_{j}. Take the case $t^{\prime}. Then

 $\pi_{K,J}^{-1}\left(\prod_{j=1}^{n}A_{j}\right)=\prod_{k=0}^{n}B_{k},$

where $B_{0}=E$ and $B_{j}=A_{j}$ for $1\leq j\leq n$. Then

 $\begin{split}&\displaystyle(\pi_{K,J})_{*}P_{K}\left(\prod_{j=1}^{n}A_{j}% \right)\\ \displaystyle=&\displaystyle P_{K}\left(\prod_{k=0}^{n}B_{k}\right)\\ \displaystyle=&\displaystyle\int_{E}\int_{E}\int_{A_{1}}\cdots\int_{A_{n}}d(P_% {t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{1}-t^{\prime}})_{x^{\prime}}(x% _{1})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})d\mu(x_{0})\\ \displaystyle=&\displaystyle\int_{E}\int_{E}\int_{A_{1}}f(x_{1})d(P_{t_{1}-t^{% \prime}})_{x^{\prime}}(x_{1})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})d\mu(x_{0}),% \end{split}$

for

 $f(x_{1})=\int_{A_{2}}\cdots\int_{A_{n}}d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})% \cdots d(P_{t_{2}-t_{1}})_{x_{1}}(x_{2}).$

By (1) and because $(P_{t})_{t\in I}$ is a Markov semigroup,

 $\begin{split}&\displaystyle\int_{E}\int_{A_{1}}f(x_{1})d(P_{t_{1}-t^{\prime}})% _{x^{\prime}}(x_{1})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})\\ \displaystyle=&\displaystyle\int_{E}\int_{E}f(x_{1})1_{A_{1}}(x_{1})d(P_{t_{1}% -t^{\prime}})_{x^{\prime}}(x_{1})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})\\ \displaystyle=&\displaystyle\int_{E}P_{t_{1}-t^{\prime}}^{*}(f1_{A_{1}})(x^{% \prime})d(P_{t^{\prime}})_{x_{0}}(x^{\prime})\\ \displaystyle=&\displaystyle P_{t^{\prime}}^{*}(P_{t_{1}-t^{\prime}}^{*}(f1_{A% _{1}}))(x_{0})\\ \displaystyle=&\displaystyle P_{t_{1}}(f1_{A_{1}})(x_{0})\\ \displaystyle=&\displaystyle\int_{E}f(x_{1})1_{A_{1}}(x_{1})d(P_{t_{1}})_{x_{0% }}(x_{1})\\ \displaystyle=&\displaystyle\int_{A_{1}}f(x_{1})d(P_{t_{1}})_{x_{0}}(x_{1})\\ \displaystyle=&\displaystyle\int_{A_{1}}\int_{A_{2}}\cdots\int_{A_{n}}d(P_{t_{% n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{2}-t_{1}})_{x_{1}}(x_{2})d(P_{t_{1% }})_{x_{0}}(x_{1}).\end{split}$

Thus

 $\begin{split}&\displaystyle(\pi_{K,J})_{*}P_{K}\left(\prod_{j=1}^{n}A_{j}% \right)\\ \displaystyle=&\displaystyle\int_{E}\int_{A_{1}}\int_{A_{2}}\cdots\int_{A_{n}}% d(P_{t_{n}-t_{n-1}})_{x_{n-1}}(x_{n})\cdots d(P_{t_{2}-t_{1}})_{x_{1}}(x_{2})d% (P_{t_{1}})_{x_{0}}(x_{1})d\mu(x_{0})\\ \displaystyle=&\displaystyle P_{J}\left(\prod_{j=1}^{n}A_{j}\right).\end{split}$

This shows that the claim is true in the case $t^{\prime}. ∎

Thus, if $E$ is a Polish space with Borel $\sigma$-algebra $\mathscr{E}$, let $I=\mathbb{R}_{\geq 0}$, let $(P_{t})_{t\in I}$ be a Markov semigroup on $\mathscr{E}$, and let $\mu$ be a probability measure on $\mathscr{E}$. The above theorem tells us that $(P_{J})_{\mathscr{K}(I)}$ is a projective family, and then the Kolmogorov extension theorem tells us that there is a probability measure1212 12 We write $P^{\mu}$ to indicate that this measure involves $\mu$; it also involves the Markov semigroup, which we do not indicate. $P^{\mu}$ on $\mathscr{E}^{I}$ such that for any $J\in\mathscr{K}(I)$, ${\pi_{J}}_{*}P^{\mu}=P^{\mu}_{J}$. This implies that there is a stochastic process $(X_{t})_{t\in I}$ whose finite-dimensional distributions are equal to the probability measures $P_{J}$ defined in Theorem 4 using the Markov semigroup $(P_{t})_{t\in I}$ and the probability measure $\mu$.