Markov kernels, convolution semigroups, and projective families of probability measures

Jordan Bell
June 12, 2015

1 Transition kernels

For a measurable space (E,), we denote by + the set of functions E[0,] that are [0,] measurable. It can be proved that if I:+[0,] is a function such that (i) f=0 implies that I(f)=0, (ii) if f,g+ and a,b0 then I(af+bg)=aI(f)+bI(g), and (iii) if fn is a sequence in + that increases pointwise to an element f of + then I(fn) increases to I(f), then there a unique measure μ on such that I(f)=μf for each f+.11 1 Erhan Çinlar, Probability and Stochastics, p. 28, Theorem 4.21.

Let (E,) and (F,) be a measurable space. A transition kernel is a function

K:E×[0,]

such that (i) for each xE, the function Kx:[0,] defined by

BK(x,B)

is a measure on , and (ii) for each B, the map

xK(x,B)

is measurable [0,].

If μ is a measure on , define

(K*μ)(B)=EK(x,B)𝑑μ(x),B.

If Bn are pairwise disjoint elements of , then using that BK(x,B) is a measure and the monotone convergence theorem,

(K*μ)(nBn) =EK(x,nBn)𝑑μ(x)
=EnK(x,Bn)dμ(x)
=nEK(x,Bn)𝑑μ(x)
=n(K*μ)(Bn),

showing that K*μ is a measure on .

If f+, define K*f:E[0,] by

(K*f)(x)=Ff(y)𝑑Kx(y),xE. (1)

For ϕ=j=1kbj1Bj with bj0 and Bj, because xK(x,Bj) is measurable [0,] for each j,

(K*ϕ)(x)=Fj=1kbj1Bj(y)dKx(y)=j=1kbjKx(Bj)=j=1bjK(x,Bj),

is measurable [0,]. For f+, there is a sequence of simple functions ϕn with 0ϕ1ϕ2 that converges pointwise to f,22 2 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed., p. 47, Theorem 2.10. and then by the monotone convergence theorem, for each xE we have

(K*ϕn)(x)=Fϕn(y)𝑑Kx(y)Ff(y)𝑑Kx(y)=(K*f)(x),

showing K*ϕn converges pointwise to K*f, and because each K*ϕn is measurable [0,], K*f is measurable [0,].33 3 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed., p. 45, Proposition 2.7. Therefore, if f+ then K*f+. In particular, if K is a transition kernel from (E,) to (F,),

(K*1B)(x)=F1B(y)𝑑Kx(y)=Kx(B)=K(x,B),xE,B. (2)

The following gives conditions under which (2) defines a transition kernel.44 4 Heinz Bauer, Probability Theory, p. 308, Lemma 36.2.

Lemma 1.

Suppose that N:++ satisfies the following properties:

  1. 1.

    N(0)=0.

  2. 2.

    N(af+bg)=aN(f)+bN(g) for f,g+ and a,b0.

  3. 3.

    If fn is a sequence in + increasing to f+, then N(fn)N(f).

Then

K(x,B)=(N(1B))(x),xE,B,

is a transition kernel from (E,) to (F,). K is the unique transition kernel satisfying

K*f=N(f),f+.

If K is a transition kernel from (E,) to (F,) and L is a transition kernel from (F,) to (G,𝒢), the function K*L*:𝒢++ satisfies (i) (K*L*)(0)=K*(0)=0, (ii) if f,g𝒢+ and a,b0,

(K*L*)(af+bg) =K*(aL*(f)+bL*(g))
=aK*(L*(f))+K*(L*(g))
=a(K*L*)(f)+b(K*L*)(g),

and (iii) if fnf in 𝒢+, then by the monotone convergence theorem, L*(fn)L*(f), and then again applying the monotone convergence theorem, K*(L*(fn))K*(L*(f)), i.e.

(K*L*)(fn)(K*L*)(f).

Therefore, from Lemma 1 we get that there is a unique transition kernel from (E,) to (G,𝒢), denoted KL and called the product of K and L, such that

(KL)*f=(K*L*)(f),f𝒢+.

For f𝒢+ and xE,

(KL)*(f)(x) =(K*(L*f))(x)
=F(L*f)(y)𝑑Kx(y)
=F(Gf(z)𝑑Ly(z))𝑑Kx(y).

In particular, for C𝒢,

(KL)*(1C)(x)=FLy(C)𝑑Kx(y)=FL(y,C)𝑑Kx(y). (3)

2 Markov kernels

A Markov kernel from (E,) to (F,) is a transition kernel K such that for each xE, Kx is a probability measure on . The unit kernel from (E,) to (E,) is

I(x,A)=δx(A). (4)

It is apparent that the unit kernel is a Markov kernel.

If K is a Markov kernel from (E,) to (F,) and L is a Markov kernel from (F,) to (G,𝒢), then for xE, by (3) we have

(KL)*(1G)(x)=F𝑑Kx(y)=Kx(F)=K(x,F)=1,

and thus by (2),

(KL)x(G)=(KL)(x,G)=1,

showing that for each xE, (KL)x is a probability measure. Therefore, the product of two Markov kernels is a Markov kernel.

Let (E,) be a measurable space and let

Bb()

be the set of bounded functions E that are measurable . Bb() is a Banach space with the uniform norm

fu=supxE|f(x)|.

For K a Markov kernel from (E,) to (F,) and for fBb(), define K*f:E by

(K*f)(x)=Ff(y)𝑑Kx(y),xE,

for which

|(K*f)(x)|F|f(y)|𝑑Kx(y)fuKx(F)=fu,

showing that K*fufu. Furthermore, there is a sequence of simple functions ϕnBb() that converges to f in the norm u.55 5 V. I. Bogachev, Measure Theory, p. 108, Lemma 2.1.8. For xE, by the dominated convergence theorem we get that

(K*ϕn)(x)=Fϕn(y)𝑑Kx(y)Ff(y)𝑑Kx(y)=(K*f)(x).

Each K*ϕn is measurable , hence K*f is measurable and so belongs to Bb().

3 Markov semigroups

Let (E,) be a measurable space and for each t0, let Pt be a Markov kernel from (E,) to (E,). We say that the family (Pt)t0 is a Markov semigroup if

Ps+t=PsPt,s,t0.

For xE and A and for s,t0, by (2) and (3),

(PsPt)(x,A)=((PsPt)*1A)(x)=EPt(y,A)d(Ps)x(y)

Thus

Ps+t(x,A)=EPt(y,A)d(Ps)x(y), (5)

called the Chapman-Kolmogorov equation.

4 Infinitely divisible distributions

Let 𝒫(d) be the collection of Borel probability measures on d. For μ𝒫(d), its characteristic function μ~:d is defined by

μ~(x)=deix,y𝑑μ(y).

μ~ is uniformly continuous on d and |μ~(x)|μ~(0)=1 for all xd.66 6 Heinz Bauer, Probability Theory, p. 183, Theorem 22.3. For μ1,,μn𝒫(d), let μ be their convolution:

μ=μ1**μn,

which for A a Borel set in d is defined by

μ(A)=(d)n1A(x1++xn)d(μ1××μn)(x1,,xn).

One computes that77 7 Heinz Bauer, Probability Theory, p. 184, Theorem 22.4.

μ~=μ~1μ~n.

An element μ of 𝒫(d) is called infinitely divisible if for each n1, there is some μn𝒫(d) such that

μ=μn**μnn. (6)

Thus,

μ~=(μ~n)n. (7)

On the other hand, if μn𝒫(d) is such that (7) is true, then because the characteristic function of μn**μn is (μ~n)n and the characteristic function of μ is μ~ and these are equal, it follows that μn**μn and μ are equal.

The following theorem is useful for doing calculations with the characteristic function of an infinitely divisible distribution.88 8 Heinz Bauer, Probability Theory, p. 246, Theorem 29.2.

Theorem 2.

Suppose that μ is an infinitely divisible distribution on d. First,

μ~(x)0,xd.

Second, there is a unqiue continuous function ϕ:d satisfying ϕ(0)=0 and

μ~=|μ~|eiϕ.

Third, for each n1, there is a unique μn𝒫(d) for which μ=μn**μn. The characteristic function of this unique μn is

μ~n=|μ~|1neiϕn.

A convolution semigroup is a family (μt)t0 of elements of 𝒫(d) such that for s,t0,

μs+t=μs*μt.

The convolution semigroup is called continuous when tμt is continuous 0𝒫(d), where 𝒫(d) has the narrow topology.

The following theorem connects convolution semigroups and infinitely divisible distributions.99 9 Heinz Bauer, Probability Theory, p. 248, Theorem 29.6.

Theorem 3.

If (μt)t0 is a convolution semigroup on d, then for each t, the measure μt is infinitely divisible.

If μ𝒫(d) is infinitely divisible and t0>0, then there is a unique continuous convolution semigroup (μt)t0 such that μt0=μ.

It follows from the above theorem that for a convolution semigroup (μt)t0 on d, μ1 is infinitely divisible and therefore by Theorem 2, μ~1(x)0 for all x. But μ0*μ1=μ1, so μ~0μ~1=μ~1, and μ~0(x)=1 for each x. But δ~0(x)=1 for all x, so

μ0=δ0. (8)

5 Translation-invariant semigroups

Let (Pt)t0 be a Markov semigroup on (d,d). We say that (Pt)t is translation-invariant if for all x,yd, Ad, and t0,

Pt(x,A)=Pt(x+y,A+y).

In this case, for t0 and for Ad, define

μt(A)=Pt(0,A).

Each μt is a probability measure on d, and

μt(A-x)=Pt(0,A-x)=Pt(x,(A-x)+x)=Pt(x,A).

Using that the Chapman-Kolmogorov equation (5) and as (Ps)0(B)=Ps(0,B)=μs(B),

μs+t(A) =Ps+t(0,A)
=dPt(y,A)d(Ps)0(y)
=dμt(A-y)𝑑μs(y)
=(μt*μs)(A),

showing that (μt)t0 is a convolution semigroup on d.

On the other hand, if (μt)t0 is a convolution semigroup of probability measures on d, for t0, xd, and Ad define

Pt(x,A)=μt(A-x).

Let t0. For xd, the map APt(x,A)=μt(A-x) is a probability measure on d. The map (x,y)x+y is continuous d×dd, and for Ad, the map 1A:d is measurable d. Hence, as d×d=dd, the map (x,y)1A(x+y) is measurable dd. Thus by Fubini’s theorem,

xd1A(x+y)𝑑μt(y)=d1A-x(y)𝑑μt(y)=μt(A-x)

is measurable d. Hence Pt is a Markov kernel, and thus (Pt)t0 is a translation-invariant Markov semigroup.

Define S:dd by S(x)=-x. For μ,ν𝒫(d),

S*(μ*ν)(A) =(μ*ν)(-A)
=dμ(-A-y)𝑑ν(y)
=dμ(-A+y)𝑑ν¯(y)
=dμ¯(A-y)𝑑ν¯(y)
=(μ¯*ν¯)(A),

thus

S*(μ*ν)=(S*μ)*(S*ν). (9)

For μ𝒫(d), write

μ¯=S*μ𝒫(d),

i.e.,

μ¯(A)=μ(S-1(A))=μ(S(A))=μ(-A).

We calculate

(Pt*1A)(x)=Pt(x,A)=μt(A-x)=d1A(x+y)𝑑μt(y).

Then if f is a simple function, f=kak1Ak,

(Pt*f)(x)=kakd1Ak(x+y)𝑑μt(y)=df(x+y)𝑑μt(y).

For fBb(d), there is a sequence of simple functions fn that converge to f in the uniform norm, and then by the dominated convergence theorem we get

(Pt*f)(x)=df(x+y)𝑑μt(y).

But

df(x+y)𝑑μt(y) =df(x+S(S(y)))𝑑μt(y)
=df(x+S(y))d(S*μt)(y)
=df(x-y)𝑑μ¯t(y)
=(f*μ¯t)(x).

Therefore for t0 and fBb(d),

Pt*f=f*μ¯t. (10)

For s,t0 and fBb(d), by (10), the fact that (μt)t0 is a convolution semigroup, and (9), we get

Ps+t*f =f*(S*μs+t)
=f*(S*(μs*μt))
=f*((S*μs)*(S*μt))
=(f*(S*μs))*(S*μt)
=(Ps*f)*(S*μt)
=Pt*(Ps*f).

This shows that (Pt)t0 is a Markov semigroup. Moreover, by (8) it holds that μ0=δ0, and hence

P0(x,A)=μ0(A-x)=δ0(A-x)=δx(A).

Namely, P0 is the unit kernel (4).

If (μt)t0 is a convolution semigroup and some μt has density qt with respect to Lebesgue measure λd on d,

μt=qtλd,

then writing q¯t(x)=qt(-x), for fBb(d) by (10) we have

(Pt*f)(x)=(f*μ¯t)(x)=df(x-y)𝑑μ¯t(y)=df(x+y)qt(y)𝑑λd(y)

so

Pt*f=f*q¯t. (11)

6 The Brownian semigroup

For a and σ>0, let γa,σ2 be the Gaussian measure on , the probability measure on whose density with respect to Lebesgue measure is

p(x,a,σ2)=12πσ2exp(-(x-a)22σ2).

For σ=0, let

γa,0=δa.

Define for t0,

μt=k=1dγ0,t,

which is an element of 𝒫(d). For s,t0, we calculate

μs*μt=(k=1dγ0,s)*(k=1dγ0,t)=k=1d(γ0,s*γ0,t)=k=1dγ0,s+t=μs+t.

Lévy’s continuity theorem states that if νn is a sequence in 𝒫(d) and there is some ϕ:d that is continuous at 0 and to which ν~n converges pointwise, then there is some ν𝒫(d) such that ϕ=ν~ and such that νnν narrowly. But for t0 and xd, we calculate

μ~t(x)=deix,y𝑑μt(y)=exp(-t|x|22). (12)

Let ϕ(x)=1 for all x, for which δ~0=ϕ. For tn0 tending to 0, let νn=μtn. Then by (12), ν~n converges pointwise to ϕ, so by Lévy’s continuity theorem, νn converges narrowly to δ0. Moreover, because d is a Polish space, 𝒫(d) is a Polish space, and in particular is metrizable. It thus follows that μt converges narrowly to δ0 as t0. It then follows that tμt is continuous 0𝒫(d). Summarizing, (μt)t0 is a continuous convolution semigroup.

For t>0, μt has density

gt(x)=j=1d(2πt)-1/2e-xj22t=(2πt)-d/2e-|x|22t

with respect to Lebesgue measure λd on d. For t0, let

Pt(x,A)=μt(A-x).

We have established that (Pt)t0 is a translation-invariant Markov semigroup for which P0(x,A)=δx(A). We call (Pt)t0 the Brownian semigroup. For t>0 and fBb(d), because g¯t=gt we have by (11),

(Ptf)(x)=(f*gt)(x)=(2πt)-d/2df(x-y)e-|y|22t𝑑λd(y).

7 Projective families

For a nonempty set I, let 𝒦(I) denote the family of finite nonempty subsets of I. We speak in this section about projective families of probability measures.

The following theorem shows how to construct a projective family from a Markov semigroup on a measurable space and a probability measure on this measurable space.1010 10 Heinz Bauer, Probability Theory, p. 314, Theorem 36.4.

Theorem 4.

Let I=0, let (E,) be a measurable space, let (Pt)tI be a Markov semigroup on , and let μ be a probability measure on . For J𝒦(I), with elements t1<<tn, and for AJ, let

PJ(A)=EEEn+11A(x1,,xn)d(Ptn-tn-1)xn-1(xn)d(Pt1)x0(x1)dμ(x0).

Then (PJ)J𝒦(I) is a projective family of probability measures.

Proof.

Let Ak be pairwise disjoint elements of J, and call their union A. Then 1A=k1Ak, and applying the monotone convergence theorem n+1 times,

EEEn+11A(x1,,xn)d(Ptn-tn-1)xn-1(xn)d(Pt1)x0(x1)dμ(x0)=kEEEn+11Ak(x1,,xn)d(Ptn-tn-1)xn-1(xn)d(Pt1)x0(x1)dμ(x0),

i.e.

PJ(A)=kPJ(Ak).

Furthermore, because (Pt)x is a probability measure for each t and for each x and μ is a probability measure, we calculate that

PJ(EJ)=1.

Thus, PJ is a probability measure on J.

To prove that (PJ)J𝒦(I) is a projective family, it suffices to prove that when J,K𝒦(I), JK, and KJ is a singleton, then (πK,J)*PK=PJ. Moreover, because (i) the product σ-algebra J is generated by the collection of cylinder sets, i.e. sets of the form tJAt for At, and (ii) the intersection of finitely many cylinder sets is a cylinder sets, it is proved using the monotone class theorem that if two probability measures on J coincide on the cylinder sets, then they are equal.1111 11 V. I. Bogachev, Measure Theory, volume I, p. 35, Lemma 1.9.4. Let t1<<tn be the elements of J. To prove that (πK,J)*PK and PJ are equal, it suffices to prove that for any A1,,An,

(πK,J)*PK(j=1nAj)=PJ(j=1nAj).

Moreover, for A=j=1nAj,

1A=1A11An,

thus

PJ(j=1nAj)=EEEn+11A1(x1)1An(xn)d(Ptn-tn-1)xn-1(xn)d(Pt1)x0(x1)dμ(x0)=EA1And(Ptn-tn-1)xn-1(xn)d(Pt1)x0(x1)𝑑μ(x0).

Let KJ={t}. Either t<t1, or t>tn, or there is some 1jn-1 for which tj<t<tj+1. Take the case t<t1. Then

πK,J-1(j=1nAj)=k=0nBk,

where B0=E and Bj=Aj for 1jn. Then

(πK,J)*PK(j=1nAj)=PK(k=0nBk)=EEA1And(Ptn-tn-1)xn-1(xn)d(Pt1-t)x(x1)d(Pt)x0(x)𝑑μ(x0)=EEA1f(x1)d(Pt1-t)x(x1)d(Pt)x0(x)𝑑μ(x0),

for

f(x1)=A2And(Ptn-tn-1)xn-1(xn)d(Pt2-t1)x1(x2).

By (1) and because (Pt)tI is a Markov semigroup,

EA1f(x1)d(Pt1-t)x(x1)d(Pt)x0(x)=EEf(x1)1A1(x1)d(Pt1-t)x(x1)d(Pt)x0(x)=EPt1-t*(f1A1)(x)d(Pt)x0(x)=Pt*(Pt1-t*(f1A1))(x0)=Pt1(f1A1)(x0)=Ef(x1)1A1(x1)d(Pt1)x0(x1)=A1f(x1)d(Pt1)x0(x1)=A1A2And(Ptn-tn-1)xn-1(xn)d(Pt2-t1)x1(x2)d(Pt1)x0(x1).

Thus

(πK,J)*PK(j=1nAj)=EA1A2And(Ptn-tn-1)xn-1(xn)d(Pt2-t1)x1(x2)d(Pt1)x0(x1)𝑑μ(x0)=PJ(j=1nAj).

This shows that the claim is true in the case t<t1. ∎

Thus, if E is a Polish space with Borel σ-algebra , let I=0, let (Pt)tI be a Markov semigroup on , and let μ be a probability measure on . The above theorem tells us that (PJ)𝒦(I) is a projective family, and then the Kolmogorov extension theorem tells us that there is a probability measure1212 12 We write Pμ to indicate that this measure involves μ; it also involves the Markov semigroup, which we do not indicate. Pμ on I such that for any J𝒦(I), πJ*Pμ=PJμ. This implies that there is a stochastic process (Xt)tI whose finite-dimensional distributions are equal to the probability measures PJ defined in Theorem 4 using the Markov semigroup (Pt)tI and the probability measure μ.