probability (martingale theory) › fileadmin › wt › inhalt › people › robert_philip… ·...

Probability (Martingale Theory)∗

Dr. Robert Philipowski

Recommended books:

H. Bauer, Wahrscheinlichkeitstheorie (5th edition). Walter de Gruyter, Berlin, 2002.English translation: Probability Theory. Walter de Gruyter, Berlin, 1995.

D. Williams, Probability with Martingales. Cambridge University Press, 1991.

Contents

1 Introduction 3

2 Measure-theoretic probability 3

3 Important inequalities 8

4 Independence 94.1 Independent events and random variables . . . . . . . . . . . . . . . . . . . 94.2 Existence of independent random variables with given distributions, product

measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.3 Products of independent random variables . . . . . . . . . . . . . . . . . . . 12

5 Convergence of random variables and distributions 13

6 Laws of large numbers 15

7 Conditional expectation 15

8 Stochastic processes and martingales 20

9 Stopping times 23

10 Maximal inequalities 25

11 Martingale convergence theorems 2611.1 Doob’s downcrossing inequality . . . . . . . . . . . . . . . . . . . . . . . . . 2611.2 Doob’s convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2911.3 Uniformly integrable martingales . . . . . . . . . . . . . . . . . . . . . . . . 30

12 Backward martingales, sub- and supermartingales 35

13 Applications of the martingale convergence theorems 36

14 Construction of stochastic processes with prescribed properties 38

∗Notes of a course given at the University of Luxembourg during the winter semesters 2013/2014,2014/2015 and 2015/2016.

1

15 Markov kernels, semigroups and processes 43

16 Brownian motion and Kolmogorov’s continuity theorem 4816.1 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4816.2 Modifications and indistinguishable processes . . . . . . . . . . . . . . . . . 5116.3 Kolmogorov’s continuity theorem . . . . . . . . . . . . . . . . . . . . . . . . 5216.4 Brownian motion is a martingale . . . . . . . . . . . . . . . . . . . . . . . . 56

17 Gaussian processes 57

18 Path properties of Brownian motion 64

2

1 Introduction

The word “martingale” has many different meanings, see e.g. http://en.wikipedia.org/wiki/Martingale. In mathematics, and more precisely probability theory, it stands for amathematical model of a “fair game”. What does that mean?

Example 1.1 (Coin tossing). We repeatedly toss a fair coin. We win 1 EUR if the resultis heads, and lose 1 EUR if it is tails. Mathematically, let ξi = 1 if the result of the i-thtoss is heads, and ξi = −1 if it is tails. Then the random variables ξi are independent andidentically distributed with P [ξi = 1] = P [ξi = −1] = 1/2. We are interested in the totalgain after the n-th toss,

Xn :=n∑i=1

ξi.

Intuitively, the sequence (Xn)n∈N is an example of a “fair game”. What does that mean?First observation: E(Xn) =

∑ni=1E(ξi) = 0, so that the expectation of Xn is constant.

This property however is not enough to call a game “fair”. If for example we defineYn := nξ1, we also have E(Yn) = nE(ξ1) = 0, but once we know Y1 we would not considerthe game to be fair any more.

Preliminary definition: We say that a sequence (Xn)n∈N of random variables is a fairgame if for all n ∈ N and all a1, . . . , an ∈ R such that P [X1 = a1, . . . , Xn = an] > 0 wehave

E [Xn+1 |X1 = a1, . . . , Xn = an] = an.

Let us check whether our two sequences (Xn)n∈N and (Yn)n∈N are fair games in the senseof this definition: we first observe that

E[Y2 |Y1 = −1] = −2 < −1,

so that the sequence (Yn)n∈N is not fair. (Once we observe that Y1 = −1, we would liketo stop the game.) On the other hand,

E [Xn+1 |X1 = a1, . . . , Xn = an] = E [Xn + ξn+1 |X1 = a1, . . . , Xn = an]

= an + E [ξn+1 |X1 = a1, . . . , Xn = an]

= an + E [ξn+1]

= an,

so that the sequence (Yn)n∈N is fair.

Problem: In many cases (for instance if the random variables Xn have continuous dis-tributions), the probability P [X1 = a1, . . . , Xn = an] is 0 for all a1, . . . , an ∈ R. Here wearrive at the limits of elementary probability, and therefore we introduce measure-theoreticprobability.

2 Measure-theoretic probability

In elementary probability theory a probability space is a pair (Ω, p) where Ω is a countableset and p : Ω → R+ a function with

∑ω∈Ω p(ω) = 1; p(ω) is the probability of the

elementary event ω ∈ Ω. The probability of a general event A ⊂ Ω is then P (A) =∑ω∈A p(ω). The function P : P(Ω)→ [0, 1] has the following properties:

3

1. For all singletons, P (ω) = p(ω).

2. P (∅) = 0.

3. Whenever (Ai)i∈I is a collection of pairwise disjoint subsets of Ω, we have

P

(⋃i∈I

Ai

)=∑i∈I

P (Ai). (1)

4. P (Ω) = 1.

Example 2.1. We choose a point “uniformly” from the unit cube [0, 1]3, in the sense that

P (A) = vol(A)

for all A ⊆ [0, 1]3. Two problems arise:

1. On the one hand we have P (ω) = 0 for all ω ∈ [0, 1]3, and consequently∑ω∈[0,1]3

P (ω) = 0;

on the other hand

P

⋃ω∈[0,1]3

(ω)

= P([0, 1]3

)= 1 6= 0;

consequently (1) is not satisfied.

2. The Banach-Tarski paradox1 shows that it is not possible to define the volume of allsubsets of [0, 1]3 in a sensible way.

The solution to this sort of problems is to use measure theory.

Definition 2.2. A probability space is a measure space (Ω,A, P ) with P (Ω) = 1. Thismeans:

1. (Ω,A) is a measurable space, i.e. Ω is a set (not necessarily countable) and A a σ-fieldon Ω, i.e. a collection of subsets of Ω satisfying

(a) ∅ ∈ A,

(b) for every countable collection (Ai)i∈I of elements of A we have⋃i∈I Ai ∈ A,

(c) for every A ∈ A we have Ac ∈ A.

2. P is a measure on (Ω,A), i.e. a function P : A → R+ satisfying

(a) P (∅) = 0,

(b) whenever (Ai)i∈I is a countable collection of pairwise disjoint subsets of Ω, wehave P

(⋃i∈I Ai

)=∑

i∈I P (Ai).

3. P (Ω) = 1.

1S. Banach, A. Tarski, Sur la decomposition des ensembles de points en parties respectivement con-gruentes. Fund. Math 6 (1924), 244–277. (Stefan Banach (1892–1945), Polish mathematician; Alfred Tarski(1901–1983), Polish mathematician and philosopher)

4

Definition 2.3. Let (Ω,A, P ) be a probability space and (E,B) a measurable space.

1. An (E,B)-valued random variable on (Ω,A, P ) is a map X : Ω → E which ismeasurable with respect to A and B.

2. The distribution of X is the image measure of P under X, i.e. the probability measurePX on (E,B) defined by

PX(B) := PX ∈ B.

Remark 2.4.

1. If E is a topological space, we always use its Borel2 σ-field B = B(E) (i.e. the σ-fieldgenerated by the topology of E) unless otherwise specified.

2. If E is not specified, the word “random variable” always means “real-valued randomvariable”, i.e. E = R.

Definition 2.5. Let X be a random variable defined on a probability space (Ω,A, P ).The expectation of X is its Lebesgue integral:3

E(X) :=

∫ΩXdP,

provided that this integral exists. (If∫

ΩX+dP and∫

ΩX−dP are both infinite, theLebesgue integral and hence the expectation of X is not defined.)

Remark 2.6. The Lebesgue integral of a random variable X is defined in three steps:

1. If X is an elementary or simple random variable, i.e. of the form X =∑k

i=1 αi1Aiwith αi ∈ R and Ai ∈ A, one sets∫

ΩXdP :=

k∑i=1

αiP (Ai).

One has to check that this is well-defined, i.e. independent of the particular repre-sentation of X.

2. If X is nonnegative, there exists a non-decreasing sequence (Xn)n∈N of nonnegativeelementary random variables converging pointwise to X, i.e. 0 ≤ X1(ω) ≤ X2(ω) ≤. . . and Xn(ω)→ X(ω) for all ω ∈ Ω. One then sets∫

ΩXdP := lim

n→∞

∫ΩXndP

and must again show that this is well-defined, i.e. independent of the choice of thesequence (Xn)n∈N.

3. In the general case one sets∫ΩXdP :=

∫ΩX+dP −

∫ΩX−dP,

provided that at least one of these two integrals is finite. If they are both infinite,the Lebesgue integral of X is not defined.

2Emile Borel (1871–1956), French mathematician3H. Lebesgue, Lecons sur l’integration et la recherche des fonctions primitives. Gauthier-Villars, Paris,

1904. (Henri Lebesgue (1875–1941), French mathematician)

5

Definition 2.7. Let X be a random variable defined on a probability space (Ω,A, P ).

1. We say that X is quasi-integrable if E(X) exists, i.e. if E(X+) or E(X−) is finite.

2. We say that X is integrable if E(X) exists and is finite. Clearly, this holds if andonly if E(X+) and E(X−) are both finite or, equivalently, if E(|X|) <∞.

3. Let p ≥ 1. We say that X belongs to Lp if E(|X|p) <∞. If X belongs to L2 we saythat X is square-integrable.

Remark 2.8. An often used sufficient criterion for the existence of E(X) is that X isnon-negative or integrable. Such random variables are sometimes called admissible.

Remark 2.9. In general the pointwise convergence of a sequence (Xn)n∈N to anotherrandom variable X does NOT imply the convergence of E(Xn) to E(X).

Example 2.10. Let Ω := [0, 1], A its Borel σ-field, P the Lebesgue measure and Xn :=n2 · 1(0,1/n). Then Xn(ω)→ 0 for all ω ∈ Ω, but E(Xn) = n→∞.

Proposition 2.11 (Beppo Levi’s4 monotone convergence theorem). Let (Xn)n∈N be asequence of random variables and X another random variable, all defined on the sameprobability space (Ω,A, P ). If 0 ≤ Xn X, then E(Xn)→ E(X).

Proposition 2.12 (Lebesgue’s dominated convergence theorem). Let (Xn)n∈N be a se-quence of random variables and X another random variable, all defined on the same prob-ability space (Ω,A, P ). If the sequence (Xn)n∈N converges pointwise to X and if moreoverthere exists an integrable random variable X ′ on (Ω,A, P ) such that |Xn| ≤ X ′ for alln ∈ N, then E(Xn)→ E(X).

Proposition 2.13 (Fatou’s lemma5). Let (Xn)n∈N be a sequence of nonnegative randomvariables. Then

E(lim infn→∞

Xn) ≤ lim infn→∞

E(Xn).

Definition 2.14. The variance of X is

Var(X) :=

E[(X − E(X))2

]if E(|X|) <∞

+∞ if E(|X|) =∞.

Remark 2.15. If E(|X|) <∞, one clearly has Var(X) = E(X2)− E(X)2.

Definition 2.16. The square root of the variance is called the standard deviation of X:

σ(X) :=√

Var(X).

Definition 2.17. Let X and Y be two integrable random variables such that their productXY is also integrable. Then their covariance is

Cov(X,Y ) := E(XY )− E(X)E(Y ).

If Cov(X,Y ) = 0, X and Y are called uncorrelated.

4Beppo Levi (1875–1961), Italian mathematician5P. Fatou, Series trigonometriques et series de Taylor. Acta Math. 30 (1906), 335–400. (Pierre Fatou

(1878–1929), French mathematician and astronomer)

6

Lemma 2.18 (Transformation lemma). Let (Ω,A, P ) be a probability space, (E,B) ameasurable space, X an (E,B)-valued random variable on (Ω,A, P ) and f : E → R ameasurable function. Then the expectation of the random variable f(X) exists if and onlyif the integral of f with respect to the measure PX exists, and in this case one has

E(f(X)) =

∫EfdPX .

In particular, for E = R and f(x) = x or f(x) = x2 we obtain

E(X) =

∫RxdPX

and

E(X2) =

∫Rx2dPX .

Consequently, the expectation and the variance of X can be computed without knowing theprobability space (Ω,A, P ), provided one knows the distribution of X.

Proof. We first assume that f is elementary, i.e. of the form f =∑k

i=1 αi1Bi with αi ∈ Rand Bi ∈ B. Then f(X) =

∑ki=1 αi1X∈Bi and consequently

E(f(X)) =

k∑i=1

αiPX ∈ Bi

=k∑i=1

αiPX(Bi)

=

∫EfdPX .

Now let f ≥ 0. Then we know that there exists a sequence (fn)n∈N of elementary functionssuch that 0 ≤ fn f . It follows that 0 ≤ fn(X) f(X), and consequently

E(f(X)) = limn→∞

E(fn(X))

= limn→∞

∫EfndPX

=

∫EfdPX .

Finally let f be arbitrary. Then we have

E(f+(X)) =

∫Ef+dPX

and

E(f−(X)) =

∫Ef−dPX ,

and the claim follows.

Remark 2.19. The proof used a typical technique called measure theoretic induction.

7

3 Important inequalities

Proposition 3.1 (Chebyshev’s inequality6). For all random variables X and all α, p > 0one has

P|X| ≥ α ≤ 1

αpE(|X|p)

In particular, replacing X with X − E(X) and choosing p = 2, one obtains

P|X − E(X)| ≥ α ≤ 1

α2Var(X).

Proof.

E(|X|p) =

∫Ω|X|pdP

≥∫|X|≥α

|X|pdP

≥ αpP|X| ≥ α.

Proposition 3.2 (Holder’s inequality7). Let X and Y be two random variables definedon the same probability space (Ω,A, P ), and let p, q > 1 such that 1/p+ 1/q = 1. Then

E(|XY |) ≤ E(|X|p)1/pE(|Y |q)1/q.

Proof. See any book on measure and integration theory.

Corollary 3.3. Let X be a random variable and q ≥ p > 0. Then

E(|X|p) ≤ E(|X|q)p/q.

In particular, if X belongs to Lq, then it also belongs to Lp.

Proof. We may assume that q > p, so that r := q/p > 1. Let s > 1 be such that1/r + 1/s = 1. Then Holder’s inequality implies that

E(|X|p) = E(|X|p · 1)

≤ E(|X|pr)1/rE(1s)1/s

= E(|X|q)p/q.

Proposition 3.4 (Jensen’s inequality8). Let X be a an integrable random variable andq : R→ R a convex function. Then E(q(X)) exists and

E(q(X)) ≥ q(E(X)).

6P.-L. Tchebychef, Sur les valeurs moyennes. J. Math. Pures Appl. 12 (1867), 177–184. (PafnutyLvovich Chebyshev (1821–1894), Russian mathematician)

7O. Holder, Ueber einen Mittelwerthssatz. Nachr. Konigl. Ges. Wiss. Georg-Augusts-Univ. Gottingen1889, 38–47. (Otto Holder (1859–1937), German mathematician)

8J. L. W. V. Jensen, Sur les fonctions convexes et les inegalites entre les valeurs moyennes. ActaMath. 30 (1906), 175–193. (Johan Ludwig William Valdemar Jensen (1859–1925), Danish mathematicianand engineer)

8

Proof. Since q is convex, we have for all x, y ∈ R

q(y) ≥ q(x) + q′+(x)(y − x), (2)

where q′+(x) is the right-hand derivative of q at x (which exists since q is convex). (Thesame statement is true if we replace q′+(x) with q′−(x), the left-hand derivative of q at x.)By choosing x = E(X) and y = X in (2) it follows that

q(X) ≥ q(E(X)) + q′+(E(X))(X − E(X)). (3)

By assumption, the right-hand side of this inequality is integrable, so that in particularthe integral of its negative part is finite. Therefore E(q(X)−) is finite as well, so thatE(q(X)) exists. Taking expectations on both sides of (3) yields the claim.

4 Independence

“Probability theory = measure theory + independence.”

4.1 Independent events and random variables

Definition 4.1. Let (Ω,A, P ) be a probability space and I an arbitrary index set. A col-lection (Ai)i∈I of events (i.e. Ai ∈ A for all i ∈ I) is said to be independent if for everyfinite subset J of I one has

P

(⋂i∈J

Ai

)=∏i∈J

P (Ai). (4)

Remark 4.2. One cannot ask (4) to hold for all subsets J of I because for general Jthe set

⋂i∈J Ai need not belong to A. One may however replace the word “finite” with

“countable”.

Definition 4.3.

1. A collection (Ei)i∈I of sets of events (i.e. Ei ⊆ A for all i ∈ I) is said to be independentif for every finite subset J of I and every choice of events Ai ∈ Ei (i ∈ J) property (4)holds.

2. A collection (Xi)i∈I of random variables on (Ω,A, P ) (Xi taking values in a mea-surable space (Ei,Bi)) is independent if the family (σ(Xi))i∈I is independent, whereσ(Xi) ⊆ A is the σ-field generated by Xi.

Remark 4.4. Since σ(Xi) = Xi ∈ Bi |Bi ∈ Bi, the family (Xi)i∈I is independentif and only if for all choices Bi ∈ Bi (i ∈ I) the family of events (Xi ∈ Bi)i∈I isindependent, i.e. if for every finite subset J of I and every choice of sets (Bi ∈ Bi)i∈J ,

P

(⋂i∈JXi ∈ Bi

)=∏i∈J

PXi ∈ Bi. (5)

9

4.2 Existence of independent random variables with given distributions,product measure

Let (Ei,Bi) be a family of measurable spaces, and for each i ∈ I let µi be a probabilitymeasure on (Ei,Bi).Question: Does there exist a probability space (Ω,A, P ) and random variables (Xi)i∈I onit (Xi taking values in (Ei,Bi)) such that

1. the family (Xi)i∈I is independent, and

2. the law of Xi equals µi for all i ∈ I?

If I is finite, the answer is yes. Namely one can take:

• Ω :=∏i∈I Ei (the Cartesian9 product),

• Xi = πi, where πi : Ω→ Ei is the ith projection, defined by πi((ωj)j∈I) := ωi.

• A :=⊗

i∈I Bi the product σ-field, i.e. the smallest σ-field on Ω such that all projec-tions πi : Ω→ Ei are measurable,

• P :=⊗

i∈I µi the product measure, i.e. the unique measure P on Ω such that forevery choice of sets (Bi ∈ Bi)i∈I

P

(∏i∈I

Bi

)=∏i∈i

µi(Bi).

In the general case one has to be more careful because then∏i∈I Bi need not belong to A.

Therefore we define:

Definition 4.5. A probability measure P on (Ω,A) is called a product measure of theprobability measures (µi)i∈I if for every non-empty finite subset J of I and every choiceof sets (Bi ∈ Bi)i∈J , one has

P

∏i∈J

Bi ×∏i∈I\J

Ei

=∏i∈J

µi(Bi).

Theorem 4.6. There is always a unique product measure.

Proof. The basic idea is the same as for the case of finite I, namely to apply Caratheodory’s10

extension theorem. To do so one has to show that

1. the set S of sets of the form∏i∈J Bi ×

∏i∈I\J Ei is a semi-ring, i.e.

(a) ∅ ∈ S,

(b) for all A,B ∈ S, we have A ∩B ∈ S, and

(c) for all A,B ∈ S, there exist finitely many pairwise disjoint sets K1, . . . ,Kn ∈ Ssuch that A \B =

⋃ni=1Ki, and

2. the function P : S → R+ is a pre-measure, i.e.

9Rene Descartes (1596–1650), French philosopher and mathematician10Constantin Caratheodory (1873–1950), Greek mathematician

10

(a) P (∅) = 0,

(b) for every countable collection (Ai)i∈I of pairwise disjoint elements of S whoseunion

⋃i∈I Ai also belongs to S, one has

P

(⋃i∈I

Ai

)=∑i∈I

P (Ai).

Once one has shown this, the result follows immediately from Caratheodory’s extensiontheorem. Since the details are very technical, they are omitted. They can be found e.g.in the book Wahrscheinlichkeitstheorie by Bauer or the book Probability with Martingalesby Williams.

Remark 4.7. A family (Xi)i∈I of random variables (each Xi taking values in a measurablespace (Ei,Bi)) can be regarded as one random variable X with values in the product space(∏i∈I Ei,

⊗i∈I Bi). (Check that X is indeed measurable with respect to A and

⊗i∈I Bi!)

The joint distribution of the family (Xi)i∈I is then the image measure of P under X (hencea probability measure on (

∏i∈I Ei,

⊗i∈I Bi)).

Proposition 4.8. A family (Xi)i∈I of random variables (each Xi taking values in a mea-surable space (Ei,Bi)) is independent if and only if their joint distribution is equal to theproduct of their distributions.

Proof. Let us first assume that the variables (Xi)i∈i are independent. To show thatP(Xi)i∈I =

⊗i∈I PXi , it suffices to show that these two measures agree on every set of

the form∏i∈J Bi ×

∏i∈I\J Ei, where J ⊆ I is finite and Bi ∈ Bi for every i ∈ J . Using

the independence of the variables Xi (in the form of (5)) and the definition of the productmeasure, one obtains

P(Xi)i∈I

∏i∈J

Bi ×∏i∈I\J

Ei

= PXi ∈ Bi ∀i ∈ J

=∏i∈J

PXi ∈ Bi

=∏i∈J

PXi(Bi)

=

(⊗i∈I

PXi

)∏i∈J

Bi ×∏i∈I\J

Ei

.

Conversely, let us now assume that P(Xi)i∈I =⊗

i∈I PXi . Then we have to show that

P

(⋂i∈JXi ∈ Bi

)=∏i∈J

PXi ∈ Bi

11

for every finite subset J of I and every choice of sets (Bi ∈ Bi)i∈J :

P

(⋂i∈JXi ∈ Bi

)= P(Xi)i∈I

∏i∈J

Bi ×∏i∈I\J

Ei

=

(⊗i∈I

PXi

)∏i∈J

Bi ×∏i∈I\J

Ei

=

∏i∈J

PXi(Bi)

=∏i∈J

PXi ∈ Bi.

4.3 Products of independent random variables

Proposition 4.9. Let X1, . . . , Xn be independent random variables on the same probabilityspace (Ω,A, P ) which are all nonnegative or all integrable. Then

E

(n∏i=1

Xi

)=

n∏i=1

E(Xi).

Moreover, in the second case, the product∏ni=1Xi is integrable as well.

Proof. Using the transformation lemma, the independence of the random variables Xi andTonelli’s theorem11 we obtain

E

(n∏i=1

|Xi|

)=

∫Rn

n∏i=1

|xi| dP(X1,...,Xn)

=

∫Rn

n∏i=1

|xi| d

(n⊗i=1

PXi

)

=n∏i=1

∫R|xi|dPXi

=n∏i=1

E(|Xi|),

which shows the first statement for nonnegative variables as well as the second statement.Consequently, if the Xi are all integrable, we may drop the absolute value signs, and thethird step in the above computation is now justified by Fubini’s12 instead of Tonelli’stheorem.

Corollary 4.10. Independent integrable random variables are pairwise uncorrelated.

11L. Tonelli, Sull’integrazione per parti. Atti Accad. Naz. Lincei 18 (1909), 246–253. (Leonida Tonelli(1885–1946), Italian mathematician)

12G. Fubini, Sugli integrali multipli. Rend. Acad. Lincei 16 (1907), 608–614. (Guido Fubini (1879–1943),Italian mathematician)

12

5 Convergence of random variables and distributions

Definition 5.1. Let (Xn)n∈N be a sequence of Rd-valued random variables and X anotherRd-valued random variable, all defined on the same probability space (Ω,A, P ). We saythat the sequence (Xn)n∈N converges to X

1. almost surely if PXn → X = 1,

2. in probability if for all ε > 0 we have limn→∞ P|Xn −X| > ε = 0,

3. in Lp (p ≥ 1) if E(|Xn −X|p)→ 0.

Proposition 5.2. Let p ≤ q. Then convergence in Lq implies convergence in Lp.

Proof. By Corollary 3.3 we have

E(|Xn −X|p) ≤ E(|Xn −X|q)p/q → 0.

Proposition 5.3. Convergence in Lp for some p ≥ 1 implies convergence in probability.

Proof. Using Chebyshev’s inequality we obtain

P|Xn −X| > ε ≤ 1

εpE(|Xn −X|p)→ 0.

Lemma 5.4. The sequence (Xn)n∈N converges almost surely to X if and only if for allε > 0

limn→∞

P

supm≥n|Xm −X| > ε

= 0.

Proof.

Xn → X a.s. ⇔ PXn → X = 1

⇔ P

[∀k ∈ N : ∃n ∈ N : ∀m ≥ n : |Xm −X| ≤

1

k

]= 1

⇔ P

⋂k∈N

⋃n∈N

⋂m≥n

|Xm −X| ≤

1

k

= 1

⇔ ∀k ∈ N : P

⋃n∈N

⋂m≥n

|Xm −X| ≤

1

k

= 1

⇔ ∀ε > 0 : P

⋃n∈N

⋂m≥n|Xm −X| ≤ ε

= 1

⇔ ∀ε > 0 : P

⋂n∈N

⋃m≥n|Xm −X| > ε

= 0

⇔ ∀ε > 0 : limn→∞

P

⋃m≥n|Xm −X| > ε

= 0

⇔ ∀ε > 0 : limn→∞

P [∃m ≥ n : |Xm −X| > ε] = 0

⇔ ∀ε > 0 : limn→∞

P

[supm≥n|Xm −X| > ε

]= 0.

13

Corollary 5.5. Almost sure convergence implies convergence in probability.

Proof. Use the preceding lemma and note that

|Xn −X| ≤ supm≥n|Xm −X|.

Warning. Almost sure convergence does not convergence in Lp, nor does convergence inLp imply almost sure convergence.

Example 5.6. Let Ω := [0, 1], A its Borel σ-field, P = Lebesgue measure and Xn :=n2 · 1(0,1/n). Then Xn → 0 a.s., but E(|Xn|p) = n2p−1 →∞, so that Xn does not convergeto 0 in Lp.

Example 5.7. Convergence in Lp does not imply almost sure convergence.

Definition 5.8. Let E be a topological space equipped with its Borel σ-field B(E).

1. A sequence (µn)n∈N of probability measures on E converges weakly to another prob-ability measure µ if ∫

Efdµn →

∫Efdµ

for all bounded continuous functions f : E → R.

2. A sequence (Xn)n∈N of E-valued random variables (not necessarily defined on thesame probability space) converges in distribution to another E-valued random vari-able X if the sequence of their distributions converges weakly to the distributionof X. By the transformation lemma this holds if and only if

E(f(Xn))→ E(f(X))

for all bounded continuous functions f : E → R.

Proposition 5.9. Let E = Rd. Then convergence in probability implies convergence indistribution.

Proof. We have to show that

E(f(Xn))→ E(f(X))

for all f ∈ Cb(Rd). Choose f ∈ Cb(Rd) and ε > 0. We will show that

|E(f(Xn))− E(f(X))| ≤ (4‖f‖∞ + 1)ε

for all sufficiently large n ∈ N:

Since BR(0) Rd, we have PX ∈ BR(0) → 1 as R → ∞. Consequently, there ex-ists R > 0 such that

P (|X| > R) ≤ ε.

Since f is uniformly continuous on the compact set B2R(0), there exists δ ∈ (0, R] suchthat

|f(x)− f(y)| ≤ ε

for all x, y ∈ B2R(0) such that |x − y| ≤ δ. Moreover, the convergence in probability ofXn to X implies the existence of n0 ∈ N such that

P [|Xn −X| > δ] ≤ ε

14

for all n ≥ n0. It follows that

|E(f(Xn))− E(f(X))|≤ E (|f(Xn)− f(X)|)

=

∫Ω|f(Xn)− f(X)|dP

=

∫|X|>R

|f(Xn)− f(X)|dP +

∫|X|≤R, |Xn−X|≤δ

|f(Xn)− f(X)|dP

+

∫|X|≤R, |Xn−X|>δ

|f(Xn)− f(X)|dP

≤ 2‖f‖∞P|X| > R+ ε+ 2‖f‖∞P|Xn −X| > δ≤ (4‖f‖∞ + 1)ε

for all n ≥ n0.

6 Laws of large numbers

Definition 6.1 (Law of large numbers). Let (Xn)n∈N be a sequence of integrable randomvariables with the same expectation, E(Xn) = µ for all n ∈ N. We say that the sequencesatisfies the strong, weak, Lp (p ≥ 1) law of large numbers if 1

n

∑ni=1Xi converges to µ

almost surely, in probability, in Lp.

Remark 6.2. The results of the preceding section imply that both the strong and the Lp

law of large numbers imply the weak law.

Example 6.3. Let (Xn)n∈N be a sequence of independent and identically distributedsquare-integrable random variables. Then it is very easy to show that it satisfies the L2

and consequently also the weak law of large numbers (exercise!). To prove that it satisfiesthe strong law is much harder (see e.g. Theorem 12.1 in Bauer’s book). We will later seea proof based on martingale theory.

7 Conditional expectation

Let us come back to the fair game example of the beginning of this course. There we gavethe preliminary definition that a sequence (Xn)n∈N of random variables is a fair game iffor all n ∈ N and all a1, . . . , an ∈ R such that P [X1 = a1, . . . , Xn = an] > 0 we have

E [Xn+1 |X1 = a1, . . . , Xn = an] = an.

Here the elementary conditional expectation of a random variable X given an event B ofpositive probability is defined as

E(X|B) :=E(X · 1B)

P (B).

Clearly this definition only makes sense if indeed P (B) > 0. Consequently, the above fairgame definition works well if the variables (Xn)n∈N are all discrete, but not in generalbecause it can happen that P [X1 = a1, . . . , Xn = an] = 0 for all a1, . . . , an ∈ R. Wetherefore need a more general concept of conditional expctation.

15

Let X be a quasi-integrable random variable on a probability space (Ω,A, P ), and let(Bi)i∈I be a countable collection of pairwise disjoint events with P (Bi) > 0 for all i ∈ Iand such that

⋃i∈I Bi = Ω. Then the conditional expectation of X given Bi is

E(X|Bi) =1

P (Bi)E(X · 1Bi).

Let nowX0 :=

∑i∈I

E(X|Bi)1Bi .

Then X0 is a random variable on (Ω,A, P ) satisfying

X0(ω) = E(X|Bi) if ω ∈ Bi.

In particular, X0 is constant on each Bi and therefore measurable with respect to C :=σ((Bi)i∈I) =

⋃i∈J Bi | J ⊂ I

. Moreover, for each i ∈ I,∫

Bi

X0dP = E(X|Bi)P (Bi) = E(X · 1Bi) =

∫Bi

XdP,

and consequently ∫CX0dP =

∫CXdP

for all C ∈ C.

Definition 7.1. Let X be a quasi-integrable random variable on a probability space(Ω,A, P ), and let C be a sub-σ-field of A. A quasi-integrable random variable X0 on(Ω,A, P ) is called a conditional expectation of X given C if

1. X0 is measurable with respect to C, and

2.∫C X0dP =

∫C XdP for all C ∈ C.

To prove existence and uniqueness of the conditional expectation we need the Radon-Nikodym theorem:13

Theorem 7.2 (Radon-Nikodym). Let (Ω,A) be a measurable space, µ a σ-finite measureon it and ν a signed measure which is absolutely continuous with respect to µ, i.e. ν(A) = 0whenever µ(A) = 0. Then ν has a density with respect to µ, i.e. there exists a quasi-integrable function f : Ω→ R such that

ν(A) =

∫Afdµ

for all A ∈ A. Moreover, f is unique µ-almost everywhere.

Theorem 7.3. A conditional expectation always exists. Moreover, it is almost surelyunique (i.e. if X0 and X ′0 are two conditional expectations, then X0 = X ′0 almost surely).

13J. Radon, Theorie und Anwendung der absolut additiven Mengenfunktionen. Sitzungsber. Kais. Akad.Wiss. Wien, Math.-Naturwiss. Kl. 112 (1913), 1295–1438. O. Nikodym, Sur une generalisation desintegrales de M. J. Radon. Fund. Math. 15 (1930), 131–179. (Johann Radon (1887–1956), Austrianmathematician; Otto Nikodym (1887–1974), Polish mathematician)

16

Proof. Let Q be the signed measure with density X with respect to P , i.e.

Q(A) :=

∫AXdP.

Then a C-measurable quasi-integrable random variable X0 is a conditional expectation ofX given C if and only if ∫

CX0 dP = Q(C)

for all C ∈ C, i.e. if and only if X0 is a density of Q|C with respect to P |C . Since Q|C isabsolutely continuous with respect to P |C (because Q is absolutely continuous with respectto P ), we can apply the Radon-Nikodym theorem which tells us that such a density existsand is almost surely unique.

Notation. The conditional expectation of X with respect to C is denoted by EC(X) orE(X|C).

Warning. Contrary to the usual expectation and the elementary conditional expectation,the conditional expectation of a random variable with respect to a sub-σ-field is NOT areal number, but again a random variable. Moreover, it is only almost surely uniquelydetermined.

Proposition 7.4. The conditional expectation has the following properties:

1. If X is integrable, then so is E(X|C) (in particular it is almost surely finite).

2. If C = ∅,Ω, then E(X|C) = E(X).

3. If C = A, then E(X|C) = X.

4. Let (Bi)i∈I be a countable collection of pairwise disjoint events with P (Bi) > 0 forall i ∈ I and such that

⋃i∈I Bi = Ω, and let C := σ((Bi)i∈I) =

⋃i∈J Bi | J ⊂ I

.

ThenE(X|C) =

∑i∈I

E(X|Bi)1Bi .

5. E(E(X|C)

)= E(X) (often useful!).

6. If X is C-measurable, then E(X|C) = X.

7. If X and Y are integrable and α, β ∈ R, then E(αX+βY | C) = αE(X|C)+βE(Y |C).

8. If X ≤ Y , then E(X|C) ≤ E(Y |C).

9. |E(X|C)| ≤ E(|X| | C).

10. Whenever 0 ≤ Xn X, one has E(Xn|C) E(X|C) (conditional Beppo Levi).

11. If Xn → X a.e. and there exists an integrable random variable Y with |Xn| ≤ Y forall n ∈ N, then E(Xn|C)→ E(X|C) (conditional Lebesgue).

12. If X is integrable and q : R → R is convex, then q(X) is quasi-integrable (as wealready know), and

E(q(X)|C) ≥ q(E(X|C))

17

(conditional Jensen). Since both sides of this inequality are quasi-integrable, it fol-lows that

E(q(E(X|C))) ≤ E(q(X)).

In particular, for p ≥ 1,E(|E(X|C)|p) ≤ E(|X|p).

In particular, if X belongs to Lp, then so does E(X|C).

13. Let C1 ⊆ C2 ⊆ A. Then

E (E(X|C2) | C1) = E(X|C1).

Proposition 7.5. Let X and Y be both nonegative or such that X, Y and XY are allintegrable. If X is C-measurable, then

E(XY |C) = XE(Y |C).

“Measurable factors can be taken out of the conditional expectation.”

Proof. Let Y0 := E(Y |C). By the definition of conditional expectation we have to showthat ∫

CXY dP =

∫CXY0dP

for all C ∈ C. We proceed in several steps:

1. X is the indicator function of some C-measurable set C, X = 1C . Then∫C

1CY dP =

∫C∩C

Y dP =

∫C∩C

Y0dP =

∫C

1CY0dP.

2. X is elementary: This follows from 1. by linearity.

3. X and Y are both nonnegative: Let (Xn)n∈N be a sequence of nonnegative elemen-tary C-measurable random variables such that Xn X. Then XnY XY andXnY0 XY0 so that by 2.∫

CXY dP =

∫C

limn→∞

XnY dP

= limn→∞

∫CXnY dP

= limn→∞

∫CXnY0dP

=

∫CXY0dP.

4. General case: Write XY = X+Y+ +X−Y− −X+Y− −X−Y+. Then∫CXY dP =

∫CX+Y+dP +

∫CX−Y−dP −

∫CX+Y−dP −

∫CX−Y+dP

=

∫CX+Y0+dP +

∫CX−Y0−dP −

∫CX+Y0−dP −

∫CX−Y0+dP

=

∫CXY0dP.

18

Corollary 7.6. Let X and Y be both nonegative or such that X, Y and XY are allintegrable. If X is C-measurable, then

E(XY ) = E(XE(Y |C)).

Proof. Take expectation on both sides of the previous proposition.

Proposition 7.7. If X is independent of C, then

E(X|C) = E(X).

Proof. Let C ∈ C. Then∫CXdP = E(X · 1C) = E(X)P (C) =

∫CE(X)dP.

Lemma 7.8. Let X,Y ∈ L2, and let Y be C-measurable. Write X0 := E(X|C). Then

E((X − Y )2) = E((X −X0)2) + E((X0 − Y )2).

Remark 7.9. In the case C = ∅,Ω, this simplifies to the well-known formula

E((X − a)2) = Var(X) + (E(X)− a)2

for all a ∈ R. “Mean squared error = variance + squared bias.”

Proof of Lemma 7.8. By Corollary 7.6 we have

E(XY ) = E(X0Y ) and E(XX0) = E(X20 ).

Consequently

E((X −X0)2) + E((X0 − Y )2) = E(X2)− 2E(XX0) + 2E(X20 )− 2E(X0Y ) + E(Y 2)

= E(X2)− 2E(XY ) + E(Y 2)

= E((X − Y )2).

Corollary 7.10. If X ∈ L2(Ω,A, P ), E(X|C) is the unique (up to almost sure equality)best approximation of X in the subspace L2(Ω, C, P ). In this sense E(X|C) is the bestpossible prediction of X given the information contained in C.

Remark 7.11. This observation leads to an alternative proof of existence for the condi-tional expectation using the projection theorem of Hilbert space14 theory.

Theorem 7.12 (Projection theorem). Let H be a Hilbert space (in our case H = L2(Ω,A, P )),x ∈ H (in our case x = X) and S a non-empty convex closed subset of H (in our caseS = L2(Ω, C, P ); being itself a Hilbert space it is complete and therefore closed).

1. There is a unique best approximation x0 to x in S, i.e. an element x0 of S satisfying

‖x− x0‖ ≤ ‖x− y‖

for all y ∈ S.

14D. Hilbert, J. von Neumann, L. Nordheim, Uber die Grundlagen der Quantenmechanik, Math. Ann.98 (1928), 1–30. (David Hilbert (1862–1943), German mathematician)

19

2. If S is a linear subspace of H, x0 is characterized by the property that x − x0 isorthogonal to S, i.e. 〈x− x0, y〉 = 0 for all y ∈ S. Picture!!!

Alternative proof of Theorem 7.3.

1. Assume first that X ∈ L2(Ω,A, P ), and let X0 be the best approximation to X inL2(Ω, C, P ). It satisfies E(XY ) = E(X0Y ) for all Y ∈ L2(Ω, C, P ). By choosingY = 1C we obtain ∫

CXdP =

∫CX0dP

for all C ∈ C so that X0 is a conditional expectation of X given C.

2. If X is nonnegative, choose a sequence (Xn)n∈N in L2(Ω,A, P ) such that 0 ≤ Xn X (use the fact that all elementary random variables belong to L2). Then thesequence (Xn0)n∈N is non-decreasing, so that X0 := limn→∞Xn0 exists, and∫

CXdP = lim

n→∞

∫CXndP = lim

n→∞

∫CXn0dP =

∫CX0dP

for all C ∈ C.

3. If X is quasi-integrable, choose X0 := X+0 −X−0.

8 Stochastic processes and martingales

Definition 8.1. Let (Ω,A, P ) be a probability space, (E,B) a measurable space and I aset. A stochastic process on (Ω,A, P ) with state space (E,B) and index set I is a family(Xt)t∈I of (E,B)-valued random variables on (Ω,A, P ).

Remark 8.2. Usually I ⊆ R, and then t ∈ I is interpreted as “time”. The two mostimportant cases are

1. I = N0, “discrete time”, and

2. I = R+, “continuous time”.

Definition 8.3. Let I be an ordered set, i.e. a set equipped with a binary relation ≤which is

1. reflexive, i.e. t ≤ t for all t ∈ I,

2. transitive, i.e. (s ≤ t and t ≤ u)⇒ s ≤ u, and

3. antisymmetric, i.e. (s ≤ t and t ≤ s)⇒ s = t.

(As above, usually I ⊆ R, in particular I = N0 or I = R+.) In this context we define:

1. A filtration on (Ω,A, P ) indexed by I is a family (Ft)t∈I of sub-σ-fields ofA satisfyingFs ⊆ Ft for all s ≤ t.

2. A filtered probability space is a quadruple (Ω,A, P, (Ft)t∈I) where (Ω,A, P ) is aprobability space and (Ft)t∈I a filtration on it.

3. A stochastic process (Xt)t∈I on (Ω,A, P ) is adapted to the filtration (Ft)t∈I if Xt isFt-measurable for each t ∈ I.

20

4. The canonical filtration of a stochastic process (Xt)t∈I is given by

FXt := σ(Xs, s ≤ t).

It is the smallest filtration with respect to which the process (Xt)t∈I is adapted.

Remark 8.4. Ft is often interpreted as “the information available at time t”. Adaptednessof a process (Xt)t∈I means that the information available at time t contains at least theinformation on the values of Xs for s ≤ t. The canonical filtration of a process (Xt)t∈Icontains exactly the information on the values of Xs for s ≤ t. If a stochastic process, butno filtration is specified, we work with the canonical filtration.

Definition 8.5. Let (Ω,A, P, (Ft)t∈I) be a filtered probability space. A real-valuedstochastic process (Mt)t∈I on (Ω,A, P ) is called a martingale, submartingale, super-martingale with respect to the filtration (Ft)t∈I if

1. it is adapted to (Ft)t∈I ,

2. E(|Mt|) <∞ for all t ∈ I, and

3. E(Mt|Fs) =, ≥, ≤Ms for all s ≤ t.

Remark 8.6.

1. Martingales describe fair games, supermartingales describe “real” games (i.e. gamesyou can play in a casino). Submartingales describe games from the point of view ofthe casino.

2. The third condition in the previous definition can also be formulated as∫AMtdP =, ≥, ≤

∫AMsdP (6)

for all s ≤ t and all A ∈ Fs (hence without explicitly using the notion of conditionalexpectation).

3. Let (Ft)t∈I be a smaller filtration (i.e. Ft ⊆ Ft for all t ∈ I). If (Mt)t∈I is amartingale (submartingale, supermartingale) with respect to (Ft)t∈I and adapted to(Ft)t∈I , then it is also a martingale (submartingale, supermartingale) with respectto (Ft)t∈I . (Use (6).)

In the sequel we will be mainly interested in the case of discrete time, i.e. I = N0. (Thestudy of martingales in the case of continuous time, i.e. I = R+, is the topic of stochasticanalysis, see next semester.)

Proposition 8.7. Let (Mn)n∈N0 be a martingale (submartingale, supermartingale) withrespect to a filtration (Fn)n∈N0, and let (Xn)n∈N0 be a bounded adapted process whichwe assume to be non-negative if (Mn)n∈N0 is only a sub- or supermartingale. Then the(sub-/super-)martingale transform of M by X defined by

(X ·M)n :=

n∑k=1

Xk−1(Mk −Mk−1)

is also a martingale (submartingale, supermartingale).

21

Proof.

E (((X ·M)n+1 − (X ·M)n) | Fn) = E (Xn(Mn+1 −Mn) | Fn)

= XnE ((Mn+1 −Mn) | Fn)

=, ≥, ≤ 0.

Example 8.8. Let Mn be the price of a financial asset at time n, and let Xn be thenumber of shares you hold from time n to time n+1 (since Xn has to be known at time n,the process (Xn)n∈N0 should be adapted). Then (X ·M)n is the amount of money youhave won (or lost) between time 0 and time n, and the proposition says that if the originalasset is a martingale (submartingale, supermartingale), then so is your own value process.(If the original asset is only a sub- or supermartingale, one has to assume that Xn ≥ 0 forall n ∈ N0, i.e. that no short-selling (Leerverkauf, vente a decouvert) occurs.

Definition 8.9. Let (Ω,A, P, (Fn)n∈N0) be a filtered probability space. A stochasticprocess (Xn)n∈N is predictable if

1. X0 is constant, and

2. Xn is Fn−1-measurable for all n ≥ 1.

“The value of Xn can be predicted given the information available at time n− 1.”

Lemma 8.10. Let (Mn)n∈N0 be a predictable martingale. Then it is constant, i.e. thereexists c ∈ R such that Mn = c for all n ∈ N0.

Proof. By induction: Since (Mn)n∈N0 is predictable, there exists a constant c ∈ R such thatM0 = c. Suppose we have proved that Mn = c. Then predictability and the martingaleproperty imply that

Mn+1 = E(Mn+1|Fn) = Mn = c.

Proposition 8.11 (Doob-Meyer decomposition). Let (Ω,A, P, (Fn)n∈N0) be a filteredprobability space and (Xn)n∈N0 an adapted integrable process. Then there exists a martin-gale (Mn)n∈N0 and a predictable process (An)n∈N0 such that

Xn = Mn +An

for all n ∈ N0. Moreover,

1. this decomposition is unique up to an additive constant (if Xn = Mn+ An is anotherdecomposition of this type, there exists a constant c ∈ R such that for all n ∈ N0 wehave Mn = Mn + c and An = An − c).

2. If the process (Xn)n∈N0 is a submartingale (supermartingale), the process (An)n∈N0

is increasing (decreasing), i.e. An+1 ≥ An (An+1 ≤ An) for all n ∈ N0.

Proof. Uniqueness up to an additive constant is clear from the preceding lemma (theprocess Mn−Mn = An−An is a predictable martingale and hence constant). For existencelet

An :=

n∑k=1

E(Xk −Xk−1|Fk−1)

andMn := Xn −An.

22

Clearly, A0 = 0 (in particular it is constant), and An is Fn−1-measurable for all n ∈ N,so that the process (An)n∈N0 is predictable. Moreover, the process (Mn)n∈N0 is clearlyadapted and integrable, and for n ≥ 1

E(Mn −Mn−1|Fn−1) = E(Xn −Xn−1 − (An −An−1)|Fn−1)

= E(Xn −Xn−1 − E(Xn −Xn−1|Fn−1)|Fn−1)

= 0,

so that the process (Mn)n∈N0 is indeed a martingale. Finally, if the process (Xn)n∈N0 isa submartingale (supermartingale), we have An −An−1 = E(Xn −Xn−1|Fn−1) ≥ 0 (≤ 0)for all n ≥ 1.

9 Stopping times

Definition 9.1. Let (Ω,A, P, (Fn)n∈N0) be a filtered probability space. A random vari-able T on (Ω,A, P, (Fn)n∈N0) with values in N0 ∪ ∞ is called a stopping time if for alln ∈ N0 we have

T ≤ n ∈ Fn.

Interpretation. At any time n we know whether T has occurred.

Remark 9.2. An equivalent condition is that

T = n ∈ Fn

for all n ∈ N0.

Proof. Use T = n = T ≤ n \ T ≤ n− 1 and T ≤ n =⋃nk=1T = k.

Example 9.3. Let (E,B) be a measurable space, (Xn)n∈N0 an adapted (E,B)-valuedstochastic process, B ∈ B, and

TB :=

infn ∈ N0|Xn ∈ B if such an n exists∞ otherwise

the first hitting time of B. TB is a stopping time because

T ≤ n = ∃k ≤ n : Xk ∈ B =n⋃k=0

Xk ∈ B ∈ Fn

On the other hand, the last hitting time of B is not in general a stopping time.

Notation. Let T be a stopping time on a filtered probability space (Ω,A, P, (Fn)n∈N0)and (Xn)n∈N0 an adapted (E,B)-valued stochastic process. We write

FT := A ∈ A|A ∩ T ≤ n ∈ Fn for all n ∈ N0

andXT (ω) := XT (ω)(ω) if T (ω) <∞

Exercise. Show that FT is a σ-field and that T and XT : T < ∞ → E are FT -measurable.

23

Proposition 9.4 (Optional stopping). Let (Mn)n∈N0 be a martingale (submartingale,supermartingale), and let T be a stopping time. Then the stopped process (MT∧n)n∈N isa martingale (submartingale, supermartingale) as well.

Proof. Let Xn := 1T≥n+1. The process (Xn)n∈N0 is adapted because T ≥ n + 1 =T ≤ nc. Moreover, the martingale transform of M by X equals

(X ·M)n =n∑k=1

Xk−1(Mk −Mk−1)

=n∑k=1

1T≥k(Mk −Mk−1)

=

T∧n∑k=1

(Mk −Mk−1)

= MT∧n −M0,

so that MT∧n −M0 equals the martingale transform of M by X and is therefore a mar-tingale (submartingale, supermartingale) as well (note that the process (Xn)n∈N0 is non-negative).

Theorem 9.5 (Optional sampling theorem). Let (Xn)n∈N0 be a martingale (submartin-gale, supermartingale), and let S ≤ T be two bounded stopping times. Then

E(XT |FS) =, ≥, ≤XS .

Proof. (for supermartingales only) Let k be an upper bound for T . We have to show that∫AXT dP ≤

∫AXS dP

for all A ∈ FS .Let us first assume that T ≤ S + 1. Then∫

AXT dP =

∫A∩S=T

XT dP +

∫A∩S<T

XT dP

=

∫A∩S=T

XS dP +

k∑i=0

∫A∩S=i∩T>i

Xi+1 dP

≤∫A∩S=T

XS dP +k∑i=0

∫A∩S=i∩T>i

Xi dP

=

∫AXS dP

For the general case, let Si := (S+ i)∧T . Then S = S0 ≤ . . . ≤ Sk = T and Si+1 ≤ Si+ 1for all i ∈ 0, . . . , k − 1. Consequently,∫

AXT dP =

∫AXSk dP ≤ . . . ≤

∫AXS0 dP =

∫AXS dP.

Corollary 9.6. Let (Xn)n∈N0 be a martingale (submartingale, supermartingale), and letT1 ≤ . . . ≤ Tk be k bounded stopping times. Then the sequence (XTi)

ki=1 is a martingale

(submartingale, supermartingale) with respect to the filtration (FTi)ki=1.

24

Remark 9.7. If T is almost surely finite, but not bounded, the result is wrong.

Example 9.8. Let (ξi)i∈N be a sequence of independent random variables with P (ξi =1) = P (ξi = −1) = 1/2, and let Xn :=

∑ni=1 ξi. Then the process (Xn)n∈N is a martingale

(exercise!). Moreover, let S := 0 and T := infn ∈ N0 : Xn = 1. T is almost surely finitebecause the process (Xn)n∈N is a one-dimensional simple random walk which is known tobe recurrent. We have XT = 1 and consequently E(XT |F0) = 1 6= X0 = 0.

10 Maximal inequalities

Proposition 10.1. Let (Xn)n∈N be a submartingale on a filtered probability space(Ω,A, P, (Fn)n∈N0). Then for all n ∈ N0 and all c > 0

P

(max

0≤k≤nXk ≥ c

)≤ 1

cE(Xn · 1max0≤k≤nXk≥c

)≤ 1

cE(X+

n ).

Proof. Fix n ∈ N0, and let X∗n := maxXk : k ≤ n and

T := n ∧ infk ∈ N : Xk ≥ c.

T is a stopping time and T ≤ n. Therefore the optional sampling theorem implies that

E(Xn|FT ) ≥ XT

so that ∫AXndP ≥

∫AXTdP

for all A ∈ FT . We want to apply this to the event A := X∗n ≥ c, so we have to checkthat A ∈ FT , i.e. A ∩ T ≤ k ∈ Fk for all k ∈ N0. For k ≤ n we have

A ∩ T ≤ k = X∗n ≥ c ∩ X∗k ≥ c = X∗k ≥ c ∈ Fk ⊆ Fn,

and for k > n we haveA ∩ T ≤ k = X∗n ≥ c ∈ Fn.

Consequently A ∈ FT , so that

cPX∗n ≥ c ≤∫X∗n≥c

XTdP ≤∫X∗n≥c

XndP ≤ E(X+n ).

Proposition 10.2. Let (Xn)n∈N be a martingale on a filtered probability space(Ω,A, P, (Fn)n∈N0). Then for all p > 1, all n ∈ N0 and all c > 0

E

(max

0≤k≤n|Xk|p

)≤(

p

p− 1

)pE(|Xn|p).

Proof. Let Y := max0≤k≤n |Xk| and q := p/(p − 1) (such that 1/p + 1/q = 1). Since forn ≥ m we have

E (|Xn| | Fm) ≥ |E (Xn | Fm)| = |Xm|,

the process (|Xn|)n∈N0 is a submartingale. Consequently the previous proposition impliesthat for all c > 0

P (|Y | ≥ c) ≤ 1

cE(|Xn| · 1|Y |≥c).

25

Consequently

E(Y p) = E

(∫ Y

0ptp−1dt

)= E

(∫ ∞0

ptp−11Y≥tdt

)=

∫ ∞0

E(ptp−11Y≥t

)dt

=

∫ ∞0

ptp−1PY ≥ tdt

≤∫ ∞

0ptp−2E(|Xn|1|Y |≥t)dt

= E

(∫ ∞0

ptp−2|Xn|1|Y |≥tdt)

= E

(|Xn|

∫ Y

0ptp−2dt

)=

p

p− 1E(|Xn|Y p−1

)≤ p

p− 1E(|Xn|p)1/pE(Y (p−1)q)1/q

=p

p− 1E(|Xn|p)1/pE(Y p)(p−1)/p,

so that

E(Y p)p ≤(

p

p− 1

)pE(|Xn|p)E(Y p)p−1

and therefore

E(Y p) ≤(

p

p− 1

)pE(|Xn|p).

11 Martingale convergence theorems

Key observation: “A sequence of real numbers converges in R if and only if it does notcross any nontrivial interval infinitely often.”

11.1 Doob’s downcrossing inequality

Definition 11.1. Let (xn)n∈N0 be a sequence of real numbers, and let a < b. The numberof downcrossings of the interval [a, b] by the sequence (xn)n∈N is

γa,b((xn)n∈N0)

:= supn ∈ N0 : ∃k1 < . . . < k2n ∈ N0 : xk1 > b, xk2 < a, . . . , xk2n−1 > b, xk2n < a.

Lemma 11.2. Let (xn)n∈N0 be a sequence of real numbers. Then the following are equiv-alent:

1. The sequence converges in R.

2. γa,b((xn)n∈N0) is finite for all a < b ∈ R.

3. γa,b((xn)n∈N0) is finite for all a < b ∈ Q.

26

Proof. 1. ⇒ 2. Suppose that γa,b((xn)n∈N0) =∞ for some a < b. Then lim infn→∞ xn ≤ aand lim supn→∞ xn ≥ b, so that the sequence (xn)n∈N0 does not converge in R.

2. ⇒ 3. This is trivial.

3. ⇒ 1. Suppose that the sequence (xn)n∈N0 does not converge in R. Then lim infn→∞ xn <lim supn→∞ xn, so that there exist a, b ∈ Q such that lim infn→∞ xn < a < b < lim supn→∞ xn.Consequently γa,b((xn)n∈N0) =∞.

Lemma 11.3. Let X be a random variable with values in N0. Then

E(X) =

∞∑k=1

P (X ≥ k).

Proof.

∞∑k=1

P (X ≥ k) =∞∑k=1

∞∑l=k

P (X = l)

=

∞∑k=1

∞∑l=1

1l≥kP (X = l)

=

∞∑l=1

∞∑k=1

1l≥kP (X = l)

=∞∑l=1

l∑k=1

P (X = l)

=

∞∑l=1

lP (X = l)

= E(X).

Theorem 11.4 (Doob’s15, downcrossing inequality). Let (Xn)n∈N0 be a submartingale ona filtered probability space (Ω,A, P, (Fn)n∈N0), and let a < b. Then

E(γa,b((Xn)n∈N0)) ≤ 1

b− asupn∈N

E((Xn − b)+).

Proof. For k ∈ N let

γka,b((Xn)n∈N0)

:= supn ∈ N0 : ∃k1 < . . . < k2n ∈ 0, . . . , k : Xk1 > b,Xk2 < a, . . . ,Xk2n−1 > b,Xk2n < a

be the number of downcrossings of the interval [a, b] by the first k+1 terms of the sequence(Xn)n∈N. Clearly

0 ≤ γka,b((Xn)n∈N0) γa,b((Xn)n∈N0),

so that by Beppo Levi’s theorem

E(γa,b((Xn)n∈N0)) = limk→∞

E(γka,b((Xn)n∈N0)).

15Joseph L. Doob (1910–2004), American mathematician

27

Therefore it suffices to show that for each k ∈ N we have

E(γka,b((Xn)n∈N0)) ≤ 1

b− asupn∈N

E((Xn − b)+).

We now fix k ∈ N and let

T1 := (k + 1) ∧ infn ∈ N0 |Xn > b,T2 := (k + 1) ∧ infn > T1 |Xn < a,T3 := (k + 1) ∧ infn > T2 |Xn > b, etc.

Each Ti is a bounded stopping time (exercise!). Now let Ai := Ti ≤ k ∈ FTi (becauseTi is FTi-measurable). Then we have:

1. A1 ⊇ A2 ⊇ . . .,

2. T2l ≤ k ⇔ γka,b((Xn)n∈N0) ≥ l, so that P(γka,b((Xn)n∈N0) ≥ l

)= P (A2l),

3. XT2l−1> b on A2l−1 = T2l−1 ≤ k,

4. and XT2l < a on A2l = T2l ≤ k.

By the optional sampling theorem we have

E(XT2l − b | FT2l−1) ≥ XT2l−1

− b

and therefore ∫A2l−1

(XT2l−1− b)dP ≤

∫A2l−1

(XT2l − b)dP,

andE(Xk+1 − b | FT2l) ≥ XT2l − b

and therefore ∫A2l−1\A2l

(XT2l − b)dP ≤∫A2l−1\A2l

(Xk+1 − b)dP.

It follows that

0 ≤∫A2l−1

(XT2l−1− b)dP ≤

∫A2l−1

(XT2l − b)dP

=

∫A2l−1\A2l

(XT2l − b)dP +

∫A2l

(XT2l − b)dP

≤∫A2l−1\A2l

(Xk+1 − b)dP +

∫A2l

(a− b)dP

=

∫A2l−1\A2l

(Xk+1 − b)dP − (b− a)P (A2l),

so that

(b− a)P (A2l) ≤∫A2l−1\A2l

(Xk+1 − b)dP.

28

Consequently

E(γka,b((Xn)n∈N0)) =

∞∑l=1

P(γka,b((Xn)n∈N0) ≥ l

)=

∞∑l=1

P (T2l ≤ k)

=

∞∑l=1

P (A2l)

≤ 1

b− a

∞∑l=1

∫A2l−1\A2l

(Xk+1 − b)+dP.

Since A1 ⊇ A2 ⊇ . . ., the sets (A2l−1 \A2l)l∈N are pairwise disjoint, so that

∞∑l=1

∫A2l−1\A2l

(Xk+1 − b)+dP =

∫⋃l∈N(A2l−1\A2l)

(Xk+1 − b)+dP

≤ E((Xk+1 − b)+)

≤ supn∈N

E((Xn − b)+).

11.2 Doob’s convergence theorem

Lemma 11.5. For a submartingale (Xn)n∈N0 the following conditions are equivalent:

1. supn∈N0E((Xn)+) <∞,

2. supn∈N0E(|Xn|) <∞.

Proof. Since (Xn)+ ≤ |Xn|, it is clear that the second condition implies the first one.Conversely, since

|Xn| = X+n +X−n = 2X+

n −Xn,

we haveE(|Xn|) ≤ 2E(X+

n )− E(X0),

so that the second condition implies the first one.

Theorem 11.6 (Doob’s convergence theorem). Let (Xn)n∈N0 be a submartingale satisfyingsupn∈N0

E((Xn)+) < ∞ (or, equivalently, supn∈N0E(|Xn|) < ∞). Then the sequence

(Xn)n∈N0 converges almost surely to an integrable random variable X∞.

Proof. Since (Xn − b)+ ≤ (Xn)+ + b−, Doob’s downcrossing inequality implies thatE(γa,b((Xn)n∈N0)) is finite for all a < b ∈ R. Consequently, γa,b((Xn)n∈N0) is finite almostsurely for all a < b ∈ R. More precisely, letting Ωa,b := γa,b((Xn)n∈N0) < ∞, we haveP (Ωa,b) = 1 for all a < b ∈ R. Now let

Ω∗ :=⋂

a<b∈QΩa,b.

Then P (Ω∗) = 1, and on Ω∗ the sequence (Xn)n∈N converges to an R-valued randomvariable X∞. By Fatou’s lemma we have

E(|X∞|) ≤ lim infn→∞

E(|Xn|) <∞.

29

Now the following two questions naturally arise:

1. Does the convergence Xn → X∞ also take place in L1?

2. Letting F∞ := σ(⋃

n∈N0Fn), is the process (Xn)n∈N0∪∞ a submartingale with

respect to the filtration (Fn)n∈N0∪∞?

Example 11.7. Let (ξi)i∈N be a sequence of independent random variables with P (ξi =1) = P (ξi = −1) = 1/2, and let Xn :=

∑ni=1 ξi. Then the process (Xn)n∈N is a martingale

(exercise!). Now fix k > 0, let Tk := infn ∈ N0 : Xn = k and Yn := Xn∧Tk . By theoptional stopping theorem the process (Yn)n∈N0 is a martingale as well. Moreover, Yn ≤ kfor all n ∈ N0, so that we can apply Doob’s convergence theorem:

∃Y∞ : Yn → Y∞ a.s (n→∞).

Obviously, Y∞ = k. Since E(Yn) = 0 for all n ∈ N, but E(Y∞) = k > 0, we neither haveYn → Y∞ in L1, nor is the process (Yn)n∈N0∪∞ a martingale with respect to the filtration(Fn)n∈N0∪∞.

However, the answer to the above questions is YES for uniformly integrable martin-gales.

11.3 Uniformly integrable martingales

Lemma 11.8. A random variable X is integrable if and only if

limM→∞

∫|X|>M

|X|dP = 0. (7)

Proof. Assume first that (7) holds. Then∫|X|>M |X|dP is finite for some M , and con-

sequently

E(|X|) =

∫|X|≤M

|X|dP +

∫|X|>M

|X|dP

≤ M +

∫|X|>M

|X|dP <∞.

Conversely, assume that X is integrable. We have 0 ≤ |X| · 1|X|≤M X, so that byBeppo Levi’s theorem ∫

|X|≤M|X|dP E(|X|).

Consequently, since by assumption E(|X|) is finite,∫|X|>M

|X|dP = E(|X|)−∫|X|≤M

|X|dP → 0

as M →∞.

Definition 11.9. A family (Xi)i∈I of random variables is said to be uniformly integrableif it satisfies (7) uniformly in i, i.e. if

limM→∞

supi∈I

∫|Xi|>M

|Xi|dP = 0. (8)

30

Remark 11.10. Any finite family of integrable random variables is clearly uniformlyintegrable.

Proposition 11.11. Let A(ε) := A ∈ A |P (A) ≤ ε. A family (Xi) of random variablesis uniformly integrable if and only if

1. it is bounded in L1, i.e. supi∈I E(|Xi|) <∞, and

2.

limε→0

supA∈A(ε),i∈I

∫A|Xi|dP = 0. (9)

Proof of Proposition 11.11. Assume first that the family (Xi)i∈I is uniformly integrable.Then there exists M > 0 such that

∫|Xi|>M |Xi|dP ≤ 1 for all i ∈ I. Consequently,

E(|Xi|) =

∫|Xi|≤M

|Xi|dP +

∫|Xi|>M

|Xi|dP ≤M + 1,

so that the family (Xi)i∈I is indeed bounded in L1. To prove (9), we have to show thatfor any given δ > 0 there exists ε > 0 such that

supA∈A(ε),i∈I

∫A|Xi|dP ≤ δ.

To do so, we first choose M > 0 in such a way that

supi∈I

∫|Xi|>M

|Xi|dP ≤ δ/2,

and then choose ε := δ/(2M). Then for all A ∈ A(ε) and all i ∈ I we have∫A|Xi|dP =

∫A∩|Xi|≤M

|Xi|dP +

∫A∩|Xi|>M

|Xi|dP

≤ MP (A) +δ

2≤ δ,

which finishes the proof of the first implication.Now assume that the family (Xi)i∈I is bounded in L1 (say E(|Xi|) ≤ C for all i ∈ I)

and satisfies (9). Then by Chebyshev’s inequality for each i ∈ I

P|Xi| > M ≤ 1

ME(|Xi|) ≤

C

M,

i.e. |Xi| > M ∈ A(C/M). Consequently

supi∈I

∫|Xi|>M

|Xi|dP ≤ supA∈A(C/M),i∈I

∫A|Xi|dP → 0 (M →∞).

Example 11.12. Let X be an integrable random variable on a probability space (Ω,A, P )and A∗ the set of all sub-σ-fields of A. Then the family (E(X|F))F∈A∗ is uniformlyintegrable.

31

Proof. We have to show that

supF∈A∗

∫|E(X|F)|>M

|E(X|F)|dP → 0

as M →∞. Chebyshev’s inequality implies that for each sub-σ-field F of A we have

P|E(X|F)| > M ≤ 1

ME(|X|),

so that|E(X|F)| > M ∈ A(E(|X|)/M).

Consequently

supF∈A∗

∫|E(X|F)|>M

|E(X|F)|dP ≤ supF∈A∗

∫|E(X|F)|>M

E(|X||F)dP

= supF∈A∗

∫|E(X|F)|>M

|X|dP

≤ supA∈A(E(|X|)/M)

∫A|X|dP

→ 0 (M →∞).

Proposition 11.13 (Sufficient criteria for uniform integrability). Let (Xi) be a familyof random variables. Each of the following two conditions implies that it is uniformlyintegrable:

1. It is dominated in L1, i.e. there exists an integrable random variable X such that|Xi| ≤ X for all i ∈ I.

2. It is bounded in Lp for some p > 1, i.e. supi∈I E(|Xi|p) <∞

Proof. If the first condition is satisfied, the family (Xi)i∈I is clearly bounded in L1 (byE(|X|)). Moreover we have

supA∈A(ε),i∈I

∫A|Xi|dP ≤ sup

A∈A(ε)

∫A|X|dP → 0

because every single integrable random variable is uniformly integrable.

Now assume that the second condition holds. Then

supi∈I

∫|Xi|>M

|Xi|dP ≤ supi∈I

∫|Xi|>M

|Xi|p

Mp−1dP ≤ 1

Mp−1supi∈I

E(|Xi|p)→ 0 (M →∞).

Theorem 11.14 (Vitali’s16 convergence theorem). Let (Xn)n∈N be a sequence of integrablerandom variables, and let X be another integrable random variable. Then the followingare equivalent:

1. Xn → X in L1.

2. The sequence (Xn)n∈N is uniformly integrable, and Xn → X in probability.

16Giuseppe Vitali (1875–1932), Italian mathematician

32

Remark 11.15. Compare with Lebesgue’s dominated convergence theorem.

Proof of Theorem 11.14. 1. ⇒ 2. We already know that convergence in L1 implies con-vergence in probability. To show uniform integrability of the sequence (Xn)n∈N, we firstshow uniform integrability of the sequence (Xn−X)n∈N: For given ε > 0 let n0(ε) ∈ N besuch that E(|Xn −X|) ≤ ε for n > n0(ε). Now write

supn∈N

∫|Xn−X|>M

|Xn −X|dP

= max

(sup

n≤n0(ε)

∫|Xn−X|>M

|Xn −X|dP, supn>n0(ε)

∫|Xn−X|>M

|Xn −X|dP

).

Since any finite family of integrable random variables is uniformly integrable, the firstsupremum is ≤ ε for M large enough. Moreover, the second supremum is bounded bysupn>n0(ε)E(|Xn −X|) ≤ ε.

Uniform integrability of the sequence (Xn −X)n∈N implies that

limε→0

supA∈A(ε)

supn∈N

∫A|Xn −X|dP = 0,

and consequently

supA∈A(ε)

supn∈N

∫A|Xn|dP ≤ sup

A∈A(ε)supn∈N

∫A|Xn −X|dP + sup

A∈A(ε)

∫A|X|dP → 0 (ε→ 0),

so that the sequence (Xn)n∈N is uniformly integrable as well.

2. ⇒ 1. We first show that the sequence (Xn −X)n∈N is uniformly integrable as well:

supA∈A(ε)

supn∈N

∫A|Xn −X|dP ≤ sup

A∈A(ε)supn∈N

∫A|Xn|dP + sup

A∈A(ε)

∫A|X|dP → 0 (ε→ 0).

To prove that Xn → X in L1 we have to show that for each ε > 0 we have

E(|Xn −X|) ≤ ε

for all large enough n ∈ N. For each δ > 0 choose n0(δ) ∈ N in such a way that

P|Xn −X| > δ ≤ δ.

for all n ≥ n0(δ) (this is possible beacause of the convergence in probability of Xn → X).Then we have |Xn −X| > δ ∈ A(δ) for all n ≥ n0(δ), so that

supn≥n0(δ)

∫|Xn−X|>δ

|Xn −X|dP ≤ supA∈A(δ)

supn∈N

∫A|Xn −X|dP → 0 (δ → 0)

Now choose δ > 0 in such a way that δ ≤ ε/2 and

supn≥n0(δ)

∫|Xn−X|>δ

|Xn −X|dP ≤ε

2.

Then we have for all n ≥ n0(δ):

E(|Xn −X|) ≤∫|Xn−X|≤δ

|Xn −X|dP +

∫|Xn−X|>δ

|Xn −X|dP

≤ δ +ε

2≤ ε.

33

Theorem 11.16. Let (Xn)n∈N0 be a submartingale on a filtered probability space(Ω,A, P, (Fn)n∈N0), and let F∞ := σ(Fn, n ∈ N0).

1. It converges in L1 if and only if it is uniformly integrable.

2. In this case the process (Xn)n∈N0∪∞ (where X∞ := limn→∞Xn) is a submartingalewith respect to the filtration (Fn)n∈N0∪∞.

Proof. Convergence in L1 implies uniform integrability by Vitali’s convergence theorem.Conversely, assume that the submartingale is uniformly integrable. Then it is boundedin L1, so that by Doob’s convergence theorem it converges almost surely to an integrablerandom variable X∞. Together with uniform integrability this implies convergence in L1

by Vitali’s convergence theorem.Finally, to show that the process (Xn)n∈N0∪∞ is a submartingale, we have to show

that ∫AX∞dP ≥

∫AXndP

for all n ∈ N0 and all A ∈ Fn. Since Xn → X∞ in L1 we have∫AX∞dP = lim

m→∞

∫AXmdP ≥

∫AXndP.

Theorem 11.17. Let (Xn)n∈N0 be a martingale on a filtered probability space(Ω,A, P, (Fn)n∈N0), and let F∞ := σ(Fn, n ∈ N0). Then the following conditions areequivalent:

1. The martingale (Xn)n∈N0 is uniformly integrable.

2. It converges in L1 to some limit X∗.

3. It is closable, i.e. there exists an F∞-measurable random variable X∞ such that theprocess (Xn)n∈N0∪∞ is a martingale with respect to the filtration (Fn)n∈N0∪∞.

Moreover, in this case X∞ = X∗.

Proof. 1.⇔ 2.⇒ 3. These implications follow from the previous theorem.

3. ⇒ 1. Since the family (E(X∞|F))F sub-σ-field of A is uniformly integrable, the se-quence (Xn)n∈N is uniformly integrable as well.

To prove that X∞ = X∗, note that for all A ∈⋃m∈N0

Fm we have∫AX∗dP = lim

n→∞

∫AXndP =

∫AX∞dP. (10)

SinceA ∈ A |

∫AX

∗dP =∫AX∞dP

is a σ-field, it follows that (10) holds for all A ∈ F∞.

Since X∗ and X∞ are both F∞-measurable (X∗ being a limit of F∞-measurable randomvariables and X∞ by assumption), it follows that X∞ = X∗.

Theorem 11.18 (Lp-convergence theorem). Let (Xn)n∈N0 be a martingale satisfyingsupn∈N0

E(|Xn|p) <∞ for some p > 1. Then it converges almost surely and in Lp.

34

Proof. Since (Xn)n∈N0 is bounded in Lp, it is uniformly integrable and therefore convergesalmost surely to some limit X∞. To prove convergence in Lp, let Y := supn∈N0

|Xn|.Clearly,

|Xn −X∞|p ≤ (2Y )p.

By the second maximal inequality,

E(Y p) = E

(supn∈N0

|Xn|p)≤(

p

p− 1

)psupn∈N0

E(|Xn|p) <∞,

so that Y p ∈ L1. Consequently, Lebesgue’s dominated convergence theorem implies that

limn→∞

E(|Xn −X∞|p) = 0.

Remark 11.19. This is wrong for p = 1.

Example 11.20. Let (ξi)i∈N be a sequence of independent random variables such thatP (ξi = 1) = P (ξi = −1) = 1/2 for all i ∈ N. Let Xn := 1 +

∑ni=1 ξi and Yn := XT0∧n.

Then the process (Yn)n∈N0 is a non-negative martingale so that

E(|Yn|) = E(Yn) = E(Y0) = 1.

Moreover, it converges almost surely to 0, but not in L1 (because E(Yn) = 1 6= 0 for allN ∈ N0).

12 Backward martingales, sub- and supermartingales

So far we have mainly studied (sub-/super-)martingales with index set N0 = 0, 1, 2, . . ..In this section we study “backward (sub-/super-)martingales”, i.e. (sub-/super-)martin-gales with index set −N0 = . . . ,−2,−1, 0. In contrast to “usual” (sub)martingales,backward (sub)martingales always have the following nice properties:

1. Because of the convexity and monotonicity of the function x 7→ x+ we haveE(X+

0 |Fn) ≥ E(X0|Fn)+ ≥ X+n and therefore E(X+

n ) ≤ E(X+0 ) for all n ∈ −N0, so

that every backward submartingale satisfies

supn∈−N0

E(X+n ) = E(X+

0 ) <∞.

2. Every backward martingale is uniformly integrable (because the family (Xn)n∈−N0

is contained in the uniformly integrable family (E(X0) | C)C∈A∗).

Theorem 12.1 (Downcrossing inequality for backward submartingales). Let (Xn)n∈−N0

be a backward submartingale on a filtered probability space (Ω,A, P, (Fn)n∈−N0), and leta < b. Then

E(γa,b((Xn)n∈−N0)) ≤ 1

b− asupn∈−N0

E((Xn − b)+) =1

b− aE((X0 − b)+)

Proof. The first inequality can be obtained in the same way as the usual downcrossinginequality. For the second inequality note that convexity and monotonicity of the functionx 7→ x+ imply that

E((X0 − b)+|Fn) ≥ E(X0 − b|Fn)+ ≥ (Xn − b)+.

35

Corollary 12.2 (Convergence theorem for backward submartingales). Any backward sub-martingale (Xn)n∈−N0 converges almost surely as n→ −∞.

Warning. The limit need not be finite.

Example 12.3. Let Xn := n. Then (Xn)n∈−N0 is a submartingale which converges to −∞as n→ −∞.

Theorem 12.4 (Convergence theorem for backward martingales). Any backward mar-tingale (Xn)n∈−N0 converges almost surely and in L1 to E(X0|F−∞) as n → −∞, whereF−∞ :=

⋂n∈−NFn.

Proof. By the convergence theorem for backward submartingales the sequence (Xn)n∈−N0

converges almost surely to a (not necessarily finite) random variable X−∞ as n → −∞.Since the process (|Xn|)n∈−N0 is a submartingale, Fatou’s lemma implies that

E(|X−∞|) ≤ lim infn→−∞

E(|Xn|) ≤ E(|X0|),

so that X−∞ is integrable. Since the sequence (Xn)n∈−N0 is uniformly integrable (becauseXn = E(X0|Fn)) the convergence Xn → X−∞ takes also place in L1 (by Vitali’s theorem).To show that X−∞ = E(X0|F−∞), note that X−∞ is F−∞-measurable, so that it sufficesto check that ∫

AX−∞dP =

∫AE(X0|F−∞)dP

for all A ∈ F−∞: Thanks to L1-convergence we have∫AX−∞dP = lim

n→−∞

∫AXndP = lim

n→−∞

∫AE(Xn|F−∞)dP =

∫AE(X0|F−∞)dP.

13 Applications of the martingale convergence theorems

Definition 13.1. Let (Ω,A, P ) be a probability space and (An)n∈N0 a sequence of sub-σ-fields of A. Then the tail σ-field of the sequence (An)n∈N0 is defined as⋂

n∈N0

σ (Am,m ≥ n).

Theorem 13.2 (Kolmogorov’s17 zero-one law). Let (Ω,A, P ) be a probability space and(An)n∈N0 an independent sequence of sub-σ-fields of A. Then every event A belonging tothe tail σ-field of this sequence satisfies P (A) ∈ 0, 1.

Proof. For each n ∈ N0 ∪ ∞ let Fn := σ(A0, . . . ,An) and Xn := E(1A|Fn). Then thesequence (Xn)n∈N0∪∞ is a closed martingale, so that

Xn → X∞

almost surely and in L1. Since A is independent of Fn for each n ∈ N0, we have

Xn = E(1A|Fn) = E(1A) = P (A)

for each n ∈ N0. On the other hand, since A is measurable with respect to F∞, we have

X∞ = E(1A|F∞) = 1A.

Consequently, 1A = P (A), so that P (A) ∈ 0, 1.17Andrey Nikolaevich Kolmogorov (1903–1987), Russian mathematician

36

Corollary 13.3. Let (Xn)n∈N be a sequence of independent random variables. Thenlim supn→∞

1n

∑ni=1Xi and lim infn→∞

1n

∑ni=1Xi are almost surely constant.

Proof. Let T be the tail σ-field of the sequence (σ(Xn))n∈N. Since for every m ∈ N

lim supn→∞

1

n

n∑i=1

Xi = lim supn→∞

1

n

n∑i=m

Xi,

this lim sup is measurable with respect to T . The claim now follows from Kolmogorov’szero-one law.

Theorem 13.4 (Kolmogorov’s strong law of large numbers). Let (Xn)n∈N be a sequenceof independent and identically distributed random variables, and let Sn :=

∑ni=1Xi. Then

Snn→ E(X1) almost surely and in L1.

For the proof we need the following lemma:

Lemma 13.5. For each n ∈ N we have

E(X1 |σ(Sk, k ≥ n)) =Snn.

Proof. For reasons of symmetry

E(Xi |σ(Sk, k ≥ n)) = E(X1 |σ(Sk, k ≥ n))

for all i ∈ 1, . . . , n. Therefore

E(X1 |σ(Sk, k ≥ n)) =1

n

n∑i=1

E(Xi |σ(Sk, k ≥ n))

= E

(1

n

n∑i=1

Xi

∣∣∣σ(Sk, k ≥ n)

)

= E

(Snn

∣∣∣σ(Sk, k ≥ n)

)=

Snn.

Proof of Theorem 13.4. For each n ∈ N0 let F−n := σ(Sk, k ≥ n) and Y−n := Sn/n. Then(Fm)m∈−N is a filtration, and the process (Ym)m∈−N is adapted to it. Moreover by thelemma

Y−n =Snn

= E(X1|F−n),

so that (Ym)m∈−N is a martingale with respect to (Fm)m∈−N. By the convergence theoremfor backward martingales it converges almost surely and in L1 to E(X1|F−∞) as m→ −∞.By the previous corollary

E(X1|F−∞) = limn→∞

Snn

is almost surely constant and hence equal to E(X1).

37

14 Construction of stochastic processes with prescribed prop-erties

Recall the definition of a stochastic process:

Definition 14.1. Let (Ω,A, P ) be a probability space, (E,B) a measurable space and Ia set. A stochastic process on (Ω,A, P ) with state space (E,B) and index set I is a family(Xt)t∈I of (E,B)-valued random variables on (Ω,A, P ).

One is often interested in the following question: Given a set I and a measurable space(E,B), do there exist a probability space (Ω,A, P ) and an (E,B)-valued stochastic process(Xt)t∈I on (Ω,A, P ) having certain prescribed properties? Often these properties can beformulated in terms of the finite-dimensional distributions of the process.

Definition 14.2. Let (E,B) be a measurable space and (Xt)t∈I an (E,B)-valued stochas-tic process on a probability space (Ω,A, P ). Let I∗ be the set of non-empty finite subsetsof I. For each J ∈ I∗ we denote by XJ the EJ -valued random variable XJ := (Xt)t∈J andby PJ(X) its distribution, i.e. the joint distribution of the random variables (Xt)t∈J . Wethen define the family of finite-dimensional distributions of the stochastic process (Xt)t∈Ito be the family (PJ(X))J∈I∗ .

Remark 14.3.

1. If J ⊆ J ′, then PJ ′(X) determines PJ(X).

2. If J = t0, . . . , tn, PJ(X) can be specified by specifying E(f(Xt0 , . . . Xtn)) for allbounded measurable functions f : Rn+1 → R.

Example 14.4 (Brownian motion18). A one-dimensional standard Brownian motion (orWiener19 process) is a real-valued stochastic process (Xt)t∈R+ with the following proper-ties:

1. X0 = 0 almost surely.

2. It has independent increments, i.e. for all n ∈ N and all 0 < t1 < . . . < tn the randomvariables Xt1 , Xt2 −Xt1 , . . . , Xtn −Xtn−1 are independent.

3. For all 0 ≤ t1 < t2 the increment Xt2 −Xt1 is normally distributed with mean 0 andvariance t2 − t1.

4. For almost all ω ∈ Ω the map t→ Xt(ω) is continuous.

We will now see that the first three properties of Brownian motion can be formulatedin terms of the finite-dimensional distributions:

Lemma 14.5. A real-valued stochastic process (Xt)t∈R+ satisfies the first three propertiesof Brownian motion if and only if for all 0 = t0 < t1 < . . . < tn and every boundedmeasurable function f : Rn+1 → R,

E (f(Xt0 , . . . , Xtn)) =

∫Rnf(0, x1, . . . , xn)

n∏i=1

νti−ti−1(xi − xi−1)dx1 . . . dxn. (11)

18R. Brown, A brief account of microscopical observations made in the months of June, July and August,1827, on the particles contained in the pollen of plants; and on the general existence of active moleculesin organic and inorganic bodies. Phil. Mag. 4 (1828), 161–173. (Robert Brown (1773–1858), Scottishbotanist)

19Norbert Wiener (1894–1964), American mathematician and philosopher

38

Here x0 := 0, and νt(x) denotes the density of the normal distribution with mean 0 andvariance t, i.e.

νt(x) =1√2πt

e−x2

2t .

Remark 14.6. Note that (11) specifies PJ(X) for J = 0, t1, . . . , tn.

Proof of Lemma 14.5. Suppose first that (Xt)t∈R+ satisfies the first three properties ofBrownian motion. Let

f(y1, . . . , yn) := f(0, y1, y1 + y2, . . . , y1 + . . .+ yn),

so thatf(0, x1, . . . , xn) = f(x1, x2 − x1, . . . , xn − xn−1).

Then it follows that

E (f(Xt0 , . . . , Xtn)) = E (f(0, Xt1 , . . . , Xtn))

= E(f(Xt1 , Xt2 −Xt1 , . . . , Xtn −Xtn−1)

)=

∫Rnf(y1, . . . , yn)P(Xt1 ,Xt2−Xt1 ,...,Xtn−Xtn−1 )(dy1, . . . dyn)

=

∫Rnf(y1, . . . , yn)

n⊗i=1

PXti−Xti−1(dy1, . . . dyn)

=

∫Rnf(y1, . . . , yn)

n∏i=1

νti−ti−1(yi)dy1, . . . dyn

=

∫Rnf(x1, x2 − x1 . . . , xn − xn−1)

n∏i=1

νti−ti−1(xi − xi−1)dx1, . . . dxn

=

∫Rnf(0, x1, . . . , xn)

n∏i=1

νti−ti−1(xi − xi−1)dx1, . . . dxn.

Now assume that (11) holds. Clearly, (11) implies that E (f(X0)) = f(0), so thatX0 = 0 almost surely.

To prove that Xt1 , Xt2 −Xt1 , . . . , Xtn −Xtn−1 are independent and that Xti −Xti−1 isnormally distributed with mean 0 and variance ti − ti−1, we have to show that

E

(n∏i=1

g(Xti −Xti−1)

)=

n∏i=1

∫Rg(x)νti−ti−1(x)dx

for all bounded measurable functions g1, . . . , gn. Let

f(x1, . . . , xn) :=

n∏i=1

g(xi − xi−1)

39

(with x0 := 0). Then it follows that

E

(n∏i=1

g(Xti −Xti−1)

)= E(f(Xt1 , . . . , Xtn))

=

∫Rnf(x1, . . . , xn)

n∏i=1

νti−ti−1(xi − xi−1)dx1 . . . dxn

=

∫Rn

n∏i=1

g(xi − xi−1)n∏i=1

νti−ti−1(xi − xi−1)dx1 . . . dxn

=

∫Rn

n∏i=1

g(yi)n∏i=1

νti−ti−1(yi)dy1 . . . dyn

=n∏i=1

∫Rng(yi)νti−ti−1(yi)dyi.

Let now J and J ′ be two non-empty finite subsets of I such that J ⊆ J ′, and let πJ′

J

be the projection from EJ′

to EJ , defined by

πJ′

J ((xt)t∈J ′) := (xt)t∈J .

Then it is clear that the image measure of PJ ′(X) under πJ′

J equals PJ(X). It follows thatthe family of finite-dimensional distributions of any stochastic process is consistent in thefollowing sense:

Definition 14.7. Let (E,B) be a measurable space and I a set. For each non-emptyfinite subset J of I let PJ be a probability measure on (EJ , BJ), where BJ :=

⊗t∈J B.

1. The family (PJ)J∈I∗ is consistent if the image measure of PJ ′ under πJ′

J equals PJwhenever J and J ′ are two non-empty finite subsets of I such that J ⊆ J ′.

2. A probability measure PI on (EI ,BI) is called a projective limit of the family(PJ)J∈I∗ if for every finite subset J of I the image of PI under the projection πJ := πIJequals PJ .

Remark 14.8. Clearly a projective limit can only exist if the family (PJ)J∈I∗ is consistent.

Proposition 14.9. A consistent family (PJ)J∈I∗ of probability measures is the family offinite-dimensional distributions of a stochastic process if and only if it has a projectivelimit.

Proof. Suppose first that (Xt)t∈I is a stochastic process such that PJ(X) = PJ for allJ ∈ I∗. Then its distribution is a projective limit of the family (PJ)J∈I∗ .

Conversely, let PI be a projective limit of the family (PJ)J∈I∗ . Then choose

• Ω := EI ,

• A := BI ,

• P := PI ,

• Xt = πt, where πt : Ω→ E is the t-th projection, defined by πt((ωi)i∈I) := ωt.

40

It turns out that in many cases consistency of a family (PJ)J∈I∗ is sufficient for the exis-tence of a projective limit and hence of a stochastic process (Xt)t∈I with finite-dimensionaldistributions (PJ)J∈I∗ .

Definition 14.10. A topological space E is Polish if

1. it is completely metrizable, i.e. if there exists a metric d on E such that

(a) d induces the topology of E and

(b) the metric space (E, d) is complete, and

2. it is separable, i.e. it contains a countable dense set.

Example 14.11. Rn, separable Banach spaces.

Remark 14.12. If E and F are both Polish, then so is E×F (choose d((e, f), (e′, f ′)) :=d1(e, e′) + d2(f, f ′)).

Theorem 14.13. Every finite measure µ on a Polish space E is regular, i.e. for everyB ∈ B(E),

µ(B) = supµ(K) |K ⊆ B, K compact (12)

= infµ(U) |U ⊇ B, U open. (13)

Proof. See books on measure theory, e.g. Maß- und Integrationstheorie by Bauer or Maß-und Integrationstheorie by Elstrodt.

Theorem 14.14 (Kolmogorov’s consistency theorem, measure-theoretic version). Let Ebe a Polish space (equipped with its Borel σ-field B), I a set and (PJ)J∈I∗ a consistentfamily of probability measures. Then it has a unique projective limit.

Corollary 14.15 (Probabilistic version of Kolmogorov’s consistency theorem). Let E bea Polish space (equipped with its Borel σ-field B), I a set and (PJ)J∈I∗ a consistent familyof probability measures. Then there exists a stochastic process with finite-dimensionaldistributions (PJ)J∈I∗.

Proof of Theorem 14.14. For each J ∈ I∗ let

BJ :=

B × EI\J

∣∣∣∣B ∈ BJ .Clearly, BJ is a σ-field on EI , and BJ ⊆ BJ ′ whenever J ⊆ J ′. Consequently, BI :=⋃J∈I∗ BJ is an algebra, i.e.

1. ∅ ∈ BI ,

2. for all A,B ∈ BI , we have A ∪B ∈ BI , and

3. for all A ∈ BI , we have Ac ∈ BI .

Moreover, BI is the σ-field generated by BI . We now define PI on BI in the following way:for any A ∈ BI there exist J ∈ I∗ and B ∈ BJ such that A = B × EI\J . We then set

PI(A) := PJ(B).

41

Consistency of the family (PJ)J∈I∗ precisely means that this definition does not dependon the choice of J . Since each PJ is a probability measure, it is clear that PI is a finitelyadditive set function on BI satisfying PI(E

I) = 1.In order to apply Caratheodory’s extension theorem, we have to show that PI is σ-

additive. From measure theory we know that this is the case if and only if for everysequence (An)n∈N of elements of BI satisfying An ∅ (i.e. An ⊆ Am for n ≥ m and⋂n∈NAn = ∅) we have

limn→∞

PI(An) = 0.

Suppose that this is not true. Then there exist ε > 0 and a sequence (An)n∈N ofelements of BI satisfying An ∅ and PI(An) ≥ ε for all n ∈ N. Let An = Bn × EI\Jnwith Jn ∈ I∗ and Bn ∈ BJn . Without loss of generality we may assume that Jn ⊆ Jn+1

for all n ∈ N. By Remark 14.12 EJn is Polish, so that by Proposition 14.13 PJn is regular.Therefore there exists a compact subset Kn of Bn such that

PJn(Bn \Kn) ≤ 2−nε.

Let Kn := Kn × EI\Jn and Ln :=⋂nm=1 Km. Then the sequence (Ln)n∈N is decreasing.

Moreover, we have Ln ⊆ Kn ⊆ An and therefore

Ln ∅. (14)

Moreover,PI(An \ Kn) = PJn(Bn \Kn) ≤ 2−nε

and consequently

PI(An \ Ln) = PI

(An \

n⋂m=1

Km

)

= PI

(n⋃

m=1

(An \ Km

))

≤ PI

(n⋃

m=1

(Am \ Km

))

≤n∑

m=1

PI

(Am \ Km

)≤

n∑m=1

2−mε < ε.

Since PI(An) ≥ ε, this implies that PI(Ln) > 0, in particular Ln is non-empty. For eachn ∈ N let ωn be an arbitrary element of Ln. Since the sequence (Ln)n∈N is decreasing, wehave ωm ∈ Ln ⊆ Kn for all m ≥ n, and consequently πJn(ωm) ∈ Kn. This implies thatfor all t ∈

⋃n∈N Jn all but a finite number of terms of the sequence (πt(ωm))m∈N belong

to a compact set. Namely, if t ∈ Jn we have

πt(ωm) ∈ πt(Kn) (15)

for all m ≥ n. Let now (t1, t2, . . .) be an enumeration of the countable set⋃n∈N Jn.

Because of (15) there exists a subsequence (ω1m)m∈N of the sequence (ωm)m∈N such that

πt1(ω1m) converges as m → ∞, a further subsequence (ω2

m)m∈N such that also πt2(ωm2)

42

converges, etc. Then the diagonal sequence (ωmm)m∈N is such that πt(ωmm) converges for all

t ∈⋃n∈N Jn. Let

zt := limm→∞

πt(ωmm)

for t ∈⋃n∈N Jn. Since πJn(ωmm) ∈ Kn for all but finitely many m and since Kn is compact

(and therefore closed), we also have

(zt)t∈Jn = limm→∞

πJn(ωmm) ∈ Kn.

Let now z an arbitrary element of E, and define ω ∈ EI by

πt(ω) :=

zt if t ∈

⋃n∈N Jn,

z otherwise.

Then πJn(ω) ∈ Kn for all n ∈ N, hence ω ∈ Kn ⊆ An for all n ∈ N. Hence⋂n∈NAn is not

empty, in contradiction to (14).Hence we have shown that indeed

limn→∞

PI(An) = 0

for every sequence (An)n∈N of elements of BI satisfying An ∅.

15 Markov kernels, semigroups and processes

Definition 15.1. Let (E,B) be a measurable space. A map K : E × B → [0, 1] is calleda Markov20 kernel on (E,B) if

1. For every B ∈ B the map x 7→ K(x,B) is measurable with respect to B.

2. For every x ∈ E the map B 7→ K(x,B) is a probability measure on (E,B).

Remark 15.2. The notion of a Markov kernel is a generalization of the notion of astochastic matrix. Namely, let E be a countable set, and let K : E × P(E) → [0, 1] be aMarkov kernel on (E,P(E)). Then the matrix P ∈ RE×E defined by

P (x, y) := K(x, y)

is stochastic, i.e.

1. P (x, y) ≥ 0 for all x, y ∈ E,

2.∑

y∈E P (x, y) = 1 for all x ∈ E.

Conversely, let P ∈ RE×E be a stochastic matrix on E. Then the map K : E × P(E) →[0, 1] defined by

K(x,B) :=∑y∈B

P (x, y)

is a Markov kernel on (E,P(E)).

Definition 15.3. Let (E,B) be a measurable space.

20Andrey Andreyevich Markov (1856–1922), Russian mathematician

43

1. The identity kernel on (E,B) is defined by

I(x,B) = 1x∈B.

2. Let µ be a probability measure and K a Markov kernel on (E,B). Then the proba-bility measure µK on (E,B) is defined by

(µK)(B) :=

∫EK(x,B)µ(dx)

3. Let f : E → R be bounded and measurable and K a Markov kernel on (E,B). Thenthe function Kf : E → R is defined by

(Kf)(x) :=

∫Ef(y)K(x, dy).

4. Let K1 and K2 be two Markov kernels on (E,B). Their composition is defined by

(K1K2)(x,B) :=

∫EK2(y,B)K1(x, dy).

Remark 15.4. If E is countable, the identity kernel corresponds to the identity matrix,and the operations 2.–4. are given by matrix multiplications. To highlight this analogy,one sometimes writes

(µK)(B) =

∫Eµ(dx)K(x,B)

instead of (µK)(B) =∫EK(x,B)µ(dx), etc.

Definition 15.5. Let (E,B) be a measurable space, and I = N0 (discrete time) or I =R+ (continuous time). A family (Pt)t∈I of Markov kernels on (E,B) is called a Markovsemigroup if

1. P0 = I, and

2. Ps+t = PsPt for all s, t ∈ I, i.e.

Ps+t(x,B) =

∫EPt(y,B)Ps(x, dy)

for all s, t ∈ I, all x ∈ E and all B ∈ B (Chapman-Kolmogorov21 equations).

Interpretation. We interprete Pt(x,B) as the probability that a particle initially at xwill be in B at time t.

Definition 15.6. Let (E,B) be a measurable space, (Pt)t∈I a Markov semigroup on (E,B)and (Ft)t∈I a filtration on a probability space (Ω,A, P ). We say that an (E,B)-valuedstochastic process (Xt)t∈I on (Ω,A, P ) is a Markov process with transition semigroup(Pt)t∈I with respect to the filtration (Ft)t∈I if

1. it is adapted to the filtration (Ft)t∈I , and

21Sydney Chapman (1888–1970), English mathematician and physicist; Andrey Nikolaevich Kolmogorov(1903–1987), Russian mathematician

44

2.P (Xt ∈ B | Fs) = Pt−s(Xs, B) (16)

for all s ≤ t ∈ I and all B ∈ B. Here the conditional probability of an event A givena σ-field C is defined as

P (A|C) := E(1A|C).

(Note that P (A|C) is NOT a number, but a random variable, and that it is onlydetermined almost surely.)

Remark 15.7. The second condition of the definition is equivalent to

E(f(Xt) | Fs) = (Pt−sf)(Xs) (17)

for all s ≤ t ∈ I and all bounded measurable functions f : E → R.

Proof. (16) follows from (17) by choosing f = 1B. Conversely, (17) follows from (16) bymeasure-theoretic induction.

Remark 15.8. In this case we have

E(f(Xt) | Fs) = E(f(Xt) |σ(Xs))

for all s ≤ t ∈ I and all bounded measurable functions f : E → R and in particular

P (Xt ∈ B | Fs) = P (Xt ∈ B |σ(Xs))

for all B ∈ B. (Markov property: The future (Xt) depends on the past (Fs) only via thepresent (Xs).)

Proof. We will show that (Pt−sf)(Xs) has the defining properties of the conditional ex-pectation E(f(Xt) |σ(Xs)). It is clearly measurable with respect to σ(Xs), and for allA ∈ σ(Xs) we have ∫

A(Pt−sf)(Xs)dP =

∫AE(f(Xt) | Fs)dP

=

∫Af(Xt)dP.

Proposition 15.9. Let (E,B) be a measurable space, (Pt)t∈I a Markov semigroup, µ aprobability measure on (E,B) and (Xt)t∈I an (E,B)-valued Markov process with transitionsemigroup (Pt)t∈I (with respect to any filtration) and initial distribution µ. Then its finite-dimensional distributions are given as follows: For all n ∈ N, all t1 < . . . < tn ∈ I and allB ∈ Bn we have

P (Xt1 ∈ B1, . . . , Xtn ∈ Bn) =

∫E

∫B1

. . .

∫Bn

Ptn−tn−1(xn−1, dxn) . . . Pt1(x0, dx1)µ(dx0).

(18)Conversely, if the finite-dimensional distributions of a stochastic process (Xt)t∈I are givenby (18), it is a Markov process with transition semigroup (Pt)t∈I with respect to its canon-ical filtration and with initial distribution µ.

45

Proof. First note that an equivalent formulation of (18) is that for all bounded measurablefunctions f1, . . . , fn : E → R we have

E

(n∏i=1

fi(Xti)

)=

∫E

∫E. . .

∫E︸︷︷︸

n times

n∏i=1

fi(xi)Ptn−tn−1(xn−1, dxn) . . . Pt1(x0, dx1)µ(dx0). (19)

((18) follows from (19) by choosing fi = 1Bi . Conversely, (19) follows from (18) bymeasure-theoretic induction.)

Suppose now that (Xt)t∈I an (E,B)-valued Markov process with transition semigroup(Pt)t∈I and initial distribution µ. To prove (19), we proceed by induction, starting withthe trivial case n = 0. Assuming that (19) is true for the value n, we obtain

E

(n+1∏i=1

fi(Xti)

)= E

(E

(n+1∏i=1

fi(Xti)

∣∣∣∣Ftn))

= E

(n∏i=1

fi(Xti)E(fn+1(Xtn+1) | Ftn

))

= E

(n∏i=1

fi(Xti) (Ptn+1−tnfn+1)(Xtn)

)

=

∫E

∫E. . .

∫E︸︷︷︸

n times

n∏i=1

fi(xi)(Ptn+1−tnfn+1)(xn)

Ptn−tn−1(xn−1, dxn) . . . Pt1(x0, dx1)µ(dx0)

=

∫E

∫E. . .

∫E︸︷︷︸

n times

n∏i=1

fi(xi)

∫Ef(xn+1)Ptn+1−tn(xn, dxn+1)

Ptn−tn−1(xn−1, dxn) . . . Pt1(x0, dx1)µ(dx0)

=

∫E

∫E. . .

∫E︸︷︷︸

n+1 times

n+1∏i=1

fi(xi)

Ptn+1−tn(xn, dxn+1)Ptn−tn−1(xn−1, dxn) . . . Pt1(x0, dx1)µ(dx0),

i.e. (19) with n replaced with n+ 1.Now suppose that the finite-dimensional distributions of (Xt)t∈I are given by (19). Fix

n ∈ N, 0 ≤ s1 < . . . sn ≤ t and a bounded measurable function f : E → R. To show that

E(f(Xt) | Fs) = (Pt−sf)(Xs),

we have to show that ∫Af(Xt)dP =

∫A

(Pt−sf)(Xs)dP (20)

for all A ∈ Fs. Suppose first that A is of the form

A = Xs1 ∈ B1, . . . , Xsn ∈ Bn

46

with 0 ≤ s1 < . . . < sn = s. Letting sn+1 := t, fi := 1Bi for i ∈ 1, . . . , n and fn+1 := f ,we have on the one hand∫

Af(Xt)dP = E

(n∏i=1

1Bi(Xsi)f(Xt)

)

=

∫E

∫E. . .

∫E︸︷︷︸

n+1 times

n∏i=1

1Bi(xi)f(xn+1)Psn+1−sn(xn, dxn+1) . . . Ps1(x0, dx1)µ(dx0), (21)

and on the other hand∫A

(Pt−s)f(Xs)dP

= E

(n∏i=1

1Bi(Xsi)(Pt−sf)(Xs)

)

=

∫E

∫E. . .

∫E︸︷︷︸

n times

n∏i=1

1Bi(xi)(Psn+1−snf)(xn)Psn−sn−1(xn−1, dxn) . . . Ps1(x0, dx1)µ(dx0)

=

∫E

∫E. . .

∫E︸︷︷︸

n times

n∏i=1

1Bi(xi)

∫Ef(y)Psn+1−sn(xn, dy)Psn−sn−1(xn−1, dxn) . . . Ps1(x0, dx1)µ(dx0),

and this coincides with (21).We have hence shown that (20) holds for all A belonging to

S :=⋃n∈N

⋃0≤s1<...sn=s

σ(Xs1 , . . . , Xsn).

Since in particular Ω ∈ S it is clear that set D of events for which (20) holds is a Dynkinsystem, i.e.

1. ∅ ∈ D,

2. A ∈ D ⇒ Ac ∈ D,

3.⋃i∈I Ai ∈ D for every countable union of pairwise disjoint elements of elements

of D.

Since moreover S is stable under intersection, we have

D = σ(S) = Fs,

i.e. (20) holds for all A ∈ Fs.

Theorem 15.10. Let (E,B) be a measurable space, and let µ be a probability measureand (Pt)t∈I a Markov semigroup on it (I = N0 or I = R+). For any non-empty finitesubset J = t1 < . . . < tn of I and any B ∈ BJ we set

PJ(B) :=

∫E

∫E. . .

∫E︸︷︷︸

n times

1(x1,...,xn)∈BPtn−tn−1(xn−1, dxn) . . . Pt1(x0, dx1)µ(dx0). (22)

Then

47

1. each PJ is a probability measure, and

2. the family (PJ)J∈I∗ is consistent.

Corollary 15.11. If E is Polish, then for any Markov semigroup (Pt)t∈I and any prob-ability measure µ on E there exists an E-valued Markov process (Xt)t∈I with transitionsemigroup (Pt)t∈I and initial distributon µ.

Proof of Theorem 15.10. The first statement is clear. To prove that the family (PJ)J∈I∗

is consistent, one has to show that for all J, J ′ ∈ I∗ with J ⊆ J ′ the image measure ofPJ ′ under the projection πJ

′J equals PJ . Clearly it suffices to do so in the case when I \ J

contains exactly one element. In order to keep notation simple, we restrict ourselves to thecase when J ′ contains three elements, J ′ = t1, t2, t3 with t1 < t2 < t3, and J = t1, t3.Then we have to show that for all B1, B3 ∈ B we have

PJ ′(B1 × E ×B3) = PJ(B1 ×B3) :

PJ ′(B1 × E ×B3) =

∫E

∫B1

∫E

∫B3

Pt3−t2(x2, dx3)Pt2−t1(x1, dx2)Pt1(x0, dx1)µ(dx0)

=

∫E

∫B1

∫EPt3−t2(x2, B3)Pt2−t1(x1, dx2)︸︷︷︸Pt1(x0, dx1)µ(dx0).

The semigroup property implies that the underbraced integral equals Pt3−t1(x1, B3), sothat we get

PJ ′(B1 × E ×B3) =

∫E

∫B1

Pt3−t1(x1, B)Pt1(x0, dx1)µ(dx0)

=

∫E

∫B1

∫B3

Pt3−t1(x1, dx3)Pt1(x0, dx1)µ(dx0)

= PJ(B1 ×B3).

16 Brownian motion and Kolmogorov’s continuity theorem

16.1 Brownian motion

Definition 16.1 (Heat kernel). For t > 0 and x, y ∈ Rd let

pt(x, y) := (2πt)−d/2e−|x−y|2

2t

be the d-dimensional heat kernel.

Remark 16.2. The heat kernel is of fudamental importance in both PDE and probabilitytheory:

1. It is the fundamental solution of the heat equation

∂p

∂t=

1

2∆p,

i.e. for each x ∈ Rd

(a) the function (t, y) 7→ pt(x, y) satisfies the heat equation in the domain R∗+×Rd,and

48

(b) as t → 0, the measure with density y 7→ pt(x, y) with respect to Lebesguemeasure converges weakly to the Dirac measure at x.

2. For each x ∈ Rd the function (t, y) 7→ pt(x, y) is the density of the d-dimensional nor-mal distribution with mean x and covariance matrix tI, i.e. of the joint distributionof d independent normally distributed random variables with means xi (i = 1, . . . , d)and variance t. To see this, note that

pt(x, y) =d∏i=1

(1√2πt

e−(xi−yi)

2

2t

).

Definition 16.3. The Brownian semigroup (Pt)t∈R+ on Rd is defined as follows: P0 = I,and

Pt(x,B) :=

∫Bpt(x, y)dy

for t > 0 and B ∈ B(Rd).

Proposition 16.4. The family (Pt)t∈R+ defined above is indeed a Markov semigroupon Rd.

Proof. We first show that

|x− y|2

2s+|y − z|2

2t=|x− z|2

2(s+ t)+s+ t

2st

∣∣∣∣x− y − s(x− z)s+ t

∣∣∣∣2for x, y, z ∈ Rd and s, t > 0:

|x− z|2

2(s+ t)+s+ t

2st

∣∣∣∣x− y − s(x− z)s+ t

∣∣∣∣2=|x− z|2

2(s+ t)+s+ t

2st

(|x− y|2 − 2s(x− y) · (x− z)

s+ t+s2|x− z|2

(s+ t)2

)=|x− z|2

2(s+ t)+

(s+ t)|x− y|2

2st− (x− y) · (x− z)

t+s|x− z|2

2t(s+ t)

=|x− z|2

2t+

(s+ t)|x− y|2 − 2s(x− y) · (x− z)2st

=s|x− z|2 + (s+ t)|x− y|2 − 2s(x− y) · (x− z)

2st

=s(|x− z|2 − 2(x− y) · (x− z) + |x− y|2

)+ t|x− y|2

2st

=s|y − z|2 + t|x− y|2

2st

=|x− y|2

2s+|y − z|2

2t.

49

We then obtain (for s, t > 0)

(PsPt)(x,B) =

∫RPt(y,B)Ps(x, dy)

=

∫R

((2πt)−d/2

∫Be−|y−z|2

2t dz

)(2πs)−d/2e−

|x−y|22s dy

= (4π2st)−d/2∫B

∫Rd

exp

(−|y − z|

2

2t− |x− y|

2

2s

)dydz

= (4π2st)−d/2∫B

∫Rd

exp

(−|x− z|

2

2(s+ t)− s+ t

2st

∣∣∣∣x− y − s(x− z)s+ t

∣∣∣∣2)dydz

= (4π2st)−d/2∫B

exp

(−|x− z|

2

2(s+ t)

)∫Rd

exp

(−s+ t

2st|y|2)dydz.

Since moreover ∫Rd

exp

(−s+ t

2st|y|2)dy =

(∫R

exp

(−s+ t

2sty2

)dy

)d=

(2πst

s+ t

)d/2it follows that

(PsPt)(x,B) = (2π(s+ t))−d/2∫B

exp

(−(x− z)2

2(s+ t)

)dz

= Ps+t(x,B).

Proposition 16.5. Let (Pt)t∈R+ be the Brownian semigroup on Rd. For an Rd-valuedstochastic process (Xt)t∈R+ the following properties are equivalent:

1. It has independent increments (i.e. for all n ∈ N and all 0 < t1 < . . . < tn therandom variables X0, Xt1 − X0, Xt2 − Xt1 , . . . , Xtn − Xtn−1 are independent), andfor all 0 ≤ s < t the increment Xt − Xs is normally distributed with mean 0 andcovariance matrix (t− s)Id, i.e. the distribution of Xt −Xs has density

pt−s(0, x) = (2π(t− s))−d/2e−|x|2

2(t−s)

with respect to Lebesgue measure.

2. It is a Markov process with transition semigroup (Pt)t∈R+ (with respect to its canon-ical filtration).

Proof. Let µ0 be the distribution of X0. Then both properties are equivalent to thefollowing statement on the finite-dimensional distributions of the process: For all 0 ≤ t1 <. . . < tn and all B1, . . . , Bn ∈ B(Rd) we have

P (Xt1 ∈ B1, . . . , Xtn ∈ Bn) =

∫E

∫B1

. . .

∫Bn

Ptn−tn−1(xn−1, dxn) . . . Pt1(x0, dx1)µ(dx0).

Definition 16.6. An Rd-valued stochastic process (Xt)t∈R+ on a probability space (Ω, A, P )is called a d-dimensional Brownian motion if

50

1. it has the equivalent properties of the last proposition, and

2. almost all its paths are continuous, i.e. for almost all ω ∈ Ω the map t 7→ Xt(ω) iscontinuous (from R+ to Rd).

If moreover X0 = 0 a.s., then it is called a standard Brownian motion.

Remark 16.7. The existence theorem for Markov processes assures the existence of astochastic process with the first property. The existence of a process which also has thesecond property will follow from Kolmogorov’s continuity theorem.

16.2 Modifications and indistinguishable processes

Definition 16.8. Let (E,B) be a measurable space and I a set.

1. Two (E,B)-valued stochastic processes (Xt)t∈I and (X ′t)t∈I defined on the sameprobability space (Ω,A, P ) are said to be modifications of each other if for each t ∈ I

Xt = X ′t a.s.

2. They are called indistinguishable if almost surely

Xt = X ′t for all t ∈ I.

Remark 16.9.

1. If two stochastic processes are indistinguishable, they are modifications of each other.

2. If two stochastic processes are modifications of each other, they have the same finite-dimensional distributions.

“Indistiguishable⇒ Modifications of each other⇒ Same finite-dimensional distributions”The converse implications are not true in general.

Example 16.10. Let (Ω, A, P ) = [0, 1] equipped with its Borel σ-field and Lebesguemeasure, and let

Xt(ω) :=

1 if t = ω0 otherwise.

Then for each t ∈ [0, 1] we have Xt = 0 a.s., so that the process (Xt)t∈[0,1] is a modificationof the process which is identically 0. On the other hand, for there is not a single ω ∈ Ωfor which the path t 7→ Xt(ω) is identically 0. Consequently, the process (Xt)t∈[0,1] is notindistinguishable from the process which is identically 0.

Proposition 16.11. Let E be a Hausdorff22 topological space, and let (Xt)t∈R+ and(X ′t)t∈R+ be two continuous E-valued stochastic processes on the same probability space(Ω,A, P ) (i.e. almost all paths of both processes are continuous). These two processes areindistinguishable if and only if they are modifications of each other.

Proof. We already know that indistinguishable processes are modifications of each other,so let us assume that (Xt)t∈R+ and (X ′t)t∈R+ are modifications of each other. Let Ω∗ bean event of probability 1 such that on Ω∗ all paths of both processes are continuous, andfor each t ∈ R+ let Ωt := Xt = X ′t. Then Ω∗ ∩

⋂t∈Q+

Ωt has probability 1. Moreover,

22Felix Hausdorff (1868–1942), German mathematician

51

choosing for each t ∈ R+ a sequence (tn)n∈N in Q+ converging to t we obtain on thatevent:

X ′t = limn→∞

X ′tn = limn→∞

Xtn = Xt,

where in the second equality we used the fact that in Hausdorff spaces limits are unique.

16.3 Kolmogorov’s continuity theorem

Lemma 16.12 (Borel-Cantelli lemma23 ). Let (An)n∈N be a sequence of events such thatthe series

∑∞n=1 P (An) converges. Then

P

⋂n∈N

⋃m≥n

Am

= 0.

Proof. Convergence of the series∑∞

n=1 P (An) is equivalent to the property thatlimn→∞

∑m≥n P (Am) = 0. Consequently

P

⋂n∈N

⋃m≥n

Am

= limn→∞

P

⋃m≥n

Am

≤ lim infn→∞

∑m≥n

P (Am) = 0.

Remark 16.13. We have ω ∈⋂n∈N

⋃m≥nAm if and only if ω ∈ Am for infinitely

many m. Consequently, the Borel-Cantelli lemma can be reformulates as follows: If theseries

∑∞n=1 P (An) converges, the probability that infinitely many of the events Am occur

is 0.

Theorem 16.14 (Kolmogorov’s continuity theorem). Let (E, d) be a complete metricspace (e.g. E = Rd), and let (Xt)t∈R+ be an E-valued stochastic process with the followingproperty: There exist constants α, β, C > 0 such that

E (d(Xs, Xt)α) ≤ C|t− s|1+β

for all s, t ∈ R+. Then this process has a continuous modification (X ′t)t∈R+. Moreover,this (and consequently any) continuous modification has the following property: Almostevery path is locally Holder continuous of any exponent γ ∈ (0, β/α), i.e. there exists anevent Ω∗ of probability 1 such that for each ω ∈ Ω and each k ∈ N there exists a constantC(k, γ, ω) such that

d(X ′s(ω), X ′t(ω)) ≤ C(k, γ, ω)|t− s|γ .

for all s, t ∈ [0, k].

Proof. Fix γ ∈ (0, β/α). For m, k ∈ N let Sm,k := i · 2−m | i ∈ 0, . . . , k · 2m ⊂ [0, k],Dk :=

⋃m∈N Sm,k and D :=

⋃k∈NDn. Dk is the set of dyadic numbers belonging to [0, k],

and D is the set of all dyadic numbers. The crucial step of the proof is the followinglemma:

23E. Borel, Les probabilites denombrables et leurs applications arithmetiques. Rend. Circ. Mat.Palermo 27 (1909), 247–271. F. P. Cantelli, Sulla probabilita come limite della frequenza. Atti Accad.Naz. Lincei 26 (1917), 39–45. (Emile Borel (1871–1956), French mathematician; Francesco Paolo Cantelli(1875–1966), Italian mathematician)

52

Lemma 16.15. Let

K(γ) :=2

1− 2−γ+ 2γ .

Then there exist an event Ω∗(γ) of probability 1 and for each k ∈ N and each ω ∈ Ω∗(γ)a number m(k, γ, ω) ∈ N such that

d(Xs(ω), Xt(ω)) ≤ K(γ)|t− s|γ

for all ω ∈ Ω∗(γ) and all s, t ∈ D ∩ [0, k] satisfying |t− s| ≤ 2−m(k,γ,ω).

Proof. Fix k ∈ N and let δ := β − αγ > 0. For each m ∈ N let

Am :=⋃

t∈Sm,k\0

d(Xt, Xt−2−m) ≥ 2−γm.

For all t ∈ Sm,k \ 0 we have using Chebyshev’s inequality and the fact that αγ < β

P (d(Xt, Xt−2−m) ≥ 2−γm) ≤ 2αγmE(d(Xt, Xt−2−m)α)

≤ 2αγmC · 2−m(1+β)

≤ C · 2−(δ+1)m,

so thatP (Am) ≤ k · 2mC · 2−(δ+1)m = kC · 2−δm.

Since δ > 0 we have 2−δ < 1, so that the series∑∞

m=1 P (Am) converges. Let

Ωk :=

⋂n∈N

⋃m≥n

Am

c

.

By the Borel-Cantelli lemma P (Ωk) = 1. Moreover, for each ω ∈ Ωk there exists m(k, γ, ω)such that

d(Xt(ω), Xt−2−m(ω)) ≤ 2−γm

for all m ≥ m(k, γ, ω) and all t ∈ Sm,k \ 0.Fix now ω ∈ Ωk, and let s, t ∈ Dk with 0 < |s− t| ≤ 2−m(k,γ,ω). Choose m ∈ N in such

a way that2−(m+1) < |s− t| ≤ 2−m (23)

(clearly this implies that m ≥ m(k, γ, ω)). Write s and t in the form

s = A · 2−m +

∞∑i=m+1

σi2−i, t = B · 2−m +

∞∑j=m+1

τj2−j

with A,B ∈ 0, . . . , k · 2m and σi, τj ∈ 0, 1 and such that only finitely many of thenumbers σi and τj are non-zero. Moreover, let

sn := A · 2−m +

m+n∑i=m+1

σi2−i, tn := B · 2−m +

m+n∑j=m+1

τj2−j

(so that sn = s and tn = t for all sufficiently large n). Then (23) implies that eithers0 = t0 or |s0 − t0| = 2−m, and consequently

d(Xs0(ω), Xt0(ω)) ≤ 2−γm.

53

Moreover,

d(Xs0(ω), Xs(ω)) ≤∞∑n=1

d(Xsn(ω), Xsn−1(ω))

≤∞∑n=1

2−γ(m+n)

= 2−γm∞∑n=1

2−γn

= 2−γm2−γ

1− 2−γ

=2−γ(m+1)

1− 2−γ,

and, by the same argument also

d(Xt0(ω), Xt(ω)) ≤ 2−γ(m+1)

1− 2−γ.

Combining these three estimates we obtain

d(Xs(ω), Xt(ω)) ≤ d(Xs(ω), Xs0(ω)) + d(Xs0(ω), Xt0(ω)) + d(Xt0(ω), Xt(ω))

≤ 22−γ(m+1)

1− 2−γ+ 2−γm

= K(γ)(2−(m+1))γ

≤ K(γ)|s− t|γ .

We conclude the proof of the lemma by choosing Ω∗(γ) :=⋂k∈N Ωk.

Lemma 16.16. Let t ∈ R+ and (tn)n∈N a sequence in D converging to t. Then for allω ∈ Ω∗(γ) the limit

X ′t(ω) := limn→∞

Xtn(ω)

exists and does not depend on the choice of the sequence (tn)n∈N.

Proof. Fix ω ∈ Ω∗(γ). We first show that the sequence (Xtn(ω))n∈N is a Cauchy sequence:To do so choose k ∈ N and n0 ∈ N such that tn ≤ k and |tn − tn′ | ≤ 2−m(k,γ,ω) for alln, n′ ≥ n0. For all such n, n′ we then obtain

d(Xtn(ω), Xtn′ (ω)) ≤ K|tn − tn′ |γ → 0

as n, n′ → ∞. Since the metric space E is complete (by assumption), this implies theexistence of the limit limn→∞Xtn(ω).

To show that the limit does not depend on the choice of the sequence (tn)n∈N, let(sn)n∈N be another sequence in D also converging to t. By the same argument as abovewe obtain for all sufficiently large n

d(Xsn(ω), Xtn(ω)) ≤ K|sn − tn|γ → 0

as n→∞ and thereforelimn→∞

Xsn(ω) = limn→∞

Xtn(ω).

54

Let nowΩ∗ :=

⋂γ∈Q∩(0,β/α)

Ω∗(γ).

To conclude the proof of the theorem we have to show that

1. For each ω ∈ Ω∗ the map t 7→ X ′t(ω) is locally Holder continuous of any orderγ ∈ (0, β/α), i.e. for each ω ∈ Ω∗, each γ ∈ (0, β/α) and each k ∈ N there exists aconstant C(k, γ, ω) such that

d(X ′s(ω), X ′t(ω)) ≤ C(k, γ, ω)|t− s|γ

for all s, t ∈ [0, k].

2. For each t ∈ R+ we haveX ′t = Xt a.s.

We start with the first property. Fix k ∈ N and γ ∈ (0, β/α). We know that on Ω∗ wehave

d(X ′s(ω), X ′t(ω)) ≤ K(γ)|t− s|γ

for all γ ∈ Q ∩ (0, β/α) and all s, t ∈ D ∩ [0, k] satisfying |t − s| ≤ 2−m(k,γ,ω). Bycontinuity this extends to all γ ∈ (0, β/α) and all s, t ∈ [0, k] satisfying |t−s| ≤ 2−m(k,γ,ω).Consequently, for all s, t ∈ [0, k] (not necessarily satisfying |t− s| ≤ 2−m(k,γ,ω)) we obtain

d(X ′s(ω), X ′t(ω)) ≤ 2m(k,γ,ω)K(γ)|t− s|γ .

To prove the second property let (tn)n∈N be a sequence in D converging to t. Then wehave

Xtn → X ′t

almost surely and hence in probability, and on the other hand, by Chebyshev’s inequality

Xtn → Xt

in probability. Consequently X ′t = Xt almost surely.

We now want to apply this result to show existence of Brownian motion.

Lemma 16.17. Let (Xt)t∈R+ be an Rd-valued stochastic process whose increments arenormally distributed with mean 0 and covariance matrix (t − s)Id. Then for each α > 0there is a constant Cα > 0 such that

E(|Xt −Xs|α) = Cα(t− s)α/2

Proof. Let

Zs,t :=Xt −Xs√t− s

.

Zs,t is normally distributed with mean 0 and covariance matrix Id, so that Cα := E(|Zs,t|α)does not depend on s and t. It follows that

E(|Xt −Xs|α) = (t− s)α/2E(|Zs,t|α) = Cα(t− s)α/2.

Theorem 16.18. For any probability measure µ on Rd there exists a Brownian motionwith initial distribution µ. Moreover almost all its paths are locally Holder continuous ofany order γ < 1/2.

Proof. For any α > 0 let β := α/2 − 1. Then β is positive as soon as α > 2. Moreoverβ/α→ 1/2 as α→∞. The result therefore follows from Kolmogorov’s continuity theorem.

55

16.4 Brownian motion is a martingale

Lemma 16.19. A real-valued stochastic process (Xt)t∈R+ has independent increments ifand only if for all s ≤ t ∈ R+ the increment Xt −Xs is independent of FXs . (Recall that(FXt )t≥0 denotes the canonical filtration of the process (Xt)t∈R+.)

Proof. Suppose first that the process has independent increments, and let s ≤ t. Let D bethe set of events which are independent of Xt−Xs. On the one hand, D is clearly a Dynkinsystem. On the other hand, independence of increments implies that for all 0 ≤ s0 <. . . < sn ≤ s the increment Xt−Xs is independent of σ(Xs0 , Xs1−Xs0 , . . . , Xsn−Xsn−1).Since this σ-field coincides with σ(Xs0 , . . . , Xsn), it follows that Xt − Xs is independentof S :=

⋃0≤s0<...<sn≤s σ(Bs0 , . . . , Bsn). Since S is stable under intersection and D is a

Dynkin system, it follows that D ⊇ σ(S) = Fs, i.e. that Xt −Xs is independent of Fs.Suppose now that for all s ≤ t ∈ R+ the increment Xt −Xs is independent of Fs. We

will show by induction that for all n ∈ N0 and all 0 = t0 < t1 < . . . < tn the incrementsXt0 , Xt1 − Xt0 , . . . , Xtn − Xtn−1 are independent. For n = 0 there is nothin to prove.So let 0 = t0 < t1 < . . . < tn+1 and suppose that Xt0 , Xt1 − Xt0 , . . . , Xtn − Xtn−1 areindependent. Since by assumption Xtn+1 − Xtn is independent of Fs, it follows that hefamily Xt0 , Xt1 −Xt0 , . . . , Xtn −Xtn−1 , Xtn+1 −Xtn is independent as well.

Corollary 16.20. An Rd-valued stochastic process (Bt)t∈R+ is a Brownian motion if andonly if

1. for all s ≤ t ∈ R+ the increment Bt −Bs is

(a) normally distributed with mean 0 and covariance matrix tI, and

(b) independent of FBs , and

2. almost all its paths are continuous, i.e. for almost all ω ∈ Ω the map t 7→ Xt(ω) iscontinuous.

This result motivates the following definition:

Definition 16.21. Let (Ft)t∈R+ be a filtration. An Rd-valued stochastic process (Bt)t∈R+

is called a Brownian motion with respect to the filtration (Ft)t∈R+ if and only if it is adaptedto (Ft)t∈R+ and has the properties of the above corollary, but with the given filtration(Ft)t∈R+ instead of the canonical filtration (FBt )t∈R+ .

Remark 16.22. Let (Bt)t∈R+ be a one-dimensional Brownian motion with respect toa filtration (Ft)t∈R+ . If B0 is integrable, then (Bt)t∈R+ is a martingale with respect to(Ft)t∈R+ .

Proof. Since the increment Bt − B0 is normally distributed, integrability of B0 impliesintegrability of Bt. Moreover, since Bt−Bs is independent of Fs and normally distributedwith mean 0, it follows that

E(Bt −Bs | Fs) = E(Bt −Bs) = 0.

Remark 16.23. In a similar way one can also show that the processes (B2t − t)t∈R+ and

(eαBt−α2t/2)t∈R+ are martingales, provided their initial values are integrable.

56

17 Gaussian processes

“An Rd-valued stochastic process is Gaussian24 if all its finite-dimensional distributionsare Gaussian measures.” Example: Any Brownian motion with Gaussian initial distribu-tion.

Definition 17.1 (One-dimensional Gaussian measure). Let m ∈ R and σ ≥ 0. TheGaussian measure (or normal distribution) with mean m and variance σ2, denoted Na,σ2 , is

a) the Dirac measure at m if σ = 0, and

b) the measure with density 1√2πσ2

e−(x−m)2/(2σ2) with respect to Lebesgue measure ifσ > 0.

Remark 17.2. Let a ∈ R. Then the image measure of Nm,σ2 under the map x 7→ axequals Nam,a2σ2 . (In probabilistic terms this means that if X is normally distributed withexpectation m and variance σ2, then aX is normally distributed with expectation am andvariance a2σ2.)

This observation motivates the following definition:

Definition 17.3 (Multidimensional Gaussian measure).

1. A probability measure µ on Rn is a Gaussian measure if for every vector λ ∈ Rn theimage measure of µ under the map x 7→ λ ·x is a one-dimensional Gaussian measure.

2. An Rn-valued random variable X is normally distributed if its distribution is aGaussian measure.

Remark 17.4. From the definition it follows immediately that an Rn-valued randomvariable X is Gaussian if and only if for every vector λ ∈ Rn the real-valued randomvariable λ ·X is Gaussian, i.e. if and only if every linear combination of the componentsof X is Gaussian.

Remark 17.5. For every matrix A ∈ Rm×n, every Rn-valued Gaussian random variable Xand every vector b ∈ Rm the Rm-valued random variable AX + b is Gaussian as well.

Proof. Let λ ∈ Rm. Then λ · (AX + b) = (ATλ) ·X + λ · b is Gaussian.

Remark 17.6. Let X and Y be independent Gaussian random variables, where X is Rm-valued and Y is Rn-valued. Then the Rm+n-valued random variable (X,Y ) is Gaussianas well.

Proof. Let λ = (λ1, λ2) ∈ Rm+n. Then λ · (X,Y ) = λ1 · X + λ2 · Y is the sum of twoindependent one-dimensional Gaussian random variables and hence itself Gaussian.

Remark 17.7. Let µ be a Gaussian measure on Rn. Then its mean m ∈ Rn defined by

mi :=

∫Rnxiµ(dx)

and its covariance matrix C ∈ RN×N defined by

Cij :=

∫Rn

(xi −mi)(xj −mj)µ(dx)

both exist.24Carl Friedrich Gauss (1777–1855), German mathematician

57

Proof. Let X be a random variable with distribution µ. Then its components Xi, i =1, . . . , N , are real-valued normally distributed random variables and therefore square-integrable. Consequently, mi = E(Xi) and Cij = Cov(Xi, Xj) exist.

Remark 17.8. The covariance matrix C is symmetric (i.e. Cij = Cji) and positive semi-definite: For all λ ∈ Rn we have

n∑i,j=1

Cijλiλj =n∑

i,j=1

λiλj Cov(Xi, Xj) = Var

(n∑i=1

λiXi

)≥ 0.

Theorem 17.9. For every vector m ∈ Rn and every symmetric and positive semi-definitematrix C ∈ Rn×n there exists exactly one Gaussian measure on Rn with mean m andcovariance matrix C.

Proof. To prove existence, let R ∈ Rn×n be such that RRT = C (such a matrix R existsbecause C is symmetric and positive semi-definite), X = (X1, . . . , Xn) a vector of inde-pendent standard normally distributed random variables and Y := RX + m. Then weknow that Y is Gaussian. Moreover, clearly E(Yi) = mi, and

Cov(Yi, Yj) = Cov

N∑α=1

RiαXα,N∑β=1

RjβXβ

=

N∑α,β=1

RiαRjβ Cov(Xα, Xβ)

= (RRT )ij

= Cij .

To prove uniqueness we show that the Fourier transform25 of a Gaussian measure dependsonly on its mean and its covariance matrix.

Remark 17.10. The Fourier transform of a finite measure µ on Rd is the functionµ : Rd → C defined by

µ(ξ) :=

∫Rdeiξ·yµ(dy).

By the uniqueness theorem, two finite measures on Rd coincide if and only if their Fouriertransforms coincide.

Lemma 17.11. Let µ be a Gaussian measure on RN with mean m and covariancematrix C. Then its Fourier transform is given by

µ(ξ) = exp

(im · ξ − 1

2ξTCξ

).

Proof. Let X be a random variable with distribution µ, so that µ(x) = E(eiξ·X). Therandom variable ξ ·X is normally distributed with expectation ξ ·m and variance

Var(ξ ·X) = Var

(N∑i=1

ξiXi

)=

N∑i,j=1

ξiξj Cov(Xi, Xj) = ξTCξ.

25J. Fourier, Theorie Analytique de la Chaleur, Firmin Didot, Paris, 1822. (Joseph Fourier (1768–1830),French mathematician and physicist)

58

Consequently

µ(ξ) = E(eiξ·X)

=

∫Reix

1√2πξTCξ

e− (x−ξ·m)2

2ξT Cξ dx

=1√

2πξTCξeiξ·m

∫Reix e

− x2

2ξT Cξ dx.

Note that this integral cannot be computed by completing the square and applying thetranslation invariance of Lebesgue measure, because here we would need translation bya complex vector, while the translation invariance of Lebesgue measure holds only fortranslations by real vectors. Instead, the integral can be computed as follows: Let

Φ(t) :=

∫Reitx e

− x2

2ξT Cξ dx,

so that Φ(1) is the integral we are interested in, and Φ(0) =∫R e− x2

2ξT Cξ dx =√

2πξTCξ.Moreover,

Φ′(t) = i

∫Rxeitx e

− x2

2ξT Cξ dx,

and

tΦ(t) = −i∫Riteitx e

− x2

2ξT Cξ dx

= −i[eitxe

− x2

2ξT Cξ

]∞−∞︸︷︷︸

=0

− i

ξTCξ

∫Rxeitx e

− x2

2ξT Cξ dx

= − Φ′(t)

ξTCξ

so thatΦ′(t) = −ξTCξtΦ(t).

Since the function Ψ(t) := e−t2ξT Cξ

2 solves the same linear ODE, Ψ′(t) = −ξTCξtΨ(t), itfollows that

Φ(t) =Φ(0)

Ψ(0)Ψ(t) =

√2πξTCξ e−

ξT Cξt2

2

and therefore

µ(ξ) =1√

2πξTCξeiξ·mΦ(1) = exp

(iξm− 1

2ξTCξ

).

Definition 17.12. An Rd-valued stochastic process (Xt)t∈I is called Gaussian if all itsfinite-dimensional distributions are Gaussian measures.

To keep notation simple we consider only real-valued Gaussian processes in the sequel.However, all definitions and results can be easily extended to Rd-valued processes.

Definition 17.13. Let (Xt)t∈I be a real-valued Gaussian process. Its expectation functionm : I → R and its covariance function Γ : I × I → R are defined by

m(t) := E(Xt)

andΓ(s, t) := Cov(Xs, Xt).

59

Remark 17.14. The covariance function is symmetric and positive semi-definite, i.e. forall t1, . . . , tn ∈ I and all λ1, . . . , λn ∈ R we have

n∑i,j=1

Γ(ti, tj)λiλj ≥ 0.

Proof.

n∑i,j=1

Γ(ti, tj)λiλj =

n∑i,j=1

Cov(Xti , Xtj )λiλj = Var

(n∑i=1

λiXti

)≥ 0.

Remark 17.15. Since every Gaussian measure is determined by its mean and its co-variance matrix, the finite-dimensional distributions of a Gaussian process are determinedby its expectation and covariance functions (i.e. two Gaussian processes with the sameexpectation and covariance functions have the same finite-dimensional distributions).

Theorem 17.16. Let I be any set. For every function m : I → R and every symmetricand positive semi-definite function Γ : I × I → R there exists a real-valued Gauss processwith expectation function m and covariance function Γ.

Proof. For each non-empty finite subset J of I let PJ be the Gauss measure with mean m|Jand covariance function C|J×J . We have to check that the family (PJ)J∈I∗ is projective,

i.e. that for all J ⊆ J ′ ∈ I∗ the image measure of PJ ′ under πJ′

J equals PJ . Let Y = (Yt)t∈J ′

be a random variable with distribution PJ ′ . Then the distribution of the variable (Yt)t∈Jis the image measure of PJ ′ under πJ

′J . Moreover, the variable (Yt)t∈J is clearly Gaussian

with mean m|J and covariance matrix C|J×J .

Example 17.17 (Standard Brownian motion). A real-valued stochastic process (Bt)t∈R+

with continuous paths is a standard Brownian motion if and only if it is a Gaussian processwith expectation function m ≡ 0 and covariance function C given by C(s, t) = s ∧ t.

Proof. Let (Bt)t∈R+ be a standard Brownian motion, and let 0 ≤ t1 < . . . < tn. To checkthat the random variable (Bt1 , . . . , Btn) is Gaussian, let Yi := Bti −Bti−1 for i = 1, . . . , n,where t0 := 0. By definition of Brownian motion, the increments Y1, . . . , Yn are inde-pendent and normally distributed, so that the vector (Y1, . . . , Yn) is Gaussian. SinceBti =

∑ij=1 Yj , it follows that the vector (Bt1 , . . . , Btn) is Gaussian as well. Moreover,

E(Bt) = 0, and for s ≤ t we obtain

Cov(Bs, Bt) = E(BsBt)

= E(B2s ) + E (Bs(Bt −Bs))

= s+ E (E(Bs(Bt −Bs)|Fs))= s+ E (BsE(Bt −Bs))= s.

The converse implication follows from the fact that the finite-dimensional distributions ofa Gaussian process are determined by its expectation and covariane functions.

Remark 17.18. It is not always easy to prove that a given symmetric function Γ : I×I →R is positive semidefinite.

Exercise. Give an elementary proof that the covariance function of standard Brownianmotion, Γ(s, t) = s ∧ t, is positive semidefinite. (We already know that it is positivesemidefinite because we know that it is the covariance function of a stochastic process.)

60

Solution. Note that

s ∧ t =

∫ ∞0

1[0,s](r)1[0,t](r)dr.

Consequently

n∑i,j=1

ti ∧ tj λiλj =n∑

i,j=1

λiλj

∫ ∞0

1[0,ti](r)1[0,tj ](r)dr

=

∫ ∞0

n∑i,j=1

λiλj1[0,ti](r)1[0,tj ](r)dr

=

∫ ∞0

(n∑i=1

λi1[0,ti](r)

)2

dr

≥ 0.

Example 17.19 (Fractional Brownian motion). Let H > 0. A real-valued stochasticprocess (Bt)t∈R+ is called a fractional Brownian motion of Hurst index26 H if

1. it is Gaussian with expectation function m ≡ 0 and covariance function

Γ(s, t) =1

2

(s2H + t2H − |s− t|2H

),

and

2. almost all its paths are continuous.

Remark 17.20. If H = 1/2, we get Γ(s, t) = s ∧ t. Hence, a fractional Brownian motionof Hurst index 1/2 is the same as a standard Brownian motion.

Proposition 17.21. The function Γ(s, t) = 12

(s2H + t2H − |s− t|2H

)is positive semi-

definite if and only if H ≤ 1. Consequently, a stochastic process with the first property offractional Brownian motion with Hurst index H exists if and only if H ≤ 1.

Proof. Assume first that H > 1. Choosing n = 2, t1 = 1, t2 = 2, λ1 = −2 and λ2 = 1 weobtain

n∑i,j=1

Γ(ti, tj)λiλj = 1 · (−2) · (−2) + 2 · 1

2· 22H · (−2) · 1 + 22H · 1 · 1

= 4− 2 · 22H + 22H

= 4− 22H

< 0

because H > 1. Hence Γ is not positive semidefinite.The case H = 1 is easy. In this case

Γ(s, t) =1

2(s2 + t2 − (s− t)2) = st

and consequently

n∑i,j=1

Γ(ti, tj)λiλj =

n∑i,j=1

titjλiλj =

(n∑i=1

λiti

)2

≥ 0.

26H. E. Hurst, Long-term storage capacity in reservoirs. Trans. Amer. Soc. Civil Eng. 116 (1951), 400–410. (Harold Edwin Hurst (1880–1978), English hydrologist)

61

Assume now that H ∈ (0, 1) and consider for t > 0 the integral∫ ∞0

1− e−t2x2

x1+2Hdx.

To check that this integral is finite we have to study the integrand as x→∞ and x→ 0.As x→∞ the integrand behaves as x−1−2H and, since H > 0, is therefore unproblematic.As x → 0 the series expansion of the exponential function tells us that the integrandbehaves as t2x1−2H which, since H < 1, is unproblematic as well. Consequently, theintegral is finite. Moreover, the substitution x = y/t implies that∫ ∞

0

1− e−t2x2

x1+2Hdx =

1

t

∫ ∞0

1− e−y2

(y/t)1+2Hdy = t2H

∫ ∞0

1− e−y2

y1+2Hdy︸︷︷︸

=:C

.

Consequently,

t2H =1

C

∫ ∞0

1− e−t2x2

x1+2Hdx.

It follows that

C(s2H + t2H − |s− t|2H

)=

∫ ∞0

1− e−s2x2

x1+2Hdx+

∫ ∞0

1− e−t2x2

x1+2Hdx−

∫ ∞0

1− e−(s−t)2x2

x1+2Hdx

=

∫ ∞0

1 + e−(s−t)2x2 − e−s2x2 − e−t2x2

x1+2Hdx

=

∫ ∞0

1 + e−(s2+t2)x2 − e−s2x2 − e−t2x2

x1+2Hdx+

∫ ∞0

e−(s−t)2x2 − e−(s2+t2)x2

x1+2Hdx

=

∫ ∞0

(1− e−s2x2)(1− e−t2x2)

x1+2Hdx+

∫ ∞0

e−s2x2e2stx2e−t

2x2 − e−s2x2e−t2x2

x1+2Hdx

=

∫ ∞0

(1− e−s2x2)(1− e−t2x2)

x1+2Hdx+

∫ ∞0

e−s2x2e−t

2x2(e2stx − 1)

x1+2Hdx

=

∫ ∞0

(1− e−s2x2)(1− e−t2x2)

x1+2Hdx+

∫ ∞0

e−s2x2e−t

2x2∑∞

n=1(2stx)n

n!

x1+2Hdx

=

∫ ∞0

(1− e−s2x2)(1− e−t2x2)

x1+2Hdx+

∞∑n=1

2n

n!

∫ ∞0

e−s2x2e−t

2x2(stx)n

x1+2Hdx

=

∫ ∞0

(1− e−s2x2)(1− e−t2x2)

x1+2Hdx+

∞∑n=1

2n

n!

∫ ∞0

(sne−s2x2)(tne−t

2x2)

x1+2H−n dx.

We have hence succeeded in representing Γ(s, t) in the form

Γ(s, t) =

∫f(s, ξ)f(t, ξ)µ(dξ)

62

(a sort of “diagonalization” with a nonnegative diagonal matrix). It follows that

n∑i,j=1

Γ(ti, tj)λiλj =

n∑i,j=1

λiλj

∫f(ti, ξ)f(tj , ξ)µ(dξ)

=

∫ n∑i,j=1

λiλjf(ti, ξ)f(tj , ξ)µ(dξ)

=

∫ ( n∑i=1

λif(ti, ξ)

)2

µ(dξ)

≥ 0.

Proposition 17.22. Fractional Brownian motion of Hurst index H exists if and only ifH ≤ 1. In this case, almost all its paths are locally Holder continuous of any exponentγ < H.

Proof. We have already seen that a process with the first property of fractional Brownianmotion of Hurst index H exists if and only if H ≤ 1. It remains to show that such aprocess (Xt)t∈R+ admits a continuous modification. To do so observe that for all s, t ∈ R+

the variable Xt −Xs is Gaussian with mean 0 and variance

Var(Xt −Xs) = Var(Xt) + Var(Xs)− 2 Cov(Xs, Xt)

=1

2

(2t2H + 2s2H − 2(s2H + t2H − |s− t|2H)

)= |s− t|2H

(note the analogy with usual Brownian motion). Let now

Zs,t :=Xt −Xs

|t− s|H.

Zs,t is normally distributed with mean 0 and variance 1, so that Cα,H := E(|Zs,t|α) doesnot depend on s and t. It follows that

E(|Xt −Xs|α) = |t− s|αHE(|Zs,t|α) = Cα,H |t− s|αH .

For any α > 0 let β := αH−1. Then β is positive as soon as α > 1/H. Moreover β/α→ Has α→∞. The result therefore follows from Kolmogorov’s continuity theorem.

The characterization of standard Brownian motion as a Gaussian process with expec-tation function m ≡ 0 and covariance function Γ(s, t) = s∧ t allows to prove the followingremarkable fact:

Proposition 17.23 (Time inversion of Brownian motion). Let (Bt)t∈R+ be a one-dimen-sional standard Brownian motion. Then the process (Yt)t∈R+ defined by

Yt :=

tB1/t if t > 0

0 if t = 0

is a standard Brownian motion as well.

63

Proof. The process (Yt)t∈R+ is clearly a Gauss process with E(Yt) = tE(B1/t) = 0 and

Cov(Ys, Yt) = stCov(B1/s, B1/t) = st1

t= s.

Hence it has the finite-dimensional distributions of Brownian motion and consequentlysatisfies

E(|Yt − Ys|α) = Cα(t− s)α/2.

Therefore, by Kolmogorov’s continuity theorem it has a continuous modification (Yt)t∈R+ .

Hence for each t ∈ R+ we have Yt = Yt almost surely, there exists an event Ω1 withP (Ω1) = 1 such that on Ω1 we have Yt = Yt for all t ∈ Q+. Moreover there exists an eventΩ2 with P (Ω2) = 1 such that on Ω2 the process (Yt)t∈R+ is continuous everywhere except

possibly at 0. It follows that on Ω1 ∩ Ω2 we have Yt = Yt for all t ∈ R+, hence almost allpaths of the process (Yt)t∈R+ are continuous even at 0.

18 Path properties of Brownian motion

We already know that almost every path of a Brownian motion is locally Holder continuousof any exponent γ < 1/2. A natural question is whether Brownian paths have even moreregularity, in particular if they are differentiable. This question is answered by the law ofthe iterated logarithm. For its proof we will need the second part of the Borel-Cantellilemma:

Proposition 18.1. Let (An)n∈N be a sequence of independent events such that∑∞n=1 P (An) =∞. Then

P

⋂n∈N

⋃m≥n

Am

= 1,

i.e. with probability 1 infinitely many of the events An occur.

Proof. It suffices to prove that for each n ∈ N we have P(⋃

m≥nAm

)= 1 or, in other

words, P(⋂

m≥nAcm

)= 0. Using the independence of the events (An)n∈N we obtain

P

⋂m≥n

Acm

= limN→∞

P

(N⋂

m=n

Acm

)

= limN→∞

N∏m=n

P (Acm)

= limN→∞

N∏m=n

(1− P (Am))

≤ lim infN→∞

N∏m=n

e−P (Am)

= lim infN→∞

exp

(−

N∑m=n

P (Am)

)= 0.

Moreover, we need the following lemma:

64

Lemma 18.2.

P

(sups∈[0,t]

Bs ≥ αt

)≤ e−α2t/2.

Proof. Clearly sups∈[0,t]

Bs ≥ αt

=

sups∈[0,t]

(Bs − αt/2) ≥ αt/2

⊆

sups∈[0,t]

(Bs − αs/2) ≥ αt/2

=

sups∈[0,t]

eαBs−α2s/2 ≥ eα2t/2

.

We now apply the first maximal inequality to the martingale (eαBt−α2t/2)t∈R+ and obtain

P

(sups∈[0,t]

eαBs−α2s/2 ≥ eα2t/2

)≤ e−α

2t/2E[eαBt−α

2t/2]

= e−α2t/2.

Theorem 18.3 (Law of the iterated logarithm for the short-time behavior of Brownianmotion). Let (Bt)t∈R+ be a one-dimensional standard Brownian motion. Then almostsurely

lim supt→0

Bt√2t log log 1/t

= 1,

lim inft→0

Bt√2t log log 1/t

= −1.

Remark 18.4.

1. For t ∈ (0, 1/e) we have 1/t > e, hence log 1/t > 1, and consequently log log 1/t > 0.Consequently, the functions

ψ(t) := t log log 1/t

andϕ(t) :=

√2t log log 1/t

are defined for all t ∈ (0, 1/e). Moreover,

ψ′(t) = log log 1/t+ t1

log 1/tt

(− 1

t2

)= log log 1/t− 1

log 1/t,

so that for t ≤ e−eψ′(t) ≥ 1− 1/e > 0.

Hence ϕ is strictly increasing on the interval (0, e−e). In the sequel we will only workon this interval.

2. As t→ 0, log log 1/t→∞, but only extremely slowly. For example, log log 103 ≈ 1.9,log log 106 ≈ 2.6, log log 10100 ≈ 5.4. In particular, t log log 1/t→ 0 as t→ 0.

65

3. Picture!!

4. The result implies that almost every path of Brownian motion is not even pointwiseHolder continuous of exponent 1/2 and in particular not differentiable.

5. Since the process (−Bt)t∈R+ is a standard Brownian motion as well, it suffices toprove the first statement.

6. There is also a law of the iterated logarithm for the long-time behavior of Brownianmotion which can be obtained from the short-time law by time inversion.

Proof of Theorem 18.5. We will show that for every ε > 0,

P

lim supt→0

Btϕ(t)

> 1 + ε

= 0,

P

lim supt→0

Btϕ(t)

< 1− ε

= 0.

To prove the first of these statements let tn := (1 + ε)−n. Since tn 0, there existsn0 ∈ N such that for all n ≥ n0 we have tn < e−e. For n ≥ n0 let

Cn :=

sup

t∈[tn+1,tn]

Btϕ(t)

> 1 + ε

Then each ω ∈

lim supt→0Btϕ(t) > 1 + ε

is contained in infinitely many of the events Cn,

i.e. lim supt→0

Btϕ(t)

> 1 + ε

⊆⋂n≥n0

⋃m≥n

Cm.

In order to apply the Borel-Cantelli lemma, we have to show that

∞∑n=n0

P (Cn) <∞.

To do so, note that since ϕ is increasing on the interval (0, e−e),

Cn ⊆

supt∈[0,tn]Bt

ϕ(tn+1)> 1 + ε

=

sup

t∈[0,tn]Bt > (1 + ε)ϕ(tn+1)

,

so that

P (Cn) ≤ P

sup

t∈[0,tn]Bt > (1 + ε)ϕ(tn+1)

= P

sup

t∈[0,tn]Bt >

(1 + ε)ϕ(tn+1)

tntn

≤ exp

(−(1 + ε)2ϕ(tn+1)2

2tn

)= exp

(−(1 + ε)2 2(1 + ε)−(n+1) log log((1 + ε)n+1)

2(1 + ε)−n

)= exp

(−(1 + ε) log log((1 + ε)n+1)

)= exp (−(1 + ε) log((n+ 1) log(1 + ε)))

= ((n+ 1) log(1 + ε))−(1+ε)

= log(1 + ε)−(1+ε) (n+ 1)−(1+ε),

66

and consequently∞∑

n=n0

P (Cn) <∞,

so that

P

(lim supt→0

Btϕ(t)

> 1 + ε

)≤ P

⋂n≥n0

⋃m≥n

Cm

= 0.

To prove that

P

lim supt→0

Btϕ(t)

< 1− ε

= 0,

let δ := ε/2 and q := ε2/64, so that

δ + 2(1 + δ)√q =

ε

2+

(2 + ε)ε

8< ε

if ε ≤ 1, and

α :=2(1− δ)2

1− q=

2(1− ε/2)2

1− ε2/64=

2(1− ε/2)2

(1− ε/8)(1 + ε/8)<

2(1− ε/2)

(1− ε/8)< 2.

Since q < 1, the sequence (tn)n∈N defined by tn := qn is strictly decreasing and convergesto 0. The random variables Zn := Btn − Btn+1 are independent. Since Zn is normallydistributed with mean 0 and variance tn − tn+1, Zn/

√tn − tn+1 is standard normally

distributed, so that for each x ≥ 1 we obtain

P(Zn > x

√tn − tn+1

)=

1√2π

∫ ∞x

e−y2

2 dy

>1√2π

∫ ∞x

1 + y−2

1 + x−2e−

y2

2 dy

=1√

2π (1 + x−2)

(∫ ∞x

e−y2

2 dy +

∫ ∞x

1

y2e−

y2

2 dy

)=

1√2π (1 + x−2)

(∫ ∞x

e−y2

2 dy +

[−1

ye−

y2

2

]y=∞

y=x

−∫ ∞x

e−y2

2 dy

)=

1√2π

1

x(1 + x−2)e−

x2

2

>1√2π

1

2xe−

x2

2 .

We apply this inequality to

xn := (1− δ) ϕ(tn)√tn − tn+1

=(1− δ)

√2qn log log q−n√qn − qn+1

=(1− δ)

√2√

1− q√

log(n log 1/q)

=√α√

log(n log 1/q).

67

Clearly xn →∞ as n→∞; in particular xn > 1 for all sufficiently large n. It follows thatfor all n > log 1/q we have

P (Zn > (1− δ)ϕ(tn)) = P(Zn > xn

√tn − tn+1

)>

1√2π

1

2xne−

x2n2

=1

2√

2π

exp(−α

2 log(n log 1/q))√

α log(n log 1/q)

>1

2√

2π

exp(−α

2 (log n+ log 1/q))√

α log(n2)

=exp

(−α

2 log 1/q)

2√

2π√

2α

exp(−α

2 log n)

√log n

= Cn−α/21√

log n.

Since α < 2, the series∑∞

n=2 n−α/2 1√

logndiverges, so that the second part of the Borel-

Cantelli lemma implies that

P

⋂n∈N

⋃m≥nZn > (1− δ)ϕ(tn)

= 1,

i.e. with probability 1 we have

Zn > (1− δ)ϕ(tn) for infinitely many n.

Consequently,

Btnϕ(tn)

=Btn+1 + Zn

ϕ(tn)> 1− δ +

Btn+1

ϕ(tn)for infinitely many n.

Since by the upper estimate with probability 1

lim supt→0

−Btϕ(t)

< 1 + δ,

we haveBtn+1 > −(1 + δ)ϕ(tn+1)

for infinitely many n and therefore

Btnϕ(tn)

> 1− δ − (1 + δ)ϕ(tn+1)

ϕ(tn)for infinitely many n.

Since

ϕ(tn+1)

ϕ(tn)=

√2qn+1 log log 1/qn+1√

2qn log log 1/qn=√q

√log log 1/qn+1

log log 1/qn︸︷︷︸→1

→ √q,

it follows that

Btnϕ(tn)

> 1− δ − 2(1 + δ)√q > 1− ε for infinitely many n

(because δ and q were chosen in such a way that δ + 2(1 + δ)√q < ε).

68

Corollary 18.5 (Law of the iterated logarithm for the long-time behavior of Brownianmotion). Let (Bt)t∈R+ be a one-dimensional standard Brownian motion. Then almostsurely

lim supt→∞

Bt√2t log log t

= 1,

lim inft→∞

Bt√2t log log t

= −1.

Proof. We know that thew process (Yt)t∈R+ defined by

Yt :=

tB1/t if t > 0

0 if t = 0.

is a standard Brownian motion as well. Consequently,

1 = lim supt→0

Yt√2t log log 1/t

= lim supt→0

tB1/t√2t log log 1/t

= lim supt→0

B1/t√2/t log log 1/t

= lim supt→∞

Bt√2t log log t

.

69

probability (martingale theory) › fileadmin › wt › inhalt › people › robert_philip… ·...

Documents