m5a42 applied stochastic processes - imperial college …pavl/m5a42lec1.pdf · course overview...

Course Overview Course Outline Bibliography Introduction Elements of Probability Theory

M5A42 APPLIED STOCHASTIC

PROCESSES

Professor G.A. Pavliotis

Department of Mathematics Imperial College London, UK

LECTURE 1 06/10/2016


Lectures: Thursdays 14:00-15:00, Huxley 140, Fridays

10:00-12:00, Huxley 130.

Office Hours: Thursdays 15:00-16:00, Fridays 13:00-14:00

or by appointment.

Course webpage:

http://www.ma.imperial.ac.uk/~pavl/M4A42.htm

Text: Lecture notes, available from the course webpage.

Also, recommended reading from various textbooks.

http://www.ma.imperial.ac.uk/~pavl/M4A42.htm


This is an introductory course on stochastic processes and

their applications, aimed towards students in applied

mathematics.

The emphasis of the course will be on the presentation of

analytical tools that are useful in the study of stochastic

models that appear in various problems in applied

mathematics, physics, chemistry and biology.

Numerical methods for stochastic processes are presented

in the course M5A44 Computational Stochastic Processes

that is offered in Term 2.

This is a year-long introductory graduate level course

on stochastic processes: the analytical techniques that

will be presented in Term I (M5A42) will provide the

necessary theoretical background for the development of

the computational techniques for studying stochastic

processes that will be developed in Term II (M5A44).


Prerequisites

Elementary probability theory.

Ordinary and partial differential equations.Linear algebra.

Some familiarity with analysis (measure theory, linearfunctional analysis) is desirable but not necessary.

Course Objectives

By the end of the course you are expected to be familiar

with the basic concepts of the theory of stochasticprocesses in continuous time and to be able to use various

analytical techniques to study stochastic models that

appear in applications.


Course assessment

Final exam (May/June 2017).There will be no assessed coursework.

Problem Sheets–Feedback

Problem sheets and solutions are already available from

the course webpage.

Problem classes/office hours.

Please do contact me, come to my office during office

hours.

Student Evaluation Forms.


Probability theory and random variables (2 lectures).

Basic definitions, probability spaces, probability measures

etc. Random variables, conditional expectation,

characteristic functions, limits theorems.

Stochastic processes (6 lectures). Basic definitions.

Brownian motion. Stationary processes. Other examples

of stationary processes. The Karhunen-Loeve expansion.

Markov processes (4 lectures). Introduction and

examples. Basic definitions. The Chapman-Kolmogorov

equation. The generator of a Markov process and its

adjoint. Ergodic and stationary Markov processes.

Diffusion processes (4 lectures). Basic definitions and

examples. The backward and forward (Fokker-Planck)

Kolmogorov equations. Connection between diffusion

processes and stochastic differential equations.


Stochastic Differential Equations (6 lectures). Basic

properties of SDEs. Itô’s formula. Linear SDEs. SDEs with

multiplicative noise.

The Fokker-Planck equation (6 lectures). Basic

properties of the FP equation. Examples of diffusion

processes and of the FP equation. The

Ornstein-Uhlenbeck process. Gradient flows and

eigenfunction expansions.

Exit problems for diffusion processes The mean first

passage time. One dimensional examples. Escape from a

potential well. Stochastic resonance.


Lecture notes will be provided for all the material that we

will cover in this course. The notes will be available from

the course webpage.

The notes are based on my book Stochastic processes

and applications : diffusion processes, the Fokker-Planck

and Langevin equations. It is available from the Central

Library, 519.23 PAV. The material relevant for this course

will be available from the course webpage.

There are many excellent textbooks/review articles on

applied stochastic processes, at a level and style similar to

that of this course.

Standard textbooks that cover the material on probability

theory, Markov chains and stochastic processes are:

Grimmett and Stirzaker: Probability and Random

Processes.Karlin and Taylor: A First Course in Stochastic Processes.

Lawler: Introduction to Stochastic Processes.Resnick: Adventures in Stochastic Processes.


Books on stochastic processes with a view towardsapplications, mostly to physics, are:

Horsthemke and Lefever: Noise induced transitions.

Risken: The Fokker-Planck equation.Gardiner: Handbook of stochastic methods.

van Kampen: Stochastic processes in physics and

chemistry.Mazo: Brownian motion: fluctuations, dynamics and

applications.

Chorin and Hald: Stochastic tools for mathematics andscience.

Gillespie; Markov Processes.


The rigorous mathematical theory of probability andstochastic processes is presented in

Koralov and Sinai: Theory of probability and random

processes.Karatzas and Shreeve: Brownian motion and stochastic

calculus.Revuz and Yor: Continuous martingales and Brownian

motion.

Stroock: Probability theory, an analytic view.

Books on stochastic differential equations and theirnumerical solution are

Oksendal: Stochastic differential equations.

Kloeden and Platen, Numerical Solution of StochasticDifferential Equations.

An excellent book on the theory and the applications ofstochastic processes is

Bhatthacharya and Waymire: Stochastic processes andapplications.


A stochastic process is used to model systems thatevolve in time and whose laws of evolution are probabilisticin nature.

The state of the system evolves in time and can bedescribed through a state variable x(t).The evolution of the state of the system depends on the

outcome of an experiment. We can write x = x(t , ω), whereω denotes the outcome of the experiment.

Examples:

The random walk in one dimension.

Brownian motion.The exchange rate between the British sterling and the US

dollar.Photon emission.

The spread of the SARS epidemic.


The One-Dimensional Random Walk

We let time be discrete, i.e. t = 0, 1, . . . . Consider the following

stochastic process Sn:

S0 = 0;

at each time step it moves to ±1 with equal probability 12.

In other words, at each time step we flip a fair coin. If the

outcome is heads, we move one unit to the right. If the outcome

is tails, we move one unit to the left.

Alternatively, we can think of the random walk as a sum of

independent random variables:

Sn =

n∑

j=1

Xj ,

where Xj ∈ −1,1 with P(Xj = ±1) = 12 .


We can simulate the random walk on a computer:

We need a (pseudo)random number generator to

generate n independent random variables which are

uniformly distributed in the interval [0,1].

If the value of the random variable is >12

then the particle

moves to the left, otherwise it moves to the right.

We then take the sum of all these random moves.

The sequence SnNn=1 indexed by the discrete time

T = 1, 2, . . .N is the path of the random walk. We use a

linear interpolation (i.e. connect the points n,Sn by

straight lines) to generate a continuous path.


0 5 10 15 20 25 30 35 40 45 50

−6

−4

−2

0

2

4

6

8

50−step random walk

Figure: Three paths of the random walk of length N = 50.


0 100 200 300 400 500 600 700 800 900 1000

−50

−40

−30

−20

−10

0

10

20

1000−step random walk

Figure: Three paths of the random walk of length N = 1000.


Every path of the random walk is different: it depends on

the outcome of a sequence of independent random

experiments.

We can compute statistics by generating a large number of

paths and computing averages. For example,

E(Sn) = 0, E(S2n) = n.

The paths of the random walk (without the linear

interpolation) are not continuous: the random walk has a

jump of size 1 at each time step.

This is an example of a discrete time, discrete space

stochastic processes.

The random walk is a time-homogeneous (the

probabilistic law of evolution is independent of time)

Markov (the future depends only on the present and not

on the past) process.

If we take a large number of steps, the random walk starts

looking like a continuous time process with continuous

paths.


Consider the sequence of continuous time stochastic

processes

Z nt :=

1√n

Snt .

In the limit as n → ∞, the sequence Z nt converges (in

some appropriate sense) to a Brownian motion with

diffusion coefficient D = ∆x2

2∆t = 12 .


0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

t

U(t)

mean of 1000 paths5 individual paths

Figure: Sample Brownian paths.


Brownian motion W (t) is a continuous time stochastic

processes with continuous paths that starts at 0

(W (0) = 0) and has independent, normally. distributed

Gaussian increments.

We can simulate the Brownian motion on a computer using

a random number generator that generates normally

distributed, independent random variables.


We can write an equation for the evolution of the paths of a

Brownian motion Xt with diffusion coefficient D starting at

x:

dXt =√

2DdWt , X0 = x .

This is an example of a stochastic differential equation.

The probability of finding Xt at y at time t , given that it was

at x at time t = 0, the transition probability density

ρ(y , t) satisfies the PDE

∂ρ

∂t= D

∂2ρ

∂y2, ρ(y ,0) = δ(y − x).

This is an example of the Fokker-Planck equation.

The connection between Brownian motion and the

diffusion equation was made by Einstein in 1905.


Why introduce randomness in the description of physical

systems?

To describe outcomes of a repeated set of experiments.

Think of tossing a coin repeatedly or of throwing a dice.

To describe a deterministic system for which we haveincomplete information: we have imprecise knowledge ofinitial and boundary conditions or of model parameters.

ODEs with random initial conditions are equivalent tostochastic processes that can be described using

stochastic differential equations.

To describe systems for which we are not confident about

the validity of our mathematical model (uncertainty

quantification).


To describe a dynamical system exhibiting very

complicated behavior (chaotic dynamical systems).

Determinism versus predictability.

To describe a high dimensional deterministic system using

a simpler, low dimensional stochastic system. Think of the

physical model for Brownian motion (a heavy particle

colliding with many small particles).

To describe a system that is inherently random. Think of

quantum mechanics.


ELEMENTS OF PROBABILITY THEORY


Definition

The set of all possible outcomes of an experiment is called the

sample space and is denoted by Ω.

Example

The possible outcomes of the experiment of tossing a coin

are H and T . The sample space is Ω =

H, T

.

The possible outcomes of the experiment of throwing a die

are 1, 2, 3, 4, 5 and 6. The sample space is

Ω =

1, 2, 3, 4, 5, 6

.


Definition

A collection F of Ω is called a field on Ω if

1 ∅ ∈ F ;

2 if A ∈ F then Ac ∈ F ;

3 If A, B ∈ F then A ∪ B ∈ F .

From the definition of a field we immediately deduce that F is

closed under finite unions and finite intersections:

A1, . . .An ∈ F ⇒ ∪ni=1Ai ∈ F , ∩n

i=1Ai ∈ F .


When Ω is infinite dimensional then the above definition is not

appropriate since we need to consider countable unions of

events.

Definition

A collection F of Ω is called a σ-field or σ-algebra on Ω if

1 ∅ ∈ F ;

2 if A ∈ F then Ac ∈ F ;

3 If A1, A2, · · · ∈ F then ∪∞i=1Ai ∈ F .

A σ-algebra is closed under the operation of taking countable

intersections.

Example

F =

∅, Ω

.

F =

∅, A, Ac, Ω

where A is a subset of Ω.

The power set of Ω, denoted by 0,1Ω which contains all

subsets of Ω.


Let F be a collection of subsets of Ω. It can be extended to

a σ−algebra (take for example the power set of Ω).

Consider all the σ−algebras that contain F and take their

intersection, denoted by σ(F), i.e. A ⊂ Ω if and only if it is

in every σ−algebra containing F . σ(F) is a σ−algebra. It

is the smallest algebra containing F and it is called the

σ−algebra generated by F .

Example

Let Ω = Rn. The σ-algebra generated by the open subsets of

Rn (or, equivalently, by the open balls of Rn) is called the Borel

σ-algebra of Rn and is denoted by B(Rn).


Let X be a closed subset of Rn. Similarly, we can define

the Borel σ-algebra of X , denoted by B(X ).

A sub-σ–algebra is a collection of subsets of a σ–algebra

which satisfies the axioms of a σ–algebra.

The σ−field F of a sample space Ω contains all possible

outcomes of the experiment that we want to study.

Intuitively, the σ−field contains all the information about the

random experiment that is available to us.


Definition

A probability measure P on the measurable space (Ω, F) is

a function P : F 7→ [0,1] satisfying

1 P(∅) = 0, P(Ω) = 1;

2 For A1, A2, . . . with Ai ∩ Aj = ∅, i 6= j then

P(∪∞i=1Ai) =

∞∑

i=1

P(Ai).

Definition

The triple(

Ω, F , P)

comprising a set Ω, a σ-algebra F of

subsets of Ω and a probability measure P on (Ω, F) is a called

a probability space.


Example

A biased coin is tossed once:

Ω = H, T, F = ∅, H, T , Ω = 0,1, P : F 7→ [0,1] such

that P(∅) = 0, P(H) = p ∈ [0,1], P(T ) = 1 − p, P(Ω) = 1.

Example

Take Ω = [0,1], F = B([0,1]), P = Leb([0,1]). Then (Ω,F ,P)is a probability space.


Definition

A family Ai : i ∈ I of events is called independent if

P(

∩j∈J Aj

)

= Πj∈JP(Aj)

for all finite subsets J of I.

When two events A, B are dependent it is important to

know the probability that the event A will occur, given that

B has already happened. We define this to be conditional

probability, denoted by P(A|B). We know from elementary

probability that

P(A|B) =P(A ∩ B)

P(B).


Definition

A family of events Bi : i ∈ I is called a partition of Ω if

Bi ∩ Bj = ∅, i 6= j and ∪i∈I Bi = Ω.

Theorem

Law of total probability. For any event A and any partition

Bi : i ∈ I we have

P(A) =∑

i∈I

P(A|Bi)P(Bi).


Let (Ω,F ,P) be a probability space and fix B ∈ F . Then

P(·|B) defines a probability measure on F :

P(∅|B) = 0, P(Ω|B) = 1

and (since Ai ∩ Aj = ∅ implies that (Ai ∩ B) ∩ (Aj ∩ B) = ∅)

P(∪∞j=1Ai |B) =

∞∑

j=1

P(Ai |B),

for a countable family of pairwise disjoint sets Aj+∞j=1 .

Consequently, (Ω,F ,P(·|B)) is a probability space for

every B ∈ F .


The function of the outcome of an experiment is a random

variable, that is, a map from Ω to R.

Definition

A sample space Ω equipped with a σ−field of subsets F is

called a measurable space.

Definition

Let (Ω,F) and (E ,G) be two measurable spaces. A function

X : Ω → E such that the event

ω ∈ Ω : X (ω) ∈ A =: X ∈ A (1)

belongs to F for arbitrary A ∈ G is called a measurable function

or random variable.


When E is R equipped with its Borel σ-algebra, then (1) can by

replaced with

X 6 x ∈ F ∀x ∈ R.

Let X be a random variable (measurable function) from

(Ω,F , µ) to (E ,G). If E is a metric space then we may define

expectation with respect to the measure µ by

E[X ] =

∫

ΩX (ω)dµ(ω).

More generally, let f : E 7→ R be G–measurable. Then,

E[f (X )] =

∫

Ωf (X (ω))dµ(ω).


Let U be a topological space. We will use the notation B(U) to

denote the Borel σ–algebra of U: the smallest σ–algebra

containing all open sets of U. Every random variable from a

probability space (Ω,F , µ) to a measurable space (E ,B(E))induces a probability measure on E :

µX (B) = PX−1(B) = µ(ω ∈ Ω;X (ω) ∈ B), B ∈ B(E). (2)

The measure µX is called the distribution (or sometimes the

law) of X .


Example

Let I denote a subset of the positive integers. A vector

ρ0 = ρ0,i , i ∈ I is a distribution on I if it has nonnegative

entries and its total mass equals 1:∑

i∈I ρ0,i = 1.


Consider the case where E = R equipped with the Borel

σ−algebra. In this case a random variable is defined to be a

function X : Ω → R such that

ω ∈ Ω : X (ω) 6 x ⊂ F ∀x ∈ R.

We can now define the probability distribution function of X ,

FX : R → [0,1] as

FX (x) = P(

ω ∈ Ω∣

∣X (ω) 6 x)

=: P(X 6 x). (3)

In this case, (R,B(R),FX ) becomes a probability space.


The distribution function FX (x) of a random variable has the

properties that limx→−∞ FX (x) = 0, limx→+∞ F (x) = 1 and is

right continuous.

Definition

A random variable X with values on R is called discrete if it

takes values in some countable subset x0, x1, x2, . . . of R.

i.e.: P(X = x) 6= x only for x = x0, x1, . . . .


With a random variable we can associate the probability mass

function pk = P(X = xk ). We will consider nonnegative integer

valued discrete random variables. In this case

pk = P(X = k), k = 0,1,2, . . . .

Example

The Poisson random variable is the nonnegative integer valued

random variable with probability mass function

pk = P(X = k) =λk

k!e−λ, k = 0,1,2, . . . ,

where λ > 0.


Example

The binomial random variable is the nonnegative integer valued

random variable with probability mass function

pk = P(X = k) =N!

n!(N − n)!pnqN−n k = 0,1,2, . . .N,

where p ∈ (0,1), q = 1 − p.


Definition

A random variable X with values on R is called continuous if

P(X = x) = 0 ∀x ∈ R.

Let (Ω,F ,P) be a probability space and let X : Ω → R be a

random variable with distribution FX . This is a probability

measure on B(R). We will assume that it is absolutely

continuous with respect to the Lebesgue measure with density

ρX : FX (dx) = ρ(x)dx . We will call the density ρ(x) the

probability density function (PDF) of the random variable X .


Example

1 The exponential random variable has PDF

f (x) =

λe−λx x > 0,0 x < 0,

with λ > 0.

2 The uniform random variable has PDF

f (x) =

1b−a a < x < b,

0 x /∈ (a,b),

with a < b.


Definition

Two random variables X and Y are independent if the events

ω ∈ Ω |X (ω) 6 x and ω ∈ Ω |Y (ω) 6 y are independent for

all x , y ∈ R.

Let X , Y be two continuous random variables. We can view

them as a random vector, i.e. a random variable from Ω to R2.

We can then define the joint distribution function

F (x , y) = P(X 6 x , Y 6 y).

The mixed derivative of the distribution function

fX ,Y (x , y) :=∂2F∂x∂y (x , y), if it exists, is called the joint PDF of the

random vector X , Y:

FX ,Y (x , y) =

∫ x

−∞

∫ y

−∞fX ,Y (x , y)dxdy .


If the random variables X and Y are independent, then

FX ,Y (x , y) = FX (x)FY (y)

and

fX ,Y (x , y) = fX (x)fY (y).

The joint distribution function has the properties

FX ,Y (x , y) = FY ,X (y , x),

FX ,Y (+∞, y) = FY (y), fY (y) =

∫ +∞

−∞fX ,Y (x , y)dx .


We can extend the above definition to random vectors of

arbitrary finite dimensions. Let X be a random variable from

(Ω,F , µ) to (Rd ,B(Rd )). The (joint) distribution function

FXRd → [0,1] is defined as

FX (x) = P(X 6 x).

Let X be a random variable in Rd with distribution function

f (xN) where xN = x1, . . . xN. We define the marginal or

reduced distribution function f N−1(xN−1) by

f N−1(xN−1) =

∫

R

f N(xN)dxN .

We can define other reduced distribution functions:

f N−2(xN−2) =

∫

R

f N−1(xN−1)dxN−1 =

∫

R

∫

R

f (xN)dxN−1dxN .


We can use the distribution of a random variable to compute

expectations and probabilities:

E[f (X )] =

∫

R

f (x)dFX (x) (4)

and

P[X ∈ G] =

∫

G

dFX (x), G ∈ B(E). (5)

The above formulas apply to both discrete and continuous

random variables, provided that we define the integrals in (4)

and (5) appropriately.

When E = Rd and a PDF exists, dFX (x) = fX (x)dx , we have

FX (x) := P(X 6 x) =

∫ x1

−∞. . .

∫ xd

−∞fX (x)dx ..


Example (Normal Random Variables)

Consider the random variable X : Ω 7→ R with pdf

γσ,m(x) := (2πσ)−12 exp

(

−(x − m)2

2σ

)

.

Such an X is termed a Gaussian or normal random

variable. The mean is

EX =

∫

R

xγσ,m(x)dx = m

and the variance is

E(X − m)2 =

∫

R

(x − m)2γσ,m(x)dx = σ.


Example (Normal Random Variables contd.)

Let m ∈ Rd and Σ ∈ R

d×d be symmetric and positive

definite. The random variable X : Ω 7→ Rd with pdf

γΣ,m(x) :=(

(2π)d detΣ)− 1

2exp

(

−1

2〈Σ−1(x − m), (x − m)〉

)

is termed a multivariate Gaussian or normal random

variable. The mean is

E(X ) = m (6)

and the covariance matrix is

E

(

(X − m)⊗ (X − m))

= Σ. (7)


Let X , Y be random variables we want to know whether they

are correlated and, if they are, to calculate how correlated they

are. We define the covariance of the two random variables as

cov(X ,Y ) = E[

(X − EX )(Y − EY )]

= E(XY )− EXEY .

The correlation coefficient is

ρ(X ,Y ) =cov(X ,Y )

√

var(X )√

var(X )(8)

The Cauchy-Schwarz inequality yields that ρ(X ,Y ) ∈ [−1,1].We will say that two random variables X and Y are

uncorrelated provided that ρ(X ,Y ) = 0. It is not true in general

that two uncorrelated random variables are independent. This

is true, however, for Gaussian random variables.


Assume that E|X | < ∞ and let G be a sub–σ–algebra of F . The

conditional expectation of X with respect to G is defined to be

the function E[X |G] : Ω 7→ E which is G–measurable and

satisfies∫

G

E[X |G]dµ =

∫

G

X dµ ∀G ∈ G.

We can define E[f (X )|G] and the conditional probability

P[X ∈ F |G] = E[IF (X )|G], where IF is the indicator function of F ,

in a similar manner.


φ(t) =

∫

R

eitλ dF (λ) = E(eitX ). (9)

For a continuous random variable for which the distribution

function F has a density, dF (λ) = p(λ)dλ, (9) gives

φ(t) =

∫

R

eitλp(λ)dλ.

For a discrete random variable for which P(X = λk ) = αk , (9)

gives

φ(t) =

∞∑

k=0

eitλk ak .


The characteristic function determines uniquely the distribution

function of the random variable, in the sense that there is a

one-to-one correspondance between F (λ) and φ(t).

Lemma

Let X1,X2, . . .Xn be independent random variables with

characteristic functions φj(t), j = 1, . . . n and let Y =∑n

j=1 Xj

with characteristic function φY (t). Then

φY (t) = Πnj=1φj(t).

Lemma

Let X be a random variable with characteristic function φ(t) and

assume that it has finite moments. Then

E(X k) =1

ikφ(k)(0).


Theorem

Let b ∈ Rn and Σ ∈ R

n×n a symmetric and positive definite

matrix. Let X be the multivariate Gaussian random variable with

probability density function

γ(x) =1

Zexp

(

−1

2〈Σ−1(x − b),x − b〉

)

.

Then

1 The normalization constant is Z = (2π)n/2√

det(Σ).

2 The mean vector and covariance matrix of X are given by

EX = b and E((X − EX)⊗ (X − EX)) = Σ.

3 The characteristic function of X is

φ(t) = ei〈b,t〉− 12〈t,Σt〉.


One of the most important aspects of the theory of random

variables is the study of limit theorems for sums of random

variables.

The most well known limit theorems in probability theory

are the law of large numbers and the central limit

theorem.

There are various different types of convergence for

sequences or random variables.


Definition

Let Zn∞n=1 be a sequence of random variables. We will say

that

(a) Zn converges to Z with probability one if

P(

limn→+∞ Zn = Z)

= 1.

(b) Zn converges to Z in probability if for every ε > 0

limn→+∞ P(

|Zn − Z | > ε)

= 0.

(c) Zn converges to Z in Lp if limn→+∞ E[∣

∣Zn − Z∣

∣

p]= 0.

(d) Let Fn(λ),n = 1, · · ·+∞, F (λ) be the distribution functions

of Zn n = 1, · · ·+∞ and Z , respectively. Then Zn converges

to Z in distribution if limn→+∞ Fn(λ) = F (λ) for all λ ∈ R at

which F is continuous.


The distribution function FX of a random variable from a

probability space (Ω,F ,P) to R induces a probability measure

on R and that (R,B(R),FX ) is a probability space. We can

show that the convergence in distribution is equivalent to the

weak convergence of the probability measures induced by the

distribution functions.

Definition

Let (E ,d) be a metric space, B(E) the σ−algebra of its Borel

sets, Pn a sequence of probability measures on (E ,B(E)) and

let Cb(E) denote the space of bounded continuous functions on

E . We will say that the sequence of Pn converges weakly to the

probability measure P if, for each f ∈ Cb(E),

limn→+∞

∫

E

f (x)dPn(x) =

∫

E

f (x)dP(x).


Theorem

Let Fn(λ),n = 1, · · ·+∞, F (λ) be the distribution functions of

Zn n = 1, · · · +∞ and Z , respectively. Then Zn converges to Z

in distribution if and only if, for all g ∈ Cb(R)

limn→+∞

∫

X

g(x)dFn(x) =

∫

X

g(x)dF (x). (10)

Remark

(10) is equivalent to

En(g) = E(g),

where En and E denote the expectations with respect to Fn and

F, respectively.


When the sequence of random variables whose

convergence we are interested in takes values in Rn or,

more generally, a metric space space (E ,d) then we can

use weak convergence of the sequence of probability

measures induced by the sequence of random variables to

define convergence in distribution.

Definition

A sequence of real valued random variables Xn defined on a

probability spaces (Ωn,Fn,Pn) and taking values on a metric

space (E ,d) is said to converge in distribution if the indued

measures Fn(B) = Pn(Xn ∈ B) for B ∈ B(E) converge weakly

to a probability measure P.


Let Xn∞n=1 be iid random variables with EXn = V . Then,

the strong law of large numbers states that average of

the sum of the iid converges to V with probability one:

P(

limn→+∞

1

N

N∑

n=1

Xn = V)

= 1.

The strong law of large numbers provides us with

information about the behavior of a sum of random

variables (or, a large number or repetitions of the same

experiment) on average.

We can also study fluctuations around the average

behavior.


let E(Xn − V )2 = σ2. Define the centered iid random

variables Yn = Xn − V . Then, the sequence of random

variables 1

σ√

N

∑Nn=1 Yn converges in distribution to a

N (0,1) random variable:

limn→+∞

P

(

1

σ√

N

N∑

n=1

Yn 6 a

)

=

∫ a

−∞

1√2π

e− 12

x2

dx .

This is the central limit theorem.

m5a42 applied stochastic processes - imperial college …pavl/m5a42lec1.pdf · course overview...

Documents