a review of the discrete time markov chains on a …pejman/intro2prob/amazon/live1.pdf · pejman...

A Review of the Discrete Time Markov Chains ona Finite State Space

Pejman Mahboubi

December 12, 2011

Pejman Mahboubi A Review of the Discrete Time Markov Chains on a Finite State Space

Conditional Probability

I A man observes a race between three horses a,b and c. Hefeels that a and b have the same chance of wining bout c istwice as likely to win as a. Assume the man learns horse b isnot going to run. What is the probability of wining for a andc? i.e we are looking for P(a|bc) and P(c|bc).

I The original probability space is Ω = a,b, c, withP(a) = P(b) = 1

4 and P(c) = 12 . After learning that b wont

run, we want to change the probability space.I Our new sample space is Ω′ = a, c. We don’t want to

divide P(b) equally between P(a), P(c).I We do this by rescaling P(a) and P(c). We want α s.t

αP(a) + αP(c) = 1

I Then α = 1P(bc ) . Then the new probability P(·|bc) is defined

by P(a|bc) = P(a)P(bc ) .


I Sometimes by removing an event like b (i.e conditioning onbc), parts of sets a or c are removed to, and we are left witha ∩ bc , and c ∩ bc . Therefore we should rescale theprobabilities of these two sets.

I This leads us to the general conditioning formula

P(A|B) =P(A ∩ B)

P(B).

I The expectation we respect to P(·|B) is called conditionalprobability and denoted by E[·|B].

I If the distribution pX (·) of a discrete r.v X with rangexn∞n=1 is given, then E[X |B] =

∑∞n=1 xnpX |B(x), where

pX |B(x) = P(X = xn|B)


Conditional independence

I Coins a and b show head with probabilities pa and pb. Wechoose one coin randomly and drop it twice. What is Ω?

Ω = a,b × H,T2

I To define P on Ω, we assume that given the coin, the firstand the second cast are independent. i.e

P(a,T ,H) = P(T ,H|a)P(a) = pa(1− pa)× 1

2

I Is under P the events A =First cast is H, and B = Secondcast is T , independent?

I Answer is No.(Check it for yourself!)

I But, by definition, Given the coin, A and B are independent.


Simple Random Walk (SRW)

I Let Xnn≥1 be iid Bernoulli:P(Xn = 1) = 12 = P(Xn = −1)

I Define S0 = X0 and Sn = Sn−1 + Xn = X0 + · · ·+ Xn

I X0 is a fixed integer, and specifies the starting point.

I This is not a MC on a FINITE space; the state space is Z.

I If X0 = 0. P(S8 = 8) = P(X1 = 1, · · · ,X8 = 1) = 128 .

I Clearly, S6 is not independent of S4, S6 = S4 + X5 + X6.

I Intuitively, given S5, S6 is independent of S2.

I Mathematicly, S6 and S2 are indep. w.r.t P(·|S4 = i) for anyi ∈ S.

P(S6 = 2, S2 = 0|S4 = i) = P(S6 = 2|S4 = i)P(S2 = 0|S4 = i)

Let us prove this for i = 2.


I Proof.

P(S6 = 2,S2 = 0|S4 = 2) =P(S6 = 2, S2 = 0,S4 = 2)

P(S2 = 0,S4 = 2)P(S2 = 0|S4 = 2)

P(S6 = 2,S2 = 0, S4 = 2)

P(S2 = 0,S4 = 2)=

P(S6 − S2 = 2, S4 − S2 = 2,S2 = 0)

P(S4 − S2 = 2,S2 = 0)

Notice that theS6 − S2 = 2,S4 − S2 = 2 = X5 + X6 = 0, X3 + X4 = 2.Therefore the set on the left is expressible by the r.v.’s X3, · · · ,X6.Therefore it is independent of S2. Therefor, the last fraction isequal to:


=P(S6 − S2 = 2,S4 − S2 = 2)

P(S4 − S2 = 2)=

P(S6 − S4 = 0,S4 − S2 = 2)

P(S4 − S2 = 2)

Similarly, S6 − S4 = 0 and S4 − S2 = 2) are independent

= P(S6 − S4 = 0) = P(S2 = 0).


-1

-2

1

2

3

1 2 3 4 5 6 7 8n

Sn

X1 = −1,X2 = 1,X3 = 1,X4 = 1,X5 = 1,X6 = −1

S •

-1

-2

1

2

3

1 2 3 4 5 6 7 8n

Sn

X1 = 1,X2 = −1,X3 = 1,X4 = 1,X5 = 1,X6 = −1

S •


I The blue bullet indicates the “conditioning”,i.e., we restrictthe sample space to all paths passing through this bluebullet.

I This obviously changes the probability law P on the space.

I Instead of re-writing the Ω, we let P be concentrated on therestricted area.

I So far we have

P(S6 = 2, S2 = 0,S4 = 2)

P(S2 = 0,S4 = 2)=

P(S4 = 2, S2 = 2)

P(S2 = 2)

=P(S4 = 2, S2 = 2)

P(S2 = 2)


Markov Chains (MC)-Transition probability matrix

I Let state space S be a finite set, and P(Xn ∈ S) = 1 ∀n ≥ 0.

I Define pi ,j = P(Xn+1 = j |Xn = i), note the time-homogeneity

P =

↓ S→1...n

1 · · · n p11 · · · p1n... · · ·

...pn1 · · · pnn

I Each rows is of TPM totals 1:

p11 + p12 + · · ·+ p1n = P(Xn+1 = 1|Xn = 1)

+P(Xn+1 = 2|Xn = 1) + · · ·+ P(Xn+1 = n|Xn = 1)

= P(Xn+1 ∈ S|Xn = 1) = 1

I Elements of TPM are nonnegative number: pi ,j ≥ 0 for all i , j .


Markov Property(MP)

I The transition matrix P, is not sufficient for describing aMarkov chain. We also need the Markov property:

P(Xn+1 = j |Xn = i ,Xn−1 = in−1, · · · ,X0 = i0)

= P(Xn+1 = j |Xn = i) = pij

I The MP says: probabilities pij apply whenever state i isvisited, no matter what happened in the past, and no matterhow state i was reached. OR

I Given the present state, the future is independent of the past.


The distribution after one step

I Assume S = 1, 2, and we are given P =

(α 1− α

1− β β

).

I Assume X0 = 1. What is the distribution of X1?

I By distribution of X1 we mean P(X1 = 1) and P(X1 = 2).P(X1 = 1|X0 = 1) = α,P(X1 = 2|X0 = 1) = 1− α

I Let φ2 denote the distribution of X1: φ2 = (α, 1− α) ∈ R2+.

I When S = 1, 2, the dist. of Xn is a vector in R2+.

I Let φn denote this vector, i.e φn has two coordinates:

φn = (φn(1), φn(2)),

where φn(i) = P(Xn = i).

I If we start from 1, i.e X0 = 1 i.e., φ0 = (1, 0), thenP(X1 = 1) = P(X1 = 1|X0 = 1) = p11, andP(X1 = 2) = P(X1 = 2|X0 = 1) = p12.

I If φ0 = (1, 0), then φ1 = (p11, p12)


Initial Distribution

I A and B are 2 cities. Assume that every year 40% of thepopulation of A moves to B, while 70% of B popl. moves toA. What is the popl. after 1 year, if A and B have initially .6and 2.4 mil. popl.?The vector (.2, .8) is denoted by φ0 ∈ R2

+I Let Xnn≥1 denote location of one individual: Xn ∈ A,B.

Assume initially all the population resides in A. What is thedistribution of population after one year?

I With this assumption, φ0 = (1, 0).

A B

.6

.4

.7

.3

P =

(.6 .4.7 .3

)

φ0 = (P(X0 ∈ A),P(X0 ∈ B)) = (1, 0)

φ1 = (.6, .4) = φ0PPejman Mahboubi A Review of the Discrete Time Markov Chains on a Finite State Space

I Then, if the initial dist. is φ0 = (1, 0),

φ1 = (P(X1 = 1|X0 = 1),P(X1 = 2|X0 = 1))

= (p11, p12) = (1, 0)

(.6 .4.7 .3

)= φ0P.

I Similarly, if the initial dist. is φ0 = (0, 1), then

φ1 = (P(X1 = 1|X0 = 2),P(X1 = 2|X0 = 2))

= (p21, p22) = (0, 1)

(.6 .4.7 .3

)= φ0P.

I If φ0 = (.2, .8), then φ1 = φ0

(.6 .4.7 .3

)= (.68, .32)


I What is the distribution of the popl. after 10 years, whenφ0 = (.2, .8)?

I We are looking for the dist. of X10, given we started from φ0:

φ10 = φ0P10 =

(.2, .8)

(0.428575 0.5714250.428569 0.571431

)= (0.428570, 0.571430)

I Therefore Pn gives the distribution of Xn.

I pnij denotes the ij element of Pn.

I Do you think that the population is reaching to anequilibrium?


Probability of the paths

I We choose a person at random, what is the probability thatthis person is from A and stays in A for 3 years, and thenmoves to B?

I X0 = X1 = X2 = A,X3 = B is a path with probability:

P(X0 = X1 = X2 = A,X3 = B)

= P(X3 = B|X0 = X1 = X2 = A)P(X0 = X1 = X2 = A)

= p12P(X2 = A|X0 = X1 = A)P(X0 = X1 = A)

= p12p11P(X1 = A|X0 = A)P(X0 = A)

= p12p11p11P(X0 = A) = .4× .6× .6× .2.

I If all the factors are nonzero, then the path is positive.


Probability of the paths

I We choose an individual from city A. What is the probabilitythat this person for the next 4 years moves every 2 years?

I We want to find P(X1 = A,X2 = B,X3 = B,X4 = A|X0 = A)

= P(X4 = A|X3 = X2 = B,X1 = X0 = A)×P(X3 = X2 = B,X1 = A|X0 = A)

=p21P(X3 = B|X2 = B,X1 = X0 = A)×P(X2 = B,X1 = A|X0 = A)

= p21p22P(X2 = B|X1 = X0 = A)× P(X1 = A|X0 = A)

= p21p22p12p11 = (.6)(.4)(.3)(.7)

I What happens if we choose a person at random, and try tofind the probability that he moves every 2 year? i.e

P(X0 = X1,X1 6= X2,X2 = X3,X3 6= X4)


n-step probability, Chapmann-Kolmogorov

I Nasim lives in A, what is the probability that she will be in Bin 10th year? P(X10 = B|X0 = A):

φ10 = (1, 0)P10 = (1, 0)

(0.428575 0.5714250.428569 0.571431

)=

I We have to find all the paths from X0 = A to X10 = B, andadd their probabilities

I We can put all the path in two categories:I path which pass through A at time 8: X8 = A,I path which pass through A at time 8: X8 = B.


P(X10 = B|X0 = A)

=∑i∈S

P(X10 = B|X8 = i ,X0 = A)P(X8 = i |X0 = A)

=∑i∈S

P(X10 = B|X8 = i)P(X8 = i |X0 = A)

I Let Pi (·) denote P(·|X0 = i) (don’t mistake it with pnij), then

p10AB =

∑i∈S

p8A,ip

2i ,B i ∈ A,B

In general, if x , y ∈ S are two states, then

pm+nx ,y =

∑z∈S

pmx ,zp

nz,y


I Similarly we have

Pi (Xn = j) =∑k∈S

Pi (Xn−1 = k)pk,j

I By repeating

Pi (Xn = j) =∑k∈S

Pi (Xn−1 = k)pk,j =∑k∈S

∑l∈S

Pi (Xn−2 = l)pl ,kpk,j

I If we continue we will get

Pi (Xn = j) =: pnij =

∑i1,··· ,in−1∈S

pii1pi1i2 · · · pin−1j

I The right-hand-side is the element ij of the matrix Pn.


I

I


Review

Consider a game in which two players A and B take turns to toss acoin and player who coms up with heads first wins the game.Supposing the player A starts off, let us consider the followingproblems:

1. What is the probability that A wins the game?

2. How many tosses are required on the average to end?

Ω =

ω1 = H, ω2 = TH,ω3 = TTH, · · ·ωn = TT · · ·THω∞ = TTT · · ·

,

and P on Ω is defined by P(ωn) = 12n .

A wins when ω1, ω3, ω5, · · · happens and B wins otherwise.


I Let WA and WB denote the events that A and B winsrespectively, e.g WA = ω1, ω3, · · · .

P(WA) =1

2+

(1

2

)3

+ · · · , P(WB) + P(B) = 1

P(WB) =

(1

2

)2

+

(1

2

)4

+ · · · , P(WB) = 2P(A)− 1

I Let N denote the length of the game. N : Ω→ N is defined by

N(ωn) = n, P(N = n) = P(ωn) =

(1

2

)n

I Therefore, EN =∑∞

n=1 nP(N = n) =∑∞

n=1 n(

12

)n.

I We know how to compute the last sum: EN = 2


Markov Chain

I We can model this game as a Markov chain with 3 stateS ,H,T , as follows

S

T H1/2

1/2

1/2

1/21

P =

0 .5 .50 .5 .50 0 1

I ESN = ES [N|X1 = T ]P(X1 = T ) + ES [N|X1 = H]P(X1 = H)

= 1 + ETN 12 + EHN 1

2 = 1 + ETN 12

I ETN = ET [N|X1 = T ]12 = 1 + ETN 1

2 ⇒ ETN = 2

I ESN = 2.


Probability of path, and n-step probability

I Assume we start from S . What is the probability of TTTH?

I X1 = T ,X2 = T ,X3 = T ,X4 = H is a path. We wantP(X1 = T ,X2 = T ,X3 = T ,X4 = H|X0 = S) =

P(X1 = T ,X2 = T ,X3 = T ,X4 = H,X0 = S)1

P(X0 = S)

= P(X4 = H|X1 = T ,X2 = T ,X3 = T ,X0 = S)×P(X1 = T ,X2 = T ,X3 = T ,X0 = S)

P(X0 = S)

= P(X4 = H|X3 = T )P(X1 = T ,X2 = T ,X3 = T ,X0 = S)

P(X0 = S)

= P(X1 = H|X0 = T )P(X1 = T ,X2 = T ,X3 = T |X0 = S)

p0,1P(X1 = T ,X2 = T ,X3 = T |X0 = S)

= · · · = pS ,TpT ,TpT ,TpT ,H = (.5)4


Markov Chain

I Let Xn∞n≥0 be a sequence of random variables taking valuesin the finite set

S = 1, · · · ,N

I Let 0 < m < n. Xnn≥0 is called a Markov Chain if

P(Xn = in|X0 = i0, · · · ,Xm = im) = P(Xn = in|Xm = im)(1)

I Equation (??) is called the Markov Property (MP).

I (??) states that if k < l < m, then Xk and Xm areindependent with respect to P(·|Xn).


Ω = H , TN

I The Cartesian product A× B of the sets A and B:

A× B = (x , y) : x ∈ A, y ∈ B

I If A = H,T, then A2 = (H,H), (H,T ), (T ,T ), (T ,T ).I A2 is the set of all sequences of length 2 comprised of

elements in A.

I An is the set of all sequences of length n, comprised ofelements in A.

TTTHTHHTHHHHTTTT ∈ H,T16

I By AN, where N is the set of natural numbers, we mean theset of all infinite sequences comprised of elements of A.


I If the player in the game above continue for ever, thenΩ = H,TN.

I If ω ∈ Ω, then ω is an infinite sequence

ω = ω1, ω2, ω3, · · ·

I We define a probability P on Ω by P(ω) = qn−1p, where n

n = infk : ωk = H

Define a random variable T on Ω by

T (ω) = infk : ωk = H

I We can translate the previous problem to this space as follows:

I A wins=T is odd. Average time of the game=ET .

I P(A wins) = P(T is odd).


Constructing iid Bernoulli(Kolmogorov’s ConsistencyTheorem)

I Let Ω = −1, 1N. i.e., if ω ∈ Ω, then ω is a sequence of −1and 1, like ω = 1, 1,−1, 1,−1,−1,−1, · · · .

I Let A1 denote the set of all ωs starting with 1. DefineP(A1) = 1

2 .I Let A2 denote the set of all ωs starting with 1, 1. Define

P(A2) =(

12

)2.

I In general let An denote the set of all ω starting witha1, · · · , an, where ai ∈ −1, 1. Define P(An) =

(12

)nI We claim that this specifies P(A) for “any” A ⊂ Ω.I Define Xn : Ω→ R by Xn(ω) = ωn.I You can check that Xn∞n=1 are iid, and Bernoulli.I Do this procedure for producing 2 Bernoulli iid r.v’s.I How can we construct infinitely many iid Gaussian random

variables?


a review of the discrete time markov chains on a …pejman/intro2prob/amazon/live1.pdf · pejman...

Documents