c 2017 agnish dey · 2018. 2. 27. · c c c c c c c a this matrix is called the transition...

COLLAPSING OF NON HOMOGENEOUS MARKOV CHAINS

By

AGNISH DEY

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2017

c⃝ 2017 Agnish Dey

To ma and baba

ACKNOWLEDGMENTS

I am grateful to Prof. Murali Rao for his encouragement and guidance throughout my

study in the graduate school. I am also indebted to Prof. L. Block, Prof. M. Jury, Prof. S.

McCullough and to Prof. A. Rosalsky for their interest in my dissertation.I would also like to

thank all the professors in my department for their hospitality.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

CHAPTER

1 INTRODUCTION AND OPENING REMARKS . . . . . . . . . . . . . . . . . . . . 7

2 MARKOV PROPERTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 Collapsibility of NHMCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Lumpability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 SUMMARY AND CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 51

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

COLLAPSING OF NON HOMOGENEOUS MARKOV CHAINS

By

Agnish Dey

August 2017

Chair: Murali RaoMajor: Mathematics

Let X (n) be a (homogeneous) Markov chain with a finite state space S = 1, 2, ...,m.

Let S be the union of disjoint sets S1, S2,..., Sk which form a partition of S . Define Y (n) = i

if and only if X (n) ∈ Si for i = 1, 2, ..., k . Is the collapsed chain Y (n) Markov? This problem

was considered by Burke and Rosenblatt in 1958 and in this dissertation this problem is studied

when the X (n) chain is non-homogeneous and Markov.

6

CHAPTER 1INTRODUCTION AND OPENING REMARKS

This thesis is essentially concerned with the markovian property of the collapsed non

homogeneous Markov chains. Let us assume that X (n) is a finite Markov chain with state

space 1, 2, ...,m. Suppose that ∪ri=1Si gives a partition of the state space of X (n) . Clearly

r ≤ m . We define the collapsed chain Y (n) as follows: Y (n) = i if and only if X (n) ∈ Si ,

where i = 1, 2, ..., r . The question that we try to explore in this thesis is whether the collapsed

chain Y (n) is Markov again. We will present an example in Chapter 3 that will show that it

is not the case in general . It is well known (see (3)) that significant work has been done in

this problem assuming the homogeneity property of the original Markov chain. In this thesis we

shall examine the conditions under which the collapsed chain will be markovian given that the

original chain is non-homogeneous markov.

The thesis is organized as follows: in Chapter 2 we will introduce markovian property and

discuss some very basic properties of it. In Chapter 3 we discuss the motivation and relevance

of the problem. Here we present an example that clearly shows that collapsed Markov chains

fail to be Markov in general, hence the problem (looking for conditions under which a collapsed

Markov chain will regain markovian property) makes sense. In Chapter 4 we present several

theorems in this context. The theorems are associated with interesting examples to show that

they are in the best possible form. Here we also talk about strong and weak lumpability of

a Markov chain. Finally we define reverse Markov chains and we conclude with a result for

reverse Markov chains in the context of weak lumpability.

7

CHAPTER 2MARKOV PROPERTY

Stochastic Process

A stochastic process is a family of random variables defined on some sample space Ω. If

there are countably many members of the family, then the process is denoted by X1,X2,X3, ...

. If there are uncountably many members of the family, then the process is denoted by

X (t) : t ≥ 0. In the first case the process is called a discrete-time process while in the

second case it is called a continuous-time process.

State Space

The set of distinct values assumed by a stochastic process is called the state space of

the process. If the state space of a stochastic process is countable or finite then the process is

called a chain.

Markov Property

Let us assume that we have a circular track with three landmarks given by v1, v2, v3 in the

anti-clockwise order.

At time 0 a man stands at v1. At time 1, he flips a fair coin and moves immediately to v2

or v3 according to whether the coin comes up with a head or a tail. At time 2 he flips the coin

again to decide which of the two adjacent landmarks to move to, with the decision rule that

if the coin comes up with a head , he moves one step clockwise and anti-clockwise in case the

coin comes up with a tail.

For each n, let X (n) denote the index of the track landmark at which the walker stands

at time n . Hence X0,X1, ... is a random process taking values in v1, v2, v3. Since the man

starts at time 0 in v1, we have P(X (0) = v1) = 1 . Next he will move to v2 or v3 with

probability1

2each , so that P(X (1) = v2) =

1

2and P(X (1) = v3) =

1

2. To compute the

distribution of X (n) for n ≥ 2 requires a little more thought . Suppose at time n, the man

8

stands at v2. Then we have ,

P(X (n + 1) = v1|X (n) = v2) =1

2,

P(X (n + 1) = v3|X (n) = v2) =1

2.

Therefore,

P(X (n + 1) = v1|X (n) = v2,X (n − 1) = in−1, ...,X (1) = i1,X (0) = i0) =1

2,

P(X (n + 1) = v3|X (n) = v2,X (n − 1) = in−1, ...,X (1) = i1,X (0) = i0) =1

2

for any choice of i0, i1, ..., in−1 for which the conditional probabilities are defined . This

property is known as the Markov property.

Let us now assume that at a particular instant the man moves along the clockwise

direction with probability p and along the anti-clockwise direction with probability (1− p) , i.e

we have

P(X (n + 1) = v2|X (n) = v1) = p,

P(X (n + 1) = v3|X (n) = v1) = 1− p.

In this Markov chain the probabilities at different time instants are computed as follows: We

have P(X (0) = v1) = 1. Then clearly, P(X (1) = v2) = p and P(X (1) = v3) = 1− p. Then

P(X (2) = v3|X (1) = v2) = p and P(X (2) = v1|X (1) = v2) = 1− p. Then,

P(X (2) = v3)

= P(X (2) = v3|X (1) = v2)× P(X (1) = v2)

= p2.

9

Again

P(X (2) = v1)

= P(X (2) = v1|X (1) = v2)× P(X (1) = v2)+

P(X (2) = v1|X (1) = 3)× P(X (1) = v3)

= (1− p)p + p(1− p)

= 2p(1− p).

And

P(X (2) = v2)

= P(X (2) = v2|X (1) = v3)× P(X (1) = v3)

= (1− p)2.

Similarly , P(X (3) = v1) = p3 + (1− p)3 , P(X (3) = v2) = 3p2(1− p) and P(X (3) = v3) =

3p(1− p)2 . In general one can find the distribution of X (n) for general n .

Formal Definition

A stochastic process X (k), k = 0, 1, 2, ... with state space S = 1, 2.3, ... is said to

satisfy the Markov property if for every n and all states i0, i1, i2, ..., in it is true that

P(X (n) = in|X (n − 1) = in−1, ...,X (1) = i1,X (0) = i0)

= P(X (n) = in|X (n − 1) = in−1).

In the above random walk example, the conditional distribution of X (n + 1) given that

X (n) = v2 (say) is the same for all n. This property is known as homogeneity.

10

Markov Chains (Homogeneous and Non Homogeneous)

A discrete time Markov chain is said to be stationary or homogeneous in time if the

probability of going from one state to another is independent of the time at which the step is

being made. That is , for all states i and j ,

P(X (n) = j |X (n − 1) = i) = P(X (n + k) = j |X (n + k − 1) = i)

for all integers k , n + k ≥ 1. The Markov chain is said to be non-homogeneous if the condition

for homogeneity fails.

Examples

(A) Assume a machine is producing items independently at the rate of one a minute. Let

X (n) denote the number of defectives produced by time n. If the probability of producing a

defective item remains constant throughout the life of the machine, then X (n) would be a

stationary Markov chain. However , if the probability of producing a defective item changes as

the machine grows older , then the Markov chain will be non-homogeneous .

(B) Consider a drunk man walking along a long straight road and let X (n) denote his

position at time n relative to some fixed position O on the road. Let

P(X (n + 1) = i + 1|X (n) = i) = an > 0,

P(X (n + 1) = i − 1|X (n) = i) = 1− an > 0.

Here we observe that the probability at a particular time instant n depends upon n . Hence

this X (n) is a non-homogeneous Markov chain.

Transition Probability Matrix

Let X (k) denote a discrete time homogeneous markov chain with a finite state space

S = 1, 2, ..., n. For this chain there are n2 transition probabilities , pij i = 1, 2, ..., n ;

j = 1, 2, ..., n . The most convenient way of recording these values is in the form of a matrix

P. Associate the i th row and j th column element of P with the transition probability

pij = P(X (n + 1) = j |X (n) = i) . Then we have,

11

P =

p11 p12 ... p1n

p21 p22 ... p2n

: : : :

pn1 pn2 ... pnn

This matrix is called the transition probability matrix corresponding to the Markov chain

X (k) . Every transition matrix has the following properties:

(1) All the entries are non negative.

(2) The sum of the entries in each row is one.

Both the properties follow easily. Any square matrix with these two properties is called a

stochastic matrix.

Thus the transition probability matrix of a homogeneous Markov chain is always a

stochastic matrix . Note that product of stochastic matrices is again a stochastic matrix .

Initial Distribution

Initial distribution tells us about the commencement of the Markov chain . The initial

distribution is represented as a row vector w (0) given by:

w (0) = (w(0)1 ,w

(0)2 , ...,w

(0)m )

= (P(X (0) = 1),P(X (0) = 2), ...,P(X (0) = m)

where 1, 2, ...,m is the state space of the Markov chain. Since w (0) represents a probability

distribution , we havem∑i=1

w(0)i = 1 and wi ≥ 0 for all i = 1, 2, ...,m.

Chapman Kolmogorov Equation

P(n)ij =

m∑k=0

P(r)ik P

(n−r)kj for all 0 < r < n where m is the total number of states.

This is a fundamental result in the study of Markov chains.

12

Consequences of Markov Property

(1) Given the past X (0) = i0, ...,X (n) = in the Markov property suggests that the

current state X (n) = in is enough to determine all distributions of the future. To see this , the

chain rule of the conditional probability yields

P[X (n +m) = in+m, ...,X (n + 1) = in+1|X (n) = in, ...,X (0) = i0]

= P(X (n +m) = in+m|X (n +m − 1) = in+m−1, ...,X (0) = i0) ×

P[X (n +m − 1) = in+m−1|X (n +m − 2) = in+m−2, ...,X (0) = i0] ×

×P(X (n + 1) = in+1|X (n) = in, ...,X (0) = i0)

...

for all m = 1, 2, ... . But from the Markov property the right hand side of the above equation

becomes

P[X (n +m) = in+m|X (n +m − 1) = in+m−1]×

P[X (n +m − 1) = in+m−1|X (n +m − 2) = in+m−2]× ...×

P[X (n + 1) = in+1|X (n) = in].

Therefore we have

P[X (n +m) = in+m, ...,X (n + 1) = in+1,X (n) = in, ...,X (0) = i0]

= P[X (n +m) = in+m, ...,X (n + 1) = in+1|X (n) = in].

From here it follows that

P(X (n +m) = in+m|X (n) = in, ...,X (0) = i0)

= P(X (n +m) = in+m|X (n) = in)

for all m ≥ 1.

Thus , for a Markov chain once the present state is known , prediction of future

distributions cannot be improved by adding any knowledge of the past.

13

(2) The process is completely determined once the initial probability distribution and the

transition probabilities are known. We shall now prove this fact.

P(X (n) = in,X (n − 1) = in−1, ...,X (1) = i1,X (0) = i0)

= P(X (n) = in|X (n − 1) = in−1, ...,X (0) = i0)× P(X (n − 1) = in−1, ...,X (0) = i0)

= Pin−1,in × P(X (n − 1) = in−1, ...,X (0) = i0),

the last equality holds because of Markov property.

Thus,

P(X (n) = in,X (n − 1) = in−1, ...,X (0) = i0)

= P(X (0) = i0)Pi0,i1Pi1,i2...× Pin−1,in .

Notice that for a non homogeneous Markov chain X (n) , n = 0, 1, 2, ... , if we write

P(X (n + 1) = j |X (n) = i) = Pn+1(i , j),

where Pn+1 is a m ×m stochastic matrix , and also , is the transition probability matrix of the

chain X (n) at time n , then

P(X (n) = in,X (n − 1) = in−1, ...,X (0) = i0)

= P(X (0) = i0)P1(i0, i1)P2(i1, i2)...× Pn(in−1, in).

14

CHAPTER 3MOTIVATION

In this thesis we study non-homogeneous Markov chains (NHMCs) X (n), n ≥ 0,

with finite state space S = 1, 2, ...,m, an initial distribution p = (p1, p2, ..., pm),

P(X (0) = i) = pi , and the transition probability matrices Pn, n ≥ 1, given by

(Pn)ij = P(X (n) = j |X (n − 1) = i),

in the context of collapsibility.

The problem can be stated as follows. Let S1,S2,...,Sr be r , 1 ≤ r ≤ m, pairwise disjoint

subsets of S each containing more than one state so that S = S1 ∪ S2 ∪ ... ∪ Sr ∪ A, where

A = S − ∪ri=1Si . Then the partition of S , given by S1,S2,...,Sr , and the singletons in A defines

a collapsed chain Y (n) given by

Y (n) = i if and only if X (n) ∈ Si ,

and Y (n) = u if and only if X (n) = u,

where n ≥ 0, 1 ≤ i ≤ r , and u ∈ A. The problem we study here is when the collapsed chain

Y (n) is Markov. Here we will present an example that shows collapsed chain is not Markov in

general :

Let X (n) be a NHMC with state space S = 1, 2, 3 and uniform initial vector p =

(13, 13, 13). Let S1,S2 be a partition of S such that S1 = 1, 2 and S2 = 3. We define the

collapsed chain Y (n) such that Y (n) = i if and only if X (n) ∈ Si , i = 1, 2.

We define the transition matrices of X (n) in the following manner:

15

P1 =

18

14

58

14

12

14

58

14

18

and P2 =

12

13

16

16

12

13

13

16

12

and Pn = P2 for all n ≥ 2. Now,

P(Y (3) = 2|Y (2) = 1)

=P(Y (3) = 2,Y (2) = 1)

P(Y (2) = 1)=P(X (3) ∈ S2,X (2) ∈ S1)

P(X (2) ∈ S1)

=P(X (3) = 3|X (2) = 1)P(X (2) = 1) + P(X (3) = 3|X (2) = 2)P(X (2) = 2)

P(X (2) = 1) + P(X (2) = 2)

=P3(1, 3)

13+ P3(2, 3)

13

23

=1

4.

Also,

P(Y (3) = 2|Y (2) = 1,Y (1) = 2)

=P(Y (3) = 2,Y (2) = 1,Y (1) = 2)

P(Y (2) = 1,Y (1) = 2)=P(X (3) ∈ S2,X (2) ∈ S1,X (1) ∈ S2)

P(X (2) ∈ S1,X (1) ∈ S2).

Then,

P(X (3) = 3,X (2) = 1,X (1) = 3) + (P(X (3) = 3,X (2) = 2,X (1) = 3)

= P(X (3) = 3|X (2) = 1,X (1) = 3)P(X (2) = 1,X (1) = 3)

+ P(X (3) = 3|X (2) = 2,X (1) = 3)P(X (2) = 2,X (1) = 3)

= P3(1, 3)P2(3, 1)1

3+ P3(2, 3)P2(3, 2)

1

3=1

27.

And,

P(X (2) ∈ 1, 2,X (1) = 3)

= P(X (2) ∈ 1, 2|X (1) = 3)P(X (1) = 3) = 16.

Thus P(Y (3) = 2|Y (2) = 1,Y (1) = 2) = 29= P(Y (3) = 2|Y (2) = 1). Hence the collapsed

chain is not Markov. Thus looking for conditions under which a function of Markov chain will

regain its markovian property makes sense.

16

Here we present some motivating examples:

(A) An experimenter leads a guinea pig into a maze consisting of a straight line path OA,

forking at A into three different paths AB, AC, and AD; each path takes the guinea pig back

to OA. A positive stimulus ( an appealing food) is left on AB, while a negative stimulus ( a

mild electric shock) is left on AC, and another negative stimulus ( a slightly less mild shock)

on AD. After observing the guinea pig’s journey along this maze a large number of times, the

experimenter decides on a 3-state Markov chain model to describe the ”learning” behavior

of the guinea pig. To simplify the model even further, he now has the problem of deciding if

collapsing the paths with negative stimuli can give him a 2-state Markov chain.

(B) The Google search engine is modeled using Markov chains (PageRank). As the

number of web-pages in the web is already huge, it is understandable that manipulating

such a model becomes a daunting task. When dimension of a Markov chain extrapolates

the thresholds of the available computational resources, lumping together states of the chain

sounds a reasonable idea. We can use stochastic factorization to analyze lumping of states.

When the transition matrix of a finite Markov chain is decomposed into the product of two

stochastic matrices, the factors of multiplication can be swapped to obtain another model -

which is much smaller than the original one. The important thing is that the reduced model

retains many characteristics of the original chain, such as reducibility, number of closed sets.

Most importantly, the stationary distributions of both Markov chains are related through a

simple linear transformation.

In the case of Google’s Markov chain (PageRank), the lumping of states is very effective

because the transition matrix used to describe the dynamics of the web has many identical

rows. Most of these correspond to web-pages with no links to other pages, called ”dangling

nodes”. Since the dangling nodes represent approximately half of the pages in the web,

lumping them into a single state results in substantial reduction of computation.

This idea of lumping states of Markov chains has found applications in many other areas:

genetics, statistical mechanics, networking to name a few.

17

Many papers on functions of Markov chains are available in printed literature. Some

of them are included in the references here. However, all such articles are on homogeneous

Markov chains. Similar interesting and non-trivial questions can be asked for NHMCs. In this

thesis, we address some of them.

18

CHAPTER 4RESULTS

This chapter is divided into three sections. In the first section, we present four theorems

and a number of examples: let X (n), n ≥ 0, be a NHMC with state space S = 1, 2, ...,m.

Theorem 1 provides a sufficient condition for the collapsed chain Y (n), n ≥ 0, to be

Markov for all possible initial distributions p of X (n). Example 1 shows that this condition

is not necessary. Theorem 2 explores additional conditions that are required for the sufficient

condition of Theorem 1 to become necessary as well. Theorem 3 is our main result. It

introduces various other natural conditions for the Y (n) chain to be Markov. Theorem 4

characterizes the transition matrices Pn of X (n) in the case when Y (n) ≡ f (X (n)), where f

is any given function from S into S , is Markov. Our last example shows that the reversibility

condition cannot be removed in Theorem 4.

In the following section, we discuss strong and weak lumpability and present three

theorems in this context. And finally we discuss an application of this idea of collapsing states

of a Markov chain.

4.1 Collapsibility of NHMCs

Theorem 1

Let X (n), n ≥ 0, be a NHMC with any given initial distribution vector p and the state

space S = 1, 2, ...,m. Let 1 ≤ r ≤ m, and S1,S2,...,Sr be r pairwise disjoint subsets of S

such that S = ∪ri=1Si . Let Y (n), n ≥ 0, be the collapsed chain defined by Y (n) = i if and

only if X (n) ∈ Si . Then a sufficient condition for Y (n) to be Markov is that

Pn(k ,Sj) ≡ P(X (n) ∈ Sj |X (n − 1) = k)

is independent of k in Si for 1 ≤ i ≤ r , 1 ≤ j ≤ r , i = j , n ≥ 2.

19

Proof. Let i0,i1,...,in be any (n + 1) elements chosen from 1, 2, ..., r.Then we must show

that

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0) = P(Y (n) = in|Y (n − 1) = in−1), for n ≥ 2.

To show this, let us write :

P(Y (n) = in,Y (n − 1) = in−1, ...,Y (0) = i0)

= P(X (n) ∈ Sin ,X (n − 1) ∈ Sin−1, ...,X (0) ∈ Si0)

=∑

xn−1∈Sin−1

P(X (n) ∈ Sin |X (n − 1) = xn−1)

× P(X (n − 1) = xn−1,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

=∑

xn−1∈Sin−1

Pn(xn−1,Sin)× P(X (n − 1) = xn−1, ...,X (0) ∈ Si0).

When the sufficient condition holds, we also have: Pn(xn−1,Sin) is independent of xn−1 in

Sin−1 , even when in = in−1, because then Pn(xn−1,Sin) = 1 −∑Sj⊂S−Sin

P(xn−1,Sj). Thus we

have:

P(Y (n) = in,Y (n − 1) = in−1, ...,Y (0) = i0)

= Pn(xn−1,Sin)× P(X (n − 1) ∈ Sin−1, ...,X (0) ∈ Si0)

= Pn(xn−1,Sin)× P(Y (n − 1) = in−1, ...,Y (0) = i0),

20

and this means that

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0)

= P(Y (n) = in|Y (n − 1) = in−1),

which is defined by the quantity Pn(xn−1,Sin) for any xn−1 in Sin−1 .

The following example shows that in Theorem 1, the sufficient condition may not be

necessary even when Y (n) is Markov for all possible initial distributions p for a given NHMC

X (n).

Example 1. Let X (n), n ≥ 0, be a NHMC with state space S = 1, 2, 3, with any given

initial distribution p, such that for n ≥ 1, the second column of each Pn is a positive multiple

of its third column, and furthermore, (Pn)21 = (Pn)31.

Let S1 = 1, S2 = 2, 3, and let Y (n) be the chain with states 1 and 2 such that for

n ≥ 0, Y (n) = 1 if and only if X (n) = 1, and Y (n) = 2 if and only if X (n) ∈ 2, 3. Then

Y (n) is Markov.

Proof. To show this, let i0, i1, ..., in be any (n + 1) states from 1, 2. We consider

P(Y (n) = in,Y (n−1) = in−1, ...,Y (0) = i0) = P(X (n) ∈ Sin ,X (n−1) ∈ Sin−1, ...,X (0) ∈ Si0).

(4–1)

We need to show that the right hand side of Equation 4− 1 is equal to

P(X (n) ∈ Sin |X (n − 1) ∈ Sin−1)× P(X (n − 1) ∈ Sin−1, ...,X (0) ∈ Si0).

21

If in−1 = 1 (that is, Sin−1 = 1), this is immediate. Thus we consider the case when in−2 = 2

(that is, Sin−1 = 2, 3). In this case,

P(X (n) ∈ Sin ,X (n − 1) ∈ 2, 3, ...,X (0) ∈ Si0)

= P(X (n) ∈ Sin |X (n − 1) = 2)× P(X (n − 1) = 2,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

+ P(X (n) ∈ Sin |X (n − 1) = 3)× P(X (n − 1) = 3,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

= (Pn)21An + (Pn)31Bn, when in = 1 or Sin = 1; and

= [1− (Pn)21]An + [1− (Pn)31]Bn when in = 2 or Sin = 2, 3,

where An = P(X (n − 1) = 2,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0) and Bn = P(X (n − 1) =

3,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0). For Y (n) to be Markov, we must have:

(Pn)21An + (Pn)31Bn (when in = 1)

= P(X (n) ∈ Sin |X (n − 1) ∈ 2, 3)× P(X (n − 1) ∈ 2, 3,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

=(Pn)21A+ (Pn)31B

A+ B× [An + Bn],

where we have taken A = P(X (n − 1) = 2), B = P(X (n − 1) = 3), suppressing n on the left

sides for the moment. Simplifying the above equation (whether in = 1 or 2), we see that Y (n)

will be Markov if and only if for n ≥ 2, we have:

((Pn)21 − (Pn)31)[AnBn

− P(X (n − 1) = 2)P(X (n − 1) = 3)

] = 0. (4–2)

Notice that

AnBn=P(X (n − 1) = 2,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)P(X (n − 1) = 3,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

.

22

When Sin−2 = 1,

AnBn=(Pn−1)12P(X (n − 2) = 1, ...,X (0) ∈ Si0)(Pn−1)13P(X (n − 2) = 1, ...,X (0) ∈ Si0)

=(Pn−1)12(Pn−1)13

≡ tn, say ,

and when Sin−2 = 2, 3,

AnBn=[(Pn−1)22P(X (n − 2) = 2, ...,X (0) ∈ Si0) + (Pn−1)32P(X (n − 2) = 3, ...,X (0) ∈ Si0)][(Pn−1)23P(X (n − 2) = 2, ...,X (0) ∈ Si0) + (Pn−1)33P(X (n − 2) = 3, ...,X (0) ∈ Si0)]

= tn (see above),

since the second column of Pn−1 is tn times its third column. Notice that this also means that

P(X (n − 1) = 2)

= P(X (n − 2) = 1)(Pn−1)12 + P(X (n − 2) = 2)(Pn−1)22 + P(X (n − 2) = 3)(Pn−1)32

= tn[P(X (n − 2) = 1)(Pn−1)13 + P(X (n − 2) = 2)(Pn−1)23 + P(X (n − 2) = 3)(Pn−1)33]

= tnP(X (n − 1) = 3),

so that tn =P(X (n−1)=2)P(X (n−1)=3) (=

AB); for all possible distributions of X (n− 2). Thus, Equation 4− 2

holds even when (Pn)21 = (Pn)31, and thus, Y (n) is Markov. Here the sufficiency condition of

Theorem 1 is not satisfied because (Pn)21 = (Pn)31.

Before we present our second theorem, we would like to introduce a few things here:

Definition 1. The initial distribution vector p = (p1, p2, ..., pm),∑mi=1 pi = 1, 0 ≤ pi ≤ 1

for each i , P(X (0) = i) = pi , where X (n) is a NHMC, is called left invariant if for each

n ≥ 1, pPn = p.

23

Note that for 1 ≤ i ≤ m, 1 ≤ j ≤ m,

(Pn)ij = P(X (n) = j |X (n − 1) = i), n ≥ 1.

Thus if p is left invariant, then for n ≥ 1, p(n) = p, where p(n)(i) = P(X (n) = i). Notice

that the uniform distribution vector p=( 1m, 1m,..., 1

m) is left invariant if and only if each Pn is an

m × m bi-stochastic matrix (that is, a matrix for which each row sum is 1 and each column

sum is 1).

Consider the m ×m diagonal matrix D such that Dii = pi > 0, 1 ≤ i ≤ m.

Definition 2. The NHMC X (n) is called reversible if and only if DPn = PTn D for each

n ≥ 1.

If a NHMC X (n) n ≥ 0, is reversible, then its initial distribution p must be left invariant.

The reason is the following: Reversibility implies that for all i , j , 1 ≤ i ≤ m, 1 ≤ j ≤ m, and

n ≥ 1,

(DPn)ij = (PTn D)ij ,

or pi(Pn)ij = pj(Pn)ji .

Thus,m∑j=1

pi(Pn)ij =

m∑j=1

pj(Pn)ji ,

or pi = (pPn)i , 1 ≤ i ≤ m,

implying p = pPn for n ≥ 1.

Thus, left invariance is relevant when we deal with reversibility, and recall that p(n) = p

for n ≥ 1, if p is left invariant.

We should also mention that it follows immediately that for a reversible NHMC X (n),

P(X (n) = i ,X (n − 1) = j) = P(X (n) = j ,X (n − 1) = i)

for n ≥ 1, 1 ≤ i ≤ m, 1 ≤ j ≤ m. The converse also holds.

24

Let us also mention that there may not exist any left invariant distribution vector for a

NHMC X (n). A very simple example is the following: Suppose that for a particular 2-state

NHMC,

Pn =

13

23

13

23

for n odd ;Pn =

13

23

23

13

for n even.Let p = (p1, p2), 0 ≤ p1 ≤ 1, 0 ≤ p2 ≤ 1, p1 + p2 = 1, be such that pPn = p for all n ≥ 1.

Then since limk→∞ Pkn = Pn for n odd and limk→∞ P

kn=

12

12

12

12

for n even, it follows that

for n odd, p = limk→∞ pPkn = (

13, 23), and for n even, similarly, p = (1

2, 12). Thus p cannot

exist.

Now we present our second theorem:

Theorem 2

Let X (n), n ≥ 0, be a reversible NHMC with state space S = 1, 2, ...,m having

Pn = Pn+1 for each odd n, Si , 1 ≤ i ≤ r , be r pairwise disjoint subsets of S forming a

partition of S so that S = ∪ri=1Si . We form the collapsed chain Y (n), n ≥ 0, such that

Y (n) = i if and only if X (n) ∈ Si , 1 ≤ i ≤ r . We assume that Y (n) is Markov and each

pi > 0, 1 ≤ i ≤ r . Then the following is true: For any j , k , 1 ≤ j ≤ r , 1 ≤ k ≤ r and any

n ≥ 1, Pn(i ,Sj) is independent of i in Sk .

Proof. Suppose that Y (n), n ≥ 0, is Markov. Let

(Qn)ij = P(Y (n) = j |Y (n − 1) = i)

= P(X (n) ∈ Sj |X (n − 1) ∈ Si)

=∑k∈Si

pkp(Si)

Pn(k ,Sj),

25

where p(Si) = P(X (0) ∈ Si) and Pn(k ,Sj) = P(X (n) ∈ Sj |X (n − 1) = k). Notice that if we

define the m × r matrix B by Bkj = 1 if k ∈ Sj , = 0 otherwise, then we have

(PnB)kj = Pn(k ,Sj) (4–3)

If we define the r ×m matrix A by

Aik =pkp(Si)

if k ∈ Si ; = 0 otherwise, (4–4)

then we have : (APnB)ij =∑k∈Si Aik(PnB)kj = (Qn)ij . Thus, we have:

Qn = APnB, n ≥ 1, (4–5)

and similarly, using the Markov property of Y (n),

QnQn+1 = APnPn+1B, n ≥ 1. (4–6)

From Equation 4− 5 and Equation 4− 6, we have:

Vn = APn(I − BA)Pn+1B = 0, n ≥ 1, (4–7)

where I is the m ×m identity matrix and 0 on the right is the r × r zero matrix. Let C be the

r × r diagonal matrix with Cii = p(Si) = P(X (0) ∈ Si), 1 ≤ i ≤ r . Then it follows from

Equation 4− 4 that (CA)ik = pk if k ∈ Si ; = 0 otherwise, so that CA = BTD. It follows from

Equation 4− 7, after using the reversibility condition DPn = PTn D, that

BTPTn [D(I − BA)]Pn+1B = 0. (4–8)

Also, the m × m matrix D(I − BA) is positive semi-definite since it is clearly verified that for

any non-zero m ×m vector x ,

x [D(I − BA)]xT =r∑k=1

1

p(Sk)

∑i<j ,i∈Sk ,j∈Sk

pipj(xi − xj)2 ≥ 0.

26

Thus using N. Higham (1990), we can write D(I − BA) = RTR for some m × m matrix R,

and then it follows from Equation 4− 8 that for each n odd,

(RPnB)T (RPnB) = 0, (4–9)

since by our assumption in the Theorem, Pn = Pn+1. This means that RPnB=0, and

therefore,

RTRPnB = 0 or (I − BA)PnB = 0,

which implies that

PnB = BQn. (4–10)

It follows immediately from Equation 4 − 3 that for any k ∈ Si , and 1 ≤ i ≤ r , 1 ≤ j ≤ r ,

Pn(k ,Sj) = (Qn)ij for all n ≥ 1.

Here we present an example of a NHMC X (n) and a corresponding collapsed chain Y (n)

such that the sufficient condition in Theorem 1 for Y (n) to be Markov is shown to be not

necessary even when the initial distribution p is uniform and left invariant. This example shows

that reversibility condition cannot be removed from Theorem 2.

Example 2. Let X (n), n ≥ 0, be a NHMC with 3 states such that S = 1, 2, 3, p, the

distribution of X (0), is equal to (13, 13, 13), and each Pn, n ≥ 1, is bi-stochastic and has the

property

(Pn)21 = (Pn)31, (Pn)12 = (Pn)13. (4–11)

Notice that in this case, the initial distribution is left invariant (since pPn = p for n ≥ 1).

Also by Equation 4 − 11, Pn is not symmetric for n ≥ 1, and that implies that in this

case, X (n) is not reversible. Let S1 = 1, and S2 = 2, 3. We define: for n ≥ 1,

Y (n) = 1 if and only if X (n) = 1, and Y (n) = 2 if and only if X (n) ∈ 2, 3. Clearly

Pn(i ,S1) = P(X (n) = 1|X (n − 1) = i) = (Pn)i1 is not independent of i because of Equation

4− 11. Thus the sufficient condition of Theorem 1 does not hold here. Let us show that Y (n)

is Markov.

27

Proof. To this end, let i0, i1, ..., in be any (n + 1) states of Y (n). Then we must show that

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0) = P(Y (n) = in|Y (n − 1) = in−1). (4–12)

Case 1: Suppose in−1 = 1. In this case,

P(Y (n) = in,Y (n − 1) = 1,Y (n − 2) = in−2, ...,Y (0) = i0)

= P(X (n) ∈ Sin ,X (n − 1) = 1,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

= P(X (n) ∈ Sin |X (n − 1) = 1)P(X (n − 1) = 1,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

= P(Y (n) = in|Y (n − 1) = 1)P(Y (n − 1) = 1,Y (n − 2) = in−2, ...,Y (0) = i0),

which is possible in this case because X (n) is Markov and Sin−1 is a singleton.

Case 2: Suppose in−1 = 2, see that Sin−1 = 2, 3. In this case, we write:

P(Y (n) = in,Y (n − 1) = in−1, ...,Y (0) = i0)

= P(X (n) ∈ Sin |X (n − 1) = 2)P(X (n − 1) = 2,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

+ P(X (n) ∈ Sin |X (n − 1) = 3)P(X (n − 1) = 3,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0).

Let us write: An−1 = P(X (n − 1) = 2,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0), and Bn−1 =

P(X (n − 1) = 3,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0). Thus if in = 1, then,

P(Y (n) = 1,Y (n − 1) = 2,Y (n − 2) = in−2, ...,Y (0) = i0) = (Pn)21An−1 + (Pn)31Bn−1.

Also,

P(Y (n) = 1|Y (n − 1) = 2) = P(X (n) = 1|X (n − 1) ∈ 2, 3)

=[(Pn)21 + (Pn)31]

13

13+ 13

=1

2[(Pn)21 + (Pn)31],

since P(X (n − 1) = 2) = P(X (n − 1) = 3) = 13, as the initial distribution is left invariant.

Thus (12) holds when in = 1 and in−1 = 2, if and only if

1

2[(Pn)21 + (Pn)31] =

(Pn)21An−1 + (Pn)31Bn−1An−1 + Bn−1

28

or ((Pn)21 − (Pn)31)(An−1 − Bn−1) = 0. (4–13)

Note that Equation 4 − 13 was obtained above by assuming in = 1 and in−1 = 2. It is also

clear that we would get the same equation even if we assumed that in = 2 and in−1 = 2. Thus,

in order to establish Equation 4 − 12 in this case when in−1 = 2, we must prove that for all

n ≥ 2, An−1 = Bn−1 (since (Pn)21 = (Pn)31 for all n ≥ 1). In other words we must prove that

for n ≥ 2,P(X (n − 1) = 2,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

= P(X (n − 1) = 3,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0).(4–14)

When n = 2, Equation 4 − 14 reduces to P(X (1) = 2,X (0) ∈ Si0) = P(X (1) = 3,X (0) ∈

Si0). When i0 = 2, this means that (P1)22 + (P1)32 = (P1)23 + (P1)33. Since these last two

equations hold as (Pn)12 = (Pn)13, and Pn is bi-stochastic for n ≥ 1, it is clear that Equation

4 − 14 holds when n = 2. The proof will be complete by induction argument once we show

that Equation 4 − 14 will follow for any n > 2, that is if we can prove that Equation 4 − 14

follows if we assume that

P(X (n − 2) = 2,X (n − 3) ∈ Sin−3, ...,X (0) ∈ Si0)

= P(X (n − 2) = 3,X (n − 3) ∈ Sin−3, ...,X (0) ∈ Si0).(4–15)

Thus, we assume that Equation 4 − 15 holds and n > 2. We are done once we show that

Equation 4 − 15 implies Equation 4 − 14. To this end, first suppose that in−2 = 1. In this

case, Equation 4 − 14 reduces to (Pn−1)12P(X (n − 2) = 1,X (n − 3) ∈ Sin−3, ...,X (0) ∈

Si0) = (Pn−1)13P(X (n − 2) = 1,X (n − 3) ∈ Sin−3, ...,X (0) ∈ Si0), which holds trivially since

(Pn−1)12 = (Pn−1)13 for n > 1. In the case in−2 = 2, Equation 4− 14 reduces to

P(X (n − 1) = 2,X (n − 2) = 2, ...,X (0) ∈ Si0)

+ P(X (n − 1) = 2,X (n − 2) = 3, ...,X (0) ∈ Si0)

= P(X (n − 1) = 3,X (n − 2) = 2, ...,X (0) ∈ Si0)

+ P(X (n − 1) = 3,X (n − 2) = 3, ...,X (0) ∈ Si0).

29

This last equation is equivalent to

(Pn−1)22P(X (n − 2) = 2, ...,X (0) ∈ Si0) + (Pn−1)23P(X (n − 2) = 3, ...,X (0) ∈ Si0)

= (Pn−1)32P(X (n − 2) = 2, ...,X (0) ∈ Si0) + (Pn−1)33P(X (n − 2) = 3, ...,X (0) ∈ Si0).

This last equation, of course, holds by Equation 4 − 15, and the fact that (Pn−1)22 +

(Pn−1)32 = (Pn−1)23 + (Pn−1)33. Thus we have established Equation 4− 12, and consequently

Y (n) is Markov, despite (Pn)21 = (Pn)31.

Finally we remark that in this example, if the initial distribution is uniform and left

invariant, and (Pn)21 = (Pn)31, then the same Y (n) chain is Markov if and only if (Pn)12 =

(Pn)13 for all n ≥ 1.

The following theorem is our main result. It is a general theorem presenting conditions for

the collapsed chain Y (n) to be Markov.

Theorem 3

Let X (n), n ≥ 0, be a NHMC with finite state space S . Let S1,S2,...,Sr be r , 1 ≤ r < m,

|S | = m, pairwise disjoint subsets of S such that |Si | > 1 for each i , and S1,S2,...,Sr , and

the singletons in A form a partition of S , where A = S − ∪ri=1Si . The collapsed chain Y (n)

is defined as follows: Y (n) = j if and only if X (n) ∈ Sj , 1 ≤ j ≤ r ; Y (n) = u if and only if

X (n) = u, for all n ≥ 0. Then the following results hold:

(i) The condition

Pn(k ,Si) ≡ P(X (n) ∈ Si |X (n − 1) = k) = 0, k /∈ Si , 1 ≤ i ≤ r , (4–16)

is a sufficient condition for Y (n) to be Markov. This is however, not necessary for Y (n) to be

Markov for any initial distribution p of X (0).

30

(ii) Let v1 and v2 be any two elements from the state space of Y (n). If Y (n) is Markov,

then we have for n ≥ 1 and given j , 1 ≤ j ≤ r ,∑l∈Sj

P(X (n) = l |X (n − 1) ∈ Sv1)P(X (n + 1) ∈ Sv2|X (n) = l)

= P(X (n) ∈ Sj |X (n − 1) ∈ Sv1)P(X (n + 1) ∈ Sv2 |X (n) ∈ Sj).

(4–17)

(iii) Let k be any state in S and v2 be as in (ii) above. Then the condition∑l∈Sj

P(X (n) = l |X (n − 1) = k)P(X (n + 1) ∈ Sv2|X (n) = l)

= P(X (n) ∈ Sj |X (n − 1) = k)P(X (n + 1) ∈ Sv2 |X (n) ∈ Sj),

(4–18)

whenever 1 ≤ j ≤ r and n ≥ 1, is a sufficient condition for Y (n) to be Markov for all possible

initial distributions p of X (0).

In general, Equation 4 − 18 is not necessary for Y (n) to be Markov for a given initial

distribution p. However, it is necessary for Y (n) to be Markov for a given initial distribution

p (necessarily left invariant) with respect to which the X (n) chain is reversible, assuming of

course, for each n odd, Pn = Pn+1.

Proof. (i) Assume that Equation 4− 16 holds. Let i0,i1,...,in be any (n+1) states for Y (n). If

in−1 is a singleton in A, it is easy to verify that

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0)

= P(X (n) ∈ Sin |X (n − 1) = in−1,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

= P(X (n) ∈ Sin |X (n − 1) = in−1) = P(Y (n) = in|Y (n − 1) = in−1).

If in−1 is not a singleton in A, in−1 = j , 1 ≤ j ≤ r . In this case, Equation 4 − 16 implies that

for

P(Y (n − 1) = in−1,Y (n − 2) = in−2, ...,Y (0) = i0)

= P(X (n − 1) ∈ Sj ,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

31

to be positive, it is necessary that i0 = i1 = ... = in−1 = j . This means that

P(Y (n − 1) = in−1, ...,Y (0) = i0)

= P(X (n − 1) ∈ Sj , ...,X (1) ∈ Sj ,X (0) ∈ Sj)

= P(X (n − 1) ∈ Sj , ...,X (1) ∈ Sj),

since P(X (1) ∈ Sj |X (0) /∈ Sj) = 0 by Equation 4 − 16, when P(X (0) /∈ Sj) > 0. Repeating

this argument we have : P(X (n− 1) ∈ Sj ,X (n− 2) ∈ Sj , ...,X (0) ∈ Sj) = P(X (n− 1) ∈ Sj).

Thus, it is clear that Y (n) is Markov in this case, because both sides of Equation 4 − 12

become P(Y (n)=j)P(Y (n−1)=j) .

Example 1 shows that Y (n) in this example is Markov for all initial distributions when

S = 1, 2, 3, S1 = 1 and S2 = 2, 3, and yet, for any n ≥ 1, Pn(1, 2, 3) need not be

zero. Thus Equation 4− 16 may not be necessary for Y (n) to be Markov.

(ii) Suppose that Y (n) is Markov. Here each of Sv1 and Sv2 is either some Sk , 1 ≤ k ≤ r ,

or is one of the singletons in A. Then for a given j , 1 ≤ j ≤ r , we have for n ≥ 1:

P(X (n) ∈ Sj |X (n − 1) ∈ Sv1)P(X (n + 1) ∈ Sv2 |X (n) ∈ Sj)

= P(Y (n) = j |Y (n − 1) = v1)P(Y (n + 1) = v2|Y (n) = j)

= P(Y (n) = j |Y (n − 1) = v1)P(Y (n + 1) = v2|Y (n) = j ,Y (n − 1) = v1)

= P(Y (n + 1) = v2,Y (n) = j |Y (n − 1) = v1)

=∑l∈Sj

P(X (n + 1) ∈ Sv2,X (n) = l |X (n − 1) ∈ Sv1)

=∑l∈Sj

P(X (n + 1) ∈ Sv2|X (n) = l)P(X (n) = l |X (n − 1) ∈ Sv1).

(iii) We assume Equation 4 − 18 holds. We show that Y (n) must be Markov for all possible

initial distributions. Let i0,i1,...,in be any (n + 1) states for the Y (n) chain. Again, as before,

32

when in−1 is a singleton in A, it is easy to check that

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0)

= P(X (n) ∈ Sin |X (n − 1) = in−1,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

= P(X (n) ∈ Sin |X (n − 1) = in−1) = P(Y (n) = in|Y (n − 1) = in−1).

When in−1 /∈ A , then in−1 = j for some j , 1 ≤ j ≤ r . In this case, we have :

P(Y (n) = in,Y (n − 1) = j ,Y (n − 2) = in−2, ...,Y (0) = i0)

= P(X (n) ∈ Sin ,X (n − 1) ∈ Sj ,X (n − 2) ∈ Sin−2, ...,X (0) ∈ Si0)

=∑k∈Sin−2

∑l∈Sj

P(X (n) ∈ Sin |X (n − 1) = l)P(X (n − 1) = l |X (n − 2) = k)×

P(X (n − 2) = k ,X (n − 3) ∈ Sin−3, ...,X (0) ∈ Si0)

=∑k∈Sin−2

P(X (n) ∈ Sin |X (n − 1) ∈ Sj)P(X (n − 1) ∈ Sj |X (n − 2) = k)×

P(X (n − 2) = k ,X (n − 3) ∈ Sin−3, ...,X (0) ∈ Si0),

where we have used Equation 4 − 18 and the Markov property of X (n). Notice that the last

expression above is equal to P(X (n) ∈ Sin |X (n − 1) ∈ Sj)P(X (n − 1) ∈ Sj ,X (n − 2) ∈

Sin−2, ...,X (0) ∈ Si0). It follows that P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0) =

P(Y (n) = in|Y (n − 1) = in−1). Thus, the Y (n) chain is Markov.

Let us now justify the final assertions in (iii). First, note that when X (n) is a reversible

NHMC with respect to a given (left invariant) initial distribution, by Theorem 2, it follows that

when Y (n) is Markov, the sufficient condition of Theorem 1 must then hold, and the sufficient

condition immediately implies Equation 4 − 18. Finally we remark that for the Markov chain

X (n) considered in our Example 2, the Y (n) chain corresponding to the partition 1, 2, 3

is Markov. However, it can be easily verified that the Equation 4 − 18 does not hold in this

case.

33

Now we would like to present an alternative (inductive) proof of the main result (above)

and in the process we will address the case when there are only two sets that collapse more

than one state of the original Markov chain ( this is interesting enough in its own right ). For

convenience, we would like to restate the main result here in a slightly different form:

Let X (n) be a non homogeneous Markov chain (NHMC) with a finite state space S .

Suppose there are r pairwise disjoint subsets S1, S2,...,Sr of S such that they are the only

subsets of S collapsing more than one state. Consider the following conditions:

(1)∑l∈Si

Pn(k , l)Pn+1(l , u) = Pn(k ,Si)C(i)n+1(u) where C

(i)n+1(u) = P(X (n+1) = u|X (n) ∈

Si), i = 1, 2, ..., r ;

(2) For k /∈ Si , n ≥ 1, Pn(k ,Si) = 0, i = 1, 2, ..., r . Then when Y (n) is Markov, for each

quadruple (k , n, i , u), where k /∈ ∪rj=1Sj , n ≥ 1, 1 ≤ i ≤ r , and u /∈ ∪rj=1Sj , either Condition

1 or Condition 2 holds. Conversely, if Condition 2 holds, or if for all k and for u /∈ ∪ri=1Si ,

Condition 1 holds, then Y (n) is Markov.

Proof. : We prove it in several steps, first we will prove the case r = 2 of the result stated

above.

Step 1: Here we assume that Y (n) chain is Markov and show that for each triple

(k , n,S1) and (k , n,S2), k /∈ S1 ∪ S2, n ≥ 1, we must have either Pn(k ,Si) = 0 or

∑l∈Si

Pn(k , l)Pn+1(l , u) = Pn(k ,Si)Cin+1(u), (4–19)

34

where C in+1(u) = P(X (n + 1) = u|X (n) ∈ Si), i = 1, 2 and u /∈ S1 ∪ S2. Let us now

assume that Pn(k ,S1) > 0. Then we have: for u /∈ S1 ∪ S2,∑l∈S1 Pn(k , l)Pn+1(l , u)

Pn(k ,S1)

=P(X (n − 1) = k)

∑l∈S1 Pn(k , l)Pn+1(l , u)

P(X (n − 1) = k)Pn(k ,S1)

=

∑l∈S1 P(X (n + 1) = u|X (n) = l)P(X (n) = l ,X (n − 1) = k)

P(X (n) ∈ S1,X (n − 1) = k)

=

∑l∈S1 P(X (n + 1) = u|X (n) = l ,X (n − 1) = k)P(X (n) = l ,X (n − 1) = k)

P(X (n) ∈ S1,X (n − 1) = k)

= P(X (n + 1) = u|X (n) ∈ S1,X (n − 1) = k)

= P(X (n + 1) = u|X (n) ∈ S1) = C 1n+1(u).

Equation 4− 19 follows for S1. Similarly, when Pn(k ,S2) > 0, Equation 4− 19 follows for

S2. This completes step 1.

Step 2: We now assume that Condition 2 holds. In this step, we show that then Y (n)

chain must be Markov. Let us now consider the states i0 ,..., in of the Y (n) chain. Often,

while switching from Y (n) to X (n), we will use X (n) ∈ in to mean Y (n) = in, when in may

not be a single state. Suppose that ik = S1 only when k = 0 and for k > 0, ik is a singleton.

Then we have:

P(Y (n) = in|Y (n) = in−1, ...,Y (0) = i0)

=

∑α∈S1 P(X (n) = in|X (n − 1) = in−1, ...,X (O) = α)P(X (n − 1) = in−1, ...,X (0) = α)∑

α∈S1 P(X (n − 1) = in−1, ...,X (0) = α)

= P(X (n) = in|X (n − 1) = in−1)

= P(Y (n) = in|Y (n − 1) = in−1).

Similar is the situation when ik = S2 only when k = 0. Now let us consider the possibility:

m = maxk |0 ≤ k ≤ n, ik = S1 or S2 > 0. If in = S2, then all the states in−1, in−2,...,i0 are

necessarily equal to S2, as, otherwise, we shall have, P(Y (n) = in|Y (n− 1) = in−1, ...,Y (0) =

35

i0) = 0 = P(Y (n) = in|Y (n−1) = in−1). Also since P(Y (n) = S2,Y (n−1) = S2, ...,Y (1) =

S2,Y (0) /∈ S2) = 0 = P(Y (n − 1) = S2, ...,Y (1) = S2,Y (0) /∈ S2), we have, when in = S2,

the following:

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0) =P(Y (n) = S2|Y (n − 1) = S2, ...,Y (0) = S2)

=P(Y (n) = S2|Y (n − 1) = S2, ...,Y (1) = S2)

=P(Y (n) = S2|Y (n − 1) = S2) = P(Y (n) = in|Y (n − 1) = in−1).

Similar is the situation when in = S1.

In case m = n − 1 and in−1 = S2, again, as before, i0 = i1 = ... = in−2 = S2,

as, otherwise, P(Y (n − 1) = in−1, ...,Y (0) = i0) = 0. Thus, it follows, as before,

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0) = P(Y (n) = in|Y (n − 1) = in−1). Finally let us

consider the possibility 0 < m < n− 1 and im = S2. In this case, as before, we have necessarily

i0 = i1 = ... = im−1 = S2, and in−1 is either S2 or a singleton element not in S1 ∪ S2. Thus, in

this case, when im = in−1 = S2, as before, we have:

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (m) = im, ...,Y (0) = i0) =P(Y (n) = in|Y (n − 1) = in).

In case im = S2 and in−1 /∈ S1 ∪ S2, in−1 is a singleton, and so is each ik , m < k < n − 1, and

as before,

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (m) = im, ...,Y (0) = i0)

=P(Y (n) = in|Y (n − 1) = in−1, ...,Y (m + 1) = im+1,Y (m) = S2, ...,Y (0) = S2)

=P(Y (n) = in|Y (n − 1) = in−1, ...,Y (m + 1) = im+1)

=P(X (n) ∈ in|X (n − 1) = in−1, ...,X (m + 1) = im+1) =P(X (n) ∈ in|X (n − 1) = in−1)

=P(Y (n) = in|Y (n − 1) = in−1),

using the fact that X (n) is Markov and that each of the elements im+1, ..., in−1 is a

singleton. Step 2 is now complete.

Step 3: In this step we assume that Condition 1 holds for all k and n, and show that

then Y (n) must be Markov. We consider again the states i0, i1,...,in of the Y (n) chain. The

state ik of Y (n) may be either S1 or S2, or simply a singleton element not in S1 ∪ S2. Thus,

36

ik = ik , when it is a singleton, and ik denotes the set S1 or S2 otherwise. We have the

following possibilities:

(i) in−1 = Sj , in = Sj , j = 1, 2; (ii) in = Sj , in−1 = Sj , j = 1, 2; (Note that in in (i)

and in−1 in (ii) are both singleton elements as no transition is possible from S1 to S2.) (iii)

in = in−1 = Sj , j = 1, 2. Let us now assume (i), and in−1 = S1 and in, a singleton not in

S1 ∪ S2. Then we have:

P(Y (n) = in,Y (n − 1) = S1,Y (n − 2) = in−2, ...,Y (0) = i0)

=P(X (n) = in,X (n − 1) ∈ S1,X (n − 2) ∈ in−2, ...,X (0) ∈ i0)

=∑

xn−2∈in−2

∑xn−1∈S1

Pn−1(xn−2, xn−1)Pn(xn−1, in)P(X (n − 2) = xn−2, ...,X (0) ∈ i0) (using

Condition 1) =C 1n (in)∑

xn−2∈in−2

Pn−1(xn−2,S1)P(X (n − 2) = xn−2, ...,X (0) ∈ i0)

=C 1n (in)P(X (n − 1) ∈ S1,X (n − 2) ∈ in−2, ...,X (0) ∈ i0).

Thus, it is clear that

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0) =C 1n (in) = P(X (n) = in|X (n − 1) ∈ S1)

=P(Y (n) = in|Y (n − 1) = in−1).

Now let us observe that case(ii) is simple, and here we do not have to use Condition 1 to prove

the Markov property of Y (n).

The last case is similar to case(i). Using Condition 1, as in case (i), we can show here

that

P(Y (n) = S1,Y (n − 1) = S1,Y (n − 2) = in−2, ...,Y (0) = i0)

=∑xn∈S1

C 1n (xn)P(X (n − 1) ∈ S1,X (n − 2) ∈ in−2, ...,X (0) ∈ i0).

It follows that

P(Y (n) = S1|Y (n − 1) = S1,Y (n − 2) = in−2, ...,Y (0) = i0)

=∑xn∈S1

C 1n (xn) = P(X (n) ∈ S1|X (n − 1) ∈ S1)

=P(Y (n) = S1|Y (n− 1) = S1). Step 3 is thus taken care of. This concludes the proof.

Now we are going to use induction.

37

Let r be a positive integer and X (n) : n ≥ 0 be a NHMC with finite state space S . Let

S1, S2,...,Sr be r pairwise disjoint subsets of S such that each of these contains more than one

state. Define the chain Z(n) such that,

Z(n) = i iff X (n) ∈ Si , 1 ≤ i ≤ r ;

Z(n) = x iff X (n) = x , when x /∈ ∪ri=1Si .

Note that the main result holds for r = 2 . We assume r > 2 and that the main result holds

for all chains Y (n) which collapse p(≤ r) pairwise disjoint subsets of S , each containing more

than one state. To prove the result by induction, we need to prove it for a chain Y (n) such

that

Y (n) = i iff X (n) ∈ Si , 1 ≤ i ≤ r + 1;

Y (n) = x iff X (n) = x when x /∈ ∪r+1i=1Si ,

where S1, S2,...,Sr+1 are pairwise disjoint subsets of S . To this end, let us assume first

that the Y (n) chain is Markov. We must prove that either Condition 2 or Condition 1 must

then hold, for all k and w in (∪r+1i=1Si)c . Let k /∈ ∪r+1i=1Si . If for this k and for some n ≥ 1,

Pn(k ,Si) > 0 for some i , 1 ≤ i ≤ r + 1, then just like in step 1 of the above result, we can

show that Condition 1 holds for each quadruple (k , n, i , u) as required. We do not need to use

induction for this part.

Conversely, let us assume that Condition 2 holds. We need to show that the Y (n) chain

is Markov. Let i0, i1,...,in be (n + 1) states of the Y (n) chain. We assume that i0 = Sr+1, and

for 0 < j ≤ n, ij ∩ Sr+1 = ∅. By induction hypothesis, the Z(n) chain (defined earlier) is

Markov. Thus, we can write:

P(Y (n) = in,Y (n − 1) = in−1, ...,Y (0) = i0)

=∑x0∈Sr+1

P(Z(n) = in,Z(n − 1) = in−1, ...,Z(0) = x0)

=P(Y (n) = in|Y (n − 1) = in−1)P(Y (n − 1) = in−1 = in−1, ...,Y (0) = i0)

Now let m = maxj |0 ≤ j ≤ n, ij = Sr+1 > 0. If in = Sr+1, and in−1∩Sr+1 = ∅, then

we have, P(Y (n) = in|y(n− 1) = in−1, ...,Y (0) = i0) = P(Y (n) = in|Y (n− 1) = in−1) = 0).

38

If in = in+1 = Sr+1, then we must have by Condition 2, i0 = i1 = ... = in−1 = Sr+1; otherwise,

P(Y (n − 1) = in−1, ...,Y (0) = i0) = 0. Thus, when in = in−1 = Sr+1,

P(Y (n) = Sr+1,Y (n − 1) = Sr+1, ...,Y (0) = Sr+1)

=P(Y (n) = Sr+1,Y (n − 1) = Sr+1, ...,Y (1) = Sr+1)

=P(Y (n) = Sr+1,Y (n − 1) = Sr+1, ...,Y (2) = Sr+1) = P(Y (n) = Sr+1), since

P(Y (n) = Sr+1,Y (n − 1) = Sr+1, ...,Y (n − s) = Sr+1)

=P(Y (n) = Sr+1,Y (n− 1) = Sr+1, ...,Y (n− s) = Sr+1)+P(Y (n) = Sr+1,Y (n− 1) =

Sr+1, ...,Y (n − s) ∈ Scr+1)

=P(Y (n) = Sr+1, ...,Y (n − s + 1) = Sr+1) = P(Y (n) = Sr+1). Now suppose

n − 1 = m so that in−1 = Sr+1, and in ∩ Sr+1 = ∅. Then, as before, as no transition is

possible from Si to Sj (i = j), we must have i0 = i1 = ... = in−1 = Sr+1, and as before,

P(Y (n) = in|Y (n − 1) = Sr+1, ...,Y (0) = Sr+1) = P(Y (n) = in|Y (n − 1) = in−1). Finally,

we assume that 0 < m < n − 1. Then again we must have: i0 = i1 = ... = im = Sr+1. It is

also clear, as we showed earlier,

P(Y (n) = in|Y (n − 1) = in−1, ...,Y (m + 1) = im+1,Y (m) = Sr+1, ...,Y (0) = Sr+1)

=P(Y (n) = in|Y (n − 1) = in, ...,Y (m + 1) = im+1), where im+1 can only be either Sr+1

or a singleton not in ∪r+1i=1Si . If im+1 = Sr+1, then continuing the same reasoning, we must

have in−1 equal to either Sr+1 or a singleton not in ∪r+1i=1Si . Thus, it is no loss of generality to

consider each ij , 0 ≤ j ≤ n, as a singleton not in ∪r+1i=1Si . Then the Markov property follows

immediately if we replace Y (j) by Z(j), o ≤ j ≤ n, and observe that the Z(n) chain is

Markov, by induction hypothesis.

Finally, we assume that Condition 1 holds for the Y (n) chain, and then we need to show

that Y (n) chain is Markov. By induction hypothesis, the Z(n) chain is Markov. Let io , i1,...,in

be (n + 1) states of the Y (n) chain. Then each ij is either the set Si , 1 ≤ i ≤ r + 1, or just

a singleton element not in ∪r+1m=1Sm. There are five possibilities: (i) in−1 = St , in = Sj , t = j ,

1 ≤ t ≤ r + 1, 1 ≤ j ≤ r + 1; (ii) in−1 /∈ ∪r+1t=1St , in = Sj , 1 ≤ j ≤ r + 1; (iii) in−1 = in = Sj ,

1 ≤ j ≤ r + 1; (iv) in−1 = Sj , 1 ≤ j ≤ r + 1, in /∈ ∪r+1t=1St ; (v) in−1 /∈ ∪r+1t=1St , in /∈ ∪r+1t=1St .

39

Note that we must prove: P(Y (n) = in|Y (n − 1) = in−1, ...,Y (0) = i0) = P(Y (n) =

in|Y (n − 1) = in−1).

When (v) occurs since then in both sides of the above equation Y (m) can be replaced by

Z(n), 0 ≤ m ≤ n, and Z(n) is Markov by induction hypothesis. Also, both sides are equal to

zero when (i) occurs, since, by hypothesis in the statement of the main result, transition from

Si to Sj is not possible when i = j .

We now assume (ii). In this case, in−1 is a singleton and in = Sj , for some j , 1 ≤ j ≤

r + 1. Thus we can write:

P(Y (n) = in|y(n − 1) = in−1, ...,Y (0) = i0)

=P(X (n) ∈ Sj |X (n − 1) = in−1)P(Y (n − 1) = in−1, ...,Y (0) = i0)

P(Y (n − 1) = in−1, ...,Y (0) = i0)

= P(Y (n) = in|Y (n − 1) = in−1)

Let us now assume that (iii). Let in−1 = in = Sj , 1 ≤ j ≤ r + 1. If 1 ≤ j ≤ r , then replacing

each Y (m) by Z(m) and noting that the Z(n) chain is Markov, it is easy to see that

P(Z(n) = in,Z(n − 1) = in−1, ...,Z(0) = i0)

P(Z(n) = j |Z(n − 1) = j)P(Z(n − 1) = in−1, ...,Z(0) = i0)

and this implies that the Markov property holds.

40

Let us now assume that in−1 = in = Sr+1. In this case, we need to use Condition 1. Note

that

P(Y (n) = in,Y (n − 1) = in−1, ...,Y (0) = i0)

=∑

xn−2∈in−2

∑xn−1∈Sr+1

Pn−1(xn−2, xn−1)Pn(xn−1,Sr+1)

× P(X (n − 2) = xn−2,X (n − 3) ∈ in−3, ...,X (0) ∈ i0)

= P(X (n) ∈ Sr+1|X (n − 1) ∈ Sr+1)

×∑

xn−2∈in−2

Pn−1(xn−2,Sr+1)P(X (n − 2) = xn−2,X (n − 3) ∈ in−3, ...,X (0) ∈ i0)

= P(Y (n) = in|Y (n − 1) = in−1)P(Y (n − 1) = in−1,Y (n − 2) = in−2, ...,Y (0) = i0).

This implies that Y (n) is Markov. The case (iv) can be taken care of in the same way as

(iii).

Theorem 4

Let X (n), n ≥ 0, be a reversible Markov chain with finite state space S , p =

(p1, p2, ..., pn) its initial distribution vector (with each pi > 0), for each n ≥ 1, n odd,

Pn = Pn+1. Then for any function f : S → S , the collapsed chain Y (n) ≡ f (X (n)) is Markov

if and only if for each n ≥ 1,

(1) Pn = αnI + (1− αn)U,

where |αn| ≤ 1 (αn possibly negative), and U has all its rows identical and equal to p. Also,

any NHMC having the Form 1 is reversible.

Proof. Suppose S has only two elements and Pn = I , n ≥ 1. Choose αn = (Pn)11 − (Pn)21 =

(Pn)22 − (Pn)12. Define : Un such that (Un)11 = (Un)21 =(Pn)21(1−αn)

, (Un)12 = (Un)22 =(Pn)12(1−αn)

,

when αn = 1. Choose αn = 1 when Pn = I . Then for n ≥ 1, Pn = αnI +(1−αn)Un, |αn| ≤ 1,

where Un is a stochastic matrix with rank 1. Since X (n) is reversible (by assumption), p is left

invariant and as such, for each n ≥ 1, p = pPn = αnp + (1− αn)pUn = αnp + (1− αn)U(1)n ,

41

or p = U(1)n , where U(1)n is the first row of Un. Thus, Un is independent of n, and for n ≥ 1,

Pn = αnI + (1− αn)U, |αn| ≤ 1, where each row of U is p.

Now we assume X (n) is reversible and S has three or more elements (that is m ≥ 3). In

this case, whenever f is a function from S into S , by Theorem 2 the chain Y (n) ≡ f (X (n))

is Markov if for each n ≥ 1, and three distinct states i ,j and k , (Pn)ik = (Pn)jk , and

(Pn)ii + (Pn)ij = (Pn)ji + (Pn)jj , whenever f (i) = f (j), f−1(f (k)) = k. In other words, for

each n ≥ 1, each column of Pn has all of its non diagonal entries equal, and furthermore, for

any two columns of Pn (say, the ith and jth), the differences (Pn)ii − (Pn)ji and (Pn)jj − (Pn)ij

are the same. Thus, as in the m = 2 case, we can again choose, αn = (Pn)ii − (Pn)ji (so that

αn = 1 if and only if Pn = I ), and write :

Pn = αnI + (1− αn)Un,

where |αn| ≤ 1, αn = (Pn)ii − (Pn)ji , and (Un)ii = (Un)ji = (Pn)ji(1−αn)

for all i ,j (i = j) in S . Since

pPn = p for each n ≥ 1, as in the m = 2 case, each row of Un is equal to p, and Un = U (a

rank one stochastic matrix where each row is equal to p).

The ”converse” part can be easily verified.

The following example shows that the conclusion of Theorem 4 may not hold for a

non-reversible NHMC X (n) with three states for which Y (n) ≡ f (X (n)) is Markov for any

function f from the state space to itself.

Example 3. Let us consider the NHMC X (n), n ≥ 1, where the initial distribution is

uniform and the Pn’s are given by: P1 = U =

13

13

13

13

13

13

13

13

13

, P2 =

12

13

16

16

12

13

13

16

12

, and

Pn = αnI + (1− αn)U, n ≥ 3, where −12

< αn < 1 for all n. Note that here the left invariant

initial distribution vector is (13, 13, 13) and the reversibility condition fails for P2 since P2 is

bi-stochastic, but not symmetric. However it can be easily verified that the collapsed chain

Y (n) ≡ f (X (n)) is Markov for any function f from the state space of X (n) to itself.

42

4.2 Lumpability

Strong Lumpability: We shall say that a Markov chain is strongly lumpable with respect

to a partition A = A1,A2, ...,Ar of the chain’s state space if for every starting vector

π the collapsed chain corresponding to this partition is a Markov chain and the transition

probabilities do not depend upon the choice of π .

Weak Lumpability: A Markov chain is weakly lumpable with respect to a certain partition

whenever the markovian property of the corresponding collapsed chain depends upon the

choice of the initial vector π . That is , in this case the collapsed chain will be Markov only

when some particular initial vectors (it may even be just one initial vector ) are chosen.

Theorem 1

Let X (n) be a non homogeneous Markov chain with

1) State space S = 1, 2, ...,m

2) Symmetric transition probability matrices Pn such that

P(X (n) = j |X (n − 1) = i) = Pn(i , j) ∀i , j ∈ S , n = 1, 2, ...

3)Pn = Pn+1 for every odd n.

Let S1,S2, ...,Sr be a partition of S , r ≤ m. We define Y (n) = i if and only if

X (n) ∈ Si . Then weak lumpability of X (n) with respect to uniform initial probability vector (

P(X (0) = i) =1

m) implies strong lumpability.

Proof. Since X (n) is weakly lumpable with respect to uniform initial probability distribution,

hence from Theorem 2 we have,

QnQn+1 = APnPn+1B = APnBAPn+1B

=⇒ APn(I − BA)Pn+1B = 0.

Therefore,

PnB = BAPnB (4–20)

43

Now,

(PnB)ij =∑k∈Si

Pn(i , k) (4–21)

Again,

(BAPnB)ij = Qn(k , j) (4–22)

Hence from the above three equations we have∑k∈Sj

Pn(i , k) = Qn(k , j) ∀n which is exactly the sufficient condition for

X (n) to be strongly lumpable from Theorem 1.

Here we define reversibility in a slightly different manner (as per Kemeny and Snell). Let

π be a left invariant probability vector and X (n) be a non-homogeneous Markov chain. We say

that X (n) is a reversible Markov chain if

πiP(X (n + 1) = j |X (n) = i) = πjP(X (n + 1) = i |X (n) = j).

Suppose π is the initial probability vector. Then we observe that

πiP(X (n + 1) = j ,X (n) = i)

P(X (n) = i)= πjP(X (n + 1) = i ,X (n) = j)

P(X (n) = j)

=⇒ P(X (n + 1) = j ,X (n) = i) = P(X (n + 1) = i ,X (n) = j)

=⇒ P(X (n + 1) = j ,X (n) = i)

P(X (n) = i)=P(X (n) = j ,X (n + 1) = i

P(X (n + 1) = i)

=⇒ P(X (n + 1) = j |X (n) = i) = P(X (n) = j |X (n + 1) = i)

Here we used the fact that P(X (n) = i) = πn(i) = πn+1(i) = P(X (n + 1) = i) because of

left invariance of π.

Theorem 2

Let X (n) be a reversible non-homogeneous Markov chain with state space S =

1, 2, ...,m and uniform initial probability distribution, i.e P(X (0) = i) = 1m

∀i ∈ S .

Let S1,S2, ...,Sr be a partition of S , r ≤ m. Then the lumped chain with respect to this

partition is also reversible. We assume that the collapsed chain is Markov here.

44

Proof. By reversibility we have, Pn = DnPTn D

−1n ∀n where (Dn)ii =

1

m∀n.

From Theorem 2 , we know that the transition probability matrix of Y (n) is of the form

Qn = APnB for all n. Thus Qn = ADnPTn D

−1n B.

We define (Dn)ii =(Dn)jj|Si |

j ∈ Si .Then we have,

(DnBT )ij = (Dn)iiB

Tij

= (Dn)iiBji

= (Dn)ii j ∈ Si

=(Dn)jj|Si |

j ∈ Si .

Again,

(ADn)ij = Aij(Dn)jj = Aij(Dn)jj =1

|Si |(Dn)jj j ∈ Si .

Hence ADn = DnBT .

Again we observe that,

(D−1n B)ij = (D

−1n )iiBij =

1

(Dn)iii ∈ Sj .

And

(AT D−1n )ij = A

Tij (D

−1n )jj

= Aji(D−1n )jj =

1

|Sj |(D−1n )jj i ∈ Sj

=1

|Sj ||Sj |(Dn)ii

i ∈ Sj =1

(Dn)iii ∈ Sj .

Hence D−1n B = A

T D−1n . Therefore we have,

Qn = DnBTPTn A

T D−1n = Dn(APnB)

T D−1n = DnQ

Tn D

−1n .

Hence the lumped chain is also reversible.

45

Reverse Markov Chain

A Markov chain observed in the reverse order is also Markov because of the following:

P(X (n − 1) = in−1|X (n) = in,X (n + 1) = in+1, ...,X (n + p) = in+p)

=P(X (n + p) = in+p, ...,X (n − 1) = in−1)P(X (n + p) = in+p, ...,X (n) = in)

=P(X (n + p) = in+p|X (n + p − 1) = in+p−1)...P(X (n) = in|X (n − 1) = in−1)

P(X (n + p) = in+p|X (n + p − 1) = in+p−1)...P(X (n + 1) = in+1|X (n) = in)P(X (n) = in)

× P(X (n − 1) = in−1

=P(X (n) = in,X (n − 1) = in−1)

P(X (n) = in)

= P(X (n − 1) = in−1|X (n) = in)

Theorem 3

If a given non homogeneous Markov chain is weakly lumpable with respect to a partition

A = A1, ...,An, then so is the reverse chain.

Proof. Let X (n) be a non homogeneous Markov chain which is weakly lumpable with respect

to partition A = A1, ...,An.

Now, we must prove that all the probabilities of the form Pβ(X (1) ∈ Ai |X (2) ∈

Aj , ...,X (n) ∈ At) depend only upon Ai and Aj where β is the initial vector with respect to

which the collapsed chain is still Markov.

P(Y (1) = i |Y (2) = j ,Y (3) = h, ...,Y (n) = t)

= P(Y (1) = i |Y (2) = j)

= Pβ(X (1) ∈ Ai |X (2) ∈ Aj),

where the second equality follows from the above discussion of reverse Markov chains. Hence

the theorem follows.

46

4.3 Application

In the third chapter we mentioned the PageRank example as an application of collapsed

Markov chains (see (2)). Here we present the idea in detail:

Definition: Given a stochastic matrix P ∈ Rn×p, the relation P = DK is called a

stochastic factorization of P if D ∈ Rn×m and K ∈ Rm×p are also stochastic matrices. The

integer m > 0 is the order of factorization.

It is well known that stationary distribution gives us valuable information about a Markov

chain. Therefore in many practical applications, computation of the stationary distribution

become very important. The computation cost directly depends on the number of states of the

Markov chain and as n approaches infinity, the number of arithmetic operations necessary to

compute the stationary distribution becomes prohibitive. One way to overcome this obstacle

is to use stochastic factorization to reduce the number of states in a Markov chain. This way,

one can find the stationary distribution of the reduced version of the chain and then find the

corresponding distribution of the original model. Now we will explain how it is done:

Proposition 1: Let P ∈ Rn×n be a transition probability matrix and let P = DK be a

stochastic factorization of order m. Let P ′ = KD be the m × m transition matrix resulting

from the inversion of the factorization terms.Then

(i) P and P ′ have the same non-zero eigenvalues (counting multiplicity) and

(ii) If x ′ ∈ R1×m is a left eigenvector of P ′ with eigenvalue λ = 0, then x ′K is a left

eigenvector of P associated with the same eigenvalue.

Proof. (i) From matrix analysis, we know that if A ∈ Rn×m and B ∈ Rm×n, then the spectra of

AB and BA can only differ by zeros.

(ii) Let λ′ be a nonzero eigenvalue of P ′. Then there exists a vector x ′ = 0 in R1×m such

that λ′x ′ = x ′P ′ = x ′KD. Right multiplying both sides by K , we have λ′x ′K = x ′KDK =

x ′KP.

47

We observe that x ′K ∈ R1×n cannot be the vector 0 ∈ R1×n since this would imply

x ′KD = x ′P ′ = λ′x ′ = 0 ∈ R1×m which clearly contradicts the assumptions that λ′ = 0 and

x ′ = 0. Hence, x ′K is an eigenvector of P .

Thus as a consequence of Proposition 1 we have, if π′ is a stationary distribution of P ′,

then π′K is a stationary distribution of P . In practice, we are usually interested in computing

the stationary distribution of irreducible chains as the distribution becomes unique in this case.

Thus an important question is whether a stochastic factorization of an irreducible transition

matrix P results in a matrix P ′ which is also irreducible. From (2), we know that the answer is

affirmative.

Now let X (n) be a homogeneous reversible Markov chain with pi > 0 for all i and states

S = 1, 2, ...,m and let S1,S2, ...,Sr be a partition of S . We define the collapsed chain

Y (n) in the usual manner. Let P and Q be the transition probability matrices of X (n) and

Y (n) respectively. Now we define matrices A and B in the following manner:

A = (BTDB)−1BTD

and

Bij =

1 if i ∈ Sj0 if i /∈ Sj

where (D)ii = pi for all i .

Now we will establish a relationship between P and Q. First we shall write the matrix A

element-wise.

Aik = (BTDB)−1ii B

TikDkk ( since B

TDB is diagonal ), and

(BTDB)ii =∑u

BTiuDuuBui =∑u∈Si

Duu =∑u∈Si

pu. Hence Aik =pk∑

u∈Si

pu( where k ∈ Si ).

Now, (APB)ij =∑k,l

AikPklBlj =∑k∈Si

∑l∈Sj

pkPkl

(∑u∈Si

pu).

48

Again,

Qij = P(Y (n) = j |Y (n − 1) = i)∑u∈Si P(X (n) ∈ Sj ,X (n − 1) = u)∑

u∈Si P(X (n − 1) = u).

Let pn(j) = P(X (n) = j) =∑k

P(X (n) = j |X (n − 1) = k)P(X (n − 1) = k). Let n = 1.

Then we have ,

p1(j) =∑k

p0(k)P(k , j) = p0(j) ( since p is left invariant of the transition matrix P ).

Clearly by induction we have , pn(j) = p0(j) ∀j and ∀n .

Then,

Qij =

∑k∈Si

P(k ,Sj)pk∑u∈Si

pu=

1∑u∈Si

pu

∑k∈Si

∑l∈Sj

pkP(k , l).

Hence we have , Q = APB .

Here we shall prove another equality given by Q2 = AP2B

Q2ij = P(Y (2) = j |Y (0) = i)

=P(X (2) ∈ Sj ,X (0) ∈ Si)

P(X (0) ∈ Si)

=

∑k∈Si

∑l

P(X (2) ∈ Sj ,X (1) = l ,X (0) = k)∑k∈Si

P(X (0) = k)

=1∑

k∈Si

pk

∑k∈Si

∑l

pkP(k , l)P(l ,Sj)

=

∑k∈Si

∑l

∑t∈Sj

pkP(k , l)P(l , t)Btj∑k∈Si

pk= (AP2B)ij .

49

Let Y (n) be markovian , hence the n- step transition probability matrix of Y (n) given by

Q(n) must satisfy Chapman Kolmogorov Equation. Therefore we have,

Q(2) =[Q(1)

]2=⇒ AP2B = APBAPB

=⇒ AP2B − APBAPB = 0

=⇒ AP[I − BA]PB = 0

=⇒(BTDB

)−1BTDP (I − BA)PB = 0

=⇒ BTPTD (I − BA)PB = 0,

where the last equality holds because of reversibility.

One can easily verify that D(I − BA) is semi-definite. Since D (I − BA)is semi-definite

hence there exist matrix R (m ×m)such that D (I − BA) = RTR.

Therefore we have,

BTPTD (I − BA)PB = 0

=⇒ BTPTRTRPB = 0

=⇒ (RPB)T RPB = 0

=⇒ RPB = 0

=⇒ RTRPB = 0

=⇒ D (I − BA)PB = 0

=⇒ PB = BAPB,

thus this is a necessary condition for the collapsed chain to be Markov.

Since Q = APB and A,P,B are stochastic matrices , the expression of the collapsed

transition matrix can be seen as a special case of stochastic factorization.

50

CHAPTER 5SUMMARY AND CONCLUSIONS

In this thesis we have considered the quite old problem of collapsing the states of

a Markov chain and then looking for conditions under which the collapsed chain will be

markovian again. Over the years, lot of work has been done in the context of homogeneous

Markov chains; in this thesis we have tried to address the problem in the non-homogeneous

finite state-space context. In near future, we would like to investigate the following:

[1]: How our results will generalize in the parlance of continuous time Markov chains.

[2]: Preserving entropy rate in lumping.

[3]: Using lumped Markov chains in understanding networks.

[4]: Expanding a Markov chain: it is exactly the opposite of what we have done in this

thesis, it is possible to obtain from a Markov chain a larger chain which gives more detailed

information about the process being considered.

51

REFERENCES

[1] A. Abdel-Moniem and F. Leysieffer, “Weak lumpability in finite markov chains,” Journalof Applied Probability, vol. 19, no. 3, pp. 685–691, Sep. 1982.

[2] A. M. Barreto and M. D. Fragoso, “Lumping the states of a finite markov chain throughstochastic factorization,” in Proceedings 18th IFAC World Congress, Milano, ITALY,2011, pp. 4206–4211.

[3] C. Burke and M. Rosenblatt, “A markovian function of a markov chain,” Annals ofMathematical Statistics, vol. 29, no. 4, pp. 1112–1122, Mar. 1958.

[4] J. Franklin, Matrix Theory, Prentice-Hall Inc, 1968.

[5] N. Higham, “Analysis of cholesky decomposition of a semi-definite matrix,”’ ReliableNumerical Computations, pp. 161–185, 1990.

[6] S. E. Bensley and B. Aazhang, Finite Markov Processes and Their Applications, JohnWiley and Sons, 1979.

[7] D. Issacson and R. Madsen, Markov Chains Theory and Applictions, John Wiley andSons, 1976.

[8] T. Karlin and H. Taylor, A First Course in Stochastic Processes, Academic Press, 1975.

[9] G. Kemeny and L. Snell , Finite Markov Chains, Springer-Verlag, 1960.

[10] M. Kijima, Markov Processes for Stochastic Modeling, Chapman and Hall, 1997.

[11] M. Rosenblatt, in Markov Process Structure and Asymptotic Behavior, Springer-Verlag,1971.

[12] G. Rubino and B. Sericola, “On weak lumpability in markov chains,” Journal of AppliedProbability, vol. 26, no. 3, pp. 446–457, Sep. 1989.

52

BIOGRAPHICAL SKETCH

Agnish Dey was born in 1986, in Kolkata, West Bengal, India. He did his schooling from

Bidhan Nagar High School in the city of Kolkata, India. Then in his bachelor’s program, he

studied engineering in information technology from West Bengal University of Technology from

2004 to 2008. After that he became a pupil of mathematics and earned his master’s degree in

mathematics from UT Pan American (UTRGV) in August, 2011.

From 2011 to 2017, Agnish was pursuing his PhD in the Department of Mathematics,

University of Florida under the guidance of Prof Murali Rao.

53

c 2017 agnish dey · 2018. 2. 27. · c c c c c c c a this matrix is called the transition...

Documents