read me at last after markov chain (summary)

Upload: sanjeetkumar

Post on 02-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    1/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    OR / STAT 645: Stochastic Processes

    Lecture 1: Probability Review, Exponential DistributionGiven: 8/31/2006

    Why You Need to Know about Stochastic Processes

    Life is stochastico Commute to worko Wait in line for luncho Even deterministic things are stochastic (e.g., Metro busses)

    Stochastic problems are directly relevant to your lifeo Why do bad things happen in groups?o How do I increase the page rank of my web site in Google?o How should stock options be priced?o When should I replace my aging car?o Why do I have to wait so long for a bus at Dulles airport? And why do the busses get

    clumped up in groups?

    Probability Review

    Notation

    ( )f x is the Probability Density Function (PDF).

    ( )F x is the Cumulative Distribution Function (CDF): ( ) ( )x

    F x f u du

    = .

    ( ) 1 ( )cF x F x= is the Complement of the CDF (or CCDF)

    Relationships

    ( ) ( ) ( )cd d

    f x F x F xdx dx= =

    Exponential Distribution

    ( ) xf x e = , 0x ( ) 1 xF x e = , 0x

    ( )c xF x e = , 0x

    Memorize These Formulas!

    Gamma Distribution

    PDF:1 /

    ( )( )

    xx e

    f x

    =

    , 0x > , where 1

    0

    ( ) xx e dx

    = .

    Note: ( ) ( 1)! = when is a positive integer.CDF: When is a positive integer,

    1/

    0

    ( / )( ) 1

    !

    jx

    j

    xF x e

    j

    =

    = , 0x > .

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    2/74

    OR / STAT 645: Stochastic Processes

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    We will derive this property later.

    Other properties1. When is a positive integer, a gamma random variable (RV) is equivalent to the sum of

    independent exponential RVs with mean .

    2. When 1 = , gamma RV is an exponential RV with mean .

    Check:1 / /

    ( )( )

    x x

    xx e ef x e

    = = =

    , where 1/ = .

    Mean of a Random Variable

    (1) Typical method

    Discrete Case: [ ] i ii

    E x x p= , where ( )i ip P X x= =

    Continuous Case: [ ]0

    ( )E X xf x dx

    =

    (2) WhenXis non-negativerandom variable

    0

    [ ] ( )cE x F x dx

    =

    Proof: SupposeXis discrete with ( )i ip P X x= = . Plot ( )c

    F x as a function ofx:

    Calculate the area under the curve two ways:

    First way:0

    ( )cF x dx

    .

    Second way: Add up areas of horizontal rectangles. This gives1

    i i

    i

    x p

    =, which is [ ]E X .

    (3) Using Moment Generating Function

    Moment generating function for random variableX: ( ) tXt E e =

    x1 x2 x3 x4

    p1

    p2

    p3

    p4

    Fc(x)

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    3/74

    OR / STAT 645: Stochastic Processes

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    ( )

    discrete case

    ( ) continuous case

    tk

    k

    k

    tx

    e p

    t

    e f x dx

    =

    =

    Then, ( )

    ( ) 0nn

    t

    E x t=

    =

    .

    For example, [ ] ( )0t

    E x t=

    =

    ( )20t

    E x t=

    =

    Note: Moment generating function is closely related to the Laplace transform:

    *

    0

    ( ) [ ] ( )sx sxf s E e e f x dx

    =

    Example: Exponential Distribution

    ( ) ( ) ( ) ( )( )

    0 0 0

    x

    t xt xtx x

    x

    et e e dx e dxt t

    =

    =

    = = = = .

    Therefore,

    [ ]0

    1

    t

    E xt

    == =

    ( ) ( )( )

    22

    3 20

    0

    2 2t

    t

    E x tt

    ==

    = = =

    Variance of Random Variable

    Variance: 2 2 2var[ ] ( [ ]) [ ] [ ]X E X E X E X E X = = , where2 2

    0

    [ ] ( )E X x f x dx

    = .

    Standard Deviation (std. dev.) = var[ ]X

    Coefficient of Variation (CV) =std. dev.

    [ ]E X

    Example: Exponential Random Variable

    2 22 2 2

    2 1 1var[ ] [ ] ( [ ])X E X E X

    = = =

    std. dev. =1

    CV =1/

    11/

    =

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    4/74

    OR / STAT 645: Stochastic Processes

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Memoryless Property

    Def. 1. A random variableXhas the memoryless property if:

    ( | ) ( )P X t s X s P X t > + > = >

    Intuition: SupposeXrepresents the time that you wait for a bus. Given that you have already beenwaiting stime units ( )X s> , the probability that you wait an additional tunits ( | )P X t s X s> + > is the same as the probability of waiting tunits in the first place ( )P X t> .

    We now formulate an alternate definition. IfXhas the memoryless property, then

    ( ) ( | )

    ( and )

    ( )

    ( )

    ( )

    P X t P X t s X s

    P X t s X s

    P X s

    P X t s

    P X s

    > = > + >> + >

    =>

    > +=

    >

    Def. 2. A random variableXhas the memoryless property if:( ) ( ) ( )P X t s P X t P X s> + = > >

    The exponential distribution is the only distribution that has the memoryless property.

    Check the exponential has this property: ( )( ) ( ) ( )t s t sP X t s e e e P X t P X s + > + = = = > >

    Useful Properties of Exponential Distribution

    Suppose that

    ( )1 1~ expX (time until event 1 happens)

    ( )2 2~ expX (time until event 2 happens)

    ( )~ expn nX (time until event nhappens)

    Alli

    X are independent

    1. First occurrence among events

    What is the probability that 1 2X X< ?

    1 1 2 2

    1

    1 1 2 2

    1

    1 1 2 1

    1 2 1 2 2 1

    0

    1 2 2 1

    0

    1 1

    0

    ( ) x x

    x

    x x

    x

    x x

    P X X e e dx dx

    e e dx dx

    e e dx

    < =

    =

    =

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    5/74

    OR / STAT 645: Stochastic Processes

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    11 2

    1 2

    ( )P X X

    < =

    +

    The double integration uses the following graphic.

    x1

    x2

    x1

    x2

    The second to last equality uses the known CCDF for the exponential distribution.

    For the opposite relationship

    1 22 1 1 2

    1 2 1 2

    ( ) 1 ( ) 1P X X P X X

    < = < = =

    + +,

    as expected from symmetry.

    More generally,

    1( min( , , ) i

    i n

    j

    j i

    P X X X

    = =

    To derive the general parts from the 2-variable cases, build up inductively.

    3 variable example: 11 1 2 3 1 2 31 2 3

    ( min( , , ) ( min( , ))P X X X X P X X X

    = = < =

    + +

    2. Distribution of time of first event (minimum)

    1 2 1 2

    1 2

    1 2

    (min( , ) ) ( , )

    ( ) ( )

    ( ) ( )C C

    P X X x P X x X x

    P X x P X x

    F x F x

    > = > >

    = > >

    =

    For an exponential RV,

    1 2

    1 2

    1 2 1 2

    ( )

    (min( , ) ) ( ) ( )C C

    x x

    x

    P X X x F x F x

    e e

    e

    +

    > =

    =

    =

    This is the CDF of an exponential RV with rate 1 2( ) + , therefore

    1 2 1 2min( , ) ~ exp( )X X + .

    More generally,

    1 2 1 2min( , , , ) ~ exp( )nX X X + + .

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    6/74

    OR / STAT 645: Stochastic Processes

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Key intuition: Think of exponential RVs as times until something happens and s as rates.

    3. Independence property (stated without proof). The time of the first occurrence of an event isindependent of the ordering of the events. That is, 1 2 nX X X< < =

    =

    =

    =

    For an exponential RV,

    1 2

    1 2 1 2

    1 2 1 2

    1 2 1 2

    ( )

    ( )

    (max( , ) ) 1 ( ) ( )

    1 (1 )(1 )

    1 (1 )

    x x

    x x x

    x x x

    P X X x F x F x

    e e

    e e e

    e e e

    +

    +

    > =

    =

    = +

    = +

    Note: This could have been derived from Venn diagram principles

    1 2 1 2 1 2(max( , ) ) ( ) ( ) ( , )P X X x P X x P X x P X x X x> = > + > > >

    5. Sum of exponentials (with same rate) is a gamma (stated earlier)

    Example (Prob. 5.28)

    Consider ncomponents with independent lifetimes. Component ifunctions for an exponential timewith rate

    i . All components are initially in use and remain so until they fail.

    a. Find the probability that component 1 is the second component to fail.b. Find the expected time of failure of the second component

    Possible orderings for component 1 to be second component to fail:

    a. 2 fails first, then 1, then some other component fails.b. 3 fails first, then 1, then some other component fails.c. d. nfails first, then 1, then some other component fails.

    Probability of event (a) is:

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    7/74

    OR / STAT 645: Stochastic Processes

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    2 1

    21

    (2 fails first, then 1, then another)

    (2 fails before all others)P(1 fails before all except 2)

    n

    iiii

    P

    P

    =

    =

    Likewise, probability of event (b) is: 3 1

    31

    n

    iiii

    =

    .

    Events (a),(b), , (d) are mutually exclusive, therefore P(component 1 is second to fail) is sum of allthe above probabilities.

    2 1 1

    21 1

    1 2

    21

    n

    n n

    i ii i ni ii i

    n

    n

    i ii i nii

    = =

    =

    + +

    = + +

    (b)

    Expected time of first failure is

    1

    1n

    ii

    =.

    Probability first failure is type kis:

    1

    k

    n

    ii

    =

    .

    Expected time from first failure to second failure given first failure is type kis:1

    ii k

    .

    Thus, total expected time until second failure is

    11 1

    1 1n kn n

    k ii ki ii i

    = = =

    +

    Computing Expectations by Conditioning

    Basic idea: Compute the expectation or variance of a (complicated) random variable byconditioning on another random variable.

    In stochastic processes, it is often useful to condition on the first event.

    Use the formulas

    ( ) ( ( | ))

    ( ) ( ( | )) ( ( | ))

    E X E E X Y

    V X V E X Y E V X Y

    =

    = +

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    8/74

    OR / STAT 645: Stochastic Processes

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Example

    The probability of an accident on I-66 during my morning commute is 0.1.

    If there is an accident, commute time ~N(50, 62)

    If there is no accident, commute time ~N(30, 42)

    What is the average time to get to work? What is the variance of time to get to work?

    Average time to get to work (easy): 0.1 50 0.9 30 32 + = .

    But lets work out carefully in language of conditional expectation:

    X= Time to get to work

    Y= Accident or no accident

    50 if accident( | )

    30 if no accidentE X Y

    =

    Note: ( | )E X Y is a random variable(call it Z). In other words,

    50 w.p. 0.1( | )

    30 w.p. 0.9Z E X Y

    =

    .

    Finally,

    ( ) ( ( | )) ( ) 0.1 50 0.9 30 32E X E E X Y E Z= = = + = .

    To compute V(X), first evaluate ( ( | ))V E X Y . We already know ( | )E X Y (which we calledZ).

    2( ) 0.1 2500 0.9 900 1,060E Z = + = .

    2 2 2

    ( ( | )) ( ) ( ) ( ) 1060 32 36V E X Y V Z E Z E Z = = = =

    Now, evaluate ( ( | ))E V X Y .

    36 w.p. 0.1( | )

    16 w.p. 0.9V X Y

    =

    Note: ( | )V X Y is a random variable.

    ( ( | )) 0.1 36 0.9 16 18E V X Y = + =

    In summary, ( ) ( ( | )) ( ( | )) 36 18 54V X V E X Y E V X Y = + = + = . (Note: Variance is bigger than

    variances of the conditioned normal variables).

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    9/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    OR / STAT 645: Stochastic Models

    Lecture 2: The Poisson ProcessGiven: 9/7/2006

    The Poisson Distribution

    Def.A Poisson random variable with meanAhas probability mass function:

    ( )!

    iAA

    P X i ei

    = =

    where 0,1,2,i =

    Note: Ae is a normalization constant.

    Mean of distribution isA.Variance of distribution isA.

    Note: For an exponential RV, the mean and std. dev.are equal. Here, the mean and varianceare

    equal.

    Historical Background

    Ladislaus Bortkiewicz. Born 1868 in St. Petersburg, Russia, born into Russian nobility. He was amilitary man and an instructor teaching artillery and mathematics. After being awarded a doctorate,he led a career in statistics and actuarial science. Some have argued that the Poisson distributionshould be named the von Bortkiewicz distribution.

    Bortkiewicz observed that events with a low frequency in a large population follow a Poissondistribution, even when the probabilities of the events vary. The classical example is the followingdata set (Bortkewicz L von.Das Gesetz der Kleinen Zahlen. Leipzig: Teubner; 1898):

    14 (out of 16 total) Prussian army corps units observed over 20 years (1875-1894). A count of men killed by a horse kick, each year, for each unit (280 data points) Total deaths = 196 Average deaths per unit per year = 196 / 280 = 0.70.

    Assume the number of deaths (for one unit in one year) is a Poisson RV with mean 0.70. Then thepredicted and actual distributions are as follows:

    Deaths

    Theoretical #

    of Units

    Observed

    # of Units

    0 139.04 144

    1 97.33 912 34.07 323 7.95 114 1.39 2

    5+ 0.22 0Total 280 280

    Some sources give an alternate account of the data:

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    10/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    10Prussian army corps units observed over 20 years (1875-1894) Total deaths = 122 Average deaths per unit per year = 122 / 200 = 0.61.

    Deaths

    Theoretical #

    of Units

    Observed

    # of Units

    0 108.67 1091 66.29 652 20.22 223 4.11 34 0.63 15 0.08 06 0.01 0

    Another Example

    During World War II, Germans attacked London with V-2 flying bombs. It was observed that theimpacts of the bombs tended to be grouped in clusters, rather than showing a random distribution.

    A possible explanation was that (a) specific areas were targeted and (b) the precision of the bombswas very high. However, the bombs were launched from across Europe and so this explanationseemed implausible.

    The following data were taken 144 square kilometers of south London were divided into 576 squares of square kilometer

    each. A count was made of the number of bombs in each square. Total bombs observed: 537. Average bombs per square: 537 / 576 = 0.932.

    Assume the number of bombs in a square kilometer is a Poisson RV with mean 0.932. Then thepredicted and actual distributions are as follows:

    Bombs per

    Square

    Theoretical #

    of Squares

    Observed

    # of Squares

    0 226.74 2291 211.39 2112 98.54 933 30.62 354 7.14 7

    5+ 1.57 1Total 576.00 576

    Conclusion: When rare events are randomly distributed, there tend to appear gaps in which noevents occur and then periods in which events appear in clusters. Mentally, we tend to forget aboutthe gaps and focus on the unusual occurrence of multiple rare events in the same space, giving aninflated illusion of rare-event clustering. It would actually be quite unusual to see rare eventsevenly distributed throughout time or space in a grid-like fashion.

    Poisson Convergence

    Why does the Poisson distribution work so well?

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    11/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Roughly speaking, one way to think of a Poisson RV is the sum of a large numberof independentrareevents (not necessarily identical). We motivate with an example:

    Let1 if person enters Giant G between 12:05 and 12:10 pm on 9/6/05

    0 otherwisei

    iX

    =

    Let1

    N

    i

    i

    X X=

    = , where the summation is over all the people in Fairfax county.

    Check conditions Large number of events? yes Independent events? Mostly

    o Counter-example: Many customers arrive in a short time period. Subsequentcustomers see a full parking lot and decide not to enter.

    o Counter-example: One car comes with multiple people Rare events? yes Identical probabilities of events? no

    For the moment, we assume all events are identical, and we relax the assumption later. Specifically,we suppose ( 1) /iP X A N = = for all i. Note: [ ]E X A= .

    Based on these assumptions, we have a binomial distribution:

    1

    1k N kN

    i

    i

    N A AP X k

    k N N

    =

    = =

    !1

    ( )! !

    k N kN A A

    N k k N N

    =

    ( 1) ( 1) (1 / )

    ! (1 / )

    k N

    k k

    N N N k A A N

    N k A N

    + =

    1! 1 !

    k A kAA e A

    ek k

    = - a Poisson random variable!

    In other words Bin(N,p) is approximately Poisson(Np) under the previous assumptions.

    Now, we eliminate the identically distributed assumption:Theorem. Let ,n mX (1 m n ) be a sequence of RVs where for each n:

    ,n mX are independent

    ,

    ,1 w.p.0 otherwise

    n m

    n mpX =

    (we are counting events)

    ,1 , (0, )n n np p A+ + (collectively, the events are rare, since nis large)

    ,1max 0n m

    m np

    (allevents are rare no one event hogs the probability)

    then ,1 , Poisson( )n n nX X A+ + .

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    12/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Note: This looks similar to the Central Limit Theorem. However, in the CLT, condition 3 is replacedwith ,1 , (0, )n n np p nA+ + (in other words, the means of the random variables ,n mX are

    approximately constant in n, so the mean of the sum grows linearly in n). Here, the means of therandom variables are shrinking in n, so the mean of the sum stays roughly constant.

    Example

    400 students are in a calculus class. LetXbe the number of students who have a birthday on the dayof the final. What is the probability that there are 2 or more birthdays on the final?

    Let1 if student has a birthday on the final

    0 otherwisei

    iX

    =

    ( 1) 1/ 365iP X = = 400

    1i

    i

    X X=

    is approximately Poisson with mean 400 / 365.

    Then, ( 2) 1 ( 2) 1 A AP X P X e Ae = < = , whereA= 400 / 365. (Answer: 0.2995)

    Preliminary Definitions

    Def.Astochastic processis a collection of random variables (RV) indexed by time{ ( ), }X t t T .

    If Tis a continuous set, the process is a continuous time stochastic process(e.g. PoissonProcess).

    If Tis countable, then the process is a discrete time stochastic process(e.g. Markov Chain).

    Def. A counting processis a stochastic process { ( }; 0}N t t such that

    ( ) {0,1,2, }N t (that is, ( )N t is a non-negative integer).

    If s t< then ( ) ( )N s N t (that is, ( )N t is non-decreasing in t).

    For s t< , ( ) ( )N t N s is the number of events occurring in the time interval ( , ]s t .

    Interpretation: ( )N t is the number of events that have occurred by time t.

    Def. A counting process hasindependent incrementsif the numbers of events in disjoint (non-overlapping) intervals are independent.

    Def. A counting process has stationary incrementsif the distribution of the number of events in aninterval depends on the lengthof the interval, but not on the starting point of the interval. That is,

    ( ( ) ( ) )P N s t N s n+ = does not depend on s. Intuitively, the interval can be slid around without

    changing its stochastic nature.

    Def. A function ( )f is ( )o h if0

    ( )lim 0h

    f hh

    = . That is, ( )f goes to zero faster than hgoes to zero.

    Example: Which functions are ( )o h ?2

    1.5

    2

    ( ) yes

    ( ) 0.01 no

    ( ) yes

    ( ) no

    f x x

    f x x

    f x x

    f x x x

    =

    =

    =

    = +

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    13/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Definitions of the Poisson Process

    Definition 1: A Poisson process is a counting process { ( }; 0}N t t with rate 0> , if:

    1. (0) 0N =

    2. The process has independent increments3. The number of events in any interval of length tis a Poisson RV with mean t .

    That is, for all , 0s t , and 0,1,2,n=

    ( )( ( ) ( ) )

    !

    nt t

    P N s t N s n en

    + = =

    Example: Consider people entering a McDonalds over a short period of time, say 20 minutes.

    Q: How do you verify these conditions?A: Condition 1 holds. Condition 2 may hold if people do not come in batches. Hard to verifyassumption 3 without collecting data.

    [Note: Cinlar (1975),Introduction to Stochastic Processes, gives a similar definition, withoutassuming independent increments. Assumption 3 is changed to: The number of events on any finiteunion of disjoint intervals is a Poisson RV with mean b where bis the length of the union.]

    Is it possible to use the physics of the situation to derive a Poisson process, similar to the rareevent law given previously?

    Definition 2: A Poisson process is a counting process { ( }; 0}N t t with rate 0> , if:

    1. (0) 0N =

    2. The process has stationary increments

    3. The process has independent increments4. ( ( ) 1) ( )P N h h o h= = + . (# of events approximately proportional to length of interval)

    5. ( ( ) 2) ( )P N h o h = (cant have 2 or more events at the same time orderliness)

    This is a more fundamental, qualitative definition of the Poisson process.

    Theorem: Definitions 1 and 2 are equivalent.

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    14/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Q: Can these conditions be verified for the McDonalds example?A: Stationarity over small intervals ok; independent increments not valid if external events occur.

    [Note: Cinlar (1975),Introduction to Stochastic Processes, gives a similar definition, withoutassumptions 4 and 5, instead assuming that the process has only unit jumps. Assumption 4 canactually be eliminated in Cinlar (1975), Lemma 1.8 derives (4) from (1), (2), (3), (5), and that theprocess is a counting process.]

    Eliminating individual assumptions yields variations on the Poisson process: Eliminate Assumption 2: Non-stationary Poisson process Eliminate Assumption 3: Mixture of Poisson processes (choose randomly, then run a

    Poisson process) Eliminate Assumption 5: Compound Poisson process

    Def. 2 Implies Def. 1

    Assume a Poisson process under definition 2. Consider a time horizon [0, T] divided up into nbins

    (where nis large):

    n Binsn Bins

    The average number of events in Suppose on average Tevents arrive in time period [0,T] By orderliness (Property 5), there is (loosely speaking) at most 1 event in each bin. By Property 4 and stationarity (Property 2),

    (1 event in a given bin) T

    Pn

    .

    By independent increments (Property 3), the numbers in each bin are independent.

    Therefore, the total number of events is approximately a binomial distribution bin( , / )n p T n= . By

    the previous discussion on Poisson convergence, the total number of events in the interval isapproximately Poisson with mean np T= .

    Additional Poisson Properties

    Letn

    TTT ,,,21 be inter-event times for a Poisson Process (for between n& n-1events). Let

    nSSS ,,,

    21 be the times of each event (ordered in time). Then..

    1

    = NNN

    SST

    1011 SSST =

    =

    =N

    iiN

    TS1

    , and

    the following are equivalent

    ( ) NtNtStTN

    N

    ii

    =1

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    15/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Inter-event Times

    First, we derive the distribution of the time 1Tof the first event. 1( )P T t> is the probability that no

    events occur in [0, ]t . The number of events in [0, ]t is a Poisson RV with mean t . So,0

    1

    ( )( ) ( ( ) 0)

    0!t tt

    P T t P N t e e

    > = = = = .

    This is the CCDF of an exponential random variable. So,

    1 ~ exp( )T .

    Now, we derive the distribution of the second inter-event time 2T . First, we condition on the time of

    the first event:

    2 1( | )P T t T s> =

    1( ( ) ( ) 0 | )P N s t N s T s= + = = 0 events in ( , ]s s t+ , 1 event in [0, ]s

    ( ( ) ( ) 0)P N s t N s= + = by independent increments

    ( ( ) (0) 0)P N t N = = by stationary increments

    ( ( ) 0)P N t= = since ( ) 0N t =

    te = Since 2 1( | )

    tP T t T s e

    > = = does not depend on s, 2 2 1( ) ( | )

    tP T t P T t T s e

    > = > = = . So,

    2 ~ exp( )T and 2T is independent of 1T.

    We can continue with the same logic for 3 4, ,T T

    Definition 3: A Poisson process with rate is a counting process such that times between events arei.i.d. with distribution exp( ) .

    Conditional Distribution of Event Times

    Given: One event in [0, ]t , what is the distribution of 1T?

    11

    ( , ( ) 1)( | ( ) 1)

    ( ( ) 1)

    (1 event in [0, ],0 events in ( , ])

    ( ( ) 1)

    P T s N t P T s N t

    P N t

    P s s t

    P N t

    = = =

    =

    ==

    (1 event in [0, ]) (0 events in ( , ])

    ( ( ) 1)

    P s P s t

    P N t

    =

    = by independent increments

    1( )

    1

    ( )

    1!( )

    1!

    s t s

    t

    se e

    st t

    e

    = = .

    So, 1( | ( ) 1) s

    P T s N t t

    = = .

    This is the CDF of a uniform distribution on [0, t]. Thus, given one event in [0, t], its location isuniformly distributed in [0, t].

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    16/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    The general result (not proven here) is:

    Theorem 5.2: Given nevents in [0, t] (i.e., ( )N t n= ), the un-ordered event times 1 2, , , nS S S are

    distributed as i.i.d. uniform random variables on [0, t].

    Un-ordered means that the event times 1 2, , , nS S S are not listed in the order of occurrence (thatis, where 1 2 nS S S< <

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    17/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    OR / STAT 645: Stochastic Processes

    Lecture 3: Poisson Process

    Further Properties, Generalizations, and ApplicationsGiven: 9/14/2006

    Splitting a Poisson Process

    Problem Set-up / Assumptions: Let ( )N t be a Poisson process with rate .

    Each event is labeled:o Type-I with probabilityp,o Type-II with probability 1 p,

    Assignment of event types are i.i.d. Split Poisson process:

    o Let ( )IN t be the number of Type-I events by time t.

    o Let ( )IIN t be the number of Type-II events by time t.

    Proposition(5.2, p. 296). ( )IN t and ( )IIN t are independentPoisson processes with rates p and

    (1 )p , respectively.

    Proof. (0) 0IN = ( )IN t has stationary and independent increments.

    ( ( ) 2) ( ( ) 2) ( )IP N h P N h o h = ( ( ) 1) ( ( ) 1| ( ) 1) ( ( ) 1) ( ( ) 1| ( ) 2) ( ( ) 2)I I IP N h P N h N h P N h P N h N h P N h= = = = = + =

    ( ( )) ( ( ) 1| ( ) 2) ( )

    ( )

    Ip h o h P N h N h o h

    p h o h

    = + + = = +

    Proposition(5.3, p. 303): Same assumptions as above except:

    An event at time tis a type-ievent with probability ( )ip t , 1,2, ,i n= , where1

    ( ) 1n

    i

    i

    p t=

    = (independent of all else). Note: the splitting probability may depend on time.

    Let ( )i

    N t be the number of type-i events by time t.

    Then, ( )i

    N t are independent Poisson random variables, with0

    [ ( )] ( )t

    i iE N t p s ds=

    .

    Note: the split processes are not technically Poisson processes why?

    Corollary. If the splitting probabilities have no time dependence, then the split process areindependent Poisson processes with rate ip (or mean ip t ).

    Example

    Calls to a central office arrive according to a Poisson process with rate 20 = per min.

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    18/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    The probability that an arriving call is a voice call is 80%; the probability of a data call is20%, independent of all else.

    What is the probability that 100 or more voice calls and 50 or more data calls arrive in a 5 minuteperiod?

    Voice calls have a Poisson distribution with mean 20 0.8 5 80 = Data calls have a Poisson distribution with mean 20 0.2 5 20 = The two random variables are independent

    The answer is99 49

    80 20

    0 0

    80 201 1

    ! !

    i i

    i i

    e ei i

    = =

    .

    What is the probability there are more data calls than voice calls in a 5 minute period?

    80 20

    0 1

    80 20

    ! !

    i j

    i j i

    e ei j

    = = + .

    Example: Minimizing # of Encounters (Optional)

    Assumptions: Cars enter the highway according to a Poisson process with rate . Velocity of each car is constant, but chosen according to distribution G. Cars pass each other with no loss of time.

    Q: What speed should you travel to minimize the number of encounters?

    SolutionConsider a section of highway with length dand the following variables:

    Time Enter

    Highway

    Time on

    Highway Velocity

    You 0 /t d v= v Other car s /T d V= V

    The decision variable is v(or equivalently t).

    An encounter with this car occurs if: 0s< and T s t+ > (you pass the car) or, 0s> and T s t+ < (the car passes you)

    We classify all other cars into those involving an encounter with you and those not. A car arriving attime sis involved in an encounter with probabilityp(s):

    ( ) if 0( )

    ( ) if 0

    cF t s sp s

    F t s s

    where ( ) ( ) ( / ) ( / ) ( / )cF t P T t P d V t P V d t G d t = = = is the CDF of time spend by cars on

    this section of highway. (Note: ( ) 0F t s = when ( ) 0t s < ).

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    19/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Think of other cars arriving as a Poisson process and classifying cars by whether or not they have anencounter with you. By Poisson splitting, the number of cars (over all time) involved in an encounterwith you is a Poisson random variable with mean:

    0

    0

    ( ) ( ) ( )cp s ds F t s ds F t s ds

    = +

    (Note: we start counting time, for the Poisson splitting, at , rather than at 0)

    ( ) ( )t

    c

    t

    F s ds F s ds

    = + (change of vars)

    To minimize this mean, take the derivative with respect to tand set equal to 0:

    ( ) ( ) 0cF t F t + =

    This implies that ( ) ( )cF t F t = . In other words, tis the median of the travel times on the road.

    Equivalently, you should travel at the median velocity of all cars on the road.

    Application: M/G/Queue

    Notation M: Markovian or Memoryless arrival process (i.e., a Poisson process). G: General service time (not necessarily exponential) : Infinite number of servers

    Let ( )X t be the number of customers who have completed service by time t

    ( )Y t be the number of customers who are being served at time t

    ( )N t be the total number of customers who have arrived by time t

    Then, ( ) ( ) ( )N t X t Y t= + .

    Splitting the arrival process Fix a reference time T. Consider the process of customers arriving prior to time T. (i.e., assume t T ) Note:

    notation is slightly different than the book, p. 304. A customer arriving at t T is

    o Type-I if service is completed before T Occurs with probability ( )G T t

    o Type-II if customer still in service at T

    Occurs with probability ( )cG T t

    Since arrival times and service times are all independent, the type assignments are independent.Therefore, we can apply Proposition 5.3:

    ( )X T is a Poisson random variable with mean0 0

    ( ) ( )T T

    G T t dt G t dt = .

    ( )Y T is a Poisson random variable with mean0 0

    ( ) ( )T T

    c cG T t dt G t dt = .

    ( )X T and ( )Y T are independent.

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    20/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    What happens when T ?

    ( ) 1G t for large t. Therefore, ( )X T is a Poisson random variable with mean T .

    ( )Y T is a Poisson random variable with mean0

    ( ) [ ]T

    cG t dt E G (why does last equality

    hold?)

    Summary: Number of customers in service in an M/G/queue in steady state is a Poissonrandom variable with mean [ ]E G .

    Note: If / and 1/ [ ]E G = , then ( ) ~ Poisson( )X T .

    Example

    Suppose insurance claims arrive according to a Poisson process with rate 5 per day (Q: What typesof insurance claims can be modeled this way? Hurricane claims? Auto-accident claims?). Suppose

    the time it takes to process an insurance claim is uniformly distributed on [1 day, 7 days]. What isthe probability that there are no insurance claims being processed at a given moment?

    SolutionProcess can be modeled as an M/G/queue. Assumption made:

    Service times are independent There are a large number of agents, so that effectively the number of servers is infinite

    (i.e., no claim ever waits for service) System is in steady-state

    Under these assumptions, the numberXof customers in service is a Poisson random variable with

    mean [ ] 5 4 20E G = = . Thus, 20( 0)P X e= =

    Combining Poisson Processes

    If ( )IN t and ( )IIN t are independentPoisson processes with rates I and II , respectively, and if

    ( )N t counts the number of events in both processes, then ( )N t is a Poisson process with rate

    I II + .

    Why? Inter-event times in ( )N t are the minimum of inter-event times in ( )I

    N t and ( )II

    N t . Hence,

    inter-event times in ( )N t are exponential with rate I II + (using properties of the exponentialdistribution). Hence, ( )N t is a Poisson process with rate I II + .

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    21/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    Non-Homogeneous Poisson Process (NHPP)

    Properties1. (0) 0N = 2. ( )N t has independent increments.

    3. [ ]( ) ( ) 1 ( ) ( )P N t h N t t h o h+ = = +

    4. [ ]( ) ( ) 2 ( )P N t h N t o h+ =

    Notes: This is like a Poisson process, without the stationarity assumption In property 3, if we had just a constant , then we would have a regular Poisson process

    (stationarity is implied by properties 3 and 4).

    A process with the above properties is a NHPP with intensity(or rate)function ( )t .

    Def. The mean value function(for a NHPP) is

    0

    ( ) ( )t

    m t u du=

    Note: If ( )t = , then ( )m t t=

    Key PropertyFor a NSPP, ( ) ( )N t s N s+ (the number of events between sand t) is a Poisson random variablewith mean ( ) ( )m s t m s+ .

    Proof (p.316)

    Divide interval [ , ]s s t+ into nbins. Let iN be the number of events in the interval i.

    Index icorresponds to interval( 1)

    ,i t it

    s sn n

    + +

    Bin width is /t n

    Using assumed properties: ( 2) 0iP N . (Property 4)

    . . . . . . . .

    s s+t

    N(t+s) N(s)

    nbins

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    22/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    ( 1)iit t

    P N sn n

    = +

    . (Property 3)

    iN are independent. (Property 2)

    Then,1

    ( ) ( )n

    i

    i

    N s t N s N=

    + = . For nlarge, ( ) ( )N s t N s+ is the sum of a large number of

    independent, rare events. Thus, ( ) ( )N s t N s+ is approximately a Poisson random variable withmean:

    [ ]1 1

    ( ) ( ) [ ]n n

    i i

    i i

    E N s t N s E N E N= =

    + = =

    Now, [ ] ( 1)i iit t

    E N P N sn n

    = +

    1 1

    [ ] ( ) ( ) ( )s tn n

    i

    i i s

    it tE N s u du m s t m s

    n n

    +

    = =

    = + = +

    This graphically looks like:

    n

    its +Height =

    n

    tWidth =

    n

    its +Height =

    n

    its +Height =

    n

    tWidth =

    n

    tWidth =

    Example

    Consider a NHPP with rate10 0 0.5

    ( )20 0.5 1

    tt

    t

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    23/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    10 0 0.5( )

    20 5 0.5 1

    t tm t

    t t

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    24/74

    Stochastic Models

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8

    th, Academic Press.

    [ ( )] [ [ ( ) | ( )]] [ [ ( ) | ( )]]V X t V E X t N t E V X t N t = + Now,

    1

    [ ( ) | ( ) ] [ ]n

    i i

    i

    V X t N t n V Y nV Y =

    = = =

    So,[ [ ( ) | ( )]] [ [ ( ) | ( )]]V E X t N t E V X t N t +

    [ ( ) [ ]] [ ( ) [ ]]i iV N t E Y E N t V Y = + 2[ ] [ ]i itE Y tV Y = + 2 2 2[ ] ( [ ] [ ])i i itE Y t E Y E Y = +

    2[ ( )] [ ]iV X t tE Y =

    Example (similar to 5.26)

    People call Ticketmaster according to a Poisson process with rate 2= per minute. The number oftickets ordered per call is 1, 2, 3, or 4 with probabilities 1/6, 1/3, 1/3, and 1/6, respectively.

    What is the probability that at least 240 tickets are sold in the next 50 minutes?

    Let ( )N t be the number of calls by time t.

    Let iYbe the number of tickets sold for call i.

    Let ( )X t be the number of tickets sold by time t.

    Then, ( )X t is a compound Poisson process with( )

    1

    ( )N t

    i

    i

    X t Y=

    = , with:

    1 1 2 2 3 2 4 1 15 5

    ( ) 6 6 2iE Y

    + + + = = =

    2 2 2 2

    2 1 1 2 2 3 2 4 1 43( )6 6

    iE Y

    + + + = =

    ( ( )) ( ) (2)(50)(5 / 2) 250i

    E X t t E Y= = =

    2 2,150( ( )) ( ) (2)(50)(43/ 6)3

    iV X t t E Y = = =

    Since ( )N t is relatively large, ( )X t is approximately a normal random variable. Thus,

    (50) 250 240 250( (50) 240)

    2150 / 3 2150 / 3

    1 ( 0.3735) ( 0.3735)

    XP X P

    >

    = =

    0.6456=

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    25/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    OR / STAT 645: Stochastic Processes

    Lecture 4: Markov Chains, Discrete & Continuous TimeGiven: 9/21/2006

    Discrete-Time Markov Chain (DTMC)Let nX ( 0,1,2,n = ) be a stochastic process, taking on a finite or countable number of

    values (generally, assume that {0,1,2, }nX ).

    ( )X t is a DTMC if it has the Markov Property: Given the present, the future is

    independent of the past:

    1 1 1 1 1 0 0( | , , , , )n n n nP X j X i X i X i X i+ = = = = = = 1( | )n nP X j X i+ = =

    In this class, we assume thatn

    X is stationary. That is, 1( | )n nP X j X i+ = = does not

    depend on n. That is,1

    ( | )n n ij

    P X j X i p+

    = = . The DTMC is said to have stationary

    transition probabilities.

    Transition probabilities must satisfy 1ijj

    p = .

    Often write transition probabilities as a matrix P .Q: Do columns or rows sum to 1?

    Continuous-Time Markov Chain (CTMC)

    Let ( 0t ) be a stochastic process, taking on a finite or countable number of values

    (generally, assume that ( ) {0,1,2, }X t ).

    ( )X t is a CTMC if it has the Markov Property: Given the present, the future is

    independent of the past:( ( ) | ( ) , ( ) ( ), for 0 ) ( ( ) | ( ) )P X t s j X s i X u x u u s P X t s j X s i+ = = = < = + = =

    In this class, we assume that ( )X t is stationary. That is, ( ( ) | ( ) )P X t s j X s i+ = = does

    not depend on s, only on t. The CTMC is said to have stationary transition probabilities.

    Distribution of Time in a State

    Let iTbe the time spent in state i(before a transition). Suppose MC enters state iat time 0 MC remains in state ithru time s

    What is the probability that the MC remains in state ifor at least an additional ttimeunits?

    ( | )i iP T s t T s> + >

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    26/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    ( | ( ) )iP T s t X s i= > + = (by Markov property)

    ( | (0) )iP T t X i= > = (by stationarity)

    ( )i

    P T t= >

    Thus, iThas the memoryless property, so ~ exp( )i iT v .

    CTMC: Alternate Definition

    This gives an alternate definition for a CTMC:

    ( )X t is a CTMC if:

    1. The amount of time spent in state i (before a transition) is exponentially

    distributed with rate iv : ~ exp( )i iT v

    2. When the process leaves state i, it enters statejw.p. ijp .

    3. All transitions and times are independent (in particular, the transition probability

    out of a state is independent of the time spent in the state).

    Summary: the process moves from state to state according to a DTMC, and the time spentin each state is exponentially distributed.

    The transition probabilities ijp denote the embeddedDTMC.

    As before, 1ijj

    p = .

    But now, we require that 0iip = (otherwise, time spent in state iis not

    exponential)

    Def. The instantaneous transition rate from state itojis ij i ijq v p , where iv is the

    instantaneous transition rate out of state i.

    Note:

    ij i ij i

    j j

    q v p v= =

    ij ij

    ij

    ij i

    j

    q qp

    q v= =

    Thus, you can specify a CTMC with either { , }ij i

    p v or { }ij

    q .

    Example

    A company has 4 machines. The time until each machine breaks is exponentially distributed with mean 6 days. The repair time of each machine is exponentially distributed with mean 2 days. There is only one repair person. All random variables are independent.

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    27/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    Let ( )X t be the number of working machines at time t.

    The transition rates out of each state are (why?):

    0

    1

    2

    3

    4

    1/ 2

    1/ 6 1/ 2 2 / 3

    2 / 6 1/ 2 5 / 6

    3/ 6 1/ 2 1

    4 / 6 2 / 3

    v

    v

    v

    v

    v

    =

    = + =

    = + =

    = + =

    = =

    The transition probabilities for the embedded DTMC are (why?)0 1 0 0 0

    1/ 4 0 3/ 4 0 0

    0 2 / 5 0 3/ 5 0

    0 0 1/ 2 0 1/ 2

    0 0 0 1 0

    =

    P

    Or, define Markov chain using transition rates:1/ 2 0 0 0

    1/ 6 1/ 2 0 0

    0 2 / 6 1/ 2 0

    0 0 3/ 6 1/ 2

    0 0 0 4 / 6

    =

    Q

    (In a moment, we will define the rate matrix Q with non-zero elements on the diagonal.)

    Note: Often easier to construct Q first and then construct P and iv .

    DTMC: n-Step Transition Probabilities

    Def. n-step transition probability. Let nijP be the probability that the system is in statejin

    nsteps, given the system is in state inow.

    ( | )nij n k k P P X j X i+= = = .

    By stationarity,

    0( | )n

    ij nP P X j X i= = =

    Note: 1ij ijP p= (using our original notation).

    Chapman-Kolmogorov equations:

    0( | )n m

    ij n mP P X j X i+ += = =

    Must be at one of the possible states at time n:

    0( , |, )n m nk

    P X j X k X i+= = = =

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    28/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    Apply Bayes rule (easier to see if ignore 0X i= ):

    0 0( | , ) ( | )n m n nk

    P X j X k X i P X k X i+= = = = = =

    By Markov property:

    0( | ) ( | )n m n nk

    P X j X k P X k X i+= = = = =

    Thus,

    (*) n m m nij kj ik k

    P P P+ = .

    If ( )iP is a matrix of i-step transition probabilities, then (*) is matrix multiplication:( ) ( ) ( )n m n m+ =P P P

    Also, (1) =P P , So ( ) (1) (1) (1)n n= =P P P P P . In other words, the n-step transitionprobabilities are the elements in the matrix obtaining by raising P to the nth power.

    Example

    1 20

    0.3 1

    0.71

    1 20

    0.3 1

    0.71

    0 1 0

    0.3 0 0.7

    0 1 0

    =

    P

    What is 201P ? Should be 0.

    What is 202P ? Should be 0.7.

    Check: 20 1 0 0 1 0 0.3 0 0.7

    0.3 0 0.7 0.3 0 0.7 0 1 0

    0 1 0 0 1 0 0.3 0 0.7

    = =

    P

    CTMC: t-time Transition Probabilities

    Def. t-step transition probability. Let ( )ij

    P t be the probability that the system is in statej

    in ttime units, given the system is in state inow.( ) ( ( ) | ( ) )ijP t P X t s j X s i= + = =

    ( ( ) | (0) )P X t j X i= = = (by stationarity)

    Lemma 6.2

    1.0

    1 ( )lim ii ih

    P hv

    h

    = (rate process leaves i)

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    29/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    Proof. ( )iiP h Prob(0 transitions in time h) ( ) iv h

    iP T h e

    = > = . Thus,2( )

    1 12!1 ( )

    ( )

    ii

    iii

    v hv h

    P hv o h

    h h

    + + = = +

    2.0

    ( )lim ij ij i ijh

    P hq v p

    h= = (rate process goes from itoj)

    Proof. ( )ijP h Prob(transition before time hand transition is to statej)

    [1 exp( )] [1 (1 )]i ij i ij i ijhv p hv p hv p= =

    Lemma 6.3( ) ( ( ) |, (0) )

    ( ( ) , ( ) |, (0) )

    ij

    k

    P t s P X t s j X i

    P X t s j X t k X i

    + = + = =

    = + = = =

    Apply Bayes rule (easier to see if ignore (0)X i= ):

    ( ( ) | ( ) , (0) ) ( ( ) | (0) )k

    P X t s j X t k X i P X t k X i= + = = = = =

    By Markov property:

    ( ( ) | ( ) ) ( ( ) | (0) )k

    P X t s j X t k P X t k X i= + = = = =

    Thus,

    (*) ( ) ( ) ( )ij ik kjk

    P t s P t P s+ = .

    Forward Chapman-Kolmogorov Equations:Basic idea: Apply Lemma 2 using a small time step h:

    ( ) ( ) ( ) ( ) ( )ij ij ik kj ijk

    P t h P t P t P h P t + =

    ( ) ( ) [1 ( )] ( )ik kj jj ij

    k j

    P t P h P h P t

    =

    So,0 0

    ( ) ( ) [1 ( )] ( )( ) ( )

    ( ) lim limik kj jj ij

    ij ij k j

    ijh h

    P t P h P h P t P t h P t

    P th h

    +

    = =

    ( ) ( ) ( )ij kj ik j ij

    k j

    P t q P t v P t

    =

    Now, let us define: jj jq v= . Then, the previous expression becomes

    ( ) ( )ij ik kj

    k

    P t P t q =

    This is just matrix multiplication ( ) ( )t t =P P Q with

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    30/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    00 01

    10 11

    ( ) ( )

    ( ) ( ) ( )

    P t P t

    t P t P t

    =

    P

    ,0 01

    10 1

    v q

    q v

    =

    Q

    .

    Thus, we usually define the transition rate matrix Q with the negative diagonal elementsas described.

    The solution to the differential equation is2 3( ) ( )

    ( )2! 3!

    t t tt e t= = + + + +Q

    Q QP I Q

    (This is the matrix analog of solving ( ) axx ax x t Ce= = )

    This solution is valid provided iv is bounded. In particular, it works when the number of

    states is finite.

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    31/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    OR / STAT 645: Stochastic Processes

    Lecture 5: Markov Chains, Discrete & Continuous TimeGiven: 9/28/2006

    Classifications of States

    Def. A path is a sequence of states, where each transition has a positive probability ofoccurring.

    Def. Statejis reachable from state i(or i j ) if there is a path from itoj equivalently,

    0nijP > for some 0n .

    Def. States iandjcommunicate ( i j ) if iis reachable fromjandjis reachable from i.

    (Note: a state ialways communicates with itself)

    Def. A MC is irreducible if all states are in the same communication class.

    Def. State iis an absorbing state if 1iip = .

    Def. A set of states Sis a closed set if no state outside of Sis reachable from any state inS(like an absorbing state, but with multiple states)

    Def. State iis a transient state if there exists a statejsuch thatjis reachable from ibut iisnot reachable fromj.

    Def. A state that is not transient is recurrent. There are two types of recurrent states:1. Positive recurrent, if the expected time to return to the state is finite.2.

    Null recurrent (less common), if the expected time to return to the state is infinite(this requires an infinite number of states).

    Def. A state iis periodic with period 1k> , if kis the smallest number such that all pathsleading from state iback to state ihave a multiple of ktransitions.

    Def. A state is aperiodic if it has period 1k= .

    Def. A state is ergodic if it is positive recurrent and aperiodic.

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    32/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    Examples

    1 20 1 20

    Period = 2

    0

    12

    0

    12

    Period = 3

    0

    12

    0

    12

    Period = 1

    ( 1 2 3 411 11 11 110, 0, 0, 0P P P P= = > > )

    Communication Classes

    Properties of communication:1.

    i i (reflexivity)2. i j j i (symmetry)

    3. i j and j k i k (transitivity)

    These three properties partition the set of states into communication classes. Each class isdisjoint, and every state is contained in one class. Each class contains states thatcommunicate with each other. If there is only one state, the MC is irreducible.

    ExampleGamblers Ruin: You win $1 with probabilitypand lose $1 with probability 1-p. Youstop when you reach $0 or $N. For example, for 4N= ,

    1 20 3

    1-p 1-p 1-p

    pp1

    4

    p 1

    1 20 3

    1-p 1-p 1-p

    pp1

    4

    p 1

    Communications classes are:

    {0} recurrent {1, 2, 3} transient

    {4} recurrent

    Example

    1

    2

    0

    3

    1

    2

    0

    3

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    33/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    Communication classes are:

    {0, 1} transient

    {2, 3} recurrent

    Transient and Recurrent Classes

    Let1 if

    0 if

    n

    n

    n

    X iI

    X i

    ==

    . Then0

    n

    n

    I

    = is the total number of visits to state i. The expected

    number of visits to state i(given the MC starts in state i) is:

    0 0 0 ,

    0 0 0 0

    n

    n n n i i

    n n n n

    E I x i E I x i P x i x i P

    = = = =

    = = = = = = =

    Therefore, the state is recurrent if0

    n

    ii

    n

    P

    =

    = and transient if0

    n

    ii

    n

    P

    =

    < .

    Technical note: Switching the expectation and the infinite sum is allowed by themonotone convergence theorem(e.g., Durrett, Probability Theory and Examples, p. 14):

    If 0jY and jY Y , then ( ) ( )jE Y E Y . The proof is as follows. (For notational

    simplicity, assume all random variables are conditioned on 0x i= .)

    Let0

    j

    j n

    n

    Y I=

    = and0

    n

    n

    Y I

    =

    = . Then 0jY and jY Y , so the MCT can be used.

    The switching of the expectation and infinite sum is proved by:

    0 0 0 0

    [ ] lim [ ] lim lim ( ) ( )j j

    n n n j nj j j

    n n n n

    E I E I E I E Y E Y E I

    = = = =

    = = = = =

    Random Walk

    With probabilityp, we move up 1 step, with probability 1-p, we move down 1 step:

    -1 0-2 1

    1-p 1-p 1-p

    ppp

    2

    1-p

    p

    -1 0-2 1

    1-p 1-p 1-p

    ppp

    2

    1-p

    p

    Is this chain recurrent or transient?

    Probability of returning to state 0:

    2

    00

    2 2 !(1 ) (1 )

    ! !

    n n n n nn n

    P p p p pn n n

    = =

    Use Sterlings approximation: 1/ 2! 2n nn n e + .

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    34/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    ( )

    2 1/ 2 22

    00 21/ 2

    (2 ) 2(1 )

    2

    n nn n n

    n n

    n eP p p

    n e

    +

    + =

    2 1/ 22(1 )

    2

    nn n

    p pn

    +

    =

    4 (1 )n n np p

    n

    =

    State 0 is transient if 2001

    n

    n

    P

    =

    < or1

    4 (1 )n n n

    n

    p p

    n

    =

    < .

    If 1/ 2p= , then 2001 1

    1n

    n n

    Pn

    = =

    = = , so state 0 is recurrent.

    If 1/ 2p , then 2001 1 1

    nn n

    n n n

    aP a

    n

    = = =

    = < < , where 1a < , so state 0 is transient.

    Note: A 2-dimensional symmetric random walk is recurrent. However, a 3-dimensional(or higher) symmetric random walk is transient.

    Limiting Probabilities (DTMC)

    Theorem. For an irreducible, ergodic MC, lim nj ijn

    P

    exists and is independent of the

    starting state i. Thenj

    is the unique solution of j i iji

    P = and 1jj

    = .

    Proof. Using law of total probability:

    1 1( ) ( ) ( )n n n ni

    P X j P X j X i P X i+ += = = = = .

    Taking limits of both sides as time :

    1 1lim ( ) ( ) ( )n n n nn

    i

    P X j P X j X i P X i+ +

    = = = = =

    .

    j ij i

    i

    P = .

    In matrix form, this theorem can be stated:

    = P

    Two interpretations for i :

    1. The probability of being in state ia long time into the future (large n).2. The long-run fraction of time in state i.

    If the MC is irreducible and ergodic, then interpretations 1 and 2 are equivalent.

    Otherwise,i is still the solution to = P , but only interpretation 2 is valid.

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    35/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    Example 1

    0 1

    1

    10 1

    1

    1

    [ ] [ ]

    [ ] [ ]

    0 1 0 1

    0 1 1 0

    1 0

    0 1

    P

    =

    =

    =

    0 1

    0 11 0.5ii

    =

    = = =

    Chain is irreducible and positive recurrent, but not aperiodic. Thus, interpretation 1 is not

    valid. In particular, 2 2 100 001, 0n nP P += = , for integer n.

    Example 2 Planes arrive at Dulles airport.

    Three types: Heavy, Large, Small Assume the sequence of airplanes follows a MC

    0

    12

    0.3

    0.8

    0.3

    0.2

    0.4

    0.7

    0.3

    0

    12

    0.3

    0.8

    0.3

    0.2

    0.4

    0.7

    0.3

    3.07.00

    4.03.03.0

    08.02.0

    S

    L

    H

    SLH

    SLH

    SLHS

    SLHL

    SLHH

    ++=++=

    ++=

    ++=

    1

    3.04.00

    7.03.08.0

    03.02.0

    One equation is redundant, eliminate complicated equation: 0.8 0.3 0.7L H L S = + +

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    36/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    3

    8

    4

    7

    H L

    S L

    =

    =

    So,3 4

    1 18 7 L

    + + =

    0.193

    0.514

    0.294

    H

    L

    S

    =

    =

    =

    Example 3: GoogleThe following Markov Chain is motivated by the Google search engine.

    Consider the following MC:

    States are web pages

    Randomly choose a new page from available links (w.p. 1 / nwhere nis thenumber of links on current page)

    Page rank is determined by j , the overall fraction of visits to pagej.

    Note: Page rank is boosted by

    Many links to the site

    Having the pages which link to the site have a high page rank themselves

    Some issues:

    Web pages with no links (absorbing states)

    Web pages with circular links (absorbing communication class)

    Solution: At each site,

    With probabilitypchoose a random web page from allweb pages With probability 1 pchoose a random web page from existinglinks

    Limiting Probabilities (CTMC)

    Let lim ( )ij j

    tP t P

    = (assume no dependence on i).

    Using Chapman-Kolmogorov forward equations (recall, we defined jj jq v= ):

    ( ) ( )ij ik kikP t P t q = (in matrix form: ( ) ( )t t =P P Q )

    lim ( ) ( )ij ik kit

    k

    P t P t q

    =

    Now, assuming that limit exists, ( )ijP t must go to zero, since probabilities are bounded by

    0 and 1. Therefore, ( )ik k

    P t P (assuming limit does not depend on initial state i)

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    37/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    0 k kik

    P q=

    In matrix notation, this is

    0 P= Q

    , where

    [ ]0 1 2P P P P=

    and

    0 01 02

    10 1 12

    20 21 2

    v q q

    q v q

    q q v

    =

    Q

    Remarks: We have assumed that the limiting probabilities iP exist (and do not depend on

    the initial condition). A sufficient condition for this is: The MC is positive recurrent andirreducible (note: dont need aperiodic as in CTMC).

    Interpretation of this equation:

    0 P= Q

    0 j kj j jk j

    P q P v

    =

    j j j kj

    k j

    P v P q

    =

    The left-hand side is the rate of transitions out of statej.The right-hand side is the rate of transitions into statej.

    Example

    3 machines, time to failure ~ exp(1)

    2 service workers, time to repair ~ exp(8)

    [ ]0 1 2 3

    16 16 0 0

    1 17 16 00

    0 2 10 8

    0 0 3 3

    P P P P

    =

    0 1

    1 0

    16 0

    16

    P P

    P P

    + =

    =

    0 1 2

    2 1 0

    16 17 2 0

    8 128

    P P P

    P P P

    + =

    = =

    2 3

    3 2 0

    8 3 0

    8 1024

    3 3

    P P

    P P P

    =

    = =

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    38/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    0 1 2 3 0

    10241 16 128

    3P P P P P

    + + + = + + +

    0

    1

    2

    3

    30.00206

    1459

    480.03290

    1459

    3840.26319

    1459

    10240.70185

    1459

    P

    P

    P

    P

    = =

    = =

    = =

    = =

    Example: M/M/1 Queue

    1 20 3

    4

    1 20 3

    4

    ( )( )

    ( )

    0 0

    0

    0

    0 0

    Q

    + = + +

    [ ] [ ] ( )

    ( )

    ( )

    ( )

    ( )

    ( )

    ( )

    0 1 2

    0 1 1 0

    0 1 2

    1 2 3

    1 1 2

    1 2 2 1

    0

    0

    0 0 00

    0

    0

    0

    0

    1ii

    P

    P P P

    P P P P

    P P P

    P P P

    P P P

    P P P P

    P

    Q

    =

    + = = +

    + = =

    + + =

    + + =

    + + =

    + =

    =

    Generalizing this we get:

    0PP

    i

    i

    =

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    39/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    =

    =

    1

    10

    i

    iP

    P

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    40/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    OR / STAT 645: Stochastic Processes

    Lecture 6: Markov Chains, Applications, Branching Processes

    Limiting Probabilities (Review)

    DTMC

    = P and 1ii

    = .

    Sufficient conditions for limiting probabilities and uniquesolution to exist:irreducible and ergodic.

    CTMC:

    0 P= Q and 1i

    i

    P=

    Sufficient conditions for limiting probabilities and uniquesolution to exist:irreducible and positive recurrent.

    Since (under given assumptions) the solution is unique, if you can guess i or iP , andthen solve the above equations, then i or iP are the limiting probabilities.

    CTMC Example: Tandem Queue

    ratein

    Queue#1

    1

    Queue#2

    2

    ratein

    Queue#1

    1

    Queue#1

    1

    Queue#2

    2

    Queue#2

    2

    Assumptions

    Exponential inter-arrival times Exponential service times

    All times independent

    To model as a CTMC, choose a 2-dimensional state space: ( ) ( , )X t a b= , where

    ais the number at queue 1 (including any customer in service)

    bis the number at queue 2 (including any customer in service)

    For a Markov chain like this, it is hard to write out the transition matrix Q , because thestate space is 2-dimensional. Instead, we write out the rate balance equationsfor eachstate.

    Let ,a bP be the limiting probability of being in state (a, b). The rate balance equations are:

    1. Node ( , )a b , where , 1a b :

    1 2 , 1 1, 1 2 , 1 1,( ) a b a b a b a bP P P P + + + + = + +

    2.

    Node (0, )b , where 1b :

    2 0, 1 1, 1 2 0, 1( ) b b bP P P ++ = +

    3. Node ( ,0)a , where 1a :

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    41/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    1 ,0 2 ,1 1,0( ) a a aP P P + = +

    4. Node (0,0) :

    0,0 2 0,1P P =

    These equations are based on the following rate diagram.

    Now, we guess the form of ,a bP and show that it satisfies all of the equations above.

    Clearly, the first queue operates as anM/M/1 queue. Recall, the limiting probabilities for

    anM/M/1 queue are:

    1

    n

    nP

    =

    Conjecture that the second queue operates as an independentM/M/1 queue. Queue 1: Arrival rate = , Service rate = 1

    Queue 2: Arrival rate = , Service rate = 2

    Thus, the joint distribution is:

    ,

    1 1 2 2

    1 1

    a b

    a bP

    =

    We can regard the terms that do not depend on aor bas normalizing constants:

    ,

    1 2

    a b

    a bP C

    =

    Check that this solves the above equations.

    Equation (1): 1 2 , 1 1, 1 2 , 1 1,( ) a b a b a b a bP P P P + + + + = + +

    0,00,0

    0,10,1

    0,20,2 1,21,2

    1,11,1

    1,01,0

    2,22,2

    2,12,1

    2,02,0

    1

    11

    11

    1

    11

    2

    2

    22

    2

    2 2

    2

    2

    Eq. 1

    Eq. 3

    Eq. 2

    Eq. 4

    Number in Queue 1

    Number in

    Queue 2

    0 1 2

    0

    2

    1

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    42/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    Plugging in:1 1 1 1

    1 2 1 2

    1 2 1 2 1 2 1 2

    ( )

    a b a b a b a b

    C C C C

    + + + + = + +

    Dividing by1 2

    a b

    :

    2 11 2 1 2

    1 2

    ( )

    + + = + +

    1 2 2 1 + + = + +

    Also need to check for other equations, but we omit that here.

    Summary: The steady-statedistribution for the number in each queue is as if the 2 queuesare independentM/M/1 queues.But, the second queue is not really independent of thefirst

    DTMC Example: Family Genetics

    Consider left and right handed people. The book What to Expect the Toddler Yearsprovides the following probabilities of having left-handed children based on thehandedness of the parents:

    Parents Prob (Left-handed Child)

    LL a= 0.50LR b= 0.17RR c = 0.02

    Using this information, what is the fractionpof left-handed people from this data?

    Let nX be the handedness of the first born of the nth generation. ( { , }nX L R ).

    A left-handed child Marries a left-handed spouse with probabilityp

    o Has a left-handed kid with probability ao Has a right-handed kid with probability 1 a

    Marries a right-handed spouse with probability 1 po Has a left-handed kid with probability bo Has a right-handed kid with probability 1 b

    A right-handed child Marries a left-handed spouse with probabilityp

    o

    Has a left-handed kid with probability bo Has a right-handed kid with probability 1 b

    Marries a right-handed spouse with probability 1 po Has a left-handed kid with probability co Has a right-handed kid with probability 1 c

    Thus, the transition matrix

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    43/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    L R

    L (1 ) 1 [ (1 ) ]

    R (1 ) 1 [ (1 ) ]

    pa p b pa p b

    pb p c pb p c

    + + +

    Now, solve = P

    [ (1 ) ] [ (1 ) ]L L R

    pa p b pb p c = + + + ,

    where L represents the probability of being left-handed for large n- by definition,p.

    Also, we have 1R L = .

    [ (1 ) ] (1 )[ (1 ) ]p p pa p b p pb p c= + + + 2 22 (1 ) (1 ) ]p p a p p b p c= + +

    20 ( 2 ) (2 2 1)a b c p b c p c= + + + 20 0.18 0.70 0.02p p= +

    .70 .49 4(.02)(.18)

    .36p

    =

    3.86,0.03p=

    Choose the value ofpthat is a probability, so 0.03p= .

    Actual percentage of left-handed people is about 10%. What explains the incorrect valueofp?

    Not a Markov chain next state also depends on handedness of grandparents,great-grandparents?

    Other factors influencing next state Probability two left-handed parents have a left-handed child >> 50% (value

    presented in book was approximate)

    Branching Process

    Consider a population.

    Each individual producesjnew offspring each period with probability , 0j

    p j .

    Assume that 1jp < for allj(i.e., the problem is not deterministic).

    Let nX be the size of the population at period n.

    Usually modeled as a DTMC. The states of a Markov chain are:

    0X

    1X

    2X

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    44/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    0 1 2 3 40 1 2 3 4

    How many communication classes? State 0 is absorbing.

    All other states are transient (assuming 0 0p > ), since one can get to 0 from any

    state, but one can not get to that state from 0. In other words, it is possible that allindividuals fail to produce any offspring during the same time period.

    Nevertheless, it is possible that the MC chain has an infinite positive drift to theright (in other words, every state is transient, but you dont have to end up in state0).

    Let be the average number of offspring per individual. That is,

    0j

    j

    jp

    =

    =

    If 1 , the system will always end up in state 0 (the population dies out).

    If 1 > , the system may end up in state 0 or may grow to infinity.

    Fundamental question: What is the probability the population survives indefinitely?

    Let iZ be the number of offspring of the ith individual from generation n 1. Then,

    1

    1

    nX

    n i

    i

    X Z

    =

    =

    1 10

    [ ] [ | ] ( )n n n nk

    E X E X X k P X k

    =

    = = =

    1

    10 1

    ( )nX

    i n

    k i

    E Z P X k

    = =

    = =

    10

    ( )nk

    k P X k

    =

    = =

    1[ ]nE X =

    Thus, (if start with one individual),

    0[ ] 1E X =

    1[ ]E X = 2

    2[ ]E X =

    [ ] nnE X =

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    45/74

    Lecture by John Shortle, partially transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003,Introduction to Probability Models, 8th, Academic Press.

    Let 0 be the probability that the population dies out.

    Condition on 1X :

    0 1 10

    (population dies out | ) ( )j

    P X j P X j

    =

    = = =

    (*) 0 00

    j

    j

    j

    p

    =

    =

    When 1 > , it can be shown that 0 is the smallest positive number satisfying (*).

    ExampleSuppose:

    0 0.3p =

    1 0.3p =

    2 0.4p =

    From this data, 1.1 = .2

    0 0 00.3 0.3 0.4 = + + 2

    0 00 0.3 0.7 0.4 = +

    0

    .7 .49 .48 .7 .1 3

    .8 .8 4

    = = =

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    46/74

    Lecture by John Shortle, transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003, Introduction toProbability Models, 8

    th, Academic Press.

    OR 645: Stochastic Models II

    Lecture 7: Markov Chains, Birth/Death Processes, Reversible Chains

    Birth Death Process

    DTMC Birth-Death Process: LetX(t)be a DTMC variable only possible transitions are up (+1) and down (-1) like a random walk.Births are represented by +1. Deaths are represented by -1. Births and Deaths occur one at a time.

    CTMC Birth Death Process: LetX(t)be a CTMC variable represents population size at time t(p353). ipeople give birth with ratei

    (arrivals). ipeople have a death with rate i(departures). CTMC Characteristic s: (a) time in state (b) embedded DTMC.

    +

    ++

    ++

    =

    000

    00

    00

    0010

    33

    3

    22

    2

    22

    2

    11

    1

    11

    1

    P

    Probability Matrix

    ( )( )

    +

    +

    =

    3

    1222

    1111

    00

    00

    0

    0

    00

    Q

    Transfer Rate Matrix

    1 20 3

    1

    2

    3

    210

    4

    3

    1 20 3

    1

    2

    3

    210

    4

    3

    For M/M/3 Queue

    =

    =

    =

    =

    =

    4

    3

    2

    1

    0

    3

    3

    3

    2

    5

    4

    3

    2

    1

    =

    =

    =

    =

    =

    Expected Time to State n: Let Tibe the time to first get to i+1starting at i. Condition on 1ststep. Let

    +

    +=

    1isstep1stif0

    1isstep1stif1iI .

    If we move from 23:

    [ ]ii

    iiITE +

    ==1

    1

    which occurs with probability:

    [ ]ii

    iiIP

    +=

    If we move from 213:

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    47/74

    Stochastic Models

    Lecture by John Shortle, transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003, Introduction toProbability Models, 8

    th, Academic Press.

    [ ] [ ] [ ]iiii

    ii TETEITE +++== 1

    10

    which occurs with probability:

    [ ]ii

    iiIP

    +=

    Unconditionally: [ ] [ ][ ]iii ITEETE = [ ]

    [ ] [ ]

    [ ] [ ] [ ][ ]

    [ ] [ ]1

    1

    1

    1

    1

    11

    +

    =

    +

    ++

    +=

    +

    +

    +

    ++

    +

    +

    =

    i

    i

    i

    i

    i

    ii

    ii

    i

    ii

    i

    ii

    ii

    i

    iiii

    i

    ii

    i

    TETE

    TETETE

    TETE

    TE

    where the initial condition [ ] 00 1=TE Example: (HW 6.20) There are two machines, one of which is used as a spare. A working machine will function for an exponential time

    with rateand will then fail. Upon failure, it is immediately replaced by the other machine if that one is in working order, and it goes tothe repair facility. The repair facility consists of a single person who takes an exponential time with rateto repair a failed machine. Atthe repair facility, the newly failed machine enters service if the repairperson is free. If the repairperson is busy, it waits until the othermachine is fixed; at that time, the newly repaired machine is put in service and repair begins on the other one. Starting at both machinesworking, find the expected value and variance of the time until both are in the repair facility. In the long run, what proportion of time isthere a working machine?

    1 20

    # in repair

    1 20

    # in repair

    [ ]

    [ ] [ ]

    [ ]

    [ ] [ ] [ ]

    [ ]

    +=

    +=

    +=

    +=

    =

    22

    102

    21

    01

    0

    2

    1

    1

    1

    TE

    TETETE

    TE

    TETE

    TE

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    48/74

    Stochastic Models

    Lecture by John Shortle, transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003, Introduction toProbability Models, 8

    th, Academic Press.

    Variance in Time to State n:

    [ ] [ ][ ] [ ][ ][ ] [ ][ ] [ ][ ]iiiii ITVEITEVTV

    YXVEYXEVXV

    +=

    +=

    where

    [ ]

    =

    =+

    +

    =

    0if

    1if01

    i

    i

    ii

    ii

    If

    IITE

    where

    [ ] [ ]

    ii

    i

    ii

    i

    1

    yprobabilitwith0

    yprobabilitwith1

    +=

    +=

    +=

    i

    i

    ii

    I

    I

    TETEf

    Now note that [ ] [ ]ii TETEf += 1 is a Bernoulli Variable of a binomial for which [ ] [ ] [ ]22 XEXEXV = is turned into

    [ ] ( )ppApApAXV == 12222 .

    Also note that the term ( )ii +1 is a constant and therefore will contribute 0to the variance. So now, we have.

    [ ][ ] [ ] [ ][ ] pqTETEITEV iiii2

    10 ++=

    whereAoccurs with probabilityp; and 0occurs with probability (1 - p) = q.

    [ ]2

    11

    +==

    ii

    iiITV

    with probabilityp.

    [ ] [ ]1plus1plus1statetotime0 +== iiii-i-VITV ii with probability q = 1 - p.

    [ ] [ ] [ ]iiiiii TVTVITV ++

    +==

    1

    2

    10

    with probability q = 1 - p.

    [ ] [ ] [ ]

    ++

    ++

    += ii

    iiii

    ii TVTVqITV 1

    22

    11

    and the initial condition is : [ ]2

    0

    0

    1

    =

    TV

    Note: In Regular Markov Chain, we had .

    [ ]iXjXPP nnij === 1

    .. using Bayes Formula [ ][ ]

    [ ] [ ][ ]iXP

    jXPjXiXPQ

    iXP

    iXjXPQ

    n

    nnn

    ij

    n

    nnij

    =

    ====

    =

    ===

    11

    1 ,

    (Note: Dont confuse this with CTMCs ijq ). Assume that this is ergodic, irreducible Markov Chain. Let n . Now

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    49/74

    Stochastic Models

    Lecture by John Shortle, transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003, Introduction toProbability Models, 8

    th, Academic Press.

    a.

    i

    jij

    ij

    PQ

    = , This is always true.

    b. A chain is time reversible if ijij PQ = . This is only true for a Time Reversible MC

    Given the following sequence

    2 1 2 3 1 2 2 1 3 1 2 1 3 2 1 3

    What if we had to estimate 12 pattern occurrence?21

    6312 =P . And in the reverse sequence, we get

    2 1 2 3 1 2 2 1 3 1 2 1 3 2 1 3

    3

    2

    6

    412 =Q . In other words, NOT EVERY CHAIN is Time Reversible. Markov Chain is Time Reversible if number of transactions

    from ijequals the number of transactions going fromji. Expressing

    i

    jij

    ij

    PQ

    = as

    jijiji PP =

    We can see that the transition rate ijmust equal the transition rateji.

    Example: The following chain is not Time Reversible; we can go 13 but we cannot go 31.

    1

    23

    1

    23

    Example: The following is chain has the following movement:

    1 20 1 20

    0 1 2 1 0 1 0 1 2 1 0

    The forward chains sequence will always be within one of the reverse chains sequence.

    Conclusion: Birth Death process is Time Reversible.

    Theorem: If you can find i that satisfies jijiji PP = and 1=i

    i , then the chain is Time Reversible and the i s are the

    limiting probabilities. In other words, if you can guess a solution to jijiji PP = that solves it, then this Markov Chain is TimeReversible.

    Example: Given the following Random Walk:

    1 20 3

    p p p

    qqq

    1 20 3

    p p p

    qqq

    Is this Time Reversible Markov Chain? YES. Typical approach is:

    1. 10 q=

    2. 21201 qpq =+=

    3. 32312 qpqp =+=

    Time Reversible (Quess) approach: Let i = 0&j = 1and using jijiji PP = we get

    1. ( ) 10 1 q=

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    50/74

    Stochastic Models

    Lecture by John Shortle, transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003, Introduction toProbability Models, 8

    th, Academic Press.

    2. 21 qp =

    3. 32 qp =

    Cut Method: Using midterm exam problem to illustrate: Given the following Markov Chain:

    1 20 3

    3/10 2/10 1/10

    111/2

    1 20 3

    3/10 2/10 1/10

    111/2

    Cut across both paths. Rate you cross from 12must equal rate of crossings going 21. This is reversible by inspection has to be

    true for every possible combinations. Key Point: jijiji PP = and 1=i

    i .

    Example: consider the midterm problem represented in the above figure. Go from 01& 10.

    01105

    3

    2

    1

    10

    3

    =

    =

    = jijiji PP

    Go from 12& 21.

    ( ) 0122125

    3

    5

    11

    10

    2 ===

    Go from 23& 32.

    ( ) 02332250

    3

    10

    11

    10

    1 ===

    433

    3,

    433

    30,

    433

    150,

    433

    250

    250

    3

    25

    31

    1

    3210

    000

    210

    ====

    +

    +=

    ++==

    i

    i

    Birth Death processes are Time Reversible. A Time Reversible process may be Birth Death process (i.e., there are other MarkovChains that are reversible besides Birth Death).

    Example: consider the midterm problem represented in the above figure. Go from 01& 10.

    2

    1 20 3 4 5

    3 3 3

    2

    1 20 3 4 5

    3 3 3

    Assume: arrival rate = & service rate = for a single server.

    0110 PPPP

    PP jijiji

    ==

    =

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    51/74

    Stochastic Models

    Lecture by John Shortle, transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003, Introduction toProbability Models, 8

    th, Academic Press.

    02

    2

    2

    01221

    2

    222

    PP

    PPPPP

    =

    ===

    04

    4

    3

    02

    2

    2332

    3!3

    2333

    PP

    PPPPP

    =

    ===

    and for any 3>n , 033!3PP

    nn

    n

    n

    =

    Proposition 6.8 (p.381) A Time Reversible chain with limiting probabilities SjPj , , that is truncated to the set SA and remains

    irreducible is also Time Reversible and has limiting probabilitiesA

    jP given by

    =Ai

    i

    jAj

    P

    PP , Aj

    1 20 3 4 5

    Eliminate these

    states to create

    the truncated

    Markov Chain

    1 20 3 4 5

    Eliminate these

    states to create

    the truncated

    Markov Chain

    This is a renormalization, given that we have thrown away some states (see figure below), so that in the new truncated Markov Chain

    1=i

    i and the individual probabilities will have the same relationship relative to each other. Renormalization Constant is: Ai

    iP .

  • 8/10/2019 Read Me at Last After Markov Chain (Summary)

    52/74

    Stochastic Models

    Lecture by John Shortle, transcribed by James LaBelle, based on the class textbook: Ross, S. M., 2003, Introduction toProbability Models, 8

    th, Academic Press.

    Example: M/M/3/3 (from the previous example)

    =

    =

    3

    0

    0

    04

    4

    3

    !

    1

    3!3

    jj

    j

    Pj

    P

    P

    Where the Renormalization Constant is:

    =

    3

    0

    0!

    1

    jj

    j

    Pj

    In general, the Blocking Probability for M/M/C/C Queue is