foss lecture1

Upload: jarsen21

Post on 03-Jun-2018

242 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Foss Lecture1

    1/32

    Lectures on Stochastic Stability

    Sergey FOSS

    Heriot-Watt University

    This mini-course presents an overview of stochastic stability methods, mostly moti-vated by (but not limited to) stochastic network applications. We work with stochastic

    recursive sequences, and, in particular, Markov chains, in a general Polish state space. We

    discuss and compare methods based on (i) Lyapunov functions, and fluid limits, (ii) ex-

    plicit coupling (renovating events and Harris chains), (iii) monotonicity, and some others.

    We also discuss instability methods and perfect simulation methods.

    Lectures are based on handouts of my lecture notes (Colorado State Uni, 1996; Novosi-

    birsk State Uni, 19972000; Kazakh National University, 2007), on the joint overview pa-

    per with Takis Konstantopoulos (2004), on notes written by us for a Short LMS/EPSRC

    Course for PhD students (September 2006), and on some (more-or-less) recent publica-tions.

    1

  • 8/12/2019 Foss Lecture1

    2/32

    Table of Topics

    1. Introduction.

    2. Lyapunov techniques. Criteria for Positive Recurrence and for Instability.3. Fluid Approximation Approach.

    4. Coupling and Harris Chains.

    5. Monotonicity and Saturation Rule.

    6. Renovation Theory, Perfect Simulation.

    7. Some intriguing open problems.

    2

  • 8/12/2019 Foss Lecture1

    3/32

    1 Lecture 1. Basic Tools.

    1.1 Notation, Acronyms, and Basic Concepts

    R.v. random variable

    i.i.d. independent identically distributed

    X , Y , Z , , , , . . . for r.v.s

    F, G distribution functions, f density function

    P probability and probability measure, E expectation, D variance

    =F means P(x) =F(x)for all x

    =P means P(B) =P(B), B B.

    I(A), or1(A) the indicator function of event A, I(A) = 1 ifA occurs, and .I(A) = 0,

    otherwise.

    Here are standart families of distributions:

    U[a, b] G(p)

    E() B(m, p)

    N(a, 2) ()

    Convergence:

    na.s. means P(lim n = ) = 1, or > 0, P(supmn |m |> )0

    asn .

    np

    means > 0, P(|n |> ) 0 as n .

    The same for random vectors.

    3

  • 8/12/2019 Foss Lecture1

    4/32

    Key Properties of Convergence. Let mean either a.s.or

    p.

    (1) Ifn andn , then (n, n) (, )

    (2) Ifn and ifg is a continuous function, then g(n) ().

    (3) More generally, assume that g is not continuous everywhere and denote by Dg a setof its discontinuity points. Ifn and ifP( Dg) = 0, theng(n)().

    Weak convergence of distribution functions: Fn F, if, for each x such that F(x)is

    continuous in x,

    Fn(x) F(x).

    Equivalent form: Fn F if, for any bounded and continuous function g , g(x)dFn(x)

    g(x)dF(x).

    Comment on terminology: Weak convergence is the most common term. Otherterms are convergence of/in distribution(s) and convergence in law.

    Weak convergence of random variables: n . It means: n = Fn, = F and

    Fn F.

    Note that n is just a convenient notation ! There is no any real convergence of

    random variables on sample paths.

    Relations between convergence types:

    na.s. implies n

    p and n

    p implies n .

    Both converse statements are incorrect. Here are two examples:

    Example 1. Weak convergence does not imply convergence in probability. LetP(1 =

    1) =P(1= 1) = 1/2 and n+1= n, n = 1, 2, . . ..

    Example 2. Convergence in probability does not imply a.s. convergence. Let ,F, P

    = ((0, 1], B(0,1], )where is the Lebesgue measure. Let0 1. Let, for m= 1, 2, . . .,

    for n such that 1 + 2 +. . .+ 2m1 < n 1 + 2 +. . .+ 2m1 + 2m, and for i =

    n (1 + 2 + . . . + 2m1),

    n() = 1 if ((i 1)/2m, i/2m) and n= 0, otherwise.

    4

  • 8/12/2019 Foss Lecture1

    5/32

    Laws of Large Numbers.

    If, 1, 2, . . . are i.i.d. random variables with a finite mean, saya = E, then the

    Weak Law of Large Numbers (WLLN) says:

    Sn/n pa as n

    and the Strong Law of Large Numbers (SLLN) says

    Sn/na.s.a as n .

    Lebesgue and Beppo Levy Theorems.

    Theorem (Beppo Levy). If{n} is a.s. non-negative and non-decreasing sequence

    of random variables, then

    E limn

    n = limn

    En

    where both sides are either finite or infinite simultaneously.

    5

  • 8/12/2019 Foss Lecture1

    6/32

    Coupling.

    is a copy of they have the same distribution D=. In general, and

    may be defined on different probability spaces.

    (a) Coupling of distribution functions (d.f.) or of probability measures.

    For two d.f.s F1 andF2, theircouplingis a construction of a two-variate distribution

    functionF(x1, x2)such that F(x1, ) =F1(x1)and F(, x2) =F2(x2).

    Similarly, for two probability measures, P1 andP2 on the real line, their coupling is a

    probability measure on the plane P(), such that its projections areP1 andP2.

    The same definitions of coupling may be introduced for any number of distributions

    (distribution functions, probability measures).

    Such a coupling may also be viewed as follows: we define a probability space ,

    F,

    Pand two random variables 1 and 2 on this space such that 1 = F1 and 2 = F2 (or,

    in other notation, 1 = P1 and 2 = P2). Then their joint distribution, say F, has

    marginals F1 andF2 (or, equivalently, a probablity measure P(B) =P((1, 2) B)has

    martinals P1 andP2).

    (b) Coupling of two random variables.

    Let1 be defined on 1,F1, P1and 2 be defined on 2,F2, P2.

    A coupling of these two r.v.s is defined by, first, an introduction of a new probability

    space, say,F, Pand, then, by defining a pair of two r.v.s1,2 on this space suchthat1 D=1,2 D=2.Examples:

    (1)F1= U(0, 1),F2 = U(0, 1);

    (2)F1= U(0, 1),F2 = E(1);

    (3)F1= U(0, 1),F2 = (1);

    (4)F1= B(n, p), F2= (np);

    (5)F1 has a density 2xI(x (0, 1)and F2 a density 2(1 x)I(x (0, 1).

    In each example, there are many couplings !

    6

  • 8/12/2019 Foss Lecture1

    7/32

    1.2 Weak and strong convergence

    Lemma 0.

    IfFn F (allFn andFare d.f.s), then a coupling of{Fn}andF:

    n a.s. .

    Proof. For a d.f. F, define its inverse F1 by

    F1(z) = inf{x :F(x) z}, z (0, 1).

    Let =(0,1), F be the -algebra of Borel subsets in (0,1), and P the Lebesgue

    measure on (0,1).

    Set() =, . Then =U(0, 1).

    Letn= F1

    n (), = F1()and shown

    a.s.. Note thatn=Fn, =F

    In order to avoid some technicalities, assume, for simplicity, that all d.f. are continuous.Let

    n

    = infmn

    m, n= supmn

    m, Fn= supmn

    Fm, Fn= infmn

    Fm

    Thenn

    =Fn,n=Fn.

    Indeed,

    P(n

    x) = P(n

    < x) =P(m n : m< x) =

    = P(m n: F1m ()< x) =P(m n : < Fm(x)) =

    = P( < supmn

    Fm(x)) = Fn(x)

    Similarly,P(n> x) =. . .= 1 Fn(x).

    SinceFn F andFn F(by definition), it is sufficient to show that, for instance,

    n

    a.s..

    But both {Fn} and {n} are monotone as a function ofn!

    Then n

    a.s. and, therefore, there exists such that n

    a.s. Then

    a.s.

    IfP(=)>0, then there exists x:

    P( x)>P( x).

    ButP( x) =F(x) = lim Fn(x) P( x)!

    Thus, we got a contradiction, and n

    a.s. By similar arguments, n a.s.

    Therefore, n a.s.

    Problem No 1. Prove this lemma without the additional

    assumption that all d.f.s are continuous

    Exercises: What is F1 for the following distribution functions:

    U(0, 1),E(), N(0, 1), B(1, p), B(n, p,()...

    7

  • 8/12/2019 Foss Lecture1

    8/32

    1.3 Uniform integrability

    Let{n}n1 be a sequence of real-valued r.v.s.

    Definition 1.

    {n} are uniformly integrable (UI), ifE|n|< nand, moreover,

    supn

    E{|n| I(|n| x)} h(x) 0 asx .

    Comments:

    Actually, we can put = instead of in the definition above. But I prefer to

    keep since I want the upper bound h(x) to be monotone non-increasing and right-

    continuous.

    Clearly, if{n}are UI, thensupn E|n| is finite.

    Examples:(1)n=E(n),n = 1, 2, . . .are UI if and only ifminn n> 0.

    (2)n 2n 2n 0

    1

    2n

    1

    2n 1

    1

    n

    |=

    E|n|= 2, En= 0; 0,

    but{n} are not UI!

    Lemma 1.

    The following are equivalent:

    (i){n} are UI;

    (ii) a functiong: [0, )[0, ):

    (a)g(0)> 0; g ;limx g(x) =;(b)supn E{|n| g(|n|)}<

    Note: g(0)>0 is not essential!

    Proof.

    (ii)(i). For each n,

    E{|n| I(|n| x)} E{|n| g(|n|)

    g(|n|) I(|n| x)}

    1

    g(x) sup

    n

    E{|n| g(|n|)} 0 as x .

    (i) (ii). Assume that h(x)>0 x (otherwise the statement is trivial).

    Form Z, let

    Am= {x: 1

    22(m+1) < h(x)

    1

    22m}

    and, for x Am, let g(x) = 2m. Fromh(0)0.

    Note that Am is an interval which is closed from the left and open from the right.

    Denote byzm its left boundary point, zm Am. Then

    E{|n| g(|n|)}=

    m

    E{|n| g(|n|) I(|n| Am)}=

    8

  • 8/12/2019 Foss Lecture1

    9/32

    =

    m

    E{|n| 2m I(|n| Am)}

    m

    2mE{|n| I(|n| zm)}

    m 2m h(zm) m 2

    m 1

    22m

    0 is arbitrary, E|n| E||.

    (b) Assume now that at least one of distributions of r.v.s has an unbounded support,

    that is,P(|n| N)0 such that P(||= x) = 0. Since na.s.

    ,

    n n I(|n|< x)a.s. I(||< x) .

    Then

    n, P(|n| x) =P(|| x) = 1 |=En E (see (a));

    and

    |n| |n|a.s. |=E|n| E|n| supn

    E|n| K n |=E|| K.

    9

  • 8/12/2019 Foss Lecture1

    10/32

    (b2) Show first that E||< . Indeed,

    E||= limx

    E{|| I(|| N)} K 0, choosex such that P(||= x) = 0,h(x) , andE{|| I(||

    x)} .

    Let

    n= E{|n| I(|n| x)} and = E{|| I(|| x)}.

    Then

    E|n| = E{|n| I(|n|< x)} + n,

    E = E{|| I(||< x)} + .

    Sincen nand|| , then

    lim sup(E|n| E||) 2 and

    liminf(E|n| E||) 2 for any.

    Letting to 0, we obtain the first statement of the lemma.

    Prove now the second statement. First, from E 0

    and then choose x0= x0()such that P(= x0) = 0 and

    E{ I( x0)} /2.

    Then we may use part (b1) from the proof of (1): for a given x0,

    En E |= E{n I(n x0)}= E(n n) =

    =En En E E= E{ I( x0)} /2.

    Therefore, n()such that

    E{n I(n x0)} n > n().

    Now, n = 1, 2, . . . , n(),

    En< |= xn : E{n I(n xn)} .

    Letx = max(x1, . . . , xn(), x0). Then

    E{n I(n x)} n.

    Thus,

    supn

    E{n I(n x)} 0 as x .

    10

  • 8/12/2019 Foss Lecture1

    11/32

    1.4 Some useful properties of UI

    Property 1. [If{n}are UI and if{n}are such that|n| |n|a.s., then{n}are UI.

    Indeed, let h(x) be from Definition 1. Then, x >0,

    E{|n| I(|n|> x)} E{|n| I(|n|> x)} E{|n| I(|n|> x)} h(x).

    Property 2.

    If{n} is an i.i.d. sequence with finite mean, E|1| x) P(|n| > x) x).

    In particular, if r.v.s{n} admit a stochastic integrable majorant,

    |n|st||, n

    and ifE||

  • 8/12/2019 Foss Lecture1

    12/32

    (a) The statement and the proof of Lemma 1 stay the same if we replace n= 1, 2, . . .

    by t T.

    (b) Similarly, the statement and the proof of Lemma 2 stay unchanged if we replacen= 1, 2, . . . by t T = [0, ).

    (c) Properties 1 and 3 still hold is we replace n= 1, 2, . . . by t T.

    12

  • 8/12/2019 Foss Lecture1

    13/32

    1.5 Coupling inequality. Maximal coupling. Dobrushins

    theorem.

    In this section, we assume that random variables are not necessarily real-valued and maytake values in a general measurable space (X,BX) which is assumed to be complete

    separablemetric space.

    The Coupling Inequality

    Let1, 2 : ,F, P (X,BX) be two X-valued r.v.s. Let

    P1(B) =P(1 B), P2(B) =P(2 B), B BX.

    Then, forB BX,

    P1(B) P2(B) = P(1 B, 1= 2) + P(1 B, 1=2)

    P(2 B, 1= 2) P(2 B, 1 =2) =

    = P(1 B, 1=2) P(2 B, 1 =2) P(1=2)

    P(1 =2)

    Therefore, for any B BX, |P1(B) P2(B)| P(1 =2), that is

    ()sup

    BBX

    |P1(B) P2(B)| P(1 =2)

    The Maximal Coupling

    Now we reformulate the result obtained. Note that the LHS of inequality (*) depends

    on marginaldistributionsP1 andP2 only and does not depend on the joint distribution

    of1 and2. Therefore, we get the following:

    for any coupling of marginal distributions P1and P2, inequality (*) holds. Equivalently,

    ()supBBX |P1(B) P2(B)| infin all coupling P(1 =2)

    The following questions seem to be natural:

    (?) Is there equality in ()?

    (??) If the answer isyes, then does there exist a coupling such that

    supBBX

    |P1(B) P2(B)|= P(1 =2)?

    13

  • 8/12/2019 Foss Lecture1

    14/32

    The answers to both questions are positive! And this is the content of

    Dobrushins theorem.

    Theorem 1.

    LetP1andP2be two probability measures on a complete separable metric

    space(X,BX). There exists a coupling of these probalility measures such

    that, fori=Pi,i= 1, 2,

    supBBX

    |P1(B) P2(B)|= P(1=2).

    Proof. (B) =P1(B) P2(B)is a signed measure. Then Banach theorem states that

    there exists a subset C X such that

    (a) (B) 0 BC;

    (b) (B) 0 B X\ CC.

    Note:

    1) if(C) = 0, then P1= P2 and the coupling is obvious;

    2) (C) =(C).

    Assume(C)> 0. Introduce 4 distributions (probability measures):

    Q1,1 is defined by Q1,1= U(C), ifP1(C) = 0,

    Q1,1(B) = P1(C B)P1(C)

    , B BX, otherwise.

    and

    Q2,1 is defined by Q2,1(B) =P2(C B) P1(C B)

    (C), B BX.

    Similarly,

    Q2,2 is defined by

    Q2,2= U(C), ifP2(C) = 0,

    Q2,2(B) =P2(C B)

    P2(C) , B BX, otherwise.

    and

    Q1,2 is defined by Q1,2(B) =P1(C B) P2(C B)

    (C) , B BX.

    Then introduce 5 mutually independent r.v.s:

    1,1 =Q1,1, 1,2=Q1,2, 2,1=Q2,1, 2,2 =Q2,2,

    and 1 2 0

    P1(C) P2(C) (C)

    14

  • 8/12/2019 Foss Lecture1

    15/32

    Now we can define 1 and2 as follows:

    1= 1,1 I(= 1) + 2,2 I(= 2) + 2,1 I(= 0),

    2= 1,1 I(= 1) + 2,2 I(= 2) + 1,2 I(= 0).

    Simple calculations show that i=Pi, i = 1, 2. This is Problem No 3 for you.

    Then,

    P(1=2) =P(= 0) =(C) supBBX

    |P1(B) P2(B)|.

    So,

    P(1=2) = supBBX

    |P1(B) P2(B)|.

    Comment. Banach theorem and Radon-Nykodim theorem are two equivalent state-

    ments formulated in slightly different ways.

    There is (formally!) another proof (see, e.g. T. Lindvalls book on the coupling method)

    based on Radon-Nykodim theorem:

    Consider a new probability measure P() = (P1() + P2())/2. Let fi = dPi

    dP be corre-

    sponding densities. Then

    sup

    BBX

    |P1(B) P2(B)|= 1 min(f1(x), f2(x))P(dx),

    and we may repeat the previous construction using densities.

    What is the maximal coupling in the following examples:

    (1) Two discrete two-point distributions.

    (2) Two absolutely continuous distributions on (0, 1)with densitiesf1 andf2.

    (3) Bernoulli and Poisson distributions.

    (4) Normal and exponential distributions.

    (this is another exercise to you)

    15

  • 8/12/2019 Foss Lecture1

    16/32

    1.6 Probabilistic Metrics

    Dobrushins theorem provides a positive solution to one of important problems in the

    theory of Probabilistic Metrics. We will discuss briefly basic concepts of this theory.Again, consider a complete separable metric space (X,BX)and introduce the following

    notation:

    1)X2 = X X,

    2) B2X

    = BX BX is a-algebra in X2 generated by all sets B1 B2,B1, B2 BX,

    3)diag(X2) ={(x, x), x X}.

    Problem No 4. Prove thatdiag(X2) B2X

    . (Actually, there is no need to assume

    that the state space is complete separable metric, and the minimal requirement for

    diag(X2) B2X

    to hold is that the sigma-algebra BX is countably generated).

    LetPbe any probability distribution on (X2,B2X). Denote byPi,i= 1, 2its marginaldistributions:

    P1(B) = P(B X),

    P2(B) = P(X B), B BX.

    LetPbe a set of all probability distributions (measures) on (X2,B2X

    ).

    Definition 3.

    A function d: P [0, ) is called a probabilistic metric if it satisfies

    the following conditions:

    (1) P(diag(X2

    )) = 1 |=d(P) = 0;

    (2) d(P) = 0 |=P1= P2;

    (3) the triangle inequiality:P(1) has marginalsP1 andP2P(2) has marginalsP1 andP3P(3) has marginalsP3 andP2

    |=d(P(1)) d(P(2)) + d(P(3));

    Definition 4.

    A probabilistic metricd is simple if it depends on marginal distributions

    only (i.e. if P(1) and P(2) have the same marginals, then d(P(1)) =

    d(P(2))), and complex otherwise.

    For a simple metric, it is reasonable to write d(P1, P2) instead ofd(P), sod has the

    meaning of a distance between P1 andP2.

    For a complex metric, we may also write d(1, 2) instead ofd(P) where 1, 2 is a

    coupling of two r.v.s with a joint distribution P,

    P(B) =P((1, 2) B), B B2X

    .

    So,d(1, 2)may be considered as a distance between r.v.s.

    16

  • 8/12/2019 Foss Lecture1

    17/32

    We can also write d(1, 2)for simple metrics. In this case,

    d(1, 2) =d(F1, F2) =d(P1, P2).

    Examples.

    Simple Complex

    1)supBB |P1(B) P2(B)| 2)P(1=2) P(X2 diag(X2))

    (Total variation norm (T.V.N.)) (Indicator metric (I.M.))

    For real-valued r.v.s:

    3)supx |F1(x) F2(x)| 5)inf{ >0 :P(|1 2|> )< }

    (Uniform metric (U.M.)) (Ki Fan metric (K.F.M.))

    4)inf{ >0 : F1(x ) F2(x) F1(x + ) + x}

    (Levy metric (L.M.))

    One of key problems in the theory of probabilistic metrics is to find answers to the

    following questions:

    Assume a simple metricd(P1, P2) is given. Does there exist a complex metric

    d such that

    (a) the following coupling inequality holds:

    d(1, 2) infall couplings

    d(1, 2) ?(compare with())(b) If yes, then is it possible to replace by = in (a) ?

    (c) Does there exist a coupling such thatd(1, 2) =d(1, 2)?The following result holds:

    Theorem 2.

    The answers to the above questions are positive for the metrics:

    (1) d= T.V.N.d= I.M.(2) d= L.M.d= K.F.M.Comment. Statement (1) is Dobrushins theorem. Statement (2) is Strassens theo-

    rem (its proof is omitted).

    17

  • 8/12/2019 Foss Lecture1

    18/32

    1.7 Stopping times

    Let ,F, Pbe a probability space and {n}n1 a sequence of r.v.s, n : R.

    Denote byFn a -algebra, generated byn:Fn F; Fn = {

    1n (B), B B},

    whereB is a -algebra of Borel sets in R.

    Then, for 1 k n, F[k,n] is a -algebra generated byk, . . . , n; i.e.

    F F[k,n] is a minimal -algebra such that

    F[k,n] Fl for all l = k, . . . , n.

    Another way to describe F[k,n] is:

    let k,n := (k, . . . , n)be a random vector; k,n: Rnk+1. Then

    F[k,n]= {1k,n(B), B Bnk+1},

    whereBnk+1 is a -algebra of Borel sets in Rnk+1.

    Finally,F[1,) is a -algebra generated by the whole sequence {n}n1.

    Good Property :

    A F[1,), a sequence of events{An}n1, An F[1,n] such that

    P(A \ An) + P(An \ A) 0 asn .

    Let now : {1, 2, . . . , n , . . .} be an integer-valued r.v. (we say it is a counting

    r.v.)

    Definition 5.

    is a stopping time (ST) with respect to{n}, ifn 1,

    {= n} F[1,n]

    (or, equivalently { n} F[1,n]).

    Another variant of a definition of a stopping time is:

    Definition 6.

    is an ST if a family of functionshn: Rn {0, 1} such that:

    n 1, I(= n) =hn(k, . . . , n)a.s.

    (or, equivalently I( n) =hn(k, . . . , n)a.s.).Examples of STs:

    (1) = min{n 1 :n x};

    (2) = min{n 1 :n

    1i x};

    (3) More examples....

    Assume now that {n} is an i.i.d. sequence, is an ST with P(

  • 8/12/2019 Foss Lecture1

    19/32

    Lemma 3.

    The following statements hold:

    1) {i} is an i.i.d. sequence;

    2) i D=1;

    3) {i}i1 and a random vector(, 1, . . . , ) are mutually indepen-

    dent.

    Corollary 1.

    {i}i1 andS 1+ . . . + are mutually independent.

    Proof of Lemma 3. It is sufficient to show that

    k 1, m 1, Borel sets B1, . . . , Bk andC1, . . . , C m,

    ()P({= k; 1 B1, . . . , kBk} {1 C1, . . . ,m Cm}) ==P(= k;1 B1, . . . , k Bk)P(1 C1, . . . , m Cm).

    Indeed, () |=1), 2), and 3).

    First, take B1= . . .= Bk =Bk+1= . . .= R. Then, m,

    ()P(1 C1, . . . ,m Cm)

    t.p.f.

    =

    k=1

    P(= k;1 C1, . . . ,m Cm)

    ()=

    k=1

    P(= k)m

    i=1

    P(1 Ci) =m

    i=1

    P(1 Ci).

    In particular, j1 Cj, we can take m j andCi= R for i =j .

    Then the LHS of()=P(j Cj),

    the RHS of()=P(1 Cj).

    |= 2)

    Now, take anyC1, . . . , C m and replace in()m

    i=1

    P(1 Ci) bym

    i=1

    P(1 Ci).

    |= 1)

    Finally, take anyB1, . . . , Bk andC1, . . . , C m and replace in()m

    i=1P(1 Ci) by

    m

    i=1P(i Ci).

    |= 3)

    So, we will prove ()now:

    P({= k; 1 B1, . . . , kBk} {1 C1, . . . ,m Cm}) =

    P({hk(1, . . . , k) = 1; 1 B1, . . . , kBk} F[1,k]

    {k+1 C1, . . . , k+m Cm} F[k+1,k+m]

    ) =

    =P(. . .) P(. . .) =

    =P(. . .) m

    i=1

    P(k+i Ci) =P(. . .) m

    i=1

    P(1 Ci).

    19

  • 8/12/2019 Foss Lecture1

    20/32

    Lemma 4.

    (Waldidentity)

    Assume thatE|1|< andE

  • 8/12/2019 Foss Lecture1

    21/32

    |= . . . F[1,k] k

    |= . . .

    F[1,k].

    Now let us write (1)i instead of i

    (1) instead of

    (2)i ...

    i(2) ... ... ...

    ...

    Lemma 6. If(i) is a ST w.r. to{(j)i }i1 j= 1, . . . , J and if{(j+1)i }= {(j)i },

    then(1) + . . . + (J) is an ST w.r. to{i}i1.

    Problem No 5. Prove Lemma 6.

    21

  • 8/12/2019 Foss Lecture1

    22/32

    1.8 Two-dimensional stopping times

    Let {n,1}n1 and {n,2}n1 be two sequences of r.v.s and F[k1,n1][k2,n2] a -algebra

    generated by k1,1, k1+1,1, . . . , n1,1; k2,2, k2+1,2, . . . , n2,2.

    Definition 7.

    A pair of r.v.s1, 2: {1, 2, . . .} is an ST w.r. to{n,1} and{n,2},

    if

    n1 1, n2 1 {1= n1, 2= n2} F[1,n1][1,n2].

    Lemma 7.

    If{n,1}n1 and{n,2}n1 are two mutually independent sequences and

    if(1, 2) is an ST, then

    1) each of the sequences

    {i,1} {1+i,1}and{i,2} {2+i,2}

    is i.i.d., and these sequences are mutually independent;

    2) i,1D=1,1; i,2

    D=1,2;

    3) {{i,1}i1; {i,2}i1} and a random vector

    (1, 2; 1,1, . . . , 1,1; 1,2, . . . , 2,2)

    are mutually independent.

    Proof is omitted.

    Lemma 8.

    In conditions of Lemma 7, assume, in addition, that

    1,1D=1,2.

    Then a sequence{n}n1,

    n=

    n,1, if n 1n1+2,2, if n > 1

    is i.i.d.; nD=1,1.

    Proof. We have to show that n = 1, 2, . . ., B1, . . . , Bl

    P(1 B1, . . . , n Bn) =n

    i=1

    P(1,1 Bi).

    1) n, B

    P(n B) =P(n,1 B; n 1) + P(n1+2,2 B; n > 1).

    22

  • 8/12/2019 Foss Lecture1

    23/32

    P(n,1 B; n 1) = P(1,1 B) P(1,1 B) P(n > 1) =

    = P(n,1 B) P(n 1)

    P(n1+2,2 B; n > 1) =

    n1

    l=1

    P(2+nl,2 B; 1 = l)

    =n1l=1

    P(nl,2 B; 1= l)

    = . . .= P(1,2 B) P(1 < n)

    2) Problem No 6. Prove the statement for joint distributions. Use the induc-

    tion arguments.

    Here is another variant of a two-dimensional analogue of Lemma 3.

    Lemma 9.

    Assume that

    (i) n= (n,1, n,2)is a sequence (n= 1, 2, . . .) of independent random

    vectors;

    (ii) each of{n,1}n1 and{n,2}n1 is an i.i.d. sequence;

    (iii) 1,1D=1,2;

    (iv) (1, 2)is an ST and1 2 = .

    Then

    n = n,1, if n n,2, if n > is an i.i.d. sequence; n

    D=1,1.

    Proof is very similar to that of Lemma 8 (omitted).

    Finally, here is a further generalization of Lemma 9.

    Lemma 10.

    In the statement of Lemma 9, replace

    ( i )by( i ) =

    m1 1, m2 1:n = ((n1)m1+1,1, . . . , nm1,1; (n1)m2+1,2, . . . , nm2,2) is

    an i.i.d. sequence;

    and

    (iv)by(iv) =

    (1, 2)is an ST,P(1 {m1, 2m1, . . .}) =P(2 {m2, 2m2, . . .}) = 1

    and 1m1

    2m2

    .

    Then

    n=

    n,1, if n 1n1+2,2, if n > 1

    is an i.i.d. sequence; nD=1,1.

    Problem No 7. Prove Lemma 10.

    23

  • 8/12/2019 Foss Lecture1

    24/32

    1.9 Stationary Sequences and Processes

    Discrete Time

    Definition 8.

    (a) Let{n}n0 be a sequence of r.v.s.

    It is stationary if l = 1, 2, . . ., 0 i1 < i2 < . . . < il,

    B1, . . . , Bl B, m = 1, 2, . . .

    P(i1 B1, . . . , il Bl) =P(i1+m B1, . . . , il+m Bl). (1)

    (b) Similarly, a double-infinite sequence {n}n= is stationary, if (1)

    holds m Z and B1, . . . , Bl B.

    Continuous Time

    Definition8.

    (a) Let{t}t0 be a family of r.v.s.

    It is stationary, if l = 1, 2, . . ., 0 t1 < t2 < . . . < tl,

    B1, . . . , Bl B, u 0

    P(t1 B1, . . . , tl Bl) =P(t1+u B1, . . . , tl+u Bl).

    (b) Similarly,{t}t=is stationary, if the above equality holds u R

    and B1, . . . , Bl B.

    Definition 9. A sequence of events{An}n= is stationary, if a sequence of random

    variables{I(An)}n= is stationary.

    Assume that {An}n=is a stationary sequence and that P(A0)> 0and P(

    n=0An) =

    1.

    Introduce the following r.v.s:

    + = min{n 1 : I(An) = 1} min{n 1 : An}

    = min{n 1 : I(An) = 1}

    + : P( > n) =P(A1 . . . An|A0)

    : P( > n) =P(A1 . . . An|A0)

    Lemma 11.

    (a) D=;

    (b) D=;

    (c) P(=n) =P(A0) P(n) n = 1, 2, . . .

    Remark 4. [ The statement of the lemma is not obvious, in general.

    24

  • 8/12/2019 Foss Lecture1

    25/32

    Examples: Let{n}be an i.i.d. sequence,P(n> 0) > 0.

    The we can take a) An= {n> 0}; b)An= {n+ n1 > 0}.

    Proof of Lemma 11.

    (a)

    P( > n) = P(A1 . . . An)

    m

    = P(A1+m . . . An+m)

    m=n1

    =

    = P(An . . . A1) =P( > n).

    (b)

    P(=n) = P(A0A1 . . . An1An)

    P(A0)

    =P(AnAn+1 . . . A1A0)

    P(A0)

    =

    = P( =n).

    (c)

    P(n) = P(A1 . . . An1) =P(A0A1 . . . An1) + P(A0A1 . . . An1)

    = P(A0) P(A1 . . . An1|A0) + P(A1 . . . An) =

    = P(A0) P(n) + P(n + 1).

    |= P(=n) =P(n) P(n + 1) =P(A0) P(n).

    Corollary 2.

    k >0, Ek

  • 8/12/2019 Foss Lecture1

    26/32

    and, using similar arguments with the lower bound,

    Ek P(A0)

    k+ 1

    Ek+1.

    |= Ek andEk+1 are either finite or infinite simultaneously.

    26

  • 8/12/2019 Foss Lecture1

    27/32

    1.10 On -algebras generated by a sequence of r.v.s.

    (1). Let ,F, P be a probabililty space andn: R, n = 1, 2, . . . a sequence

    of r.v.s. Let F[k,n] = (k, . . . , n); F[k,) = (k, k+1 . . .).ForA, B F, introduce a distance

    d(A, B) =P(A \ B) + P(B\ A).

    (A)Recall basic properties of-algebras.

    1) IfF(1),F(2) are -algebras on |= F(1) F(2) is -algebra, too, but

    F(1) F(2) may be not, in general.

    2) More generally, let T be any parameter set and F(t), t T -algebras on

    |= tTF(t)

    is -algebra, too.

    By definition,F[1,) is a minimal-algebra which contains all -algebras F[1,n] , n=

    1, 2, . . . it is an intersection of all -algebras F[1,n]n = 1, 2, . . ..

    SinceF F[1,n] n |= F[1,] F.

    (B)Now we study properties of the distance d:

    (1) Clearly, d(A, B) =d(B, A) 0;

    (2) d(A, C) d(A, B) + d(B, C)(the triangle inequality);

    Indeed, A \ C= (A \ B) (A (B\ C)) (A \ B) (B\ C)

    |= P(A \ C) P(A \ B) + P(B\ C).

    Similarly,

    P(C\ A) P(B\ A) + P(C\ B).

    (3) d(A, B) =d(A, B)(since P(A \ B) =P(B\ A));

    (4) |P(A) P(B)| |P(A B) + P(A \ B) P(A B) P(B\ A)| d(A, B);

    (5) d(A1 A2, B1 B2) d(A1, B1) + d(A2, B2);

    Indeed,(A1A2)\(B1B2) = (A1\(B1B2))(A2\(B1B2)) (A1\B1)(A2\B2)

    |= P((A1 A2) \ (B1 B2)) P(A1 \ B1) + P(A2 \ B2).

    Lemma 12.

    A F[1,), {An}n1, An F[1,n] : d(A, An)0.

    27

  • 8/12/2019 Foss Lecture1

    28/32

    Proof. Let U be a set of events A F such that {An}n1, An F[1,n] :

    d(A, An)0.

    1) One can easily se that U F[1,m] m = 1, 2, . . ..

    Indeed, m, A F[1,m], let

    An =

    , ifn < m;

    A, ifn m.

    Therefore, A U.

    2) Thus, it is sufficient to show that U is-algebra. Then, with necessity,U F[1,),

    that completes the proof.

    2.1) First we prove that Uis an algebra, i.e.

    (i) U;

    (ii) A U |= A U;

    (iii) k,A(1), . . . , A(k) U |= A(1) . . . A(k) U.

    (i) is obvious, (ii) follows from property (3), and (iii) follows from (5):

    d(A(1) . . . A(k), A(1)n . . . A(k)n )

    k

    j=1d(A(j), A(j)n ) 0.

    2.2) Now we prove that U is a -algebra:

    (iii) A(1), A(2) . . . U |= A j=1A(j) U.

    LetB (k) =kj=1A(j) Then B (k) A and P(B(k))P(A).

    |= {B(k)n } : B(k)n F[1,n], d(B

    (k), B(k)n ) 0 as n .

    Choose

    n(1) = min{n 1 : d(B(1), B(1)l ) 1/2l n}

    and, for k 1,

    n(k+ 1) = min{n n(k) : d(B(k), B(k)l ) 1/2

    k l n}.

    Then let

    An=

    , ifn < n(1);

    B(k)

    n(k), ifn(k) n < n(k+ 1).

    Clearly, An F[1,n]. Then d(A, An) d(A, B(k)) + 1/2k, for n(k) n < n(k+ 1).

    Sincek as n , d(A, An) 0.

    28

  • 8/12/2019 Foss Lecture1

    29/32

    Lemma 13.

    Let{n}n= be a double-infinite sequence of r.v.s,

    F(,) = {. . . , 2, 1, 0, 1, 2, . . .}.

    Then A F(,), {An}, An F[n,n] : d(A, An) 0.

    Problem No 8. Prove Lemma 13.

    (2). Sigma-algebras generated by sequences of independent r.v.s.

    Definition 10.

    For a sequence{n}n1 of r.v.s, its tail-algebra is

    F= k=1F[k,).

    Note: Since F[k+1,) F[k,), |= F= k=lF[k,) l.

    Definition 11.

    For a sequence{n}n=,

    F= k=1F[k,)

    k=lF[k,), < l

  • 8/12/2019 Foss Lecture1

    30/32

    Lemma 15.

    If {n}

    n=is a sequence of independent r.v.s, then both FandF

    are trivial.

    Problem No 9. Prove Lemma 15.

    (3). A stationary sequence of r.v.s.

    Definition 12.

    A sequence{n}n1 (or{n}n=) is stationary, if

    l 1, 1 n1 < n2< . . . < nl (or without 1),

    k 1 (or < k

  • 8/12/2019 Foss Lecture1

    31/32

    Definition 13.

    An F[1,)-measurable (orF(,)-measurable) r.v. is invariant (w.r.to

    ), if

    = a.s. (i.e. P( = ) = 1).

    An eventA F[1,) (orA F(,)) is invariant (w.r.to), if

    P(A A) =P(A).

    Note that = a.s. x,

    P({ x} { x}) =P( x).

    Comments, examples...

    Definition 14.

    A stationary sequence{n} is ergodic (w.r.to), if A F[1,) (AF[1,)),

    Ais invariant |= P(A) = 01

    (or is invariant |= = consta.s. ).

    Remark 5.

    All invariant events (sets) form a -algebra F(inv) (invariant-algebra).

    Lemma 16.

    (1) A F[1,) (or A F(,)) a sequence of events{nA, n

    0} (or{nA, n }) is stationary;

    (2) If {n} is stationary ergodic, then A F[1,) (or A

    F(,)),P(A)>0

    |= P(n=lnA) = 1 l (andP(n=l

    nA) = 1 l).

    Proof. (1) follows from definitions.

    (2) Let B =n=lnA. Then

    B = n=l(

    n

    A) =n=l+1

    n

    A

    andB B

    |= P(B B) =P(B) =P(B) |= B is invariant

    |= P(B) = 01.

    ButP(B) P(lA) =P(A)>0 |= P(B) = 1.

    Lemma 17.

    IfA is invariant, then B F such thatd(A, B) = 0.

    31

  • 8/12/2019 Foss Lecture1

    32/32

    Proof. There are two cases: (a) F[1,); (b) F(,). Here we give a proof in the

    first case.

    Problem No 10. Prove the lemma in the case (b).

    1) Let B0,m= A A 2A . . . mA,B0=

    n=0

    nA. Then

    A= B0,0 B0,1 . . . B0,m B0,m+1 . . . B0

    andP(B0,m)P(B0). But

    P(B0,m) =P(A) m |= P(B0) =P(A)and d(B0, A) = 0.

    2) For k 1, put Bk =kB0

    n=k

    nA.

    Note that Bk+1 Bk andBk F[k,),

    P(Bk) =P(B0) =P(A) and d(Bk, A) = 0.

    Let

    B = limk

    Bk |= P(B) =P(A) and d(B, A) = 0.

    SinceB F[k,) k |= B F.

    Remark 6.

    In the caseF(,), the symmetric statement is true, too: if A is

    invariant, then B F such thatd(A, B) = 0.

    Corollary 3. Any i.i.d. sequence is stationary ergodic.Indeed,

    F is trivial |=ifA is invariant,B F,P(B) = 0 1and d(A, B) = 0

    |= P(A) = 0 1.

    Remark 7.

    There exists a number of weaker conditions that imply the triviality ofthe tail-algebra F and, as a corollary, the ergodicity of a stationary

    sequence.

    For instance, we can introduce the following mixing coefficients:

    dk = supBF[k,),AF(,0]

    |P(A B) P(A) P(B)|,

    and then show that ifdk0 as k , then F is trivial.

    In general, there are examples when F is not trivial, but Finv is (i.e. the sequence

    is ergodic).

    Example n+1= n n;1 =

    1, w.pr. 1/2

    1, w.pr. 1/2 Then