concentration inequalities and martingale inequalities — a survey

Upload: seaboy

Post on 01-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    1/43

    Concentration inequalities and martingale

    inequalities a survey

    Fan Chung Linyuan Lu

    May 28, 2006

    Abstract

    We examine a number of generalized and extended versions of concen-tration inequalities and martingale inequalities. These inequalities are ef-

    fective for analyzing processes with quite general conditions as illustrated

    in an example for an infinite Polya process and webgraphs.

    1 Introduction

    One of the main tools in probabilistic analysis is the concentration inequalities.Basically, the concentration inequalities are meant to give a sharp predictionof the actual value of a random variable by bounding the error term (from theexpected value) with an associated probability. The classical concentration in-equalities such as those for the binomial distribution have the best possible error

    estimates with exponentially small probabilistic bounds. Such concentration in-equalities usually require certain independence assumptions (i.e., the randomvariable can be decomposed as a sum of independent random variables).

    When the independence assumptions do not hold, it is still desirable to havesimilar, albeit slightly weaker, inequalities at our disposal. One approach isthe martingale method. If the random variable and the associated probabilityspace can be organized into a chain of events with modified probability spacesand if the incremental changes of the value of the event is small, then themartingale inequalities provide very good error estimates. The reader is referredto numerous textbooks [5, 17, 20] on this subject.

    In the past few years, there has been a great deal of research in analyzinggeneral random graph models for realistic massive graphs which have unevendegree distribution such as the power law [1, 2, 3, 4, 6]. The usual concentration

    inequalities and martingale inequalities have often been found to be inadequateand in many cases not feasible. The reasons are multi-fold: Due to unevendegree distribution, the error bound of those very large degrees offset the delicate

    University of California, San Diego, [email protected] supported in part by NSF Grants DMS 0100472 and ITR 0205061University of South Carolina, [email protected]

    1

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    2/43

    analysis in the sparse part of the graph. For the setup of the martingales, auniform upper bound for the incremental changes are often too poor to be of

    any use. Furthermore, the graph is dynamically evolving and therefore theprobability space is changing at each tick of the time.

    In spite of these difficulties, it is highly desirable to extend the classical con-centration inequalities and martingale inequalities so that rigorous analysis forrandom graphs with general degree distributions can be carried out. Indeed, inthe course of studying general random graphs, a number of variations and gen-eralizations of concentration inequalities and martingale inequalities have beenscattered around. It is the goal of this survey to put together these extensionsand generalizations to present a more complete picture. We will examine andcompare these inequalities and complete proofs will be given. Needless to saythat this survey is far from complete since all the work is quite recent and theselection is heavily influenced by our personal learning experience on this topic.Indeed, many of these inequalities have been included in our previous papers[9, 10, 11, 12].

    In addition to numerous variations of the inequalities, we also include anexample of an application on a generalization of Polyas urn problem. Dueto the fundamental nature of these concentration inequalities and martingaleinequalities, they may be useful for many other problems as well.

    This paper is organized as follows:

    1. Introduction overview, recent developments and summary.

    2. Binomial distribution and its asymptotic behavior the normalized bi-nomial distribution and Poisson distribution,

    3. General Chernoff inequalities sums of independent random variables in

    five different concentration inequalities.4. More concentration inequalities five more variations of the concentra-

    tion inequalities.

    5. Martingales and Azumas inequality basics for martingales and proofsfor Azumas inequality.

    6. General martingale inequalities four general versions of martingale in-equalities with proofs.

    7. Supermartingales and submartingales modifying the definitions formartingale and still preserving the effectiveness of the martingale inequal-ities.

    8. The decision tree and relaxed concentration inequalities instead of theworst case incremental bound (the Lipschitz condition), only certain localconditions are required.

    9. A generalized Polyas urn problem An application for an infinite Polyaprocess by using these general concentration and martingale inequalities.

    2

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    3/43

    For webgraphs generated by the preferential attachment scheme, the con-centration for the power law degree distribution can be derived in a similar

    way.

    2 The binomial distribution and its asymptotic

    behavior

    Bernoulli trials, named after James Bernoulli, can be thought of as a sequenceof coin flips. For some fixed value p, where 0 p 1, the outcome of thecoin tossing process has probability p of getting a head. LetSn denote thenumber of heads after n tosses. We can write Sn as a sum of independentrandom variablesXi as follows:

    Sn= X1+ X2+

    + Xn

    where, for each i, the random variable Xsatisfies

    Pr(Xi = 1) = p,

    Pr(Xi = 0) = 1 p. (1)A classical question is to determine the distribution ofSn. It is not too difficultto see that Sn has the binomial distributionB (n, p):

    Pr(Sn= k) =

    n

    k

    pk(1 p)nk, fork = 0, 1, 2, . . . , n .

    The expectation and variance ofB(n, p) are

    E(Sn) = np, Var(Sn) =np(1 p).

    To better understand the asymptotic behavior of the binomial distribution,we compare it with the normal distribution N(, ), whose density function isgiven by

    f(x) = 1

    2e

    (x)222 , < x <

    where denotes the expectation and 2 is the variance.The case N(0, 1) is called the standard normal distribution whose density

    function is given by

    f(x) = 1

    2ex

    2/2,

    < x 0, we have E(X) =

    ni=1 aipi and we define

    =

    ni=1 a

    2ipi. Then we have

    Pr(X E(X) ) e2/2 (2)Pr(X E(X) + ) e

    2

    2(+a/3) (3)

    wherea = max{a1, a2, . . . , an}.

    6

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    7/43

    Theorem 8 Theorem 9

    Theorem 6 Theorem 7

    Theorem 5

    Theorem 4

    Theorem 10 Theorem 13 Theorem 14Theorem 11

    Upper tails Lower tails

    Figure 7: The flowchart for theorems on the sum of independent variables.

    To compare inequalities (2) to (3), we consider an example in Figure 8. Thecumulative distribution is the function Pr(X > x). The dotted curve in Figure 8illustrates the cumulative distribution of the binomial distribution B(1000, 0.1)with the value ranging from 0 to 1 as x goes from to. The solid curveat the lower-left corner is the bound e

    2/2 for the lower tail. The solid curve

    at the upper-right corner is the bound 1 e 2

    2(+a/3) for the upper tail.

    0

    0.2

    0.4

    0.6

    0.8

    1

    70 80 90 100 110 120 130

    CumulativeProbability

    value

    Figure 8: Chernoff inequalities

    The inequality (3) in the above theorem is a corollary of the following generalconcentration inequality (also see Theorem 2.7 in the survey paper by McDi-armid [20]).

    7

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    8/43

    Theorem 6 [20] LetXi(1 i n) be independent random variables satisfyingXi

    E(Xi) +M, for 1

    i

    n. We consider the sum X = ni=1 Xi with

    expectationE(X) =ni=1E(Xi) and variance Var(X) =

    ni=1Var(Xi). Then

    we have

    Pr(X E(X) + ) e 2

    2(Var(X)+M/3) .

    In the other direction, we have the following inequality.

    Theorem 7 IfX1, X2, . . . , X nare non-negative independent random variables,we have the following bounds for the sumX=

    ni=1 Xi:

    Pr(X E(X) ) e 2

    2 ni=1

    E(X2i) .

    A strengthened version of the above theorem is as follows:

    Theorem 8 SupposeXi are independent random variables satisfyingXi M,for1 i n. LetX= ni=1 Xi andX = ni=1E(X2i). Then we have

    Pr(X E(X) + ) e 2

    2(X2+M/3) .

    Replacing Xby Xin the proof of Theorem 8, we have the following theoremfor the lower tail.

    Theorem 9 LetXi be independent random variables satisfyingXi M, for1 i n. LetX= ni=1 Xi andX = ni=1E(X2i). Then we have

    Pr(X E(X) ) e 2

    2(X2+M/3) .

    Before we give the proof of Theorems 8, we will first show the implicationsof Theorems 8 and 9. Namely, we will show that the other concentration in-equalities can be derived from Theorems 8 and 9.

    Fact: Theorem 8 =Theorem 6:Proof: LetXi = Xi E(Xi) andX=

    ni=1 X

    i = X E(X). We have

    Xi M for 1 i n.

    We also have

    X2 =n

    i=1E(X2i )

    =

    ni=1

    Var(Xi)

    = Var(X).

    8

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    9/43

    Applying Theorem 8, we get

    Pr(X E(X) + ) = Pr(X ) e

    2

    2(X2+M/3)

    e 2

    2(Var(X)+M/3) .

    Fact: Theorem 9 =Theorem 7The proof is straightforward by choosingM= 0.

    Fact: Theorem 6 and 7 =Theorem 5Proof: We defineYi= aiXi. Note that

    X2 =ni=1

    E(Y2i ) =

    ni=1

    a2ipi= .

    Equation (2) follows from Theorem 7 sinceYis are non-negatives.For the other direction, we have

    Yi ai a E(Yi) + a.

    Equation (3) follows from Theorem 6.

    Fact: Theorem 8 and Theorem 9 =Theorem 3The proof is by choosing Y =X E(X),M= 1 and applying Theorems 8 andTheorem 9 to Y .

    Fact: Theorem 5 =Theorem 4The proof follows by choosinga1 = a2= =an= 1.Finally, we give the complete proof of Theorem 8 and thus finish the proofs

    for all the above theorems on Chernoff inequalities.Proof of Theorem 8: We consider

    E(etX) = E(et

    iXi) =

    ni=1

    E(etXi)

    since the Xis are independent.

    We defineg(y) = 2k=2yk2

    k! = 2(ey1y)y2 , and use the following facts about

    g:

    g(0) = 1. g(y) 1, fory

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    10/43

    For y

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    11/43

    Theorem 10 Let Xi denote independent random variables satisfying Xi E(Xi) + ai+ M, for1

    i

    n. For, X= ni=1 Xi, we have

    Pr(X E(X) + ) e 2

    2(Var(X)+ ni=1

    a2i+M/3) .

    Proof: LetXi = Xi E(Xi) ai andX=ni=1 X

    i. We have

    Xi M for 1 i n.

    X E(X) =ni=1

    (Xi E(Xi))

    =

    ni=1

    (Xi+ ai)

    =ni=1

    (Xi E(Xi))

    = X E(X).Thus,

    X2 =ni=1

    E(X2i )

    =ni=1

    E((Xi E(Xi) ai)2)

    =n

    i=1

    E((Xi

    E(Xi))2 + a2

    i

    = Var(X) +ni=1

    a2i .

    By applying Theorem 8, the proof is finished.

    Theorem 11 Suppose Xi are independent random variables satisfying XiE(Xi) + Mi, for0 i n. We order theXis so that theMi are in increasingorder. LetX=

    ni=1 Xi. Then for any1 k n, we have

    Pr(X E(X) + ) e 2

    2(Var(X)+ ni=k

    (MiMk)2+Mk/3) .

    Proof: For fixed k , we choose M=Mk and

    ai=

    0 if 1 i k,Mi Mk ifk i n.

    We haveXi E(Xi) Mi ai+ Mk for 1 k n,

    11

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    12/43

    n

    i=1

    a2i =

    n

    i=k

    (Mi Mk)2.

    Using Theorem 10, we have

    Pr(Xi E(X) + ) e 2

    2(Var(X)+ ni=k

    (MiMk)2+Mk/3) .

    Example 12 Let X1, X2, . . . , X n be independent random variables. For 1i n 1, supposeXi follows the same distribution with

    Pr(Xi= 0) = 1 p and Pr(Xi= 1) = p,

    and Xn follows the distribution with

    Pr(Xn= 0) = 1 p and Pr(Xn= n) = p.

    Consider the sumX=ni=1 Xi.

    We have

    E(X) =

    ni=1

    E(Xi)

    = (n 1)p + np.

    Var(X) =n

    i=1

    Var(Xi)

    = (n 1)p(1 p) + np(1 p)= (2n 1)p(1 p).

    Apply Theorem 6 with M= (1 p)n. We have

    Pr(X E(X) + ) e 2

    2((2n1)p(1p)+(1p)n/3) .

    In particular, for constant p (0, 1) and = (n 12+), we have

    Pr(X E(X) + ) e(n).

    Now we apply Theorem 11 with M1

    = . . . = Mn1

    = (1

    p) and Mn

    =n(1 p). Choosingk = n 1, we have

    Var(X) + (Mn Mn1)2 = (2n 1)p(1 p) + (1 p)2(

    n 1)2 (2n 1)p(1 p) + (1 p)2n (1 p2)n.

    12

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    13/43

    Thus,

    Pr(Xi E(X) + ) e 2

    2((1

    p2)n+(1

    p)2/3)

    .For constantp (0, 1) and = (n 12+), we have

    Pr(X E(X) + ) e(n2).From the above examples, we note that Theorem 11 gives a significantly

    better bound than that in Theorem 6 if the random variables Xi have verydifferent upper bounds.

    For completeness, we also list the corresponding theorems for the lower tails.(These can be derived by replacingXbyX.)Theorem 13 Let Xi denote independent random variables satisfying Xi E(Xi) ai M, for0 i n. ForX=

    ni=1 Xi, we have

    Pr(X E(X) ) e 2

    2(Var(X)+ ni=1

    a2i+M/3) .

    Theorem 14 Let Xi denote independent random variables satisfying Xi E(Xi) Mi, for0 i n. We order theXis so that theMi are in increasingorder. LetX=

    ni=1 Xi. Then for any1 k n, we have

    Pr(X E(X) ) e 2

    2(Var(X)+ ni=k

    (MiMk)2+Mk/3) .

    Continuing the above example, we chooseM1 = M2 = . . .= Mn1 = p, andMn=

    np. We choose k = n 1, so we haveVar(X) + (Mn Mn1)2 = (2n 1)p(1 p) +p2(

    n 1)2

    (2n 1)p(1 p) +p2n p(2 p)n.

    Using Theorem 14, we have

    Pr(X E(X) ) e 2

    2(p(2p)n+p2/3) .

    For a constant p (0, 1) and = (n 12+), we have

    Pr(X E(X) ) e(n2).

    5 Martingales and Azumas inequality

    A martingale is a sequence of random variables X0, X1, . . . with finite meanssuch that the conditional expectation ofXn+1 givenX0, X1, . . . , X n is equal toXn.

    The above definition is given in the classical book of Feller [15], p. 210.However, the conditional expectation depends on the random variables under

    13

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    14/43

    consideration and can be difficult to deal with in various cases. In this surveywe will use the following definition which is concise and basically equivalent for

    the finite cases.Suppose that is a probability space with a probability distribution p. Let

    Fdenote a -field on . (A-field on is a collection of subsets of whichcontainsand , and is closed under unions, intersections, and complementa-tion.) In a -fieldFof , the smallest set inFcontaining an element x is theintersection of all sets inF containing x. A functionf : R is said to beF-measurable iff(x) = f(y) for any y in the smallest set containing x. (Formore terminology on martingales, the reader is referred to [17].)

    Iff : Ris a function, we define the expectation E(f) = E(f(x) | x )by

    E(f) = E(f(x) | x ) :=x

    f(x)p(x).

    IfF is a -field on , we define the conditional expectation E(f| F) : Rby the formula

    E(f| F)(x) := 1yF(x)p(y)

    yF(x)

    f(y)p(y)

    whereF(x) is the smallest element ofFwhich contains x.A filterF is an increasing chain of-subfields

    {0, } = F0 F1 F n= F.

    A martingale (obtained from)Xis associated with a filter F and a sequence ofrandom variables X0, X1, . . . , X n satisfying Xi = E(X| Fi) and, in particular,X0= E(X) and Xn= X.

    Example 15 For given independent random variables Y1, Y2, . . . , Y n, we candefine a martingale X = Y1 + Y2 + +Yn as follows. LetFi be the -fieldgenerated by Y1, . . . , Y i. (In other words,Fi is the minimum -field so thatY1, . . . , Y i areFi-measurable.) We have a natural filterF:

    {0, } = F0 F1 F n= F.

    Let Xi =ij=1 Yj+

    nj=i+1E(Yj). Then, X0, X1, X2, . . . , X n forms a martin-

    gale corresponding to the filterF.

    For c = (c1, c2, . . . , cn) a vector with positive entries, the martingale X issaid to be c-Lipschitz if

    |Xi Xi1| ci (4)

    fori = 1, 2, . . . , n. A powerful tool for controlling martingales is the following:

    14

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    15/43

    Theorem 16 (Azumas inequality) If a martingaleX isc-Lipschitz, then

    Pr(|X E(X)| ) 2e 2

    2

    ni=1

    c2i, (5)

    wherec = (c1, . . . , cn).

    Theorem 17 LetX1, X2, . . . , X n be independent random variables satisfying

    |Xi E(Xi)| ci for1 i n.Then we have the following bound for the sumX=

    ni=1 Xi.

    Pr(|X E(X)| ) 2e 2

    2 ni=1 c

    2i.

    Proof of Azumas inequality: For a fixedt, we consider the convex functionf(x) = etx. For any

    |x

    | c, f(x) is below the line segment from (

    c, f(

    c)) to

    (c, f(c)). In other words, we have

    etx 12c

    (etc etc)x +12

    (etc + etc).

    Therefore, we can write

    E(et(XiXi1)|Fi1) E( 12ci

    (etci etci)(Xi Xi1) +12

    (etci + etci)|Fi1)

    = 1

    2(etci + etci)

    et2c2i/2.Here we apply the conditions E(Xi Xi1|Fi1) = 0 and|Xi Xi1| ci.

    Hence, E(etXi |Fi1) et2c2i /2etXi1 .Inductively, we have

    E(etX) = E(E(etXn |Fn1)) et2c2n/2E(etXn1)

    ni=1

    et2c2i/2E(etX0)

    = e12 t

    2 ni=1c

    2i etE(X).

    Therefore,

    Pr(X E(X) + ) = Pr(et(XE(X)) et) etE(et(XE(X))) ete 12 t2

    ni=1c

    2i

    = et+12 t

    2 ni=1c

    2i .

    15

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    16/43

    We chooset = ni=1c

    2i

    (in order to minimize the above expression). We have

    Pr(X E(X) + ) et+ 12 t2

    ni=1c2i

    = e 2

    2 ni=1

    c2i.

    To derive a similar lower bound, we consider Xi instead ofXiin the precedingproof. Then we obtain the following bound for the lower tail.

    Pr(X E(X) ) e 2

    2 ni=1

    c2i.

    6 General martingale inequalities

    Many problems which can be set up as a martingale do not satisfy the Lipschitzcondition. It is desirable to be able to use tools similar to the Azuma inequalityin such cases. In this section, we will first state and then prove several extensionsof the Azuma inequality (see Figure 9).

    Theorem 20

    Theorem 21

    Theorem 18

    Theorem 19 Theorem 22

    Theorem 24

    Theorem 23

    Upper tails Lower tails

    Figure 9: The flowchart for theorems on martingales.

    Our starting point is the following well known concentration inequality (see[20]):

    Theorem 18 LetXbe the martingale associated with a filterF satisfying

    1. Var(Xi|Fi1) 2i , for1 i n;2.|Xi Xi1| M, for1 i n.

    Then, we havePr(X E(X) ) e

    2

    2( ni=1

    2i+M/3) .

    Since the sum of independent random variables can be viewed as a martingale(see Example 15), Theorem 18 implies Theorem 6. In a similar way, the followingtheorem is associated with Theorem 10.

    16

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    17/43

    Theorem 19 LetXbe the martingale associated with a filterF satisfying

    1. Var(Xi|Fi1) 2i , for1 i n;

    2. Xi Xi1 Mi, for1 i n.Then, we have

    Pr(X E(X) ) e 2

    2 ni=1

    (2i+M2

    i) .

    The above theorem can be further generalized:

    Theorem 20 LetXbe the martingale associated with a filterF satisfying

    1. Var(Xi|Fi1) 2i , for1 i n;2. Xi Xi1 ai+ M, for1 i n.

    Then, we have

    Pr(X E(X) ) e 2

    2( ni=1

    (2i+a2

    i)+M/3) .

    Theorem 20 implies Theorem 18 by choosinga1 = a2= =an= 0.We also have the following theorem corresponding to Theorem 11.

    Theorem 21 LetXbe the martingale associated with a filterF satisfying

    1. Var(Xi|Fi1) 2i , for1 i n;2. Xi Xi1 Mi, for1 i n.

    Then, for anyM, we have

    Pr(X E(X) ) e 2

    2( ni=1

    2i+

    Mi>M(MiM)2+M/3) .

    Theorem 20 implies Theorem 21 by choosing

    ai=

    0 ifMi M,Mi M ifMi M.

    It suffices to prove Theorem 20 so that all the above stated theorems hold.Proof of Theorem 20:

    Recall that g(y) = 2k=2yk2

    k! satisfies the following properties:

    g(y)

    1, fory

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    18/43

    Since E(Xi|Fi1) = Xi1 andXi Xi1 ai M, we have

    E(et(XiXi1ai)|Fi1) = E( k=0

    tk

    k!(Xi Xi1 ai)k|Fi1)

    = 1 tai+ E(k=2

    tk

    k!(Xi Xi1 ai)k|Fi1)

    1 tai+ E( t2

    2(Xi Xi1 ai)2g(tM)|Fi1)

    = 1 tai+ t2

    2g(tM)E((Xi Xi1 ai)2|Fi1)

    = 1 tai+ t2

    2g(tM)(E((Xi Xi1)2|Fi1) + a2i )

    1 tai+t2

    2g(tM)(2i + a

    2i )

    etai+ t2

    2 g (tM)(2i+a

    2i ).

    Thus,

    E(etXi |Fi1) = E(et(XiXi1ai)|Fi1)etXi1+tai etai+ t

    2

    2 g(tM)(2i+a

    2i )etXi1+tai

    = et2

    2 g(tM)(2i+a

    2i )etXi1 .

    Inductively, we have

    E(etX) = E(E(etXn

    |Fn

    1))

    e t2

    2 g(tM)(2n+a

    2n)E(etXn1)

    ni=1

    et2

    2 g(tM)(2i+a

    2i )E(etX0)

    = e12 t

    2g(tM) n

    i=1(2i+a

    2i )etE(X).

    Then fort satisfying tM

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    19/43

    We chooset = ni=1(

    2i+a

    2i )+M/3

    . ClearlytM M(MiM)2+M/3) .

    7 Supermartingales and Submartingales

    In this section, we consider further strengthened versions of the martingale

    inequalities that were mentioned so far. Instead of a fixed upper bound for thevariance, we will assume that the variance Var(Xi|Fi1) is upper bounded by alinear function ofXi1. Here we assume this linear function is non-negative forall values that Xi1 takes. We first need some terminology.

    For a filter F:{, } = F0 F1 F n= F,

    19

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    20/43

    a sequence of random variables X0, X1, . . . , X nis called asubmartingaleifXiis

    Fi-measurable (i.e.,Xi(a) =Xi(b) if all elements of

    Ficontainingaalso contain

    b and vice versa) then E(Xi| Fi1) Xi1, for 1 i n.A sequence of random variables X0, X1, . . . , X n is said to be a supermartin-

    gale ifXi isFi-measurable and E(Xi| Fi1) Xi1, for 1 i n.To avoid repetition, we will first state a number of useful inequalities for sub-

    martingales and supermartingales. Then we will give the proof for the generalinequalities in Theorem 27 for submartingales and in Theorem 29 for super-martingales. Furthermore, we will show that all the stated theorems followfrom Theorems 27 and 29 (See Figure 10). Note that the inequalities for sub-martingale and supermartingale are not quite symmetric.

    Theorem 27

    Theorem 25

    Theorem 20 Theorem 29

    Theorem 26

    Theorem 22

    Submartingale Supermartingale

    Figure 10: The flowchart for theorems on submartingales and supermartingales

    Theorem 25 Suppose that a submartingaleX, associated with a filterF, sat-isfies

    Var(Xi|Fi1) iXi1and

    Xi

    E(Xi|Fi1)

    M

    for1 i n. Then we have

    Pr(Xn X0+ ) e 2

    2((X0+)( ni=1

    i)+M/3) .

    Theorem 26 Suppose that a supermartingale X, associated with a filter F,satisfies, for1 i n,

    Var(Xi|Fi1) iXi1and

    E(Xi|Fi1) Xi M.Then we have

    Pr(Xn X0 ) e 22(X0( ni=1

    i)+M/3) ,

    for any X0.

    20

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    21/43

    Theorem 27 Suppose that a submartingaleX, associated with a filterF, sat-isfies

    Var(Xi|Fi1) 2i + iXi1and

    Xi E(Xi|Fi1) ai+ Mfor1 i n. Herei,ai,i andMare non-negative constants. Then we have

    Pr(Xn X0+ ) e 2

    2( ni=1(

    2i+a

    2i )+(X0+)(

    ni=1i)+M/3) .

    Remark 28 Theorem 27 implies Theorem 25 by setting all is and ais tozero. Theorem 27 also implies Theorem 20 by choosing1 = =n= 0.

    The theorem for a supermartingale is slightly different due to the asymmetryof the condition on the variance.

    Theorem 29 Suppose a supermartingaleX, associated with a filterF, satisfies,for1 i n,

    Var(Xi|Fi1) 2i + iXi1and

    E(Xi|Fi1) Xi ai+ M,whereM, ais, is, andis are non-negative constants. Then we have

    Pr(Xn X0 ) e 2

    2( ni=1(

    2i+a

    2i )+X0(

    ni=1i)+M/3) ,

    for any 2X0+ n

    i=1(2i+a

    2i )

    ni=1i

    .

    Remark 30 Theorem 29 implies Theorem 26 by setting all is and ais tozero. Theorem 29 also implies Theorem 22 by choosing1 = =n= 0.

    Proof of Theorem 27:For a positive t (to be chosen later), we consider

    E(etXi |Fi1) = etE(Xi|Fi1)+taiE(et(XiE(Xi|Fi1)ai)|Fi1)

    = etE(Xi|Fi1)+taik=0

    tk

    k!E((Xi E(Xi|Fi1) ai)k |Fi1)

    etE(Xi|Fi1)+

    k=2tk

    k!E((XiE(Xi|Fi1)ai)k|Fi1)

    Recall that g(y) = 2k=2 yk2k! satisfies

    g(y) g(b)< 11 b/3

    for all y band 0 b 3.

    21

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    22/43

    SinceXi E(Xi|Fi1) ai M, we have

    k=2

    tk

    k!E((Xi E(Xi|Fi1) ai)k|Fi1) g(tM)

    2 t2E((Xi E(Xi|Fi1) ai)2|Fi1)

    = g(tM)

    2 t2(Var(Xi|Fi1) + a2i )

    g(tM)2

    t2(2i + iXi1+ a2i ).

    Since E(Xi|Fi1) Xi1, we have

    E(etXi |Fi1) etE(Xi|Fi1)+

    k=2tk

    k!E((XiE(Xi|Fi1)ai)k|Fi1)

    etXi1+ g(tM)2 t2(2i+iXi1+a2i )

    = e(t+ g(tM)

    2

    it2)Xi

    1

    e

    t2

    2

    g(tM)(2

    i

    +a2

    i

    )

    .

    We define ti 0 for 0< i n, satisfying

    ti1 = ti+g (t0M)

    2 it

    2i ,

    whilet0 will be chosen later. Then

    tn tn1 t0,

    and

    E(etiXi

    |Fi1)

    e(ti+

    g(tiM)2 it

    2i )Xi1e

    t2i2 g(tiM)(

    2i+a

    2i )

    e(ti+ g(t0M)2 t2ii)Xi1e t2i2 g(tiM)(2i+a2i )

    = eti1Xi1et2i2 g(tiM)(

    2i+a

    2i )

    sinceg(y) is increasing for y >0.By Markovs inequality, we have

    Pr(Xn X0+ ) etn(X0+)E(etnXn)= etn(X0+)E(E(etnXn |Fn1)) etn(X0+)E(etn1Xn1)e

    t2i2 g(tiM)(

    2i+a

    2i )

    etn(X0+)E(et0X0)e n

    i=1

    t2i

    2 g(tiM)(

    2

    i+a

    2

    i )

    etn(X0+)+t0X0+ t202 g(t0M)

    ni=1(

    2i+a

    2i ).

    22

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    23/43

    Note that

    tn = t0 ni=1

    (ti1 ti)

    = t0 ni=1

    g(t0M)

    2 it

    2i

    t0 g (t0M)2

    t20

    ni=1

    i.

    Hence

    Pr(Xn X0+ ) etn(X0+)+t0X0+t202 g(t0M)

    ni=1(

    2i+a

    2i )

    e(t0

    g(t0M)2 t

    20

    ni=1i)(X0+)+t0X0+

    t202 g(t0M)

    ni=1(

    2i+a

    2i )

    = et0+g(t0M)

    2 t20(

    ni=1(

    2i+a

    2i )+(X0+)

    ni=1i)

    Now we choose t0 =

    ni=1(

    2i+a

    2i )+(X0+)(

    ni=1i)+M/3

    . Using the fact that

    t0M

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    24/43

    and

    E(etiXi |Fi1) e(tig(tiM)

    2 t2

    ii)Xi1eg(tiM)

    2 t2

    i (2

    i+a2

    i )

    e(ti g(tnM)2 t2ii)Xi1e g(tnM)2 t2i (2i+a2i )= eti1Xi1e

    g(tnM)2 t

    2i (

    2i+a

    2i ).

    By Markovs inequality, we have

    Pr(Xn X0 ) = Pr(tnXn tn(X0 )) etn(X0)E(etnXn)= etn(X0)E(E(etnXn |Fn1)) etn(X0)E(etn1Xn1)e g(tnM)2 t2n(2n+a2n)

    etn(X0)E(et0X0)e

    ni=1

    g(tnM)2 t2i (2i+a2i )

    etn(X0)t0X0+ t2n2 g(tnM)

    ni=1(

    2i+a

    2i ).

    We note

    t0 = tn+

    ni=1

    (ti1 ti)

    = tn ni=1

    g(tnM)

    2 it

    2i

    tn

    g (tnM)

    2

    t2n

    n

    i=1

    i.

    Thus, we have

    Pr(Xn X0 ) etn(X0)t0X0+t2n2 g(tnM)

    ni=1(

    2i+a

    2i )

    etn(X0)(tn g(tnM)2 t2n)X0+ t2n2 g(tnM)

    ni=1(

    2i+a

    2i )

    = etn+g(tnM)

    2 t2n(

    ni=1(

    2i+a

    2i )+(

    ni=1i)X0)

    We choosetn=

    ni=1(

    2i+a

    2i )+(

    ni=1i)X0+M/3

    . We have tnM

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    25/43

    It remains to verify that all tis are non-negative. Indeed,

    ti t0 tn g (tnM)

    2 t2n

    ni=1

    i

    tn(1 12(1 tnM/3)tn

    ni=1

    i)

    = tn(1 2X0+

    ni=1(

    2i+a

    2i )

    ni=1i

    )

    0.

    The proof of the theorem is complete.

    8 The decision tree and relaxed concentration

    inequalities

    In this section, we will extend and generalize previous theorems to a martingalewhich is not strictly Lipschitz but is nearly Lipschitz. Namely, the (Lipschitz-like) assumptions are allowed to fail for relatively small subsets of the probabilityspace and we can still have similar but weaker concentration inequalities. Similartechniques have been introduced by Kim and Vu [19] in their important workon deriving concentration inequalities for multivariate polynomials. The basicsetup for decision trees can be found in [5] and has been used in the work ofAlon, Kim and Spencer [7]. Wormald [22] considers martingales with a stopping

    time that has a similar flavor. Here we use a rather general setting and we shallgive a complete proof here.We are only interested in finite probability spaces and we use the following

    computational model. The random variableXcan be evaluated by a sequenceof decisionsY1, Y2, . . . , Y n. Each decision has finitely many outputs. The proba-bility that an output is chosen depends on the previous history. We can describethe process by a decision tree T, a complete rooted tree with depth n. Eachedge uv of T is associated with a probability puv depending on the decisionmade from u to v . Note that for any node u, we have

    v

    pu,v= 1.

    We allowpuvto be zero and thus include the case of having fewer than routputs,for some fixed r. Let i denote the probability space obtained after the first idecisions. Suppose = nand Xis the random variable on . Let i : ibe the projection mapping each point to the subset of points with the samefirst i decisions. LetFi be the -field generated by Y1, Y2, . . . , Y i. (In fact,Fi = 1i (2i) is the full -field via the projectioni.) TheFi form a natural

    25

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    26/43

    filter:

    {,

    }=

    F0

    F1

    F n=

    F.

    The leaves of the decision tree are exactly the elements of . LetX0, X1, . . . , X n=Xdenote the sequence of decisions to evaluate X. Note that Xiis Fi-measurable,and can be interpreted as a labeling on nodes at depth i.

    There is one-to-one correspondence between the following:

    A sequence of random variables X0, X1, . . . , X nsatisfying Xiis Fi-measurable,fori = 0, 1, . . . , n.

    A vertex labeling of the decision tree T, f: V(T) R.

    In order to simplify and unify the proofs for various general types of mar-tingales, here we introduce a definition for a function f :V(T) R. We say fsatisfies an admissiblecondition P ifP =

    {Pv

    }holds for every vertex v.

    Examples of admissible conditions:

    1. Supermartingale: For 1 i n, we have

    E(Xi|Fi1) Xi1.

    Thus the admissible condition Pu holds if

    f(u) vC(u)

    puvf(v)

    where Cu is the set of all children nodes of u and puv is the transitionprobability at the edgeuv .

    2. Submartingale: For 1 i n, we haveE(Xi|Fi1) Xi1.

    In this case, the admissible condition of the submartingale is

    f(u) vC(u)

    puvf(v).

    3. Martingale: For 1 i n, we have

    E(Xi|Fi1) =Xi1.

    The admissible condition of the martingale is then:

    f(u) =vC(u)

    puvf(v).

    26

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    27/43

    4. c-Lipschitz: For 1 i n, we have

    |Xi Xi1| ci.The admissible condition of thec-Lipschitz property can be described asfollows:

    |f(u) f(v)| ci, for any childv C(u)where the node u is at level i of the decision tree.

    5. Bounded Variance: For 1 i n, we haveVar(Xi|Fi1) 2i

    for some constants i.

    The admissible condition of the bounded variance property can be de-scribed as:

    vC(u)p

    uvf

    2

    (v) ( vC(u)puvf(v))2

    2

    i .

    6. General Bounded Variance: For 1 i n, we haveVar(Xi|Fi1) 2i + iXi1

    where i, i are non-negative constants, and Xi 0. The admissiblecondition of the general bounded variance property can be described asfollows:vC(u)

    puvf2(v) (

    vC(u)

    puvf(v))2 2i +if(u), andf(u) 0

    wherei is the depth of the node u.

    7. Upper-bound: For 1 i n, we haveXi E(Xi|Fi1) ai+ M

    whereais, and M are non-negative constants. The admissible conditionof the upper bounded property can be described as follows:

    f(v) vC(u)

    puvf(v) ai+ M, for any childv C(u)

    wherei is the depth of the node u.

    8. Lower-bound: For 1 i n, we haveE(Xi|Fi1) Xi ai+ M

    whereais, and M are non-negative constants. The admissible conditionof the lower bounded property can be described as follows:

    (vC(u)

    puvf(v)) f(v) ai+ M, for any childv C(u)

    wherei is the depth of the node u.

    27

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    28/43

    For any labeling f onTand fixed vertex r, we can define a new labeling fras follows:

    fr(u) =

    f(r) ifu is a descedant ofr.f(u) otherwise.

    A propertyPis said to beinvariantunder subtree-unification if for any treelabelingfsatisfyingP, and a vertex r, fr satisfies P.

    We have the following theorem.

    Theorem 31 The eight properties as stated in the preceding examples super-martingale, submartingale, martingale, c-Lipschitz, bounded variance, generalbounded variance, upper-bounded, and lower-bounded properties are all invari-ant under subtree-unification.

    Proof: We note that these properties are all admissible conditions. Let Pdenote any one of these. For any node u, ifu is not a descendant ofr, then fr

    andfhave the same value on v and its children nodes. Hence, Pu holds for frsincePu does for f.

    Ifu is a descendant ofr, then fr(u) takes the same value as f(r) as well asits children nodes. We verify Pu in each case. Assume that u is at level i of thedecision treeT.

    1. For supermartingale, submartingale, and martingale properties, we have

    vC(u)

    puvfr(v) =vC(u)

    puvf(r)

    = f(r)vC(u)

    puv

    = f(r)= fr(u).

    Hence, Pu holds for fr.

    2. Forc-Lipschitz property, we have

    |fr(u) fr(v)|= 0 ci, for any childv C(u).

    Again,Pu holds for fr.

    3. For the bounded variance property, we have

    vC(u)

    puvf2r (v)

    ( vC(u)

    puvfr(v))2 =

    vC(u)puvf

    2(r)

    ( vC(u)

    puvf(r))2

    = f2(r) f2(r)= 0

    2i .

    28

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    29/43

    4. For the second bounded variance property, we have

    fr(u) =f(r) 0.vC(u)

    puvf2r (v) (

    vC(u)

    puvfr(v))2 =

    vC(u)

    puvf2(r) (

    vC(u)

    puvf(r))2

    = f2(r) f2(r)= 0

    2i + ifr(u).

    5. For upper-bounded property, we have

    fr(v) vC(u)

    puvfr(v) = f(r) vC(u)

    puvf(r)

    = f(r) f(r)= 0

    ai+ M.for any child v ofu.

    6. For the lower-bounded property, we havevC(u)

    puvfr(v) fr(v) =vC(u)

    puvf(r) f(r)

    = f(r) f(r)= 0

    ai+ M,

    for any child v ofu.

    Therefore,Pv holds forfr and any vertex v . .For two admissible conditions P and Q, we define P Q to be the property,

    which is only true when bothPandQ are true. If both admissible conditions Pand Q are invariant under subtree-unification, then P Qis also invariant undersubtree-unification.

    For any vertex u of the tree T, an ancestor of u is a vertex lying on theunique path from the root to u. For an admissible conditionP, the associatedbadset Bi overXis is defined to be

    Bi = {v| the depth ofv isi, and Pu does not hold for some ancestor u ofv}.

    Lemma 1 For a filterF{, } = F0 F1 F n= F,

    suppose each random variable Xi isFi-measurable, for 0 i n. For anyadmissible conditionP, letBi be the associated bad set ofP overXi. There arerandom variablesY0, . . . , Y n satisfying:

    29

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    30/43

    1. Yi isFi-measurable.

    2. Y0, . . . , Y n satisfy conditionP.

    3.{x: Yi(x) =Xi(x)} Bi, for0 i n.Proof: We modifyf and define f on Tas follows. For any vertex u,

    f(u) =

    f(u) iffsatisfies Pv for every ancestor v ofu including u itself,f(v) v is the ancestor with smallest depth so that f failsPv.

    LetSbe the set of vertices u satisfying

    f failsPu, f satisfiesPv for every ancestor v ofu.

    It is clear that fcan be obtained from fby a sequence of subtree-unifications,whereS is the set of the roots of subtrees. Furthermore, the order of subtree-unifications does not matter. Since P is invariant under subtree-unifications,the number of vertices that Pfails decreases. Now we will show f satisfiesP.

    Suppose to the contrary that f fails Pu for some vertex u. Since P isinvariant under subtree-unifications, f also fails Pu. By the definition, there isan ancestor v (ofu) inS. After the subtree-unification on subtree rooted at v,Pu is satisfied. This is a contradiction.

    LetY0, Y1, . . . , Y n be the random variables corresponding to the labeling f.Yis satisfy the desired properties in (1)-(3).

    The following theorem generalizes Azumas inequality. A similar but morerestricted version can be found in [19].

    Theorem 32 For a filterF

    {, } = F0 F1 F n= F,suppose the random variableXi isFi-measurable, for0in. LetB = Bndenote the bad set associated with the following admissible condition:

    E(Xi|Fi1) = Xi1|Xi Xi1| ci

    for1 i n wherec1, c2, . . . , cn are non-negative numbers. Then we have

    Pr(

    |Xn

    X0

    | )

    2e 2

    2 ni=1 c

    2i + Pr(B),

    Proof: We use Lemma 1 which gives random variables Y0, Y1, . . . , Y n satisfyingproperties (1)-(3) in the statement of Lemma 1. Then it satisfies

    E(Yi|Fi1) = Yi1|Yi Yi1| ci.

    30

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    31/43

    In other words,Y0, . . . , Y nform a martingale which is (c1, . . . , cn)-Lipschitz. ByAzumas inequality, we have

    Pr(|Yn Y0| ) 2e 2

    2 ni=1

    c2i.

    SinceY0 = X0 and{x: Yn(x) =Xn(x)} Bn= B, we have

    Pr(|Xn X0| ) Pr(|Yn Y0| ) + Pr(Xn=Yn)

    2e 2

    2 ni=1

    c2i + Pr(B).

    Forc = (c1, c2, . . . , cn) a vector with positive entries, a martingale is said tobe near-c-Lipschitz with an exceptional probability if

    i

    Pr(|Xi Xi1| ci) . (6)

    Theorem 32 can be restated as follows:

    Theorem 33 For non-negative values, c1, c2, . . . , cn, suppose a martingale Xis near-c-Lipschitz with an exceptional probability. ThenXsatisfies

    Pr(|X E(X)| < a) 2e a2

    2 ni=1

    c2i + .

    Now, we can apply the same technique to relax all the theorems in theprevious sections.

    Here are the relaxed versions of Theorems 20, 25, and 27.

    Theorem 34 For a filterF

    {, } = F0 F1 F n= F,

    suppose a random variable Xi isFi-measurable, for 0 i n. Let B be thebad set associated with the following admissible conditions:

    E(Xi| Fi1) Xi1Var(Xi|Fi1) 2i

    Xi E(Xi|Fi1) ai+ M

    for some non-negative constantsi andai. Then we have

    Pr(Xn X0+ ) e 2

    2( ni=1(

    2i +a

    2i )+M/3) + Pr(B).

    Theorem 35 For a filterF

    {, } = F0 F1 F n= F,

    31

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    32/43

    suppose a non-negative random variable Xi isFi-measurable, for 0 i n.LetB be the bad set associated with the following admissible conditions:

    E(Xi| Fi1) Xi1Var(Xi|Fi1) iXi1

    Xi E(Xi|Fi1) Mfor some non-negative constantsi andM. Then we have

    Pr(Xn X0+ ) e 2

    2((X0+)( ni=1i)+M/3) + Pr(B).

    Theorem 36 For a filterF

    {, } = F0 F1 F n= F,

    suppose a non-negative random variable Xi isFi-measurable, for 0 i n.LetB be the bad set associated with the following admissible conditions:

    E(Xi| Fi1) Xi1Var(Xi|Fi1) 2i + iXi1

    Xi E(Xi|Fi1) ai+ Mfor some non-negative constantsi, i, ai andM. Then we have

    Pr(Xn X0+ ) e 2

    2( ni=1

    (2i+a2

    i)+(X0+)(

    ni=1

    i)+M/3) + Pr(B).

    For supermartingales, we have the following relaxed versions of Theorem 22,26, and 29.

    Theorem 37 For a filterF

    {, } = F0 F1 F n= F,suppose a random variable Xi isFi-measurable, for 0 i n. Let B be thebad set associated with the following admissible conditions:

    E(Xi| Fi1) Xi1Var(Xi|Fi1) 2i

    E(Xi|Fi1) Xi ai+ Mfor some non-negative constantsi, ai andM. Then we have

    Pr(Xn X0 ) e 2

    2( ni=1(

    2i +a

    2i )+M/3) + Pr(B).

    Theorem 38 For a filterF

    {, } = F0 F1 F n= F,

    32

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    33/43

    suppose a random variable Xi isFi-measurable, for 0 i n. Let B be thebad set associated with the following admissible conditions:

    E(Xi| Fi1) Xi1Var(Xi|Fi1) iXi1

    E(Xi|Fi1) Xi Mfor some non-negative constantsi andM. Then we have

    Pr(Xn X0 ) e 2

    2(X0( ni=1

    i)+M/3) + Pr(B).

    for all X0.Theorem 39 For a filterF

    {, } = F0 F1 F n= F,suppose a non-negative random variable Xi isFi-measurable, for 0 i n.LetB be the bad set associated with the following admissible conditions:

    E(Xi| Fi1) Xi1Var(Xi|Fi1) 2i + iXi1

    E(Xi|Fi1) Xi ai+ M

    for some non-negative constantsi,i, ai andM. Then we have

    Pr(Xn X0 ) e 2

    2( ni=1

    (2i+a2

    i)+X0(

    ni=1

    i)+M/3) + Pr(B),

    for < X0.

    9 A generalized Polyas urn problem

    To see the powerful effect of the concentration and martingale inequalities inthe previous sections, the best way is to check out some interesting applications.In the this section we give the probabilistic analysis of the following processinvolving balls and bins:

    For a fixed 0 p 1, begin with bins,each containing one ball and then introduce balls one at a time. Foreach new ball, with probability p, create a new bin and place the

    ball in that bin; otherwise, place the ball in an existing bin, suchthat the probability the ball is placed in a bin is proportional to thenumber of balls in that bin.

    Polyas urn problem (see [18]) is a special case of the above process withp = 0 so new bins are never created. For the case ofp > 0, this infinite Polya

    33

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    34/43

    process has a similar flavor as the preferential attachment scheme, one of themain models for generating the webgraph among other information networks

    (see Barabasi et al [4, 6]).In Subsection 9.1, we will show that the infinite Polya process generates

    a power law distribution so that the expected fraction of bins having k ballsis asymptotic to ck , where = 1 + 1/(1 p) and c is a constant. Thenthe concentration result on the probabilistic error estimates for the power lawdistribution will be given in Subsection 9.2.

    9.1 The expected number of bins with k balls

    To analyze the infinite Polya process, we let nt denote the number of bins attime t and letet denote the number of balls at time t. We have

    et = t + .

    The number of bins nt, however, is a sum oft random indicator variables,

    nt = +

    ti=1

    st

    where

    Pr(sj = 1) = p,

    Pr(sj = 0) = 1 p.It follows that

    E(nt) = +pt.

    To get a handle on the actual value ofnt, we use the binomial concentration

    inequality as described in Theorem 4. Namely,

    Pr(|nt E(nt)| > a) ea2/(2pt+2a/3).Thus,nt is exponentially concentrated around E(nt).

    The problem of interest is the distribution of sizes of bins in the infinitePolya process.

    Let mk,t denote the number of bins with k balls at time t. First we notethat

    m1,0 = , and m0,k = 0.

    We wish to derive the recurrence for the expected value E(mk,t). Note thata bin with k balls at time t could have come from two cases, either it was a binwithk balls at timet 1 and no ball was added to it, or it was a bin with k 1balls at timet

    1 and a new ball was put in. Let

    Ftbe the -algebra generated

    by all the possible outcomes at time t.

    E(mk,t|Ft1) = mk,t1(1 (1 p)kt+

    ) + mk1,t1((1 p)(k 1)

    t+ 1 )

    E(mk,t) = E(mk,t1)(1 (1 p)kt+ 1 ) + E(mk1,t1)(

    (1 p)(k 1)t+ 1 ).(7)

    34

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    35/43

    Fort >0 and k = 1, we have

    E(m1,t|Ft1) = m1,t1(1 (1 p)t + 1 ) +p.

    E(m1,t) = E(m1,t1)(1 (1 p)t + 1 ) +p. (8)

    To solve this recurrence, we use the following fact (see [12]):For a sequence{at} satisfying the recursive relation at+1 = (1 btt )at+ct,

    limt att exists and

    limt

    att

    = c

    1 + b

    provided that limt bt = b >0 and limt ct = c.We proceed by induction on k to show that limt E(mk,t)/t has a limit

    Mk for each k .

    The first case is k = 1. In this case, we apply the above fact with bt = b =1 p and ct = c = p to deduce that limt E(m1,t)/t exists and

    M1 = limt

    E(m1,t)

    t =

    p

    2 p .

    Now we assume that limt E(mk1,t)/t exists and we apply the fact againwith bt = b = k(1p) and ct = E(mk1,t1)(1p)(k1)/(t+ 1), soc= Mk1(1 p)(k 1). Thus the limit limt E(mk,t)/texists and is equal to

    Mk = Mk1(1 p)(k 1)1 + k(1 p) =Mk1

    k 1k+ 11p

    . (9)

    Thus we can write

    Mk = p

    2 pkj=2

    j 1j+ 11p

    = p

    2 p(k)(2 + 11p)

    (k+ 1 + 11p )

    where (k) is the Gamma function.We wish to show that the distribution of the bin sizes follows a power law

    withMk k (where means is proportional to) for largek. IfMk k ,then

    MkMk1

    = k

    (k 1) = (1 1

    k) = 1

    k+ O(

    1

    k2).

    From (9) we have

    Mk

    Mk1 = k

    1

    k+ 11p = 1 1 + 11

    p

    k+ 11p = 1 1 + 11

    p

    k + O(

    1

    k2 ).

    Thus we have an approximate power-law with

    = 1 + 1

    1 p = 2 + p

    1 p .

    35

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    36/43

    9.2 Concentration on the number of bins with k balls

    Since the expected value can be quite different from the actual number of binswith k balls at time t, we give a (probabilistic) estimate of the difference.We will prove the following theorem.

    Theorem 40 For the infinite Polya process, asymptotically almost surely thenumber of bins withk balls at timet is

    Mk(t+ ) + O(2

    k3(t+ )ln(t+ )).

    Recall M1 = p2p andMk =

    p2p

    (k)(2+ 11p )(k+1+ 11p )

    = O(k(2+ p1p )), fork 2. In

    other words, almost surely the distribution of the bin sizes for the infinite Polyaprocess follows a power law with the exponent= 1 + 11p .

    Proof: We have shown that

    limt E(m

    k,t)t

    =Mk,

    whereMk is defined recursively in (9). It is sufficient to showmk,t concentrateson the expected value.

    We shall prove the following claim.

    Claim: For any fixed k1, for any c >0, with probability at least 1 2(t+ + 1)k1ec

    2

    , we have

    |mk,t Mk(t+ )| 2kc

    t + .

    To see that the claim implies Theorem 40, we choosec=

    k ln(t+ ). Notethat

    2(t + )k1ec2

    = 2(t + + 1)k1(t+ )k =o(1).

    From the claim, with probability 1 o(1), we have|mk,t Mk(t + )| 2

    k3(t+ )ln(t + ),

    as desired.It remains to prove the claim.

    Proof of the Claim: We shall prove it by induction on k .

    The base case ofk= 1:Fork = 1, from equation (8), we have

    E(m1,t M1(t+ )|Ft1)= E(m1,t|Ft1) M1(t + )= m1,t1(1 1 p

    t + 1 ) +p M1(t + 1) M1

    = (m1,t1 M1(t+ 1))(1 1 pt+ 1 ) +p M1(1 p) M1

    = (m1,t1 M1(t+ 1))(1 1 pt+ 1 )

    36

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    37/43

    sincep M1(1 p) M1 = 0.Let X1,t =

    m1,tM1(t+) tj=1(1

    1

    p

    j+1 ). We consider the martingale formed by 1 =

    X1,0, X1,1, . . . , X 1,t. We have

    X1,t X1,t1=

    m1,t M1(t +)tj=1(1 1pj+1)

    m1,t1 M1(t+ 1)t1j=1(1 1pj+1)

    = 1tj=1(1 1pj+1)

    [(m1,t M1(t+ )) (m1,t1 M1(t+ 1))(1 1 pt + 1 )]

    = 1tj=1(1 1pj+1)

    [(m1,t m1,t1) + 1 pt+ 1 (m1,t1 M1(t+ 1)) M1].

    We note that

    |m1,t

    m1,t

    1

    | 1, m1,t

    1

    t, andM1 =

    p2p

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    38/43

    Here E(X1,t) = X1,0 = 1. We will use the following approximation.

    ij=1

    (1 1 pj+ 1 ) =

    ij=1

    j+ 2 +pj+ 1

    = ()(i + 1 +p)

    ( 1 +p)(i + ) C(i + )1+p

    whereC= ()(1+p) is a constant depending only on p and .

    For anyc >0, we choose = 4ct+

    tj=1(1 1pj )

    4C1ct3/2p. We have

    t

    i=1

    2i =

    t

    i=1

    4

    ij=1(1

    1

    p

    j )2

    ti=1

    4C2(i + )22p

    4C2

    3 2p (t + )32p

    < 4C2(t+ )32p.

    We note that

    M/3 83

    C2ct5/22p 0. In fact, it is trivial when4c/3>

    t+ since|m1,t M1(t + )| 2talways holds.

    Similarly, by applying Theorem 23 on the martingale, the following lowerbound

    m1,t M1(t+ ) 2c

    t+ (13)

    38

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    39/43

    holds with probability at least 1 ec2 .We have proved the claim for k = 1.

    The inductive step:Suppose the claim holds fork 1. Fork , we define

    Xk,t=mk,t Mk(t+ ) 2(k 1)c

    t + t

    j=1(1 (1p)kj+1 ).

    We have

    E(mk,t Mk(t+ ) 2(k 1)c

    t + |Ft1)= E(mk,t|Ft1) Mk(t+ ) 2(k 1)c

    t +

    = mk,t1(1 (1 p)kt + 1 ) + mk1,t1(

    (1 p)(k 1)t + 1 )

    Mk(t+ ) 2(k 1)ct+ .By the induction hypothesis, with probability at least 1 2tk2ec2, we have

    |mk1,t1 Mk1(t + )| 2(k 1)c

    t+ .

    By using this estimate, with probability at least 1 2tk2ec2 , we have

    E(mk,t Mk(t+ ) 2(k 1)c

    t + |Ft1) (1 (1 p)k

    t )(mk,t1 Mk(t+ 1) 2(k 1)c

    t + 1)

    from the fact that Mk Mk1 as seen in (9).Therefore, 0 =Xk,0, Xk,1, , Xk,t forms a submartingale with failure prob-ability at most 2tk2ec2.Similar to inequalities (10) and (11), it can be easily shown that

    |Xk,t Xk,t1| 4tj=1(1 (1p)kj+1 )

    (14)

    and

    Var(Xk,t|Ft1) 4tj=1(1 (1p)kj+1 )2

    .

    We apply Theorem 35 on the submartingale with 2i = 4

    ij=1(1 (1p)kj+1 )2

    ,

    M= 4 tj=1(1 (1p)j+1) andai= 0. We have

    Pr(Xk,t E(Xk,t) + ) e 2

    2( ti=1

    2i+M/3) + Pr(B),

    where Pr(B) tk1ec2 by induction hypothesis.

    39

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    40/43

    Here E(Xk,t) = Xk,0 = 0. We will use the following approximation.

    ij=1

    (1 (1 p)kj+ 1 ) =

    ij=1

    j (1 p)kj+ 1

    = ()

    (1 (1 p)k)(i + 1 (1 p)k)

    (i + )

    Ck(i + )(1p)k

    whereCk = ()

    (1(1p)k) is a constant depending only on k , p and .

    For anyc >0, we choose = 4ct+

    tj=1(1 (1p)kj )

    4C1k ct3/2p. We have

    t

    i=1

    2i =t

    i=1

    4

    ij=1(1

    (1p)kj )2

    ti=1

    4C2k (i+ )2k(1p)

    4C2k

    1 + 2k(1 p) (t+ )1+2k(1p)

    < 4C2k (t + )1+2k(1p).

    We note that

    M /3 83

    C2k c(t + )1/2+2(1p) 0. In fact, it is trivial when4c/3>

    t+ since|mk,t Mk(t+ )| 2(t + ) always holds.

    40

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    41/43

    To obtain the lower bound, we consider

    Xk,t= mk,t Mk(t+ ) + 2(k 1)ct + tj=1(1 (1p)kj+1 )

    .

    It can be easily shown that Xk,t is nearly a supermartingale. Similarly, if ap-plying Theorem 38 to Xk,t, the following lower bound

    mk,t Mk(t+ ) 2kc

    t + (16)

    holds with probability at least 1 (t+ + 1)k1ec2 .Together these prove the statement for k. The proof of Theorem 40 is

    complete. The above methods for proving concentration of the power law distribution

    for the infinite Polya process can be easily carried out for many other problems.One of the most popular models for generating random graphs (which simu-

    late webgraphs and various information networks) is the so-called preferentialattachment scheme. The problem on the degree distribution of the preferentialattachment scheme can be viewed as a variation of the Polya process as we willsee. Before we proceed, we first give a short description for the preferentialattachment scheme [3, 21]:

    With probabilityp, for some fixed p, add a new vertexv, and add an edge{u, v}fromv by randomly and independently choosingu in proportion tothe degree ofu in the current graph. The initial graph, say, is one singlevertex with a loop.

    Otherwise, add a new edge {r, s} by independently choosing verticesr ands with probability proportional to their degrees. Here r and s could be

    the same vertex.The above preferential attachment scheme can be rewritten as the following

    variation of the Polya process:

    Start with one bin containing one ball. At each step, with probability p, add two balls, one to a new bin and

    one to an existing bin with probability proportional to the bin size. withprobability 1 p, add two balls, each of which is independently placed toan existing bin with probability proportional to the bin size.

    As we can see, the bins are the vertices; at each time step the bins thatthe two balls are placed are associated with an edge; the bin size is exactly thedegree of the vertex.

    It is not difficult to show the expected degrees of the preferential attachmentmodel satisfy a power law distribution with exponent 1 + 2/(2p) (see [3,21]). The concentration results for the power law degree distribution of thepreferential attachment scheme can be proved in a very similar way as what wehave done in this section for the Polya process. The details of the proof can befound in a forthcoming book [13].

    41

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    42/43

    References

    [1] J. Abello, A. Buchsbaum, and J. Westbrook, A functional approach toexternal graph algorithms, Proc. 6th European Symposium on Algorithms,pp. 332343, 1998.

    [2] W. Aiello, F. Chung and L. Lu, A random graph model for massive graphs,Proceedings of the Thirty-Second Annual ACM Symposium on Theory ofComputing, (2000) 171-180.

    [3] W. Aiello, F. Chung and L. Lu, Random evolution in massive graphs,Extended abstract appeared inThe 42th Annual Symposium on Foundationof Computer Sciences, October, 2001. Paper version appeared inHandbookof Massive Data Sets, (Eds. J. Abello, et. al.), Kluwer Academic Publishers(2002), 97-122.

    [4] R. Albert and A.-L. Barabasi, Statistical mechanics of complex networks,Review of Modern Physics74 (2002), 47-97.

    [5] N. Alon and J. H. Spencer,The Probabilistic Method, Wiley and Sons, NewYork, 1992.

    [6] A.-L. Barabasi and R. Albert, Emergence of scaling in random networks,Science286 (1999) 509-512.

    [7] N. Alon, J.-H. Kim and J. H. Spencer, Nearly perfect matchings in regularsimple hypergraphs,Israel J. Math. 100 (1997), 171-187.

    [8] H. Chernoff, A note on an inequality involving the normal distribution,Ann. Probab., 9, (1981), 533-535.

    [9] F. Chung and L. Lu, Connected components in random graphs with givenexpected degree sequences, Annals of Combinatorics6, (2002), 125145.

    [10] F. Chung and L. Lu, The average distances in random graphs with givenexpected degrees, Proceeding of National Academy of Science99, (2002),1587915882.

    [11] F. Chung, L. Lu and V. Vu, The spectra of random graphs with givenexpected degrees, Proceedings of National Academy of Sciences, 100, no.11, (2003), 6313-6318.

    [12] F. Chung and L. Lu, Coupling online and offline analyses for random powerlaw graphs,Internet Mathematics, 1 (2004), 409-461.

    [13] F. Chung and L. Lu,Complex Graphs and Networks, CBMS Lecture Notes,in preparation.

    [14] F. Chung, S. Handjani and D. Jungreis, Generalizations of Polyas urnproblem,Annals of Combinatorics, 7, (2003), 141-153.

    42

  • 8/9/2019 Concentration inequalities and martingale inequalities a survey

    43/43

    [15] W. Feller, Martingales,An Introduction to Probability Theory and its Ap-plications, Vol. 2, New York, Wiley, 1971.

    [16] R. L. Graham, D. E. Knuth and O. Patashnik, Concrete Mathematics,Second edition, Addison-Wesley Publishing Company, Reading, MA, 1994.

    [17] S. Janson, T. Luczak, and A. Rucnski, Random Graphs, Wiley-Interscience, New York, 2000.

    [18] N. Johnson and S. Kotz, Urn Models and their Applications: An approachto Modern Discrete Probability Theory, Wiley, New York, 1977.

    [19] J. H. Kim and V. Vu, Concentration of multivariate polynomials and itsapplications,Combinatorica, 20 (3) (2000), 417-434.

    [20] C. McDiarmid, Concentration, Probabilistic methods for algorithmic dis-

    crete mathematics, 195248, Algorithms Combin., 16, Springer, Berlin,1998.

    [21] M. Mitzenmacher, A brief history of generative models for power law andlognormal distribution,Internet Mathematics, 1, (2004), 226-251.

    [22] N. C. Wormald, The differential equation method for random processesand greedy algorithms, in Lectures on Approximation and Randomized Al-gorithms, (M. Karonski and H. J. Proemel, eds), pp. 73-155, PWN, Warsaw,1999.