probability theory 1 lecture notes - cornell university web6720/math 6710 notes.pdfprobability...

Download Probability Theory 1 Lecture Notes - Cornell University web6720/MATH 6710 notes.pdfPROBABILITY THEORY 1 LECTURE NOTES JOHN PIKE These lecture notes were written for MATH 6710 at Cornell University in the allF semester of 2013. They were revised in the allF of 2015 and the schedule on the following page

Post on 20-Apr-2018




5 download

Embed Size (px)




    These lecture notes were written for MATH 6710 at Cornell University in the Fall semester

    of 2013. They were revised in the Fall of 2015 and the schedule on the following page

    reects that semester. These notes are for personal educational use only and are not to

    be published or redistributed. Almost all of the material, much of the structure, and some

    of the language comes directly from the course text, Probability: Theory and Examples by

    Rick Durrett. Gerald Folland's Real Analysis: Modern Techniques and Their Applications

    is the source of most of the auxiliary measure theory details. There are likely typos and

    mistakes in these notes. All such errors are the author's fault and corrections are greatly



  • Day 1: Finished Section 1

    Day 2: Finished Section 2

    Day 3: Up to denition of semialgebra

    Day 4: Finished Section 3

    Day 5: Through review of integration

    Day 6: Through Limit Theorems

    Day 7: Through Corollary 6.2

    Day 8: Through Theorem 6.6

    Day 9: Finished Section 6

    Day 10: Through Corollary 7.1

    Day 11: Through Example 7.5

    Day 12: Through Theorem 7.4

    Day 13: Finished Section 7

    Day 14: Through Theorem 8.3

    Day 15: Finished Section 8

    Day 16: Through Theorem 9.2

    Day 17: Through Example 10.2

    Day 18: Up to Claim in 3-series Theorem

    Day 19: Finished Section 10

    Day 20: Through Theorem 11.1

    Day 21: Through Theorem 11.4

    Day 22: Finished Section 11

    Day 23: Through Theorem 12.1

    Day 24: Up to Theorem 12.4

    Day 25: Through Lemma 13.2

    Day 26: Through Theorem 13.3

    Day 27: Finished Section 13

    Day 28: Through Example 14.1

    Day 29: Finished Section 14

    Day 30: Through beginning of proof of Lemma 15.2

    Day 31: Through Theorem 15.2

    Day 32: Through Theorem 15.6

    Day 33: Finished Section 16

    Day 34: Through Proposition 17.2

    Day 35: Started proof of Theorem 18.1 (skipped Wald 2)

    Day 36: Through Theorem 18.4

    Day 37: Through Lemma 19.3 (skipped Chung-Fuchs)

    Day 38: Finished Section 19


  • 1. Introduction

    Probability Spaces.

    A probability space is a measure space (,F , P ) with P () = 1.

    The sample space can be any set, and it can be thought of as the collection of all possible outcomes of

    some experiment or all possible states of some system. Elements of are referred to as elementary outcomes.

    The -algebra (or -eld) F 2 satises

    1) F is nonempty

    2) E F EC F

    3) For any countable collection {Ei}iI F ,iI Ei F .

    Elements of F are called events, and can be regarded as sets of elementary outcomes about which one can saysomething meaningful. Before the experiment has occurred (or the observation has been made), a meaningful

    statement about E F is P (E). Afterward, a meaningful statement is whether or not E occurred.

    The probability measure P : F [0, 1] satises

    1) P () = 1

    2) for any countable disjoint collection {Ei}iI , P(

    iI Ei)

    =iI P (Ei).

    The interpretation is that P (A) represents the chance that event A occurs (though there is no general

    consensus about what that actually means).

    If p is some property and A = { : p() is true} is such that P (A) = 1, then we say that p holdsalmost surely, or a.s. for short. This is equivalent to almost everywhere in measure theory. Note that it

    is possible to have an event E F with E 6= and P (E) = 0. Thus, for instance, there is a distinctionbetween impossible and with probability zero as discussed in Example 1.3 below.

    Example 1.1. Rolling a fair die: = {1, 2, 3, 4, 5, 6}, F = 2, P (E) = |E|6 .

    Example 1.2. Flipping a (possibly biased) coin: = {H,T}, F = 2 = {, {H}, {T}, {H,T}}, P satisesP ({H}) = p and P ({T}) = 1 p for some p (0, 1).

    Example 1.3. Random point in the unit interval: = [0, 1], F = B[0,1] = Borel Sets, P = Lebesgue measure.

    The experiment here is to pick a real number between 0 and 1 uniformly at random. Generally speaking,

    uniformity corresponds to translation invariance, which is the primary dening property of Lebesgue measure.

    Observe that each outcome x [0, 1] has P ({x}) = 0, so the experiment must result in the realization of anoutcome with probability zero.

    Example 1.4. Standard normal distribution: = R, F = B, P (E) = 12



    2 dx.

    Example 1.5. Poisson distribution with mean : = N {0}, F = 2, P (E) = ekE


    k! .


  • Why Measure Theory.

    Historically, probability was dened in terms of a nite number of equally likely outcomes (Example 1.1) so

    that ||

  • The expectation (or mean or expected value) of a real-valued random variable X on (,F , P ) is dened asE[X] =

    X()dP () whenever the integral is well-dened.

    Expectation is generally interpreted as a weighted average which gives the best guess for the value of the

    random quantity X.

    We will study random variables and their expectations in greater detail soon. For now, the point is that many

    familiar objects from undergraduate probability can be rigorously and simply dened using the language of

    measure theory.

    That said, it should be emphasized that probability is not just the study of measure spaces with total mass

    1. As useful and necessary as the rigorous measure theoretic foundations are, it is equally important to

    cultivate a probabilistic way of thinking whereby one conceptualizes problems in terms of coin tossing, card

    shuing, particle trajectories, and so forth.

    * An example of a subset of [0, 1] which has no well-dened Lebesgue measure is given by the following


    Dene an equivalence relation on [0, 1] by x y if and only if x y Q.Using the axiom of choice, let E [0, 1] consist of exactly one point from each equivalence class.For q Q[0,1), dene Eq = E + q (mod 1). By construction Eq

    Er = for r 6= q and

    qQ[0,1) Eq = [0, 1].

    Thus, by countable additivity, we must have

    1 = m ([0, 1)) = m



    = qQ[0,1)


    However, Lebesgue measure is translation invariant, so m(Eq) = m(E) for all q.

    We see that m(Eq) is not well-dened as m(Eq) = 0 implies 1 = 0 and m(Eq) > 0 implies 1 =.

    The existence of non-measurable sets can be proved using slightly weaker assumptions than the axiom of

    choice (such as the Boolean prime ideal theorem), but it has been shown that the existence of non-measurable

    sets is not provable in Zermelo-Fraenkel alone.

    In three or more dimensions, the Banach-Tarski paradox shows that in ZFC, there is no nitely additive

    measure dened on all subsets of Euclidean space which is invariant under translation and rotation.

    (The paradox is that one can cut a unit ball into ve pieces and reassemble them using only rigid motions

    to obtain two disjoint unit balls.)


  • 2. Preliminary Results

    At this point, we need to establish some fundamental facts about probability measures and -algebras in

    preparation for a discussion of probability distributions and to reacquaint ourselves with the style of measure

    theoretic arguments.

    Probability Measures.

    The following simple facts are extremely useful and will be employed frequently throughout this course.

    Theorem 2.1. Let P be a probability measure on (,F).

    (i) Complements For any A F , P (AC) = 1 P (A).

    (ii) Monotonicity For any A,B F with A B, P (A) P (B).

    (iii) Subadditivity For any countable collection {Ei}i=1 F , P (i=1Ei)

    i=1 P (Ei).

    (iv) Continuity from below If Ai A (i.e. A1 A2 ... andi=1Ai = A), then

    limn P (An) = P (A).

    (v) Continuity from above If Ai A =i=1Ai, then limn P (An) = P (A).


    For (i), 1 = P () = P (A tAC) = P (A) + P (AC) by countable additivity.

    For (ii), P (B) = P (A t (B \A)) = P (A) + P (B \A) P (A).

    For (iii), we disjointify the sets by dening F1 = E1 and Fi = Ei \(i1


    )for i > 1, and observe that

    the F is are disjoint andni=1 Fi =

    ni=1Ei for all n N {}. Since Fi Ei for all i, we have


    ( i=1


    )= P

    ( i=1




    P (Fi) i=1

    P (Ei).

    For (iv), set B1 = A1 and Bi = Ai \ Ai1 for i > 1, and note that the Bis are disjoint withni=1Bi = An

    andi=1Bi = A. Then

    P (A) = P

    ( i=1




    P (Bi) = limn


    P (Bi)

    = limn




    )= limn

    P (An).

    For (v), if A1 A2 ... and A =i=1Ai, then A

    C1 AC2 ... and AC = (



    Ci , so it

    follows from (i) and (iv) that

    P (A) = 1 P (AC) = 1 limn

    P (ACn ) = limn

    (1 P (ACn )

    )= limn

    P (An).

    Note that (ii)-(iv) hold for any measure space (S,G, ), (v) is true for arbitrary measure spaces under theassumption that there is some Ai with (Ai) < , and (i) holds for all nite measures upon replacing 1with (S).


  • Sigma Algebras.

    We now review some some basic facts about -algebras. Our rst observation is

    Proposition 2.1. For any index set I, if {Fi}iI is a collection of -algebras on , then so is iIFi.

    It follows easily from Proposition 2.1 that for any collection of sets A 2, there is a smallest -algebracontaining A - namely, the intersection of all -algebras containing A.This is called the -algebra generated by A and is denoted by (A).

    Note that if F is a -algebra and A F , then (A) F .

    An important class of examples are the Borel -algebras: If (X, T )