# Probability Theory 1 Lecture Notes - Cornell University web6720/MATH 6710 notes.pdfPROBABILITY THEORY 1 LECTURE NOTES JOHN PIKE These lecture notes were written for MATH 6710 at Cornell University in the allF semester of 2013. They were revised in the allF of 2015 and the schedule on the following page

Post on 20-Apr-2018

218 views

Embed Size (px)

TRANSCRIPT

<ul><li><p>PROBABILITY THEORY 1 LECTURE NOTES</p><p>JOHN PIKE</p><p>These lecture notes were written for MATH 6710 at Cornell University in the Fall semester</p><p>of 2013. They were revised in the Fall of 2015 and the schedule on the following page</p><p>reects that semester. These notes are for personal educational use only and are not to</p><p>be published or redistributed. Almost all of the material, much of the structure, and some</p><p>of the language comes directly from the course text, Probability: Theory and Examples by</p><p>Rick Durrett. Gerald Folland's Real Analysis: Modern Techniques and Their Applications</p><p>is the source of most of the auxiliary measure theory details. There are likely typos and</p><p>mistakes in these notes. All such errors are the author's fault and corrections are greatly</p><p>appreciated.</p><p>1</p></li><li><p>Day 1: Finished Section 1</p><p>Day 2: Finished Section 2</p><p>Day 3: Up to denition of semialgebra</p><p>Day 4: Finished Section 3</p><p>Day 5: Through review of integration</p><p>Day 6: Through Limit Theorems</p><p>Day 7: Through Corollary 6.2</p><p>Day 8: Through Theorem 6.6</p><p>Day 9: Finished Section 6</p><p>Day 10: Through Corollary 7.1</p><p>Day 11: Through Example 7.5</p><p>Day 12: Through Theorem 7.4</p><p>Day 13: Finished Section 7</p><p>Day 14: Through Theorem 8.3</p><p>Day 15: Finished Section 8</p><p>Day 16: Through Theorem 9.2</p><p>Day 17: Through Example 10.2</p><p>Day 18: Up to Claim in 3-series Theorem</p><p>Day 19: Finished Section 10</p><p>Day 20: Through Theorem 11.1</p><p>Day 21: Through Theorem 11.4</p><p>Day 22: Finished Section 11</p><p>Day 23: Through Theorem 12.1</p><p>Day 24: Up to Theorem 12.4</p><p>Day 25: Through Lemma 13.2</p><p>Day 26: Through Theorem 13.3</p><p>Day 27: Finished Section 13</p><p>Day 28: Through Example 14.1</p><p>Day 29: Finished Section 14</p><p>Day 30: Through beginning of proof of Lemma 15.2</p><p>Day 31: Through Theorem 15.2</p><p>Day 32: Through Theorem 15.6</p><p>Day 33: Finished Section 16</p><p>Day 34: Through Proposition 17.2</p><p>Day 35: Started proof of Theorem 18.1 (skipped Wald 2)</p><p>Day 36: Through Theorem 18.4</p><p>Day 37: Through Lemma 19.3 (skipped Chung-Fuchs)</p><p>Day 38: Finished Section 19</p><p>2</p></li><li><p>1. Introduction</p><p>Probability Spaces.</p><p>A probability space is a measure space (,F , P ) with P () = 1.</p><p>The sample space can be any set, and it can be thought of as the collection of all possible outcomes of</p><p>some experiment or all possible states of some system. Elements of are referred to as elementary outcomes.</p><p>The -algebra (or -eld) F 2 satises</p><p>1) F is nonempty</p><p>2) E F EC F</p><p>3) For any countable collection {Ei}iI F ,iI Ei F .</p><p>Elements of F are called events, and can be regarded as sets of elementary outcomes about which one can saysomething meaningful. Before the experiment has occurred (or the observation has been made), a meaningful</p><p>statement about E F is P (E). Afterward, a meaningful statement is whether or not E occurred.</p><p>The probability measure P : F [0, 1] satises</p><p>1) P () = 1</p><p>2) for any countable disjoint collection {Ei}iI , P(</p><p>iI Ei)</p><p>=iI P (Ei).</p><p>The interpretation is that P (A) represents the chance that event A occurs (though there is no general</p><p>consensus about what that actually means).</p><p>If p is some property and A = { : p() is true} is such that P (A) = 1, then we say that p holdsalmost surely, or a.s. for short. This is equivalent to almost everywhere in measure theory. Note that it</p><p>is possible to have an event E F with E 6= and P (E) = 0. Thus, for instance, there is a distinctionbetween impossible and with probability zero as discussed in Example 1.3 below.</p><p>Example 1.1. Rolling a fair die: = {1, 2, 3, 4, 5, 6}, F = 2, P (E) = |E|6 .</p><p>Example 1.2. Flipping a (possibly biased) coin: = {H,T}, F = 2 = {, {H}, {T}, {H,T}}, P satisesP ({H}) = p and P ({T}) = 1 p for some p (0, 1).</p><p>Example 1.3. Random point in the unit interval: = [0, 1], F = B[0,1] = Borel Sets, P = Lebesgue measure.</p><p>The experiment here is to pick a real number between 0 and 1 uniformly at random. Generally speaking,</p><p>uniformity corresponds to translation invariance, which is the primary dening property of Lebesgue measure.</p><p>Observe that each outcome x [0, 1] has P ({x}) = 0, so the experiment must result in the realization of anoutcome with probability zero.</p><p>Example 1.4. Standard normal distribution: = R, F = B, P (E) = 12</p><p>Ee</p><p>x2</p><p>2 dx.</p><p>Example 1.5. Poisson distribution with mean : = N {0}, F = 2, P (E) = ekE</p><p>k</p><p>k! .</p><p>3</p></li><li><p>Why Measure Theory.</p><p>Historically, probability was dened in terms of a nite number of equally likely outcomes (Example 1.1) so</p><p>that || </p></li><li><p>The expectation (or mean or expected value) of a real-valued random variable X on (,F , P ) is dened asE[X] =</p><p>X()dP () whenever the integral is well-dened.</p><p>Expectation is generally interpreted as a weighted average which gives the best guess for the value of the</p><p>random quantity X.</p><p>We will study random variables and their expectations in greater detail soon. For now, the point is that many</p><p>familiar objects from undergraduate probability can be rigorously and simply dened using the language of</p><p>measure theory.</p><p>That said, it should be emphasized that probability is not just the study of measure spaces with total mass</p><p>1. As useful and necessary as the rigorous measure theoretic foundations are, it is equally important to</p><p>cultivate a probabilistic way of thinking whereby one conceptualizes problems in terms of coin tossing, card</p><p>shuing, particle trajectories, and so forth.</p><p>* An example of a subset of [0, 1] which has no well-dened Lebesgue measure is given by the following</p><p>construction:</p><p>Dene an equivalence relation on [0, 1] by x y if and only if x y Q.Using the axiom of choice, let E [0, 1] consist of exactly one point from each equivalence class.For q Q[0,1), dene Eq = E + q (mod 1). By construction Eq</p><p>Er = for r 6= q and</p><p>qQ[0,1) Eq = [0, 1].</p><p>Thus, by countable additivity, we must have</p><p>1 = m ([0, 1)) = m</p><p> qQ[0,1)</p><p>Eq</p><p> = qQ[0,1)</p><p>m(Eq).</p><p>However, Lebesgue measure is translation invariant, so m(Eq) = m(E) for all q.</p><p>We see that m(Eq) is not well-dened as m(Eq) = 0 implies 1 = 0 and m(Eq) > 0 implies 1 =.</p><p>The existence of non-measurable sets can be proved using slightly weaker assumptions than the axiom of</p><p>choice (such as the Boolean prime ideal theorem), but it has been shown that the existence of non-measurable</p><p>sets is not provable in Zermelo-Fraenkel alone.</p><p>In three or more dimensions, the Banach-Tarski paradox shows that in ZFC, there is no nitely additive</p><p>measure dened on all subsets of Euclidean space which is invariant under translation and rotation.</p><p>(The paradox is that one can cut a unit ball into ve pieces and reassemble them using only rigid motions</p><p>to obtain two disjoint unit balls.)</p><p>5</p></li><li><p>2. Preliminary Results</p><p>At this point, we need to establish some fundamental facts about probability measures and -algebras in</p><p>preparation for a discussion of probability distributions and to reacquaint ourselves with the style of measure</p><p>theoretic arguments.</p><p>Probability Measures.</p><p>The following simple facts are extremely useful and will be employed frequently throughout this course.</p><p>Theorem 2.1. Let P be a probability measure on (,F).</p><p>(i) Complements For any A F , P (AC) = 1 P (A).</p><p>(ii) Monotonicity For any A,B F with A B, P (A) P (B).</p><p>(iii) Subadditivity For any countable collection {Ei}i=1 F , P (i=1Ei) </p><p>i=1 P (Ei).</p><p>(iv) Continuity from below If Ai A (i.e. A1 A2 ... andi=1Ai = A), then</p><p>limn P (An) = P (A).</p><p>(v) Continuity from above If Ai A =i=1Ai, then limn P (An) = P (A).</p><p>Proof.</p><p>For (i), 1 = P () = P (A tAC) = P (A) + P (AC) by countable additivity.</p><p>For (ii), P (B) = P (A t (B \A)) = P (A) + P (B \A) P (A).</p><p>For (iii), we disjointify the sets by dening F1 = E1 and Fi = Ei \(i1</p><p>j=1Ej</p><p>)for i > 1, and observe that</p><p>the F is are disjoint andni=1 Fi =</p><p>ni=1Ei for all n N {}. Since Fi Ei for all i, we have</p><p>P</p><p>( i=1</p><p>Ei</p><p>)= P</p><p>( i=1</p><p>Fi</p><p>)=</p><p>i=1</p><p>P (Fi) i=1</p><p>P (Ei).</p><p>For (iv), set B1 = A1 and Bi = Ai \ Ai1 for i > 1, and note that the Bis are disjoint withni=1Bi = An</p><p>andi=1Bi = A. Then</p><p>P (A) = P</p><p>( i=1</p><p>Bi</p><p>)=</p><p>i=1</p><p>P (Bi) = limn</p><p>ni=1</p><p>P (Bi)</p><p>= limn</p><p>P</p><p>(ni=1</p><p>Bi</p><p>)= limn</p><p>P (An).</p><p>For (v), if A1 A2 ... and A =i=1Ai, then A</p><p>C1 AC2 ... and AC = (</p><p>i=1Ai)</p><p>C=i=1A</p><p>Ci , so it</p><p>follows from (i) and (iv) that</p><p>P (A) = 1 P (AC) = 1 limn</p><p>P (ACn ) = limn</p><p>(1 P (ACn )</p><p>)= limn</p><p>P (An). </p><p>Note that (ii)-(iv) hold for any measure space (S,G, ), (v) is true for arbitrary measure spaces under theassumption that there is some Ai with (Ai) < , and (i) holds for all nite measures upon replacing 1with (S).</p><p>6</p></li><li><p>Sigma Algebras.</p><p>We now review some some basic facts about -algebras. Our rst observation is</p><p>Proposition 2.1. For any index set I, if {Fi}iI is a collection of -algebras on , then so is iIFi.</p><p>It follows easily from Proposition 2.1 that for any collection of sets A 2, there is a smallest -algebracontaining A - namely, the intersection of all -algebras containing A.This is called the -algebra generated by A and is denoted by (A).</p><p>Note that if F is a -algebra and A F , then (A) F .</p><p>An important class of examples are the Borel -algebras: If (X, T ) is a topological space, then BX = (T )is called the Borel -algebra. It is worth recalling that for the standard topology on R, the Borel sets aregenerated by open intervals, closed intervals, half-open intervals, open rays, and closed rays, respectively.</p><p>Our main technical result about -algebras is Dynkin's - Theorem, an extremely useful result which is</p><p>often omitted in courses on measure theory. To state the result, we will need the following denitions.</p><p>Denition. A collection of sets P 2 is called a -system if it is closed under intersection.</p><p>Denition. A collection of sets L 2 is called a -system if</p><p>(1) L(2) If A,B L and A B, then B \A L(3) If An L with An A, then A L</p><p>Theorem 2.2 (Dynkin). If P is a -system and L is a -system with P L, then (P) L.</p><p>Proof. We begin by observing that the intersection of any number of -systems is a -system, so for any</p><p>collection A, there is a smallest -system `(A) containing A. Thus it will suce to show</p><p>a) `(P) is a -algebra (since then (P) `(P) L).</p><p>In fact, as one easily checks that a -system which is closed under intersection is a -algebra</p><p>(AC = \A, A B = (AC BC)C , andni=1Ai </p><p>i=1Ai), we need only to demonstrate</p><p>b) `(P) is closed under intersection.</p><p>To this end, dene GA = {B : A B `(P)} for any set A. To complete the proof, we will rst show</p><p>c) GA is a -system for each A `(P),</p><p>and then prove that b) follows from c).</p><p>To establish c), let A be an arbitrary member of `(P). Then A = A `(P), so GA 3 . Also, for anyB,C GA with B C, we have A (C \ B) = (A C) \ (A B) `(P) since A B,A C `(P) and`(P) is a -system, hence GA is closed under subset dierences. Finally, for any sequence {Bn} in GA withBn B, we have (ABn) (AB) `(P), so GA is closed under countable increasing unions as well andthus is a -system.</p><p>7</p></li><li><p>It remains only to show that c) implies b). To see that this is the case, rst note that since P is a -system,P GA for every A P, so it follows from c) that `(P) GA for every A P. In particular, for any A P,B `(P), we have A B `(P). Interchanging A and B yields A `(P) and B P implies A B `(P).But this means if A `(P), then P GA, and thus c) implies that `(P) GA. Therefore, it follows fromthe denition of GA that for any A,B `(P), A B `(P). </p><p>It is not especially important to commit the details of this proof to memory, but it worth seeing once and you</p><p>should denitely know the statement of the theorem. Though it seems a bit obscure upon rst encounter,</p><p>we will use this result in a variety of contexts throughout the course. In typical applications, we show that</p><p>a property holds on a -system that we know generates the -algebra of interest. We then show that the</p><p>collection of all sets for which the property holds is a -system in order to conclude that the property holds</p><p>on the entire -algebra.</p><p>A related result which is probably more familiar to those who have taken measure theory is the monotone</p><p>class lemma used to prove Fubini-Tonelli.</p><p>Denition. A monotone class is a collection of subsets which is closed under countable increasing unions</p><p>and countable decreasing intersections.</p><p>Like -systems, -systems, and -algebras, the intersection of monotone classes is a monotone class, so it</p><p>makes sense to talk about the monotone class generated by a collection of subsets.</p><p>Lemma 2.1 (Monotone Class Lemma). If A is an algebra of subsets, then the monotone class generated byA is (A).</p><p>Note that -algebras are -systems and -systems are monotone classes, but the converses need not be true.</p><p>8</p></li><li><p>3. Distributions</p><p>Recall that a random variable on a probability space (,F , P ) is a function X : R that is measurablewith respect to the Borel sets.</p><p>Every random variable induces a probability measure on R (called its distribution) by</p><p>(A) = P(X1(A)</p><p>)for all A B.</p><p>To check that is a probability measure, note that since X is a function, if A1, A2, ... B are disjoint, thenso are {X A1}, {X A2}, ... F , hence</p><p> (iAi) = P ({X </p><p>iAi}) = P (</p><p>i{X Ai}) =</p><p>i</p><p>P ({X Ai}) =i</p><p>(Ai).</p><p>The distribution of a random variable X is usually described in terms of its distribution function</p><p>F (x) = P (X x) = ((, x]) .</p><p>In cases where confusion may arise, we will emphasize dependence on the random variable using subscripts</p><p>- i.e. X , FX .</p><p>Theorem 3.1. If F is the distribution function of a random variable X, then</p><p>(i) F is nondecreasing</p><p>(ii) F is right-continuous (i.e. limxa+ F (x) = F (a) for all a R)(iii) limx F (x) = 0 and limx F (x) = 1</p><p>(iv) If F (x) = limyx F (y), then F (x) = P (X < x)</p><p>(v) P (X = x) = F (x) F (x)</p><p>Proof.</p><p>For (i), note that if x y, then {X x} {X y}, so F (x) = P (X x) P (X y) = F (y) bymonotonicity.</p><p>For (ii), observe that if x a, then {X x} {X a}, and apply continuity from above.</p><p>For (iii), we have {X x} as x and {X x} R as x.</p><p>For (iv), {X y} {X < x} as y x. (Note that the limit exists since F is monotone.)</p><p>For (v), {X = x} = {X x} \ {X < x}. </p><p>In fact, the rst three properties in Theorem 3.1 are sucient to characterize a distribution function.</p><p>Theorem 3.2. If F : R R satises properties (i), (ii), and (iii) from Theorem 3.1, then it is thedistribution function of some random variable.</p><p>Proof. (Draw Picture)</p><p>Let = (0, 1), F = B(0,1), P = Lebesgue measure, and dene X : (0, 1) R by</p><p>X() = F1() := inf{y R : F (y) }.</p><p>Note that properties (i) and (iii) ensure that X is well-dened.9</p></li><li><p>To see that F is indeed the distribution function of X, it suces to show that</p><p>{ : X() x} = { : F (x)}</p><p>for all x R, as this implies</p><p>P (X x) = P ({ : X() x}) = P ({ : F (x)}) = F (x)</p><p>where the nal equality uses the denition of Lebesgue measure and the fact that F (x) [0, 1].</p><p>Now if F (x), then x {y R : F (y) }, so X() = inf{y R : F (y) } x.</p><p>Consequently, { : F (x)} { : X() x}.</p><p>To establish the reverse inclusion, note that if > F (x), then properties (i) and (ii) imply that there is an</p><p> > 0 such that F (x) F (x+ ) < .</p><p>Since F is nondecreasing, it follows that x+ is a lower bound for {y R : F (y) }, henceX() x+ > x.</p><p>Therefore, { : F (x)}C { : X() x}C and thus { : X() x} { : F (x)}. </p><p>Theorem 3.2 shows that any function satisfying properties (i) - (iii) gives rise to a random variable X, and</p><p>thus to a probability measure ,...</p></li></ul>