part 1 – probability and distribution theory professor william greene stern school of business...

108
Part 1 – Probability and Distribution Theory Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat- UB.0015.01

Upload: lisandro-jayne

Post on 31-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Part 1 Probability and Distribution Theory Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 Slide 2 Part 1 Probability and Distribution Theory Slide 3 1 Probability Slide 4 Part 1 Probability and Distribution Theory 4/107 Sample Space Random outcomes: The result of a process Sequence of events, Number of events, Measurement of a length of time, space, etc. Outcomes, experiments and sample spaces Slide 5 Part 1 Probability and Distribution Theory 5/107 Consumer Choice: 4 possible ways a randomly chosen traveler might travel between Sydney and Melbourne = {Air, Train, Bus, Car} Slide 6 Part 1 Probability and Distribution Theory 6/107 Market Behavior: Fair Isaacs credit card service to major vendors = {Reject, Accept} Slide 7 Part 1 Probability and Distribution Theory 7/107 Measurement of Lifetimes A box of light bulbs states Average life is 1500 hours Outcome = length of time until failure (lifetime) of a randomly chosen light bulb = {lifetime | lifetime > 0} Slide 8 Part 1 Probability and Distribution Theory 8/107 Events Events are defined as Subsets of sample space, such as empty set Intersection of related events Complements such as A and not A Disjoint sets such as (train,bus),(air,car) Any subset including is a disjoint union of subsets: = (Air, Train) (Bus, Car) Slide 9 Part 1 Probability and Distribution Theory 9/107 Probability is a Measure The sample space is a - field: Contains at least one nonempty subset (event) Is closed under complementarity Is closed under countable union Probability is a measure defined on all subsets of Axioms of Probability P( ) = 1 A P(A) > 0 If A B = { }, P(A B) = P(A) + P(B) Slide 10 Part 1 Probability and Distribution Theory 10/107 Implications of the Axioms P(~A) = 1 P(A) as A ~A = P( ) = 0 as = ~ and P( ) = 1 A B P(A) < P(B) as B = A + (~A B) P(A B) = P(A) + P(B) P(A B) Slide 11 Part 1 Probability and Distribution Theory 11/107 Probability Assigning probability: Size of an event relative to size of sample space. Counting rules for equally likely discrete outcomes Using combinations and permutations to count elements Example: Discrete uniform, poker hands Example hypergeometric: the super committee (House 242R,193D, Senate 49R, 51D&I) Measurement for continuous outcomes Slide 12 Part 1 Probability and Distribution Theory 12/107 Applications: Games of Chance; Poker In a 5 card hand from a deck of 52, there are (52*51*50*49*48)/(5*4*3*2*1) different possible hands. (Order doesnt matter). 2,598,960 possible hands. How many of these hands have 4 aces? 48 = the 4 aces plus any of the remaining 48 cards. Slide 13 Part 1 Probability and Distribution Theory 13/107 Some Poker Hands Royal Flush Top 5 cards in a suit Straight Flush 5 sequential cards in the same suit suit 4 of a kind plus any other card Full House 3 of one kind, 2 of another. (Also called a boat.) Flush 5 cards in a suit, not sequential Straight 5 cards in a numerical row, not the same suit Slide 14 Part 1 Probability and Distribution Theory 14/107 5 Card Poker Hands Slide 15 Part 1 Probability and Distribution Theory 15/107 The Dead Mans Hand The dead mans hand is 5 cards, 2 aces, 2 8s and some other 5 th card (Wild Bill Hickok was holding this hand when he was shot in the back and killed in 1876.) The number of hands with two aces and two 8s is 44 = 1,584 The rest of the story claims that Hickok held all black cards (the bullets). The probability for this hand falls to only 44/2598960. (The four cards in the picture and one of the remaining 44.) Some claims have been made about the 5 th card, but noone is sure there is no record. http://en.wikipedia.org/wiki/Dead_man's_hand Slide 16 Part 1 Probability and Distribution Theory 16/107 Budget Supercommittee Slide 17 Part 1 Probability and Distribution Theory 17/107 Conditional Probability P(A|B) = P(A,B)/P(B) = Size of A relative to a subset of Basic result p(A,B) = p(A|B) p(B) (follows from the definition) Bayes theorem Applications mammography, drug testing, lie detector test, PSA test. Slide 18 Part 1 Probability and Distribution Theory 18/107 Using Conditional Probabilities: Bayes Theorem Slide 19 Part 1 Probability and Distribution Theory 19/107 Drug Testing Data P(Test correctly indicates disease)=.98 (Sensitivity) P(Test correctly indicates absence)=.95 (Specificity) P(Disease) =.005 (Fairly rare) Notation + = test indicates disease, = indicates no disease D = presence of disease, N = absence of disease Data: P(D) =.005 (Incidence of the disease) P(+|D) =.98 (Correct detection of the disease) P( |N) =.95 (Correct failure to detect the disease) What are P(D|+) and P(N|)? Note, P(D|+) = the probability that a patient actually has the disease when the test says they do. Slide 20 Part 1 Probability and Distribution Theory 20/107 More Information Deduce: Since P(+|D)=.98, we know P( |D)=.02 because P(-|D)+P(+|D)=1 [P( |D) is the P(False negative). Deduce: Since P( |N)=.95, we know P(+|N)=.05 because P(-|N)+P(+|N)=1 [P(+|N) is the P(False positive). Deduce: Since P(D)=.005, P(N)=.995 because P(D)+P(N)=1. Slide 21 Part 1 Probability and Distribution Theory 21/107 Now, Use Bayes Theorem Slide 22 Part 1 Probability and Distribution Theory 22/107 Independent events Definition: P(A|B) = P(A) Multiplication rule P(A,B) = P(A)P(B) Application: Infectious disease transmission Slide 23 Part 1 Probability and Distribution Theory 2 Random Variables Slide 24 Part 1 Probability and Distribution Theory 24/107 Random Variable Definition: Maps elements of the sample space to a single variable: Assigns a number to Discrete: Payoff to poker hands Continuous: Lightbulb lifetimes Mixed: Ticket sales with capacity constraints. (Censoring) Slide 25 Part 1 Probability and Distribution Theory 25/107 Market Behavior: Fair Isaacs credit card service to major vendors = {Reject, Accept} X = 0=reject, 1=accept Slide 26 Part 1 Probability and Distribution Theory 26/107 Caribbean Stud Poker {---------------- Sample Space --------------} Probability Variable Slide 27 Part 1 Probability and Distribution Theory 27/107 Features of Random Variables Probability Distribution Mass function: Prob(X=x)=f(x) Density function: f(x), x =... Cumulative probabilities; CDF Prob(X < x) F(x) Quantiles: x such that F(x) = Q Median: x = median, Q = 0.5. Slide 28 Part 1 Probability and Distribution Theory 28/107 Discrete Random Variables Elemental building block Bernoulli: Credit card applications Discrete uniform: Die toss Counting Rules Binomial: Family composition Hypergeometric: House/Senate Supercommittee Models Poisson: Diabetes incidence, Accidents, etc. Slide 29 Part 1 Probability and Distribution Theory 29/107 Market Behavior: Fair Isaacs credit card service to major vendors X = 0=reject, 1=accept Prob(X=x)=(1-p) (1-x) p x, x=0,1 Slide 30 Part 1 Probability and Distribution Theory 30/107 Binomial Sum of n Bernoulli trials Slide 31 Part 1 Probability and Distribution Theory 31/107 Examples Slide 32 Part 1 Probability and Distribution Theory 32/107 Poisson Approximation to binomial General model for a type of process Slide 33 Part 1 Probability and Distribution Theory 33/107 Poisson Approximation to Binomial Slide 34 Part 1 Probability and Distribution Theory 34/107 Diabetes Incidence per 1000 http://www.cdc.gov/diabetes/statistics/incidence/fig2.htm Slide 35 Part 1 Probability and Distribution Theory 35/107 Poisson Distribution of Disease Cases in 1000 Draws with =7 Slide 36 Part 1 Probability and Distribution Theory 36/107 Poisson Process: Doctor visits in the survey year by people in a sample of 27,326. =.8 Poisson probability model is a description of this process, not an approximation Slide 37 Part 1 Probability and Distribution Theory 37/107 Continuous RV Density function, f(x) Probability measure P(event) obtained using the density. Application: Lightbulb lifetimes? Slide 38 Part 1 Probability and Distribution Theory 38/107 Probability Density Function; PDF Slide 39 Part 1 Probability and Distribution Theory 39/107 CDF and Quantiles pth = quantile; 0 < p < 1 Quantile = x p such that F(x p ) = p. x p = F -1 (p). For p =.5, x p = median Slide 40 Part 1 Probability and Distribution Theory 40/107 Model for Light Bulb Lifetimes This is the exponential model for lifetimes. The model is f(time) = (1/) e -time/ Slide 41 Part 1 Probability and Distribution Theory 41/107 Model for Light Bulb Lifetimes The area under the entire curve is 1.0. Slide 42 Part 1 Probability and Distribution Theory 42/107 Continuous Distribution A partial area will be between 0.0 and 1.0, and will produce a probability. The probability associated with an interval such as 1000 < LIFETIME < 2000 equals the area under the curve from the lower limit to the upper. Slide 43 Part 1 Probability and Distribution Theory 43/107 Probability of a Single Value Is Zero The probability associated with a single point, such as LIFETIME=2000, equals 0.0. Slide 44 Part 1 Probability and Distribution Theory 44/107 Probabilities via the CDF Slide 45 Part 1 Probability and Distribution Theory 45/107 Probability for a Range of Values Based on CDF Prob(Life < 2000) (.7364) Minus Prob(Life < 1000) (.4866) Equals Prob(1000 < Life < 2000) (.2498) Slide 46 Part 1 Probability and Distribution Theory 46/107 Common Continuous RVs Continuous random variables are all models; they do not occur in nature. The model builders toolkit: Continuous uniform Exponential Normal Lognormal Gamma Beta Defined for specific types of outcomes Slide 47 Part 1 Probability and Distribution Theory 47/107 Continuous Uniform f(x) = 1/(b a), a < x < b F(x) = x/(b a), a < x < b. Slide 48 Part 1 Probability and Distribution Theory 48/107 Exponential f(x) = exp(- x), x > 0, 0 otherwise F(x) = 1 exp(- x), x > 0 Median: F(M) =.5 1 exp(- M) =.5 exp(- M) =.5 M = ln.5 M = -ln.5/ = (ln2)/ Slide 49 Part 1 Probability and Distribution Theory 49/107 49 Slide 50 Part 1 Probability and Distribution Theory 50/107 Gamma Density Uses the Gamma Function Slide 51 Part 1 Probability and Distribution Theory 51/107 Gamma Distributed Random Variable Used to model nonnegative random variables e.g., survival of people and electronic components Two special cases P = 1 is the exponential distribution P = and = is the chi squared with one degree of freedom Slide 52 Part 1 Probability and Distribution Theory 52/107 Beta Uses Beta Integrals Slide 53 Part 1 Probability and Distribution Theory 53/107 Normal Density The Model Mean = , standard deviation = Slide 54 Part 1 Probability and Distribution Theory 54/107 Normal Distributions The scale and location (on the horizontal axis) depend on and . The shape of the distribution is always the same. (Bell curve) Slide 55 Part 1 Probability and Distribution Theory 55/107 Slide 56 Part 1 Probability and Distribution Theory 56/107 Standard Normal Density (0,1) Slide 57 Part 1 Probability and Distribution Theory 57/107 Lognormal Distribution Slide 58 Part 1 Probability and Distribution Theory 58/107 Censoring and Truncation Censoring Observation mechanism. Values above or below a certain value are assigned the boundary value Applications, ticket market: demand vs. sales given capacity constraints; top coded income data Truncation Observation mechanism. The relevant distribution only applies in a restricted range of the random variable Application: On site survey for recreation visits. Truncated Poisson Incidental truncation: Income is observed only for those whose wealth (not income) exceeds $100,000. Slide 59 Part 1 Probability and Distribution Theory 59/107 Truncated Random Variable Untruncated variable has density f(x) Truncated variable has density f(x)/Prob(x is in range) Truncated Normal: Slide 60 Part 1 Probability and Distribution Theory 60/107 F(x | x > X L ) Truncated Normal: f(x|x>a) = f(x)/Prob(x>a) Slide 61 Part 1 Probability and Distribution Theory 61/107 Truncated Poisson f(x)= exp(- ) x / (x+1) f(x|x>0) = f(x)/Prob(x>0) = f(x) / [1 Prob(x=0)] = {exp(- ) x / (x+1)} / {1 - exp(- )} Slide 62x) = 1-F(x) Hazard function, h(x) = -dlnS(x)/dx Representations are one to one each uniquely determines the distribution of the random variable"> Part 1 Probability and Distribution Theory 62/107 Representations of a Continuous Random Variable Representations Density, f(x) CDF, F(x) = Prob(X < x) Survival, S(x) = Prob(X > x) = 1-F(x) Hazard function, h(x) = -dlnS(x)/dx Representations are one to one each uniquely determines the distribution of the random variable Slide 63 Part 1 Probability and Distribution Theory 63/107 Application: A Memoryless Process Slide 64 Part 1 Probability and Distribution Theory 64/107 A Change of Variable Theorem: x = a continuous RV with continuous density f(x). y=g(x) is a monotonic function over the range of x. y=g(x), f(y) = f(x(y)) |dx(y)/dy)| = f(x(y)) |dg -1 (y)/dy)| Slide 65 Part 1 Probability and Distribution Theory 65/107 Change of Variable Applications Standardized normal Lognormal to normal Fundamental probability transform Slide 66 Part 1 Probability and Distribution Theory 66/107 Standardized Normal X ~ N[, 2 ] Prob[X < a] = F(a) Prob[X < a] = Prob[(X - )/ ] < (a - )/ y = (x - )/ J = dx(y)/dy = f(y) = f( y+ ) = [1/sqr(2 )]exp(-y 2 /2) Only a table for the standard normal is needed. Slide 67 Part 1 Probability and Distribution Theory 67/107 Textbooks Provide Tables of Areas for the Standard Normal Econometric Analysis, WHG, 2008, Appendix G, page 1093, Rice Table 2 Note that values are only given for z ranging from 0.00 to 3.99. No values are given for negative z. Slide 68 Part 1 Probability and Distribution Theory 68/107 Computing Probabilities Standard Normal Tables give probabilities when = 0 and = 1. For other cases, do we need another table? Probabilities for other cases are obtained by standardizing. Standardized variable is z = (x )/ z has mean 0 and standard deviation 1 Slide 69 Part 1 Probability and Distribution Theory 69/107 Standard Normal Density Slide 70 Part 1 Probability and Distribution Theory 70/107 Standard Normal Distribution Facts The random variable z runs from - to + (z) > 0 for all z, but for |z| > 4, it is essentially 0. The total area under the curve equals 1.0. The curve is symmetric around 0. (The normal distribution generally is symmetric around .) Slide 71 Part 1 Probability and Distribution Theory 71/107 Only Half the Table Is Needed The area to left of 0.0 is exactly 0.5. Slide 72 Part 1 Probability and Distribution Theory 72/107 Only Half the Table Is Needed The area left of 1.60 is exactly 0.5 plus the area between 0.0 and 1.60. Slide 73 Part 1 Probability and Distribution Theory 73/107 Areas Left of Negative Z Area left of -1.6 equals area right of +1.6. Area right of +1.6 equals 1 area to the left of +1.6. Slide 74 Part 1 Probability and Distribution Theory 74/107 Computing Probabilities by Standardizing: Example Slide 75 Part 1 Probability and Distribution Theory 75/107 Lognormal Distribution Slide 76 Part 1 Probability and Distribution Theory 76/107 Lognormal Distribution of Monthly Wages in NLS 76 Slide 77 Part 1 Probability and Distribution Theory 77/107 Log of Lognormal Variable 77 Slide 78 Part 1 Probability and Distribution Theory 78/107 Fundamental Probability Transformation Slide 79 Part 1 Probability and Distribution Theory 79/107 Random Number Generation The CDF is a monotonic function of x If u = F(x), x = F -1 (u) We can generate u with a computer Example: Exponential Example: Normal Slide 80 Part 1 Probability and Distribution Theory 80/107 Generating Random Samples Exponential u = F(x) = 1 exp(- x) 1 u = exp(- x) x = (-1/ ) ln(1 u) Normal (, ) u = (z) z = -1 (u) x = z + = -1 (u) + Slide 81 Part 1 Probability and Distribution Theory 81/107 U[0,1] Generation Linear congruential generator x(n) = (a x(n-1) + b)mod m Properties of RNGs Replicability they are not RANDOM Period Randomness tests The Mersenne twister: Current state of the art (of pseudo-random number generation) Slide 82 Part 1 Probability and Distribution Theory 3 Joint Distributions Slide 83 Part 1 Probability and Distribution Theory 83/107 Jointly Distributed Random Variables Usually some kind of association between the variables. E.g., two different financial assets Joint cdf for two random variables F(x, y) = Prob(X < x, Y < y) Slide 84 Part 1 Probability and Distribution Theory 84/107 Probability of a Rectangle a1 b1 b2 a2 F(b1,b2) - F(b1,a2) - F(a1,b2) + F(a1,a2) Prob[a1 < x < b1, a2 < y < b2] x y Slide 85 Part 1 Probability and Distribution Theory 85/107 Joint Distributions Discrete: Multinomial for R kinds of success in N independent trials Continuous: Bi- and Multivariate normal Mixed: Conditional regression models Slide 86 Part 1 Probability and Distribution Theory 86/107 Multinomial Distribution Slide 87 Part 1 Probability and Distribution Theory 87/107 Probabilities: Inherited Color Blindness Inherited color blindness has different incidence rates in men and women. Women usually carry the defective gene and men usually inherit it. Pick an individual at random from the population. B=1 = has inherited color blindness, B=0, not color blind G=0 = MALE = gender, G=1, Female Marginal: P(B=1) = 2.75% Conditional:P(B=1|G=0) = 5.0% (1 in 20 men) P(B=1|G=1) = 0.5% (1 in 200 women) Joint: P(B=1 and G=0) = 2.5% P(B=1 and G=1) = 0.25% Slide 88 Part 1 Probability and Distribution Theory 88/107 Marginal Distributions Prob[X=x] = y Prob[X=x,Y=y] Color Blind Gender B=0 B=1Total G=0.475.0250.50 G=1.4975.00250.50 Total.97255.02751.00 Prob[G=0]=Prob[G=0,B=0]+ Prob[G=0,B=1] Slide 89 Part 1 Probability and Distribution Theory 89/107 Joint Continuous Distribution Slide 90 Part 1 Probability and Distribution Theory 90/107 Marginal Distributions Slide 91 Part 1 Probability and Distribution Theory 91/107 Two Leading Applications Copula Function - Application in Finance Bivariate Normal Distribution Slide 92 Part 1 Probability and Distribution Theory 92/107 Slide 93 Part 1 Probability and Distribution Theory 93/107 Slide 94 Part 1 Probability and Distribution Theory 94/107 Slide 95 Part 1 Probability and Distribution Theory 95/107 Slide 96 Part 1 Probability and Distribution Theory 96/107 Slide 97 Part 1 Probability and Distribution Theory 97/107 Slide 98 Part 1 Probability and Distribution Theory 98/107 Slide 99 Part 1 Probability and Distribution Theory 99/107 The Bivariate Normal Distribution Slide 100 Part 1 Probability and Distribution Theory 100/107 Slide 101 Part 1 Probability and Distribution Theory 101/107 Independent Random Variables F(x, y) = Prob(X < x, Y < y) = Prob(X < x) Prob(Y < y) = F X (x) F Y (y) f(x,y) = 2 F(x,y)/ x y = f(x) f(y) Slide 102 Part 1 Probability and Distribution Theory 102/107 Independent Normals Slide 103 Part 1 Probability and Distribution Theory 103/107 Conditional Distributions Color Blind Gender B=0 (No) B=1 (Yes) Total G=0 (M).475.0250.50 G=1 (F).4975.00250.50 Total.97255.02751.00 Prob(Not color blind given male) Prob(B=0|G=0) = Prob(B=0,G=0) / Prob(G=0) =.475 /.50 =.950 Prob(B=1|G=0) =.025/.5 =.05 Prob(B=1|G=0)+Prob(B=0|G=0)=1 Slide 104 Part 1 Probability and Distribution Theory 104/107 Conditional Distribution Continuous Normal Slide 105 Part 1 Probability and Distribution Theory 105/107 Bivariate Normal Joint distribution is bivariate normal Marginal distributions are normal Conditional distributions are normal Slide 106 Part 1 Probability and Distribution Theory 106/107 Y and Y|X X X Y Slide 107 Part 1 Probability and Distribution Theory 107/107 Model Building Typically f(y|x) is of interest x is generated by a separate process f(x) Joint distribution is f(y,x)=f(y|x)f(x) Ex: demographic y = log(household income|family size) x = family size y|x ~ Normal( y|x, y|x ) x ~ Poisson ( ) Slide 108 Part 1 Probability and Distribution Theory 108/107 y|x ~ Normal[ 20 + 3x, 4 2 ], x = 1,2,3,4; Poisson X=4 X=3 X=2 X=1