statistical inference: probability and distribution
TRANSCRIPT
Statistical InferenceWeeks 1 & 2: Probability and Distribution
Types of Variables
All Variables
Categorical May be represented by
numbers, but does not make sense to add, subtract, average, etc
Numerical Makes sense to add,
subtract, average, etc(i.e., perform math operations)
Discrete Are counted and can
only take on non-negative whole numbers
Continuous Are measured and
can take on any real number (i.e., have decimal places)
Categorical Have no inherent
ordering (e.g., single, married, divorced)
Ordinal Have ordered levels
(e.g., primary, secondary, JC, university, etc)
Probability
P(A) = Probability of event A happening0 ≤ P(A) ≤ 1
Disjoint (mutually exclusive) events Cannot happen at the same time
− A card drawn from a deck cannot be both spades and hearts
− P(Spade & Heart) = 0
Non-disjoint events Can happen at the same time
− A card drawn from a deck can be both a spade and an ace
− P(Spade & Ace) = 1/52
Spade SpadeHeart Ace
Disjoint and non-disjoint events
Union of disjoint events−Probability of drawing a
Spade or a Heart from a deck of cards
P(Spade or Heart)
= P(Spade) + P(Heart)
= 13/52 + 13/52
= 26/52
Union of non-disjoint events−Probability of drawing a
Spade or an Ace from a deck of cards
P(Spade or Ace)
= P(Spade) + P(Ace) – P(Spade and Ace)
= 13/52 + 4/52 – 1/52
= 16/52
General Additional Rule = P(A or B) = P(A) + P(B) – P(A and B)
Marginal, Joint, and Conditional Probability
Marginal probability− Probability based on a single variable
P(Student = uses)
= 219/445
Joint Probability− Probability based on two or more
variables
P(Student = uses and Parent = uses)
= 125/445 = 0.28
Conditional Probability− Probability of one event conditional
upon another event
P(Student = use | parents = used)
= 125/210 = 0.60
Parents
Used Did not use
Total
Student
Uses 125 94 219
Does not Use
85 141 226
Total 210 235 445
Bayes’ Theorem
Bayes’ theorem− 𝑷 𝑨 𝑩) =
𝑷(𝑨 𝒂𝒏𝒅 𝑩)
𝑷 (𝑩)
Probability that the Children use given that the Parents also used𝑃 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛 = 𝑢𝑠𝑒 𝑝𝑎𝑟𝑒𝑛𝑡𝑠 = 𝑢𝑠𝑒𝑑)
= 𝑃(𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛=𝑢𝑠𝑒 𝑎𝑛𝑑 𝑝𝑎𝑟𝑒𝑛𝑡𝑠=𝑢𝑠𝑒𝑑)
𝑃(𝑝𝑎𝑟𝑒𝑛𝑡𝑠=𝑢𝑠𝑒𝑑)
= 125/445
210/445
= 0.60
Parents
Used Did not use
Total
Children
Uses 125 94 219
Does not Use
85 141 226
Total 210 235 445
General Product Rule = P(A and B) = P(A|B) x P(B)
Bayes’ Theorem expanded Probability of women with
breast cancer in general population− P(breast cancer) = 0.017
Probability of true positive from mammogram− P(positive | breast cancer) = 0.78
− I.e., sensitivity
Probability of false positive from mammogram− P(positive | no breast cancer) =
0.10
− i.e., 1 - specificity
What is the probability that the patient has breast cancer given a positive mammogram? 𝑃(𝑐𝑎𝑛𝑐𝑒𝑟 | 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒)
= 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟) 𝑃(𝑐𝑎𝑛𝑐𝑒𝑟)
𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟) 𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 +𝑝 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟) 𝑃(𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟)
= 0.78 ∗ 0.017
0.78 ∗0.017+0.10 ∗0.983
= 0.119
Bayes’ theorem
𝑷 𝑨 𝑩) =𝑷(𝑨 𝒂𝒏𝒅 𝑩)
𝑷 (𝑩)
= 𝑷 𝑩 𝑨) 𝑷(𝑨)
𝑷 (𝑩)
= 𝑷 𝑩 𝑨) 𝑷(𝑨)
𝑷 𝑩 𝑨) 𝑷 𝑨 +𝑷 𝑩 𝑨𝒄)𝑷(𝑨𝒄)
Probability Tree
Cancer
No Cancer
P(cancer)0.017
P(no cancer)0.983
What is the probability that the patient has breast cancer given a positive mammogram?
Positive
Positive
Negative
Negative
P(positive | cancer)
0.78
P(negative | cancer)
0.22
P(positive | no cancer)
0.10
P(negative | no cancer)
0.90
P(cancer and positive)
0.017 x 0.78 = 0.01326
P(no cancer and positive)0.983 x 0.10
= 0.0983
𝑃(𝑐𝑎𝑛𝑐𝑒𝑟 | 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒)
= 𝑃(𝑐𝑎𝑛𝑐𝑒𝑟 𝑎𝑛𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 )
𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒)
= 0.01326
0.01326+0.0983
= 0.119
Expected Mean
Expected Mean𝐸 𝑋
= E[𝑋 × 𝑝 𝑥 ] # sum of all values of x multiplied by its probability
What is the expected value of a dice roll?𝐸 𝑋
= 1 ×1
6+ 2 ×
1
6+ 3 ×
1
6+ 4 ×
1
6+ 5 ×
1
6+ 6 ×
1
6
= 3.5
Notation: 𝑥 : sample mean𝜇 : population mean
Mean
Mean𝑀𝑒𝑎𝑛
= 𝑥1+ 𝑥2+ 𝑥3+ …+ 𝑥𝑛
𝑛
What is the mean number of dots on each die face?𝑀𝑒𝑎𝑛
= 1+2+3+4+5+6
6
= 3.5
Notation: 𝑥 : sample mean𝜇 : population mean
Expected Variance
Expected Variance𝑉𝑎𝑟 𝑋
=E[(𝑋 − 𝜇)2] # sum square of difference between each value and mean
=E 𝑋2 − 𝐸[𝑋]2
What is the variance of a dice roll?
From previous slide, mean 𝐸 𝑋 = 3.5
𝐸 𝑋2 = 12 ×1
6+ 22 ×
1
6+ 32 ×
1
6+ 42 ×
1
6+ 52 ×
1
6+ 62 ×
1
6= 15.17
Var(X) = 𝐸 𝑋2 − 𝐸 𝑋 2 = 15.17 − 3.52 ≈ 2.9
Notation:𝑠2: sample variance𝜎2 : population variance
𝑠 : sample standard deviation𝜎 : population standard deviation
Population Variance
Population Variance𝜎2
= 1
𝑁Σ[(𝑥𝑖 − 𝜇)2]
What is the variance of dots on die faces?
Given 𝑥 = 3.5
𝜎2 = 1
6[ 1 − 3.5 2 + 2 − 3.5 2 + …+ 6 − 3.5 2]
≈ 2.9
Notation:𝑠2: sample variance𝜎2 : population variance
𝑠 : sample standard deviation𝜎 : population standard deviation
Sample Variance
Sample Variance𝑠2
= 1
𝑛−1Σ[(𝑥𝑖 − 𝑥)2]
Why n – 1?−A sample will always have smaller variance than the population. Thus, we
perform an “adjustment” to get a bigger variance that more closer approximates the population variance
− i.e., think of it as a “correction” used on samples
Notation:𝑠2: sample variance𝜎2 : population variance
𝑠 : sample standard deviation𝜎 : population standard deviation
Bernoulli Distribution
Where an individual trial only has two possible outcomes
Assuming a fair coin, what is the probability of it landing on heads (i.e., success)?𝑃 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = 𝑝 ℎ𝑒𝑎𝑑𝑠 1𝑝(𝑡𝑎𝑖𝑙𝑠)0 = 0.5
Assuming an unfair coin (i.e., 𝑝 ℎ𝑒𝑎𝑑𝑠 = 0.25), what is the probability of it landing on tails (i.e., failure)? 𝑃 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 = 𝑝 ℎ𝑒𝑎𝑑𝑠 0𝑝(𝑡𝑎𝑖𝑙𝑠)1 = 0.75
Binomial Distribution
Probability of k successes in n trials𝑃 𝑘 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 𝑛 𝑡𝑟𝑖𝑎𝑙𝑠 = (𝑘
𝑛) 𝑝𝑘(1 − 𝑝)(𝑛−𝑘)
where (𝑘𝑛) =
𝑛!
𝑘! 𝑛−𝑘 !
Given 7 trials, how many scenarios can have 2 successes?
(27) =
7!
2!(5!)
= 7 ×6 ×5!
2 ×1×5!
= 21
If you toss the unfair coin 7 times, what’s the probability of 2 heads (i.e., successes)?
Given 𝑃 ℎ𝑒𝑎𝑑𝑠 = 0.25𝑃 𝑘 = 2 = (2
7) × 0.252 × 0.755
= 7 ×6 ×5!
2 ×1×5!× 0.252 × 0.755
= 0.311
Normal Distribution
Unimodal (only one peak) and symmetric
68-95-99.7% rule− 68% of values within 1sd from mean
− 95% of values within 2sd from mean
− 99.7% of values within 3sd from mean
Represented as 𝑁(𝜇, 𝜎)
Xiao MingMuthu
Normal Distribution
You want to compare between two cousins and determine who fared better. Xiao Ming scored 1800 on his SAT and Muthuscored 24 on his ACT—who did better?− 𝑆𝐴𝑇 𝑠𝑐𝑜𝑟𝑒𝑠 ~ 𝑁 𝑚𝑒𝑎𝑛 = 1500, 𝑆𝐷 = 300
−𝐴𝐶𝑇 𝑠𝑐𝑜𝑟𝑒𝑠 ~ 𝑁(𝑚𝑒𝑎𝑛 = 21, 𝑆𝐷 = 6)
Xiao Ming: 1800 −1500
300= 1sd
Muthu: 24 −21
6= 0.5sd
Normal Distribution (Z scores)
Standardization with Z scores (normalization)
𝑍 =𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝜇
𝑆𝐷
Standardized (Z) score of a value is the number of standard deviations it falls above or below the mean
Z score of mean = 0
Normal Distribution
Suppose that your company ad campaign receives daily ad clicks that are (approximately) normally distributed with mean = 1,020 and standard deviation = 50. What’s the probability of getting more than 1,160 clicks a day?
𝑍 =𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝜇
𝑆𝐷
=1,160 − 1,020
50= 2.8
𝑃 𝑍 > 2.8 = 1 − 0.9974= 0.0026
Normal Distribution
Your friend boast that his ad is in the top 25% of the company’s ad campaign. What is the lowest number of ad clicks his ad received? −𝐴𝑑 𝑐𝑙𝑖𝑐𝑘𝑠 ~ 𝑁(1020, 50)
𝑍 = 0.67 =𝑥 − 1,020
50𝑥 = 0.67 × 50 + 1020= 1053.5
Poisson Distribution
Poisson Distribution
𝑃 𝑋 =𝑒−𝜆𝜆𝑥
𝑥!− 𝑒 = 𝑏𝑎𝑠𝑒 𝑜𝑓 𝑛𝑎𝑡𝑢𝑟𝑎𝑙 𝑙𝑜𝑔, 2.71828…
− 𝜆 = 𝑚𝑒𝑎𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
2.5 people show up at a bus stop every hour. What is the probability that 3 or fewer people show up after 4 hours?
𝑃 𝑋 ≤ 3 =𝑒−10100
0!+𝑒−10101
1!+𝑒−10102
2!+𝑒−10103
3!= 0.10336
Thank you for your attention!Eugene Yan