randomness and probability models - hosung sohn...

51
1/ 51 Randomness and Probability Models Probability Hosung Sohn Department of Public Administration and International Affairs Maxwell School of Citizenship and Public Affairs Syracuse University Lecture Slide 4-1 (October 1, 2015) Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Upload: others

Post on 12-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

1/ 51

Randomness and Probability Models

Probability

Hosung Sohn

Department of Public Administration and International AffairsMaxwell School of Citizenship and Public Affairs

Syracuse University

Lecture Slide 4-1 (October 1, 2015)

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 2: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

2/ 51

Randomness and Probability Models

Table of Contents

1 Randomness and Probability Models

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 3: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

3/ 51

Randomness and Probability Models

Announcement

Problem Set 1:

Midterm Course Evaluation:

=⇒ You will receive a link by an email where you can answer midtermevaluation questions.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 4: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

4/ 51

Randomness and Probability Models

Review of Problem Set 1

Question 5: The distribution of SAT scores is approximately Normalwith mean µ = 1498 and standard deviation σ = 316. If you were torandomly pick a student among those who took the SAT exam, what isthe probability that the student’s SAT exam score is between 1321 and1574?

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 5: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

5/ 51

Randomness and Probability Models

Review of Problem Set 1

True or false: An influential observation will always have a largeresidual. Explain your answer.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 6: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

6/ 51

Randomness and Probability Models

Review of Problem Set 1

Question 9: Use the equation for the least-squares regression line toshow that this line always passes through the point (x, y).

=⇒ To show that the regression line always passes through the point(x, y), we need to show that if we put x into xi, then the predictedvalue of yi (i.e., yi) is equal to y.

Solution

yi = β0 + β1xi

yi = y − β1x+ β1xi b/c β0 = y − β1x

yi = y − β1x+ β1x

= y.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 7: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

7/ 51

Randomness and Probability Models

Review of Problem Set 1

Prove that β1 =Cov(x, y)

V ar(x).

Proof

β1 = rsy

sx

=1

n− 1

n∑i=1

(xi − x

sx

)(yi − y

sy

)×sy

sx

=1

n− 1

n∑i=1

(xi − x

sx

)(yi − y

sx

)

=

(1

sx

)(1

sx

)1

n− 1

n∑i=1

(xi − x) (yi − y)︸ ︷︷ ︸Cov(x,y)

=

(1

s2x

)Cov(x, y)

=Cov(x, y)

V ar(x)

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 8: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

8/ 51

Randomness and Probability Models

Review of Problem Set 1

Some advice:

1. Please be meticulous as to notations!

=⇒ When you study Econometrics, notations get more complicated.

2. Please see solutions and make sure you understand the reasoning.

=⇒ The midterm and final exam will be very similar to the problem sets.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 9: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

9/ 51

Randomness and Probability Models

Review of Previous Lecture

Basics of statistical inference.

Checking whether our sample “represents” the population can be doneby examining how the sample is chosen.

=⇒ Draw a sample from a population randomly.

Statistical inference allows us to determine whether the information wehave drawn from a sample corresponds to the truth about thepopulation.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 10: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

10/ 51

Randomness and Probability Models

Review of Previous Lecture

Two important terms:

1. Parameter: a number that describes the “population.”

=⇒ Fixed number, and we don’t know this number.

2. Statistic: a number that describes the “sample.”

=⇒ It changes from sample to sample (i.e., sampling variability).

We use a statistic to estimate a parameter.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 11: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

11/ 51

Randomness and Probability Models

Review of Previous Lecture

Sampling distribution: distribution of the statistic.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 12: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

12/ 51

Randomness and Probability Models

Review of Previous Lecture

Sample size (i.e., number of observations from a single SRS) matters.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 13: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

13/ 51

Randomness and Probability Models

Review of Previous Lecture

Bias vs. Variability

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 14: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

14/ 51

Randomness and Probability Models

Review of Previous Lecture

To reduce bias, we should use random sampling.

=⇒ It neither consistently overestimate nor consistently underestimatethe value of the population parameter.

To reduce the variability of a statistic from a random sample, we shoulduse a larger sample (i.e., n).

We also learned that as long as we have a random sample, thevariability of the statistic depends only on the size of a sample, not thesize of a population.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 15: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

15/ 51

Randomness and Probability Models

Review Exercise 1

State what is wrong in each of the following:

1. A parameter describes a sample.

=⇒ A parameter describes a “population.”

2. Bias and variability are two names for the same thing.

=⇒ Bias indicates the magnitude of a deviation of the estimatedstatistic from the mean value.

=⇒ Variability refers to the spread (not the center) of the samplingdistribution of statistics.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 16: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

16/ 51

Randomness and Probability Models

Review Exercise 1

3. Large samples are always better than small samples.

=⇒ Not necessarily; larger samples generally come at a cost (thinkabout sample vs. census).

=⇒ But it’s good for reducing the variability of sampling distribution.

4. A sampling distribution is something generated by a computer.

=⇒ Although a sampling distribution can be simulated with acomputer, it arises from the process of sampling, not from computation.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 17: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

17/ 51

Randomness and Probability Models

Review Exercise 2

Each histogram shows the sampling distribution of statistics intended toestimate the population parameter. Determine characteristics of thesampling distribution using the bias (high or low) and variability (highor low) criteria.

1.

=⇒ High bias and high variability!

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 18: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

18/ 51

Randomness and Probability Models

Review Exercise 2

2.

=⇒ Low bias and low variability!

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 19: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

19/ 51

Randomness and Probability Models

Review Exercise 2

3.

=⇒ Low bias and high variability!

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 20: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

20/ 51

Randomness and Probability Models

Review Exercise 2

4.

=⇒ High bias and low variability!

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 21: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

21/ 51

Randomness and Probability Models

Probability: Introduction

The reasoning of statistical inference is dependent on the answer to thefollowing question:

“How often would this method (i.e., random sampling) gives a correctanswer if we used it very many times?”

Unfortunately, we cannot draw random samples many times in order toanswer the question above.

Fortunately, the probability theory provides an answer to the question.

So we now learn the fundamentals of probability.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 22: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

22/ 51

Randomness and Probability Models

Randomness and Probability Models

Consider tossing a coin.

As you can imagine, we cannot predict in advance whether the coin willshow up as head or tail, because the result will vary when you toss itrepeatedly.

But some kinds of patterns will likely to emerge as we toss the coinmany times.

=⇒ This fact is the basis for the idea of probability.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 23: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

23/ 51

Randomness and Probability Models

Randomness and Probability Models

Let’s look at the figure below:

For each number of tosses from 1 to 5,000, I have plotted the proportionof those tosses that gave a head.

% of tosses that produce “heads” is quite variable at first.

=⇒ But once we toss more and more, % of heads approaches to 0.5 andstay there.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 24: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

24/ 51

Randomness and Probability Models

Randomness and Probability Models

In this case, we say that the probability that tossing a coin gives thehead outcome is 0.5; i.e., P (Head) = 0.5.

=⇒ When the probability of a head is 0.5, then we say that a coin is“fair.”

Probability, in general, describes only what happens in the long run.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 25: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

25/ 51

Randomness and Probability Models

Terminology 1: Random

Before we learn the fundamentals of probability, we should get used tosome terminologies.

The first term we learn is “random.”

Random

DefinitionWe call a phenomenon random if individual outcomes are uncertain butthere is nonetheless a regular distribution of outcomes in a large number ofrepetitions.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 26: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

26/ 51

Randomness and Probability Models

Terminology 1: Random

The term “random” in statistics is not a synonym for “haphazard.”

It is a description of a pattern that emerges in the long run.

The proportion of heads from tossing a coin is one kind of “random”phenomenon.

In fact, there are many random phenomena in our everyday life.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 27: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

27/ 51

Randomness and Probability Models

Terminology 1: Random

Determine whether the following example is random:

1. The gender of a baby? Yes! Is it “fair”? Probably.

=⇒ The sex ratio at birth is commonly thought to be 107 boys to 100girls.

=⇒ The sex ratio for the entire world population is 101 males to 100females.

2. A statistic (e.g., x) from a random sample? Yes!

3. The outcome of a randomized experiment? Yes!

Theory of probability is the branch of mathematics that studies theserandom phenomena.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 28: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

28/ 51

Randomness and Probability Models

Terminology 2: Sample Space

A description of a random phenomenon is called a probability model.

The description has two parts:

1. A list of possible outcomes.

2. A probability for each outcome.

And a list of possible outcomes is called the “sample space.”

Sample Space

DefinitionThe sample space, denoted as S, is the set of all possible outcomes of arandom phenomenon.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 29: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

29/ 51

Randomness and Probability Models

Terminology 2: Sample Space

Determine the sample space for each random phenomenon:

1. Coin tossing?

=⇒ S = {heads, tail}

2. Tossing a six-sided die?

=⇒ S = {1, 2, 3, 4, 5, 6}

3. Gender at birth?

=⇒ S = {boy, girl}

4. An opinion poll that asks whether he or she advocates reducing federalspending on low-interest student loans?

=⇒ S = {yes,no}

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 30: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

30/ 51

Randomness and Probability Models

Terminology 2: Sample Space

Determine the sample space, if any, for each random phenomenon:

5. A statistic (i.e., the mean weight) calculated from a random sample?

=⇒ S = {x ∈ R such that x > 0}

6. Gender of the president of the US in 2015?=⇒ It is not “random.” We all know that Obama is male. So we cannottalk about sample space.

One of the advantages of using probability model is that a randomphenomenon that is seemingly different from another randomphenomenon can be described by the same probability model!

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 31: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

31/ 51

Randomness and Probability Models

Terminology 3: Event

Recall that the description of a random phenomenon has two parts:

1. A list of possible outcomes =⇒ Sample space.

2. A probability for each outcome in sample space.

But keep in mind that we need to assign probabilities not only to singleoutcomes in the sample space, but also to “sets” of outcomes.

In probability, an outcome or a set of outcomes is called an event.

Event

DefinitionAn event, denoted A or B, ..., is an outcome or a set of outcomes from arandom phenomenon. The event is a subset of the sample space.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 32: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

32/ 51

Randomness and Probability Models

Terminology 3: Event

Let’s illustrate what this event is:

1. Suppose that a firm has offices in five cities: SF, LA, NYC, Paris, andLondon.

2. A new employee will be assigned to work in one of these five offices.

3. A city in which a new employee gets assigned to is random!

4. What is the sample space?

=⇒ S = {SF,LA,NYC,Paris,London}

5. Let event A denotes whether an employee was assigned to work inCalifornia. Then event A is

=⇒ A = {SF,LA}

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 33: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

33/ 51

Randomness and Probability Models

Terminology 3: Event

The description of a random phenomenon:

1. A list of possible outcomes =⇒ Sample space.

2. Assign probabilities to events in sample space.

We now learn some basic rules that must be satisfied by any logicalsystem of assigning probabilities to events in a sample space.

There are four rules.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 34: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

34/ 51

Randomness and Probability Models

Probability Rule 1

Rule 1: For any event A in a sample space S, the probability of eventA, denoted as P (A), satisfies:

0 ≤ P (A) ≤ 1.

If P (A) = 0, it implies that event A never occurs.

If P (A) = 1, it implies that event A always occurs.

If P (A) = 0.5, it implies that event A occurs in half the trials in thelong run.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 35: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

35/ 51

Randomness and Probability Models

Probability Rule 2

Rule 2: Suppose S = {A,B,C, ...} represent the sample space of arandom phenomenon. Then

P (S) = P (A) + P (B) + P (C) + · · · = 1.

Rule 2 implies that the sum of the probabilities for all possibleoutcomes must be exactly 1.

Consider tossing a coin. Our sample space is

S = {heads, tails}.

As you can see, P (S) = P (heads) + P (tails) = 1.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 36: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

36/ 51

Randomness and Probability Models

Probability Rule 3

Rule 3: Let A and B be two events in a sample space S. We say thatthe two events are disjoint, or mutually exclusive, if A and B haveno outcomes in common and so can never occur simultaneously. And IfA and B are disjoint,

P (A or B) = P (A) + P (B)− P (A ∩B) = P (A) + P (B).

P (A or B) is equivalent to P (A ∪B), so Rule 3 implies that if A and Bare disjoint, then P (A ∪B) = P (A) + P (B) and P (A and B) orP (A ∩B) = 0.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 37: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

37/ 51

Randomness and Probability Models

Probability Rule 3: Example

Let’s illustrate Rule 3.

Set up:

1. At the Hosung company, 10% of the employees have MPA degrees.

2. 50% of the employees are female and 50% are male.

3. And all of the MPA degree holders are female.

4. What is the probability that a randomly selected person is either anMPA degree holder or a male?

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 38: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

38/ 51

Randomness and Probability Models

Probability Rule 4

Rule 4: The complement of any event A is the event that A does notoccur, and is denoted as Ac. Then

P (Ac) = 1− P (A).

Let’s prove the above rule mathematically:

Suppose a sample space contains n outcomes; i.e.,S = {A1, A2, ..., An−1, An}.

Let event A = {A1, A2, ..., Ak}, and Ac = {Ak+1, Ak+2, ..., An}.

Proof

P (S) = [P (A1) + · · ·+ P (Ak)] + [P (Ak+1) + · · ·+ P (An)] = 1

=⇒ P (A) + P (Ac) = 1

=⇒ P (Ac) = 1− P (A).

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 39: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

39/ 51

Randomness and Probability Models

Probability Rule 4: Example

Let’s illustrate Rule 4.

Set up:

1. Suppose that the computer center has 20 personal computers.

2. 14 of the computers are in perfect condition and 6 are not workingproperly.

3. And a student randomly selects one computer in the computer center.

4. What is the probability that the student chooses a defective computer?

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 40: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

40/ 51

Randomness and Probability Models

Usefulness of Venn diagrams

Using a Venn diagram helps us calculate probabilities.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 41: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

41/ 51

Randomness and Probability Models

Usefulness of Venn diagrams

The events A and B are disjoint.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 42: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

42/ 51

Randomness and Probability Models

Usefulness of Venn diagrams

Ac in a Venn diagram

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 43: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

43/ 51

Randomness and Probability Models

Usefulness of Venn diagrams

A ∩B in a Venn diagram

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 44: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

44/ 51

Randomness and Probability Models

Usefulness of Venn diagrams

A ∪B in a Venn diagram

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 45: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

45/ 51

Randomness and Probability Models

Independence

Before we move on to studying another probability rule, let’s define onevery important terminology.

Independence

DefinitionWe say that the two events, A and B, are independent if knowing that oneoccurs does not change the probability that the other occurs.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 46: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

46/ 51

Randomness and Probability Models

Probability Rule 5

Rule 5: When the two events A and B are “independent,”

P (A and B) = P (A)P (B), or equivalently P (A ∩B) = P (A)P (B).

The above rule is called the “multiplication” rule for independent events.

The tossing a coin example:

What is the probability of obtaining a head (A) and then a tail (B) ontwo tosses of a fair coin?

Coin has no memory, so we can assume that successive coin tosses areindependent.

=⇒ This implies that the outcome of the first toss does not influencethe outcome of any other toss.

Therefore by Rule 5,

P (A and B) = P (A)P (B) = 0.5× 0.5 = 0.25.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 47: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

47/ 51

Randomness and Probability Models

Example of Two Events That Are “Dependent”

If you take an SAT exam or other tests twice in succession, the two testscores (i.e., events) are not independent.

=⇒ The learning that occurs on the first attempt influences yoursecond attempt.

If you lean a lot from the first attempt, then your score on the secondattempt (event B) would be a lot higher than your score on the firstattempt (event A)

If events A and B are not independent, then we cannot use Rule 5 tocalculate P (AandB) by P (A)× P (B).

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 48: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

48/ 51

Randomness and Probability Models

Disjointness vs. Independence

True or false: if two events A and B are disjoint, they are independent.

=⇒ False!

If A and B are disjoint, then we know that if A has occurred, then Bcannot occur.

Unlike disjointness or complements, independence cannot be picturedby a Venn diagram because it involves the probabilities of the eventsrather than the outcomes that make up the events.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 49: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

49/ 51

Randomness and Probability Models

Computing Probabilities for Complex Events

The following example is a “classic” example that illustrates the use ofprobabilities that provides huge policy implications.

Setup:

1. Many people who come to clinics to be tested for HIV don’t come backto learn the test results.

2. So clinics use “rapid HIV tests” that give a result in a few minutes.

3. For the rapid HIV test, the FDA has established 2% as the maximum“false-positive” rate allowed for a rapid HIV test.

4. The false-positive rate for a diagnostic test is the probability that aperson with no diseases will have a positive test rest.

5. Suppose a clinic uses a test that meets the FDA standard and tests 50people who are free of HIV antibodies.

6. What is the probability that “at least” 1 false-positive will occur?

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 50: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

50/ 51

Randomness and Probability Models

Computing Probabilities for Complex Events

Solution:

1. The probability that the test gives false-positive for any person is 0.02.

2. It implies that the probability that the test gives a negative result forany person is 1− 0.02 = 0.98; i.e., P (Ai) = 0.98.

3. Question asks us to find P (at least 1 false-positive).

P (at least 1 false-positive) = 1− P (no false-positives)

= 1− P (50 negatives)

= 1− P (A1 ∩A2 ∩ · · · ∩A50)

= 1− 0.9850 (why?)

= 1− 0.364

= 0.636.

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)

Page 51: Randomness and Probability Models - Hosung Sohn (손호성)hosung.weebly.com/uploads/1/7/9/6/17964019/slide4-1.pdf · Review of Problem Set 1 Question 5: The distribution of SAT scores

51/ 51

Randomness and Probability Models

Computing Probabilities for Complex Events

Hence, there is about a 64% chance that at least 1 of the 50 people willtest positive for HIV even though none of them has the HIV.

Surprised by this high probability, the NYC Department of Health andMental Hygiene suspended the use of one particular rapid HIV test.

This example shows that calculating probabilities—properly—is veryimportant.

=⇒ It has huge policy consequences!

Hosung Sohn (Lecture Slide 4-1) Introduction to Statistics: PAI 721 (Fall, 2015)