chapter 22, part 2: computing p-values for …azimmer/lect23_ch21...-0.05 0.00 0.05 0.10 0.15 0.20 0...

Reminders

• Last HW and Last quiz on Thursday

• My office hours will be Today from 11-1

• If you won’t be around during the final week to take theFinal Project, please email me ASAP to arrange for atime for you to take it.

1

Warmup

• A drug company develops an AIDS treatment that theyhope will reduce the proportion of AIDS patients who diewithin 50 years. In a randomized control trial, 35% ofpatients in the control group died within 5 years. Thedrug company would like to show that the proportion ofpatients who die within 5 years in the treatment group isless than this.

• What is the null hypothesis for this experiment?

• What is the alternative hypothesis for thisexperiment?

2

Warmup

• A drug company develops an AIDS treatment that theyhope will reduce the proportion of AIDS patients who diewithin 50 years. In a randomized control trial, 35% ofpatients in the control group died within 5 years. Thedrug company would like to show that the proportion ofpatients who die within 5 years in the treatment group isless than this. What is the null hypothesis for thisexperiment? What is the alternative hypothesis forthis experiment?

• H0 : p = 0.35

• Ha : p < 0.35

3

Warmup

• It turns out that 28% of the patients in the treatmentgroup died within 5 years. The drug company calculatesthat the p-value for the experiment is .014. What doesthis p-value mean?

• Before the trial, the drug company set the significancelevel of the test at α = 1% = .01. What is theconclusion of this experiment?

4

Warmup• It turns out that 28% of the patients in the treatment

group died within 5 years. The drug company calculatesthat the p-value for the experiment is .014. What doesthis p-value mean?

• There is a .014 chance (14 in 1000 chance) that wewould observe results as extreme (as small) as we did ifthe null hypothesis was true.

• Before the trial, the drug company set thesignificance level of the test at α = 1% = .01. Whatis the conclusion of this experiment?

• Since the p-value is larger than the significance level, wefail to reject the null hypothesis and conclude thatthe differences we observe could be due to randomchance alone. So, we don’t have enough evidence tosuggest that the treatment group has a statistically lowerpercent of people dying within 5 years. 5

Chapter 22, Part 2: Computing p-values

for significance tests

Aaron ZimmermanSTAT 220 - Summer 2014

Department of StatisticsUniversity of Washington - Seattle

6

Practice

The U.S. military would like to know whether the proportionof women in the military has changed in the last 20 years. In1992, they know that 4.6% of active-duty soldiers werewomen. They would like to know if the current proportion isdifferent that this value.What is the null hypothesis for this experiment?What is the alternative hypothesis?

7

Practice

The U.S. military would like to know whether the proportionof women in the military has changed in the last 20 years. In1992, they know that 4.6% of active-duty soldiers werewomen. They would like to know if the current proportion isdifferent that this value.What is the null hypothesis for this experiment?What is the alternative hypothesis?H0 : p = 0.046Ha : p 6= 0.046

8

Practice

It turns out that 16% of the active duty soldiers they surveyedare women. The military calculates that the p-value for theexperiment is .003.What does this p-value mean?Before the trial, the military set the significance level of thetest at α = 5% = .05. Remember, the p-value of the test is.003.What is the conclusion of this experiment?

9

Practice

It turns out that 16% of the active duty soldiers they surveyedare women. The military calculates that the p-value for theexperiment is .003.What does this p-value mean?It means that there’s a 3 in 1000 chance (.003) that we wouldobserve a result this extreme (16% or more of active dutysoldiers are women) if the null hypothesis was true.Before the trial, the military set the significance level of thetest at α = 5% = .05. Remember, the p-value of the test is.003.What is the conclusion of this experiment?Since the p-value is less than α, we reject the null hypothesisand conclude that our data gives us evidence suggesting thatthe percent of active duty soldiers has changed since 1992.

10

Steps of a Test of Significance

• Returning to our motivating example from Monday,remember that the prosecution in the Kristin Gilbert casefound that there were 34/1384 = .025 deaths per shiftwhen Nurse Gilbert wasn’t working, and that there were40/257=.156 deaths per shift when she was working.

• We’d like to know if the high rate of deaths during hershift can be explained by random variation. That is, we’dlike to know if the rate of deaths during her shift is trulydifferent than .025.

• Question: Is there sufficient evidence against the nullhypothesis that the rate of deaths on Nurse Gilbert’sshifts are different than the baseline .025 rate if thesignificance level is α = .05?

11

Step 0 and 1: Significance Level & The Hypotheses• Before we even start, we set the significance level

(α = .05)

• Remember, the claim being tested in a statistical test iscalled the null hypothesis (H0).

• Nurse Gilbert’s defense claims that she’s unlucky and thatthe rate of deaths during her shift is the same as everyoneelse (.025).

? So, H0: p = 0.025

• The statement we hope or suspect is true instead of H0 iscalled the alternative hypothesis (Ha or H1).

• The prosecution wants to show that the percent of deathsunder Nurse Gilbert is larger than .025.

? So, Ha : p > 0.025

12

Step 2: The Sampling Distribution (if H0 is true)

• Remember, in a test ofsignificance, we start byassuming that H0 is true

• If H0 (p = 0.025) is true,what is the samplingdistribution of p̂?

? The samplingdistribution is Normal

? The mean is p = 0.025

? The standard deviation

is:√

p(1−p)n

=√.025(1−.025)

257= .00974

−0.05 0.00 0.05 0.10 0.15 0.20

010

2030

40

Sampling Distribution

x

y

13

Step 3: The Data

• There were 40 deaths outof 257 shifts under NurseGilbert

• So, p̂ = 40257

= .156

−0.05 0.00 0.05 0.10 0.15 0.20

010

2030

40


x

y

14

Step 4: The p-value (NEW)

• Remember: a p-value is theprobability of observing anoutcome as extreme or moreextreme than what we actuallyobserved if the null hypothesiswere true

• In this problem, the alternativehypothesis is one-sided(p > 0.025)

• So, the p-value is the area underthe normal curve that is as far orfurther away from the mean ofthe distribution.

−0.05 0.00 0.05 0.10 0.15 0.20

010

2030

40


p

p−value is the ‘more extreme' area under the Normal curve

NOTE: We’d look at the areaunder the curve to the left ofthe observation if thealternative was Ha : p < .025. 15


• What percent of the samplingdistribution is greater than theobservation of 40/257=0.156?

? Mean = 0.025

? SD = 0.0097

? Standard score:.156−.025

.0097= 13.5!

• Look up the standard score inTable B. Not so helpful - it justtells us that it must be less than1-.9997 = .0003

• My computer says the p-value isless than 1/100,000,000

−0.05 0.00 0.05 0.10 0.15 0.200

1020

3040


p

p−value is the ‘more extreme' area under the Normal curve

16


17

Step 5: Conclusion• The p-value of .00000001 means that there is a 1 in

100,000,000 chance that Kristin Gilbert would randomly(and unluckily) have that extreme percent of deathsduring her shifts if the proportion of deaths during shiftswas actually .025 (H0).

• Since my significance level is α = .05 andalpha > p − value, this test IS statistically significant.

• Conclusion: We have enough evidence to reject thenull hypothesis and conclude that the percent ofdeaths during Nurse Gilbert’s shifts is larger thanthe baseline rate of .025.

• REMEMBER: this doesn’t mean she was killing people,but it does imply that something different was happeningunder her watch.

18

Note #1: p-values in 2-sided tests

• In practice, when Ha is two-sided(Ha : p 6= .025), we calculate thearea that’s more extreme thanthe observation in one directionand then multiply by two

• We do this because in the2-sided setting, “more extreme”could be extreme and large orextreme and small. Either waygives us evidence against thenull hypothesis H0

• We’re not doing it for thisproblem, but you should beaware of it!

−0.05 0.00 0.05 0.10 0.15 0.200

1020

3040


p

2−sided p−value is found by looking at

‘more extreme' in both directions!

19

Note #2: Different Sample Sizes

• What if we only saw 7 of Nurse Gilbert’s shifts?

• 7× 40/257 ≈ 1. So using 1 death in 7 shifts is about thesame ratio.

• Then the sampling distribution would have mean .025,

but SD =√

.025(1−.025)7

= .059

• And the standard score would be .156−.025.059

= 2.22

• So the p-value would be 1-.9861 = .0139

• While we still would reject the null at α = .05, theevidence isn’t as strong, and we wouldn’t reject atα = .01.

20

Significance Tests for Means

• There’s no reason that we can’t apply the proportionssignificance testing framework directly towardssignificance tests for means.

• We’ll still use the same steps

• Chat with your neighbor about the strategy we’regoing to take to perform significance tests about amean

21

Significance Tests for Means

• There’s no reason that we can’t apply the proportionssignificance testing framework directly towardssignificance tests for means.

• We’ll still use the same steps

• Chat with your neighbor about the strategy we’regoing to take to perform significance tests about amean

Very generally, we find the sampling distribution if the nullhypothesis was true, and then we see how unlikely it was torecord data as extreme as what we’ve seen (still assuming H0

true).

22

Steps 0-5 for Significance Tests on Means• Step 0: pick a significance level (usually α = .05 unless

you have a reason to use a different level)• Step 1: Write down the hypotheses (both H0 & Ha)• Step 2: Determine the sampling distribution if H0 is true.

It will be Normal with mean equal to the claim in H0 andeither standard error like the standard errors used inconfidence intervals (from the CLT)

• Step 3: The data. Figure out what the sample mean isfrom your data

• Step 4: Find the p-value. It’s the area under samplingdistribution more extreme than the sample meanobservation. Multiply this p-value by 2 if you have a2-sided alternative.

• Step 5: Make a conclusion. If the p-value is smaller thanα, reject H0. If the p-value is larger than α, fail to rejectH0

23

Your turnA doctor claims that 17 year olds have an average bodytemperature that is higher than the commonly acceptedaverage human temperature of 98.6 degrees Fahrenheit. Asimple random statistical sample of 25 people, each of age 17,is selected. The average temperature of the 17 year olds isfound to be 98.83 degrees, with standard deviation of 0.6degrees.

The doctor hires you to perform a statisticalsignificance test to check the validity of his claim.

Perform the significance test.

How would your work change if he instead suspectedthat 17 year olds have a temperature different than98.6 but wasn’t sure if they were hotter or colder?

24

Your turn• Step 0: Significance level: α = 0.05

• Step 1: Hypotheses: H0 : µ = 98.6 VS Ha : µ > 98.6

• Step 2: Sampling distribution: Normal with mean = 98.6 andSD = 0.6/

√25 = .12

• Step 3: Data: Observation is 98.83, and we standardize it to98.83−98.6

.12 = 1.9

• Step 4: P-value: From Table B, 1-.9713 = .0287

• Step 5: Since .0287 < .05 (p − value < α), we reject the nullhypothesis and claim that we have a significant test result atthe .05 level. So, we conclude that we have enough evidenceto reject the null hypothesis that 17-year-olds have a 98.6degree average temperature in favor for the claim that thatthey have a higher average body temperature.

25

Your turn

97 98 99 100

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Sampling Distribution of Sample Mean

mean body temp

Sampling Dist. AssumingNull Hyp. True: µ= 98.6

Sample Mean from 17 year olds (98.83)

P−value (.0287)

26

Your turnAnd if we instead did the two-sided test,

• Step 0: Significance level: α = 0.05

• Step 1: Hypotheses: H0 : µ = 98.6, Ha : µ 6= 98.6

• Step 2: Sampling distribution: Normal with mean = 98.6 andSD = 0.6/

√25 = .12

• Step 3: Data: Observation is 98.9, and we standardize it to98.83−98.6

.12 = 1.9

• Step 4: P-value: From Table B,2× (1− .9713) = 2× .0287 = .0574

• Step 5: Since .0574 > .05 (p − value > α), we now fail toreject the null hypothesis! So we don’t have enough evidenceat the .05 significance level to suggest that 17-year-olds havea different average body temperature than the averagehuman. 27

Your turn

97 98 99 100

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Sampling Distribution of Sample Mean

mean body temp

Sampling Dist. Assuming Null Hyp. True

Sample Mean from 17 year olds

P−value

28

Homework• The final HW is up on the website• After today you can finish reading Ch. 22• Do problems:

22.24 (use significance level α = 5% = .05)22.27 (use significance level α = 1% = .01)22.2822.3222.34) A professor once claimed to Aaron that in a smalldiscussion course, about 10% of the students fall asleepat some point during class. During Monday’s lecture,Aaron counted that 1 out of the 20 students in class wereasleep at some point. Is this evidence that the trueproportion is different that 10%? (use significance level α= 5% = .05, and note that you will need a two-sidedalternative hypothesis)

29

chapter 22, part 2: computing p-values for …azimmer/lect23_ch21...-0.05 0.00 0.05 0.10 0.15 0.20 0...

Documents