unit 4: inference for numerical data - 2. anova · outline 1. housekeeping 2. main ideas 1....

27
Unit 4: Inference for numerical data 2. ANOVA GOVT 3990 - Spring 2020 Cornell University

Upload: others

Post on 11-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Unit 4: Inference for numerical data2. ANOVA

GOVT 3990 - Spring 2020

Cornell University

Dr. Garcia-Rios Slides posted at http://garciarios.github.io/govt_3990/

Page 2: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Outline

1. Housekeeping

2. Main ideas

1. Comparing many means requires care

2. ANOVA tests for some difference in means of manydifferent groups

3. ANOVA compares between group variation to within groupvariation

4. To identify which means are different, use t-tests and theBonferroni correction

3. Summary

Page 3: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Announcements

I SlackI Class zoom link

1

Page 4: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Outline

1. Housekeeping

2. Main ideas

1. Comparing many means requires care2. ANOVA tests for some difference in means of many

different groups3. ANOVA compares between group variation to within group

variation4. To identify which means are different, use t-tests and the

Bonferroni correction

3. Summary

Page 5: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Outline

1. Housekeeping

2. Main ideas

1. Comparing many means requires care2. ANOVA tests for some difference in means of many

different groups

3. ANOVA compares between group variation to within groupvariation

4. To identify which means are different, use t-tests and theBonferroni correction

3. Summary

Page 6: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

NEWS FLASH!Jelly beans rumored to cause acne!!!

How would you check this rumor? Imagine that doctors canassign an “acne score” to patients on a 0-100 scale.

I What would your research question be?I How would you conduct your study?I What statistical test would you use?

Use an independent samples t-test:H0 : µjelly beans − µplacebo = 0HA : µjelly beans − µplacebo ̸= 0

2

Page 7: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

NEWS FLASH!Jelly beans rumored to cause acne!!!

How would you check this rumor? Imagine that doctors canassign an “acne score” to patients on a 0-100 scale.

I What would your research question be?I How would you conduct your study?I What statistical test would you use?

Use an independent samples t-test:H0 : µjelly beans − µplacebo = 0HA : µjelly beans − µplacebo ̸= 0

2

Page 8: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

NEWS FLASH!Jelly beans rumored to cause acne!!!

How would you check this rumor? Imagine that doctors canassign an “acne score” to patients on a 0-100 scale.

I What would your research question be?I How would you conduct your study?I What statistical test would you use?

Use an independent samples t-test:H0 : µjelly beans − µplacebo = 0HA : µjelly beans − µplacebo ̸= 0

2

Page 9: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

NEWS FLASH!Jelly beans rumored to cause acne!!!

How would you check this rumor? Imagine that doctors canassign an “acne score” to patients on a 0-100 scale.

I What would your research question be?I How would you conduct your study?I What statistical test would you use?

Use an independent samples t-test:H0 : µjelly beans − µplacebo = 0HA : µjelly beans − µplacebo ̸= 0

2

Page 10: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Your turnSuppose α = 0.05. What is the probability of making a Type1 error and rejecting a null hypothesis like

H0 : µpurple jelly bean − µplacebo = 0

when it is actually true?

(a) 1%(b) 5%(c) 36%(d) 64%(e) 95%

3

Page 11: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Your turnSuppose α = 0.05. What is the probability of making a Type1 error and rejecting a null hypothesis like

H0 : µpurple jelly bean − µplacebo = 0

when it is actually true?

(a) 1%(b) 5%(c) 36%(d) 64%(e) 95%

3

Page 12: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Your turnSuppose we want to test 20 different colors of jelly beans versus aplacebo with hypotheses like

H0 : µpurple jelly bean − µplacebo = 0H0 : µbrown jelly bean − µplacebo = 0H0 : µpeach jelly bean − µplacebo = 0...

and we use α = 0.05 for each of these tests. What is the probability ofmaking at least one Type 1 error in these 20 independent tests?

(a) 1%(b) 5%(c) 36%(d) 64%

→ 1 − (1 − 0.05)20

(e) 95% 4

Page 13: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Your turnSuppose we want to test 20 different colors of jelly beans versus aplacebo with hypotheses like

H0 : µpurple jelly bean − µplacebo = 0H0 : µbrown jelly bean − µplacebo = 0H0 : µpeach jelly bean − µplacebo = 0...

and we use α = 0.05 for each of these tests. What is the probability ofmaking at least one Type 1 error in these 20 independent tests?

(a) 1%(b) 5%(c) 36%(d) 64% → 1 − (1 − 0.05)20

(e) 95% 4

Page 14: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Outline

1. Housekeeping

2. Main ideas

1. Comparing many means requires care

2. ANOVA tests for some difference in means of manydifferent groups

3. ANOVA compares between group variation to within groupvariation

4. To identify which means are different, use t-tests and theBonferroni correction

3. Summary

Page 15: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

ANOVA tests for some difference in means of many different groups

Null hypothesis:

H0 : µplacebo = µpurple = µbrown = . . . = µpeach = µorange.

Your turnWhich of the following is a correct statement of the alternativehypothesis?

(a) For any two groups, including the placebo group, no two groupmeans are the same.

(b) For any two groups, not including the placebo group, no two groupmeans are the same.

(c) Among the jelly bean groups, there are at least two groups thathave different group means from each other.

(d) Amongst all groups, there are at least two groups that havedifferent group means from each other. 5

Page 16: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

ANOVA tests for some difference in means of many different groups

Null hypothesis:

H0 : µplacebo = µpurple = µbrown = . . . = µpeach = µorange.

Your turnWhich of the following is a correct statement of the alternativehypothesis?

(a) For any two groups, including the placebo group, no two groupmeans are the same.

(b) For any two groups, not including the placebo group, no two groupmeans are the same.

(c) Among the jelly bean groups, there are at least two groups thathave different group means from each other.

(d) Amongst all groups, there are at least two groups that havedifferent group means from each other. 5

Page 17: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Outline

1. Housekeeping

2. Main ideas

1. Comparing many means requires care

2. ANOVA tests for some difference in means of manydifferent groups

3. ANOVA compares between group variation to within groupvariation

4. To identify which means are different, use t-tests and theBonferroni correction

3. Summary

Page 18: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

ANOVA compares between group variation to within group variation

∑|2/∑

|2

6

Page 19: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Relatively large WITHIN group variation: little apparent difference

∑|2/∑

|2

7

Page 20: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Relatively large BETWEEN group variation: there may be a differ-ence

∑|2/∑

|2

8

Page 21: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

For historical reasons, we use a modification of this ratio calledthe F-statistic:

F =

∑|2 / (k − 1)∑|2 / (n − k) =

MSGMSE

k: # of groups; n: # of obs.

Df Sum Sq Mean Sq F value Pr(>F)Between groups k-1

∑|2 MSG Fobs pobs

Within groups n-k∑

|2 MSETotal n-1

∑(|+ |)2

9

Page 22: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

For historical reasons, we use a modification of this ratio calledthe F-statistic:

F =

∑|2 / (k − 1)∑|2 / (n − k) =

MSGMSE

k: # of groups; n: # of obs.

Df Sum Sq Mean Sq F value Pr(>F)Between groups k-1

∑|2 MSG Fobs pobs

Within groups n-k∑

|2 MSETotal n-1

∑(|+ |)2

9

Page 23: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Outline

1. Housekeeping

2. Main ideas

1. Comparing many means requires care

2. ANOVA tests for some difference in means of manydifferent groups

3. ANOVA compares between group variation to within groupvariation

4. To identify which means are different, use t-tests and theBonferroni correction

3. Summary

Page 24: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

To identify which means are different, use t-tests and the Bonferronicorrection

I If the ANOVA yields a significant results, next naturalquestion is: “Which means are different?”

I Use t-tests comparing each pair of means to each other,– with a common variance (MSE from the ANOVA table)

instead of each group’s variances in the calculation of thestandard error,

– and with a common degrees of freedom (dfE from theANOVA table)

I Compare resulting p-values to a modified significance level

α⋆ =α

Kwhere K is the total number of pairwise tests

10

Page 25: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Application exercise: 4.4 ANOVA

11

Page 26: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Outline

1. Housekeeping

2. Main ideas

1. Comparing many means requires care

2. ANOVA tests for some difference in means of manydifferent groups

3. ANOVA compares between group variation to within groupvariation

4. To identify which means are different, use t-tests and theBonferroni correction

3. Summary

Page 27: Unit 4: Inference for numerical data - 2. ANOVA · Outline 1. Housekeeping 2. Main ideas 1. Comparing many means requires care 2. ANOVA tests for some difference in means of many

Summary of main ideas

1. ??2. ??3. ??4. ??

12