anova: a test of analysis of variance by harry lee and manik kuchroo

26
ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Upload: leonardo-chamberlin

Post on 30-Mar-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

ANOVA: A Test of Analysis of Variance

By Harry Lee and Manik Kuchroo

Page 2: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

What is the ANOVA Test?

• Remember the 2-Mean T-Test?• For example: A salesman in car sales wants

to find the difference between two types of cars in terms of mileage:

• Mid-Size Vehicles

• Sports Utility Vehicles

Page 3: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Car Salesman’s Sample

The salesman took an independent SRS from each population of vehicles:

Level n Mean StDev

Mid-size 28 27.101 mpg 2.629 mpg

SUV 26 20.423 mpg 2.914 mpg

If a 2-Mean TTest were done on this data:

T = 8.15 P-value = ~0

Page 4: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

What if the salesman wanted to compare another type of car, Pickup Trucks in addition to the SUV’s and Mid-size vehicles?

Level n Mean StDev

Midsize 28 27.101 mpg 2.629 mpg

SUV 26 20.423 mpg 2.914 mpg

Pickup 8 23.125 mpg 2.588 mpg

Page 5: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

This is an example of when we would use the ANOVA Test.

In a 2-Mean TTest, we see if the

difference between the 2 sample means is significant.

The ANOVA is used to compare multiple means, and see if the

difference between multiple sample means is significant.

Page 6: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Let’s Compare the Means…

Do these sample means look significantly different from each other?

Yes, we see that no two of these confidence intervals

overlap, therefore the means are significantly different.

This is the question that the ANOVA test answers

mathematically.

Page 7: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

More Confidence Intervals

What if the confidence intervals were different? Would these confidence intervals be significantly different?

SignificantNot Significant

Page 8: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

ANOVA Test Hypotheses

H0: µ1 = µ2 = µ3 (All of the means are equal)

HA: Not all of the means are equal

For Our Example:

H0: µMid-size = µSUV = µPickup

The mean mileages of Mid-size vehicles, Sports Utility

Vehicles, and Pickup trucks are all equal.

HA: Not all of the mean mileages of Mid-size vehicles,

Sports Utility Vehicles, and Pickup trucks are equal.

Page 9: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

F Statistic

• Like any other test, the ANOVA test has its own test statistic

• The statistic for ANOVA is called the F statistic, which we get from the F Test

• The F statistic takes into consideration: – number of samples taken (I)– sample size of each sample (n1, n2, …, nI)– means of the samples ( 1, 2, …, I)– standard deviations of each sample (s1, s2,

…, sI)

x x x

Page 10: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Explaining the F-Statistic

• The F statistic determines if the variation between sample means is significant

This is what we are doing when we look at the 95% confidence intervals.

SampleEach In sIndividual AmongVariation

Means Sample AmongVariation

Page 11: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Another Look at the CI’s

From this picture, we can see that the variation between sample means is greater than the variation in each sample; therefore, F is large.

Page 12: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

F Statistic Equation

INsnsnsn

Ixxnxxnxxn

FII

II

2222

211

2222

211

)1(...)1()1(1

)(...)()(

Rewritten as a formula, the F Statistic looks like this:

Weighing

Weighing

Standard Deviations (Squared)

Means (Squared)

Page 13: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

The F Statistic

Page 14: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Degrees of Freedom

• The ANOVA test has 2 degrees of freedom:– N-I (Total number sampled – Number of Groups)

– I-1 (Number of Groups – 1)

• Some sample distributions with different degrees of freedom:

Page 15: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

How About Our Example:

Data:

Level n Mean StDev

Midsize 28 27.101 mpg 2.629 mpg

SUV 26 20.423 mpg 2.914 mpg

Pickup 8 23.125 mpg 2.588 mpg

F value = 40.05

P-value = ~0 (Found from a table or using the Fcdf calculator command).

Page 16: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Conditions

As useful as the ANOVA test is, we can only use it if a number of conditions are met:

• We must take an independent SRS from each population that we sample

• All populations have the same standard deviation. (No population’s standard deviation is double another’s)

• All of the populations must be normally distributed

Page 17: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Testing the Conditions

• The salesman had originally taken independent SRS’s.

• The second condition is fulfilled since no sample has more than twice the standard deviation of any other.

• To test the third condition, whether the populations being sampled are normally shaped, we must look at the histograms of each sample:

Page 18: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Sample Histograms

2

4

6

8

10

12

14

16

Midsize16 18 20 22 24 26 28 30 32 34 36

Collection 1 Histogram

2

4

6

8

10

12

14

16

SUV16 18 20 22 24 26 28 30 32 34 36

Collection 1 Histogram

2

4

6

8

10

12

14

16

Pickup16 18 20 22 24 26 28 30 32 34 36

Collection 1 Histogram

All of the histograms appear to be relatively normally shaped.

Page 19: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Try a Problem

• Researchers are trying to see if the English AP scores from four different Massachusetts private schools are different. From each school, a random sample of students in the past year was taken and compared. Here are the results from the samples:

Page 20: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Results

School n Mean StDev

BB&N 23 4.3 0.4

Roxbury Latin 25 3.9 0.6

Winsor 26 4.2 0.3

Belmont Hill 29 3.1 0.3

Is there any significant difference between these schools’ AP English scores? (Assume that the populations are normally distributed)

Page 21: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Hypotheses

• H0: = µBB&N µRL = µWinsor = µBelHill

The mean AP English Test scores in BB&N, Roxbury Latin, Winsor, and Belmont Hill are all the same.

• HA: The mean AP English Test scores in BB&N, Roxbury Latin, Winsor, and Belmont Hill are not all the same.

Page 22: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Conditions

• Random samples taken

• All of the standard deviations are the same– No standard deviation is more than twice any

other.

• All of the populations are normally distributed

Page 23: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Doing out the F Statistic

Page 24: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

F Curve

• Plug the F statistic into the F distribution (df = 3, 99). The shaded area has a p-value of nearly 0.

Page 25: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Interpretation

Since all the conditions were met, we have conclusive evidence (df = 3,99, p = 0) to reject the null hypothesis that the mean AP English Test scores in BB&N, Roxbury Latin, Winsor, and Belmont Hill are all the same.

Page 26: ANOVA: A Test of Analysis of Variance By Harry Lee and Manik Kuchroo

Thanks For Watching

• A special thanks to Mr. Coons for all the help and advice.