anova ( analysis of variance)

36
ANOVA (Analysis of Variance) Martina Litschmannová m artina.litschmannova @vsb.cz K210

Upload: ania

Post on 24-Feb-2016

87 views

Category:

Documents


0 download

DESCRIPTION

ANOVA ( Analysis of Variance). Martina Litschmannová m artina.litschmannova @vsb.cz K210. The basic ANOVA situation. Two variables: 1 Categorical, 1 Quantitative - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ANOVA ( Analysis of  Variance)

ANOVA(Analysis of Variance)

Martina Litschmannová[email protected]

K210

Page 2: ANOVA ( Analysis of  Variance)

The basic ANOVA situation

Two variables: 1 Categorical, 1 Quantitative

Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical variable) the individual is in?

If categorical variable has only 2 values: null hypothesis:

ANOVA allows for 3 or more groups

Page 3: ANOVA ( Analysis of  Variance)

An example ANOVA situation

Subjects: 25 patients with blisters Treatments: Treatment A, Treatment B, Placebo Measurement: # of days until blisters heal

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

Are these differences significant?

Page 4: ANOVA ( Analysis of  Variance)

Informal Investigation

Graphical investigation: side-by-side box plots multiple histograms

Whether the differences between the groups are significant depends on the difference in the means the standard deviations of each group the sample sizes

ANOVA determines p-value from the F statistic

Page 5: ANOVA ( Analysis of  Variance)

Side by Side Boxplots

PBA

13

12

11

10

9

8

7

6

5

treatment

days

Page 6: ANOVA ( Analysis of  Variance)

What does ANOVA do?At its simplest (there are extensions) ANOVA tests the following

hypotheses:

H0: The means of all the groups are equal.

HA: Not all the means are equal.• Doesn’t say how or which ones differ.• Can follow up with “multiple comparisons”.

Note: We usually refer to the sub-populations as “groups” when doing ANOVA.

Page 7: ANOVA ( Analysis of  Variance)

Assumptions of ANOVA

each group is approximately normalcheck this by looking at histograms and/or normal

quantile plots, or use assumptions can handle some nonnormality, but not severe outliers test of normality

standard deviations of each group are approximately equal rule of thumb: ratio of largest to smallest sample st.

dev. must be less than 2:1 test of homoscedasticity

Page 8: ANOVA ( Analysis of  Variance)

Normality Check

We should check for normality using:• assumptions about population • histograms for each group• normal quantile plot for each group• test of normality (Shapiro-Wilk, Liliefors, Anderson-

Darling test, …)

With such small data sets, there really isn’t a really good way to check normality from data, but we make the common assumption that physical measurements of people tend to be normally distributed.

Page 9: ANOVA ( Analysis of  Variance)

Shapiro-Wilk test

One of the strongest tests of normality. [Shapiro, Wilk]

Online computer applet (Simon Ditto, 2009) for this test can be found here.

Page 10: ANOVA ( Analysis of  Variance)

Standard Deviation Check

Compare largest and smallest standard deviations: largest: 1,764 smallest: 1,458 1,764/1,458=1,210<2 OK

Note: Std. dev. ratio greather then 2 signs heteroscedasticity.

Variable treatment N Mean Median StDevdays A 8 7.250 7.000 1.669 B 8 8.875 9.000 1.458 P 9 10.111 10.000 1.764

Page 11: ANOVA ( Analysis of  Variance)

ANOVA Notation

Number of Individuals all together : ,Sample means: ,Grand mean: ,Sample Standard Deviations:

Group 1 2 … k

Sample …

Sample Size …

Sample average …

Sample Std. Deviation …

Page 12: ANOVA ( Analysis of  Variance)

Levene Test

Null and alternative hypothesis:H0: , HA: Test Statistic:

,where , , , , .

p-value: , where is CDF of Fisher-Snedecor distribution with , degrees of freedom.

Page 13: ANOVA ( Analysis of  Variance)

How ANOVA works (outline)ANOVA measures two sources of variation in the data and compares their relative sizes.

Page 14: ANOVA ( Analysis of  Variance)

How ANOVA works (outline)

Sum of Squares between Groups, ,

resp. Mean of Squares – between groups,

where is degrees of freedom .

Sum of Squares – errors,

resp. Mean of squares - error,

where is degrees of freedom .

Difference between Means

Difference within Groups

Page 15: ANOVA ( Analysis of  Variance)

The ANOVA F-statistic is a ratio of the Between Group Variaton divided by the Within Group Variation:

A large F is evidence against H0, since it indicates that there is more difference between groups than within groups.

𝐹=𝑀𝑆𝐵

𝑀𝑆𝑒

Page 16: ANOVA ( Analysis of  Variance)

ANOVA Output

Source of Variation SS DF MS F p-valueBetween Groups 34,74 2 17,37 6,45 0,006Within Groups 59,26 22 2,69 Total 94,00 24

Page 17: ANOVA ( Analysis of  Variance)

How are these computations made?

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between GroupsWithin Groups --- ---

Total --- --- ---

Page 18: ANOVA ( Analysis of  Variance)

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Count Average Variance ------------------------------------------------------------------------------<35 let 53 25,0796 10,3825 35 - 50 let 123 25,9492 16,2775 >50 let 76 26,0982 12,3393 -------------------------------------------------------------------------------Total 252 25,8113 13,8971

BMI

méně než 35 let od 35 do 50 let více než 50 let18

28

38

48

58

Page 19: ANOVA ( Analysis of  Variance)

Assumptions:1. Normality

2. Homoskedasticita H0: , HA: (Levene test)

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 20: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Count Average Variance ------------------------------------------------------------------------------<35 let 53 25,0796 10,3825 35 - 50 let 123 25,9492 16,2775 >50 let 76 26,0982 12,3393 -------------------------------------------------------------------------------Total 252 25,8113 13,8971

+ +34,0

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 21: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between GroupsWithin Groups --- ---

Total --- --- ---

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 22: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between Groups Within Groups 3451,9 --- ---Total --- --- ---

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 23: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between Groups Within Groups 3451,9 --- ---Total 3485,9 --- --- ---

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 24: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between Groups Within Groups 3451,9 --- ---Total 3485,9 --- --- ---

k … number of sanmples n … t

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 25: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between Groups Within Groups 3451,9 --- ---Total 3485,9 --- --- ---

/ =/ =

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 26: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between Groups Within Groups 3451,9 --- ---Total 3485,9 --- --- ---

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 27: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between Groups Within Groups 3451,9 --- ---Total 3485,9 --- --- ---

,

where F(x) is CDF of Fisher-Snedecor distribution with 2 , 249 degrees of freedom

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 28: ANOVA ( Analysis of  Variance)

Null and alternative hypothesis:

H0: , HA:

Calculating of p-value:

Result:We dont reject null hypothesis at the significance level 0,05. There is not a statistically significant difference between the means of BMI depended on the age.

Source of Variation Sum of Squares Degrees of

FreedomMean of Squares

Between Groups Within Groups 3451,9 --- ---Total 3485,9 --- --- ---

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI.

Page 29: ANOVA ( Analysis of  Variance)

Where’s the Difference?

Analysis of Variance for days Source DF SS MS F Ptreatmen 2 34.74 17.37 6.45 0.006Error 22 59.26 2.69Total 24 94.00 Individual 95% CIs For Mean Based on Pooled StDevLevel N Mean StDev ----------+---------+---------+------A 8 7.250 1.669 (-------*-------) B 8 8.875 1.458 (-------*-------) P 9 10.111 1.764 (------*-------) ----------+---------+---------+------Pooled StDev = 1.641 7.5 9.0 10.5

Once ANOVA indicates that the groups do not all appear to have the same means, what do we do?

Clearest difference: P is worse than A (CI’s don’t overlap)

Page 30: ANOVA ( Analysis of  Variance)

Multiple Comparisons

Once ANOVA indicates that the groups do not all have the same means, we can compare them two by two using the 2-sample t test.

We need to adjust our p-value threshold because we are doing multiple tests with the same data.

There are several methods for doing this.

If we really just want to test the difference between one pair of treatments, we should set the study up that way.

Page 31: ANOVA ( Analysis of  Variance)

Bonferroni method – post hoc analysis

We reject null hypothesis if

,

where is correct significance level, , is is quantile of Student distribution with degrees of freedom.

Page 32: ANOVA ( Analysis of  Variance)

Kruskal-Wallis test

The Kruskal–Wallis test is most commonly used when there is one nominal variable and one measurement variable, and the measurement variable does not meet the normality assumption of an ANOVA.

It is the non-parametric analogue of a one-way ANOVA.

Page 33: ANOVA ( Analysis of  Variance)

Kruskal-Wallis test

Like most non-parametric tests, it is performed on ranked data, so the measurement observations are converted to their ranks in the overall data set: the smallest value gets a rank of 1, the next smallest gets a rank of 2, and so on. The loss of information involved in substituting ranks for the original values can make this a less powerful test than an anova, so the anova should be used if the data meet the assumptions.

If the original data set actually consists of one nominal variable and one ranked variable, you cannot do an anova and must use the Kruskal–Wallis test.

Page 34: ANOVA ( Analysis of  Variance)

1. The farm bred three breeds of rabbits. An attempt was made (rabbits.xls), whose objective was to determine whether there is statistically significant (conclusive) the difference in weight between breeds of rabbits. Verify.

http://vassarstats.net/anova1u.html

Page 35: ANOVA ( Analysis of  Variance)

2. The effects of three drugs on blood clotting was determined. Among other indicators was determined the thrombin time. Information about the 45 monitored patients are recorded in the file thrombin.xls. Does the magnitude of thrombin time depend on the used preparation?

http://vassarstats.net/kw3.html

Page 36: ANOVA ( Analysis of  Variance)

Study materials :

http://homel.vsb.cz/~bri10/Teaching/Bris%20Prob%20&%20Stat.pdf (p. 142 - p.154)

Shapiro, S.S., Wilk, M.B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika. 1965, roč. 52, č. 3/4, s. 591-611. Dostupné z: http://www.jstor.org/stable/2333709.