analysis of variance petter mostad 2005.11.07. comparing more than two groups up to now we have...

26
Analysis of variance Petter Mostad 2005.11.07

Upload: edgar-horton

Post on 31-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Analysis of variance 

Petter Mostad

2005.11.07

Page 2: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Comparing more than two groups

• Up to now we have studied situations with– One observation per object

• One group• Two groups

– Two or more observations per object

• We will now study situations with one observation per object, and three or more groups of objects

• The most important question is as usual: Do the numbers in the groups come from the same population, or from different populations?

Page 3: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

ANOVA

• If you have three groups, could plausibly do pairwise comparisons. But if you have 10 groups? Too many pairwise comparisons: You would get too many false positives!

• You would really like to compare a null hypothesis of all equal, against some difference

• ANOVA: ANalysis Of VAriance

Page 4: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

One-way ANOVA: Example

• Assume ”treatment results” from 13 patients visiting one of three doctors are given: – Doctor A: 24,26,31,27– Doctor B: 29,31,30,36,33– Doctor C: 29,27,34,26

• H0: The treatment results are from the same population of results

• H1: They are from different populations

Page 5: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Comparing the groups

• Averages within groups: – Doctor A: 27

– Doctor B: 31.8

– Doctor C: 29

• Total average: • Variance around the mean matters for comparison. • We must compare the variance within the groups

to the variance between the group means.

4 27 5 31.8 4 2929.46

4 5 4

Page 6: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Variance within and between groups

• Sum of squares within groups:

• Compare it with sum of squares between groups:

• Comparing these, we also need to take into account the number of observations and sizes of groups

2 2 2(24 27) (26 27) ... (29 31.8) .... 94.8SSW

2 2 2

2 2 2

(27 29.46) (27 29.46) ... (31.8 29.46) ....

4(27 29.46) 5(31.8 29.46) 4(29 29.46) 52.43

SSG

Page 7: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Adjusting for group sizes

• Divide by the number of degrees of freedom

• Test statistic: reject H0 if this is large

SSWMSW

n K

1

SSGMSG

K

MSG

MSW

Both are estimates of population variance of error under H0

n: number of observationsK: number of groups

Page 8: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Test statistic thresholds

• If populations are normal, with the same variance, then we can show that under the null hypothesis,

• Reject at confidence level if

1,~ K n K

MSGF

MSW

1, ,K n K

MSGF

MSW

The F distribution, with K-1 and n-K degrees of freedom

Find this value in a table

Page 9: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Continuing example

• Thus we can NOT reject the null hypothesis in our case.

94.89.48

13 3

SSWMSW

n K

52.43

26.21 3 1

SSGMSG

K

26.22.76

9.48

MSG

MSW 3 1,13 3,0.05 4.10F

Page 10: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

ANOVA table

Source of variation

Sum of squares

Deg. of freedom

Mean squares

F ratio

Between groups

SSG K-1 MSG

Within groups

SSW n-K MSW

Total SST n-1

MSG

MSW

2 2 2(24 29.46) (26 29.46) ... (26 29.46)SST

SSG SSW SST NOTE:

Page 11: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

One-way ANOVA in SPSS

• ANOVA

VAR00001

52,431 2 26,215 2,765 ,111

94,800 10 9,480

147,231 12

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

Use ”Analyze => Compare Means => One-way ANOVA

Last column: The p-value: The smallest value of at which the null hypothesis is rejected.

Page 12: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

The Kruskal-Wallis test

• ANOVA is based on the assumption of normality

• There is a non-parametric alternative not relying this assumption:– Looking at all observations together, rank them– Let R1, R2, …,RK be the sums of ranks of each

group– If some R’s are much larger than others, it

indicates the numbers in different groups come from different populations

Page 13: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

The Kruskal-Wallis test

• The test statistic is

• Under the null hypothesis, this has an approximate distribution.

• The approximation is OK when each group contains at least 5 observations.

21K

2

1

123( 1)

( 1)

Ki

i i

RW n

n n n

Page 14: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Example: previous dataDoctor A Doctor B Doctor C

24 (rank 1) 29 (rank 6.5) 29 (rank 6.5)

26 (rank 2.5) 31 (rank 9.5) 27 (rank 4.5)

31 (rank 9.5) 30 (rank 8) 34 (rank 12)

27 (rank 4.5) 36 (rank 13) 26 (rank 2.5)

33 (rank 11)

R1=17.5 R2=48 R3=25.5

(We really havetoo few observations for this test!)

Page 15: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Kruskal-Wallis in SPSS

• Use ”Analyze=>Nonparametric tests=>K independent samples”

• For our data, we get Ranks

4 4,38

5 9,60

4 6,38

13

VAR000021

2

3

Total

VAR00001N Mean Rank

Test Statisticsa,b

4,195

2

,123

Chi-Square

df

Asymp. Sig.

VAR00001

Kruskal Wallis Testa.

Grouping Variable: VAR00002b.

Page 16: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

When to use what method

• In situations where we have one observation per object, and want to compare two or more groups: – Use non-parametric tests if you have enough data

• For two groups: Mann-Whitney U-test (Wilcoxon rank sum)

• For three or more groups use Kruskal-Wallis

– If data analysis indicate assumption of normally distributed independent errors is OK

• For two groups use t-test (equal or unequal variances assumed)

• For three or more groups use ANOVA

Page 17: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

When to use what method

• When you in addition to the main observation have some observations that can be used to pair or block objects, and want to compare groups, and assumption of normally distributed independent errors is OK: – For two groups, use paired-data t-test– For three or more groups, we can use two-way

ANOVA

Page 18: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Two-way ANOVA (without interaction)

• In two-way ANOVA, data fall into categories in two different ways: Each observation can be placed in a table.

• Example: Both doctor and type of treatment should influence outcome.

• Sometimes we are interested in studying both categories, sometimes the second category is used only to reduce unexplained variance. Then it is called a blocking variable

Page 19: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Sums of squares for two-way ANOVA

• Assume K categories, H blocks, and assume one observation xij for each category i and each block j block, so we have n=KH observations. – Mean for category i: – Mean for block j: – Overall mean:

ix

jx

x

Page 20: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Sums of squares for two-way ANOVA

2

1

( )K

ii

SSG H x x

2

1

( )H

jj

SSB K x x

2

1 1

( )K H

ij i ji j

SSE x x x x

2

1 1

( )K H

iji j

SST x x

SSG SSB SSE SST

Page 21: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

ANOVA table for two-way dataSource of variation

Sums of squares

Deg. of freedom

Mean squares F ratio

Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE

Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE

Error SSE (K-1)(H-1) MSE= SSE/(K-1)(H-1)

Total SST n-1

Test for between groups effect: compare to

Test for between blocks effect: compare to

MSG

MSEMSB

MSE

1,( 1)( 1)K K HF

1,( 1)( 1)H K HF

Page 22: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Two-way ANOVA (with interaction)

• The setup above assumes that the blocking variable influences outcomes in the same way in all categories (and vice versa)

• We can check if there is interaction between the blocking variable and the categories by extending the model with an interaction term

Page 23: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Sums of squares for two-way ANOVA (with interaction)

• Assume K categories, H blocks, and assume L observations xij1, xij2, …,xijL for each category i and each block j block, so we have n=KHL observations. – Mean for category i: – Mean for block j:– Mean for cell ij: – Overall mean:

ix

jx

x

ijx

Page 24: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Sums of squares for two-way ANOVA (with interaction)

2

1

( )K

ii

SSG HL x x

2

1

( )H

jj

SSB KL x x

2

1 1

( )K H

ij i ji j

SSI L x x x x

2

1 1 1

( )K H L

ijli j l

SST x x

SSG SSB SSI SSE SST

2

1 1 1

( )K H L

ijl iji j l

SSE x x

Page 25: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

ANOVA table for two-way data (with interaction)

Source of variation

Sums of squares

Deg. of freedom

Mean squares F ratio

Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE

Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE

Interaction SSI (K-1)(H-1) MSI=

SSI/(K-1)(H-1)

MSI/MSE

Error SSE KH(L-1) MSE= SSE/KH(L-1)

Total SST n-1

Test for interaction: compare MSI/MSE with

Test for block effect: compare MSB/MSE with

Test for group effect: compare MSG/MSE with 1, ( 1)K KH LF

1, ( 1)H KH LF

( 1)( 1), ( 1)K H KH LF

Page 26: Analysis of variance Petter Mostad 2005.11.07. Comparing more than two groups Up to now we have studied situations with –One observation per object One

Notes on ANOVA

• All analysis of variance (ANOVA) methods are based on the assumptions of normally distributed and independent errors

• The same problems can be described using the regression framework. We get exactly the same tests and results!

• There are many extensions beyond those mentioned