unbalanced designs in factorials type i, ii, iii...

41
Unbalanced Designs in Factorials Type I, II, III SS Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 41

Upload: others

Post on 26-Aug-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced Designs in FactorialsType I, II, III SS

Chapter 10 in Oehlert

STAT:5201

Week 9 - Lecture 2

1 / 41

Page 2: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type I, II, III SS

When we perform an ANOVA (and create an ANOVA table), we aretrying to quantify the amount of variability in the data accounted forby a specific “source” using Sums of Squares (SS).

For instance, SSA gives us an idea of the amount of variability in thedata accounted for by factor A.

For any model, we know SStotal = SSmodel + SSE .

But if we have a 2-factor ANOVA with A, B, and AB terms, doesSSmodel = SSA + SSB + SSAB?

ANSWER: It depends...For one, which SS are we walking about? Type I, II, or III?And also, is this a balanced or unbalanced design?

2 / 41

Page 3: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Balanced factorials: Type I, II, III SS

In a balanced 2-way ANOVA, the factors are orthogonal to eachother, and therefore, regardless of the Type of SS we are using (I, II,or III), we have...

SStotal = SSA + SSB + SSAB + SSE

The SSmodel can be partitioned into non-overlapping SS asSSmodel = SSA + SSB + SSAB

So, when we have equal sample sizes in each ‘cell’ (i.e. a balanceddesign) we don’t really need to be concerned about the different typesof SS because they are all equivalent.

3 / 41

Page 4: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorials: Type I, II, III SS

But if we have an unbalanced factorial the factors are notorthogonal, and they have some ‘overlapping’ information. So howmuch variability is explained by a given factor (quantified by SS)depends on what else is in the model.

If we have an unbalanced 2-way ANOVA, the relationshipSSmodel = SSA + SSB + SSAB will not necessarily hold.

The relationship will hold for the Type I SS (which is the sequentialsums of squares), but not for the Type II and Type III.

4 / 41

Page 5: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type I SS

Types of sums of squares

Type I SS

In Type I SS, the order in which terms are entered into the modelmatters. These SS are called sequential sums of squares.

For example, consider a regression scenario with the followingundoubtedly correlated variables:

X1 ≡ triglyceride levelX2 ≡ cholesterol levelX3 ≡ ageY ≡ blood pressure

In SAS, if we code the model: 1○ model = X1 X2 X3;

then the Type I SS for X3 describes the variability explained by X3

after accounting for X1 & X2. This is because X1 & X2 appear in themodel statement before X3.

5 / 41

Page 6: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type I SS

1○ model = X1 X2 X3; gives us the following Type I SS ...

Source Type I SS

X3 SS(X3|1,X1,X2) = SSE1,X1,X2 − SSE1,X1,X2,X3

NOTE: the 1 represents accounting for the intercept

These SS tell us how much more explained by the modelwhen X3 is entered sequentially after the other two terms.

And the other Type I SS for 1○ model...

X1 SS(X1|1) = SSE1 − SSE1,X1 (entered first)

X2 SS(X2|1,X1) = SSE1,X1 − SSE1,X1,X2 (entered second)

6 / 41

Page 7: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type I SS

Suppose we write the model in SAS as: 2○ model = X3 X1 X2; then

Source Type I SS

X3 SS(X3|1) = SSE1 − SSE1,X3 (entered first)

And the other Type I SS for 2○ model...

X1 SS(X1|1,X3) = SSE1,X3 − SSE1,X1,X3 (entered second)

X2 SS(X2|1,X1,X3) = SSE1 − SSE1,X1,X2,X3 (entered last)

7 / 41

Page 8: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type I SS

Thinking in terms of regression and the general model with predictorsX1,X2,X3, when would the Type I SS for X3 be approximately the samefor models 1○ and 2○ above? (i.e. When would the conditioningstatements not make a difference?)

SS(X3|1,X1,X2)?≈ SS(X3|1)

“added last” “added first”

The Type I SS are calculated based on the “order of entry” into themodel. Sometimes this may be useful, like when choosing a polynomialmodel, but often, it’s not the best option for testing the significance of aterm. Do you want the order of entry to impact the significance?

8 / 41

Page 9: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Balanced factorials: Type I, II, III SS

When the factorial is balanced, the conditioning doesn’t change the SSbecause the terms provide ‘unique’ nonoverlapping information.

SS(A|1) = SS(A|1,B)

Recall in the balanced design, SSmodel = SSA + SSB + SSAB .

In the 2x2 factorial example with 2 levels for each factor, the 3 d.f. for themodel coincide with three 1 d.f. orthogonal contrasts.

Let µ′ = (µ11, µ12, µ21, µ22)

Contrast SS df

c ′A = (1, 1,−1,−1) SSA 1c ′B = (1,−1, 1,−1) SSB 1c ′AB = (1,−1,−1, 1) +SSAB + 1

SSmodel 3

* NOTE: These are not orthogonal contrasts if unbalanced.9 / 41

Page 10: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Balanced factorials: Type I, II, III SS

We can also see thisorthogonality in the full rankdesign matrix X with columnslabeled µ, A, B, AB...

X =

1 1 1 11 1 1 11 1 1 11 1 −1 −11 1 −1 −11 1 −1 −11 −1 1 −11 −1 1 −11 −1 1 −11 −1 −1 11 −1 −1 11 −1 −1 1

12x4

The dot product of any twocolumns is zero.

In this case of orthogonality, theSS isn’t affected by which termsare entered first.

Sometimes I think of it as theterms are orthogonal to eachother, and what each termexplains doesn’t overlap withwhat another term explains.

In contrast, when the terms arenot orthogonal (i.e. the design isunbalanced), the terms sharesome information, or there issome overlap in what each termexplains.

10 / 41

Page 11: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Balanced factorials: Type I, II, III SS

Recall the balanced 2-way ANOVA Animal Fattening Experiment:Antibiotics (0mg, 40mg) and Vitamin (0mg, 5mg)

Example (R balanced factorial)

# Set the dummy variable coding to sum-to-zero constraints

> options(contrasts=c("contr.sum","contr.poly"))

> attach(af)

> table(vitamin, antibiotic)

antibiotic

vitamin 0 40

0 3 3

5 3 3

11 / 41

Page 12: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Balanced factorials: Type I, II, III SS

Recall that the anova() function in R provides Type I SS. But in a balanceddesign, order of entry does not matter because or the orthogonality.

Example (R balanced factorial)> anova(lm(gain~vitamin+antibiotic))

Analysis of Variance Table

Response: gain

Df Sum Sq Mean Sq F value Pr(>F)

vitamin 1 0.2187 0.218700 9.7537 0.01226 *

antibiotic 1 0.0192 0.019200 0.8563 0.37892

Residuals 9 0.2018 0.022422

# SWITCHING THE ORDER OF ENTRY

> anova(lm(gain~antibiotic+vitamin))

Analysis of Variance Table

Response: gain

Df Sum Sq Mean Sq F value Pr(>F)

antibiotic 1 0.0192 0.019200 0.8563 0.37892

vitamin 1 0.2187 0.218700 9.7537 0.01226 *

Residuals 9 0.2018 0.022422

12 / 41

Page 13: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorials: Types I, II, III SS

When we have unbalanced data, the SS for a term depends on theType of SS being requested. This is because Type I, II, and III eachuse a different set of terms ‘already being in the model’ beforecalculating how much the next variable entered explains.

Again, in a balanced design, these SS are all the same because eachterm provides unique information and doesn’t ‘take away’ from whatanother term explains.

In an orthogonal design, if you remove a term from the model, thenthat term’s SS are added to the error. The other SS are not affected.

13 / 41

Page 14: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Types of sums of squares

Type II SS

The Type II SS relates to the extra variability explained when a termis entered into the model after all terms at the same level or at amore fundamental level have already been entered. These SS couldbe called model building sums of squares. The Type II SS followsthe “hierarchy principle.”

If you fit the full model (i.e. all possible interactions), you can look atthe table of Type II SS in order to test for higher interactions, such asa 3-way interaction, and if it is not significant, you can use the sametable to test for 2-way interactions and the p-value coincides with amodel that does not include the 3-way interaction (but you don’tactually have to remove the 3-way interaction and re-fit the model).

14 / 41

Page 15: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Type II SS for 3-way ANOVA

Source Type II SS

A SS(A|1,B,C )B SS(B|1,A,C )C SS(C |1,A,B)AB SS(AB|1,A,B,C ,AC ,BC )AC SS(AC |1,A,B,C ,AB,BC )BC SS(BC |1,A,B,C ,AB,AC )ABC SS(ABC |1,A,B,C ,AB,AC ,BC )

So, you can look at the Type II SS and essentially get tests from a“new model” without actually re-fitting the model in SAS.

As discussed earlier, start with the highest-order interaction tests andwork your way toward the main effects, continuing while terms arenot significant.

15 / 41

Page 16: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)

[Problem 10.1 in Oehlert]An experiment investigated the release of the hormone ACTH from ratpituitary glands under eight treatments from three factors.

Response: amount of ACTHFactors: CRF(0 or 100nM)

Calcium(0 or 2mM)Verapamil (0 or 50mM)

The control treatment (all factors at 0 moles) received 8 EUs, while allother treatments received 4 EUs (N=36). This is a completely randomizeddesign.

All factors coded as 1,2.

We will use Type II SS to decide on what factors should be in the model.

16 / 41

Page 17: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)proc freq data=p10_1;

tables calcium*crf*verapamil/nocol norow nocum nopercent;

run;

The FREQ Procedure

Table 1 of crf by verapamil

Controlling for calcium=1

crf verapamil

Frequency| 1| 2| Total

---------+--------+--------+

1 | 8 | 4 | 12

---------+--------+--------+

2 | 4 | 4 | 8

---------+--------+--------+

Total 12 8 20

Table 2 of crf by verapamil

Controlling for calcium=2

crf verapamil

Frequency| 1| 2| Total

---------+--------+--------+

1 | 4 | 4 | 8

---------+--------+--------+

2 | 4 | 4 | 8

---------+--------+--------+

Total 8 8 1617 / 41

Page 18: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)

Due to nonconstant variance, the log transformation was used.

proc transreg data=p10_1;

model boxcox(response)=class(superfactor);

run;

/*Use the lambda=0 transformation from box-cox, which is log(y)*/

18 / 41

Page 19: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)data p10_1; set p10_1;

logy=log(response);

run;

proc glm data=p10_1 plot=diagnostics;

class verapamil crf calcium;

model logy=verapamil|crf|calcium/ss2;

run;

Source DF Type II SS Mean Square F Value Pr > F

verapamil 1 0.34131932 0.34131932 14.91 0.0006

crf 1 11.27564925 11.27564925 492.51 <.0001

verapamil*crf 1 0.03314380 0.03314380 1.45 0.2390

calcium 1 7.34125531 7.34125531 320.66 <.0001

verapamil*calcium 1 0.05613136 0.05613136 2.45 0.1286

crf*calcium 1 1.11913661 1.11913661 48.88 <.0001

verapami*crf*calcium 1 0.03096207 0.03096207 1.35 0.2547

19 / 41

Page 20: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)

2

1

Starting with the 3-way interaction, we see that this term is not significant1○ and is not needed in the model. Moving onto 2-way interactions, we

see that the only 2-way interaction that is needed is crf-by-calcium 2○.Note that the p-values for the 2-way interaction here 2○ relate to a modelwithout the 3-way interaction included.

20 / 41

Page 21: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)

Because the 2-way interaction for crf-by-calcium was significant, weneed to keep the main effects for crf and calcium in the model,regardless of their p-values in the table, because of the hierarchy principle.

We will re-fit the final parsimonious model and perform some relevanttests and plots.

proc glm data=p10_1 plot=diagnostics;

class verapamil crf calcium;

model logy= verapamil crf calcium crf*calcium/ss2;

lsmeans verapamil/pdiff;

lsmeans calcium*crf/adjust=tukey pdiff;

run;

21 / 41

Page 22: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)Dependent Variable: logy

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 4 22.91219129 5.72804782 235.02 <.0001

Error 31 0.75553614 0.02437213

Corrected Total 35 23.66772743

R-Square Coeff Var Root MSE logy Mean

0.968077 11.41904 0.156116 1.367153

Source DF Type II SS Mean Square F Value Pr > F

verapamil 1 0.34131932 0.34131932 14.00 0.0007

crf 1 11.43485091 11.43485091 469.18 <.0001

calcium 1 7.28089585 7.28089585 298.74 <.0001

crf*calcium 1 1.12256063 1.12256063 46.06 <.0001

NOTE: The Type II SS test for verapamil above puts the interaction into the error term, which will inflate the σ̂2 for that test,

but the main effect is strong enough here to be significant already (the Type III tests for this model would have removed that

interaction from the error term before testing verapamil, probably preferred).22 / 41

Page 23: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Type II SS)symbol1 value=star interpol=std1mj color=black line=1;

symbol2 value=circle interpol=std1mj color=blue line=2;

proc gplot data=p10_1;

plot logy*crf=calcium/haxis=.5 to 2.5;

run;

Looking at the plot, we see that calcium has a positive effect at both levels of crf, but it has a

larger positive effect when crf is set to the high level of 100 nM.

23 / 41

Page 24: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)Least Squares Means

Adjustment for Multiple Comparisons: Tukey-Kramer

LSMEAN

crf calcium logy LSMEAN Number

1 1 0.59678787 1

1 2 1.18026526 2

2 1 1.41154042 3

2 2 2.71481285 4

Least Squares Means for effect crf*calcium

Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: logy

i/j 1 2 3 4

1 <.0001 <.0001 <.0001

2 <.0001 0.0282 <.0001

3 <.0001 0.0282 <.0001

4 <.0001 <.0001 <.0001

All four crf*calcium means are significantly different from each other.

24 / 41

Page 25: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)Least Squares Means

H0:LSMean1=

LSMean2

verapamil logy LSMEAN Pr > |t|

1 1.37662585 0.0007

2 1.57507735

verapamil also has a positive effect on ACTH.

25 / 41

Page 26: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)

Interpretation of verapamil on original scale...We can take a closer look at the ‘positive’ main effect for verapamil byconsidering the log transformation, and backtransforming to the originalscale.

Changing verapamil from 0nM (low) to 50 nM (high) is associated witha positive change in the mean response (on the log-scale) of1.575-1.376=0.199 units. Thus, changing from the low quantity to thehigh quantity of verapamil is associated with a multiplicative change inthe mean response on the raw scale of exp(0.199) = 1.22 units.

log(y2)− log(y1) = 0.199 ⇒ log(y2

y1) = 0.199 ⇒

y2

y1= exp(0.199) ⇒ y2 = y1 exp(0.199)

26 / 41

Page 27: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Example (Unbalanced 3-way ANOVA)

Interpretation of calcium and crf on original scale...In the same manner as the slice option, we can consider the effect ofcalcium when crf is at the low level, and the effect of calcium whencrf is at the high level, separately (both significant).

Least Squares Means

LSMEAN

crf calcium logy LSMEAN Number

1 1 0.59678787 1

1 2 1.18026526 2

2 1 1.41154042 3

2 2 2.71481285 4

On the raw scale, changing calcium from low to high when crf is at the low level changes themean response by a multiplicative factor of exp(1.180-0.597)=1.79 units.

On the raw scale, changing calcium from low to high when crf is at the high level changes themean response by a multiplicative factor of exp(2.715-1.412)=3.68 units.

27 / 41

Page 28: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type II SS

Because the Type II SS takes the “hierarchy principle” intoconsideration, it can be used as a model building tool.

The summation of the Type II SS does not equal SSmodel when thedata are unbalanced.

28 / 41

Page 29: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type III SS

Types of sums of squares

Type III SS

The Type III SS relates to the extra variability explained when a termis entered into the model after ALL other terms have already beenentered. These SS could be called fully adjusted sums of squares.The Type III SS does not follow the “hierarchy principle, so youshould only look at relevant rows of the table.

Source Type III SS

A SS(A|1,B,C ,AB,AC ,BC ,ABC )B SS(B|1,A,C ,AB,AC ,BC ,ABC )C SS(C |1,A,B,AB,AC ,BC ,ABC )AB SS(AB|1,A,B,C ,AC ,BC ,ABC )AC SS(AC |1,A,B,C ,AB,BC ,ABC )BC SS(BC |1,A,B,C ,AB,AC ,ABC )ABC SS(ABC |1,A,B,C ,AB,AC ,BC )

29 / 41

Page 30: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Type III SS

The Type III SS does not honor the “hierarchy principle”, but thetable is still relevant if you start with the highest order interaction,and only proceed if interaction terms are not significant.

In practice, I find that I look at the Type III SS, and then removenonsignificant interaction terms and then re-fit the smaller model(but Type II SS could give me this same information).

The summation of the Type III SS does not equal SSmodel when thedata are unbalanced.

30 / 41

Page 31: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

A professor wished to investigate if the version of exam (2 version)had any impact on scores.

Exams were randomly handed out to students.

The factors for the data are status (1:undergrad or 2:grad) andexam (1 or 2). The data are unbalanced with respect to the factors,and status would be a ‘controlled for’ factor here.

status1 2

exam 1 11 112 6 17

What is noticeable here is that exam version 2 was taken by adisproportionate number of graduate students, and we wouldnaturally think that the grad students would perform better than theundergrads (which the data confirms).

31 / 41

Page 32: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

The first statistical test would naturally be on the interaction.

Source DF Type I SS Mean Square F Value Pr > F

status 1 5912.815873 5912.815873 30.13 <.0001

exam 1 13.797977 13.797977 0.07 0.7922

status*exam 1 243.243786 243.243786 1.24 0.2720 <--

Source DF Type II SS Mean Square F Value Pr > F

status 1 5417.733042 5417.733042 27.61 <.0001

exam 1 13.797977 13.797977 0.07 0.7922

status*exam 1 243.243786 243.243786 1.24 0.2720 <--

Source DF Type III SS Mean Square F Value Pr > F

status 1 5602.998709 5602.998709 28.55 <.0001

exam 1 0.299684 0.299684 0.00 0.9690

status*exam 1 243.243786 243.243786 1.24 0.2720 <--

32 / 41

Page 33: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

As this is an unbalanced design, order matters in the the Type I SS.

proc glm data=grades plot=diagnostics;

class status exam;

model grades = status exam/ ss1;

run;

Source DF Type I SS Mean Square F Value Pr > F

status 1 5912.815873 5912.815873 29.96 <.0001

exam 1 13.797977 13.797977 0.07 0.7928

proc glm data=grades plot=diagnostics;

class status exam;

model grades = exam status/ ss1;

run;

Source DF Type I SS Mean Square F Value Pr > F

exam 1 508.880808 508.880808 2.58 0.1158

status 1 5417.733042 5417.733042 27.45 <.0001

The exam factor looks much more significant when added first comparedto being added second.

33 / 41

Page 34: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

Let’s look at order of entry, Type II SS (hierarchical), for main effects.

proc glm data=grades plot=diagnostics;

class status exam;

model grades = status exam/ ss2;

run;

Source DF Type II SS Mean Square F Value Pr > F

status 1 5417.733042 5417.733042 27.45 <.0001

exam 1 13.797977 13.797977 0.07 0.7928

proc glm data=grades plot=diagnostics;

class status exam;

model grades = exam status/ ss2;

run;

Source DF Type II SS Mean Square F Value Pr > F

exam 1 13.797977 13.797977 0.07 0.7928

status 1 5417.733042 5417.733042 27.45 <.0001

Order of entry did not matter here because A and B are at the samehierarchical level. SS(A|B) and SS(B|A) is outputted for Type II SS.

34 / 41

Page 35: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

Type III SS (fully adjusted), for main effects.

proc glm data=grades plot=diagnostics;

class status exam;

model grades = status exam/ ss3;

run;

Source DF Type III SS Mean Square F Value Pr > F

status 1 5417.733042 5417.733042 27.45 <.0001

exam 1 13.797977 13.797977 0.07 0.7928

proc glm data=grades plot=diagnostics;

class status exam;

model grades = exam status/ ss3;

run;

Source DF Type III SS Mean Square F Value Pr > F

exam 1 13.797977 13.797977 0.07 0.7928

status 1 5417.733042 5417.733042 27.45 <.0001

In this case of a 2-way ANOVA main effects, Type II & III SS are the same.35 / 41

Page 36: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

Just to emphasize the need to account for other factors (or knowncovariates) prior to testing a factor of interest in an observational study oran unbalanced ANOVA, I’m going to consider two models: 1) exam only,and 2) exam and status.

Really, the instructor wanted to know if the exam version mattered, butthey wisely kept track of what type of student was taking the test.

1) exam as the only predictor

proc glm data=grades plot=diagnostics;

class exam;

model grades = exam;

lsmeans exam;

run;

36 / 41

Page 37: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

In this one factor model, exam is not significant but it’s effect (version 1mean 62.3, version 2 mean 69.0) is inflated compared to the model thatfits both exam and status (next model).

Source DF Type III SS Mean Square F Value Pr > F

exam 1 508.8808081 508.8808081 1.60 0.2132

Least Squares Means

exam LSMEAN

1 62.2727273

2 69.0000000

37 / 41

Page 38: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

2) exam and status as predictors

proc glm data=grades plot=diagnostics;

class status exam;

model grades = exam status/ss3;

lsmeans exam;

run;

Source DF Type III SS Mean Square F Value Pr > F

exam 1 13.797977 13.797977 0.07 0.7928

status 1 5417.733042 5417.733042 27.45 <.0001

Least Squares Means

exam LSMEAN

1 62.2727273

2 63.4157549

Given a student’s status, the estimated effect of exam version is muchmuch smaller here (version 1 mean 62.3, version 2 mean 63.4)

38 / 41

Page 39: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

Unbalanced factorial: Type I, II, III SS

Example (SAS unbalanced 2x2 factorial)

20

40

60

80

1.00 1.25 1.50 1.75 2.00

Status: Undergrad=1, Grad=2

grades exam

1

2

Model accounting for Status & Exam version

20

40

60

80

1.00 1.25 1.50 1.75 2.00

Exam version (only predictor)grades status

1

2

Model accounting for only Exam version

A status affect is expected. Because this is an unbalanced design, weshould account for status before we test for an exam effect (factor ofprimary interest). If not, we could get a false impression (bias) of the exam

effect.

39 / 41

Page 40: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

R code

Example (R code for plot 1)lmBoth <- lm(grades~statusNum+examNum,data=df)

library(ggplot2)

ggplot(df, aes(x = statusNum, y = grades, colour=exam)) +

geom_point() +

scale_colour_manual(values = c("#A31727","#17A319")) +

theme(axis.title.x = element_text(face="bold", colour="black", size=15),

axis.title.y = element_text(face="bold", colour="black", size=15)) +

xlab("Status: Undergrad=1, Grad=2") +

ggtitle("Model accounting for Status & Exam version") +

geom_abline(intercept = lmBoth$coefficients[1],

slope = lmBoth$coefficients[2],size=1,col="#A31727") +

geom_abline(intercept = (lmBoth$coefficients[1]+lmBoth$coefficients[3]),

slope = lmBoth$coefficients[2],size=1,col="#17A319")

40 / 41

Page 41: Unbalanced Designs in Factorials Type I, II, III SShomepage.stat.uiowa.edu/...unbalanced_factorials.pdf · Unbalanced factorials: Type I, II, III SS But if we have an unbalanced factorial

R code

Example (R code for plot 2)ggplot(df, aes(x = examNum, y = grades,colour=status)) +

geom_point() +

geom_abline(intercept = lmExam$coefficients[1],

slope = lmExam$coefficients[2],size=1) +

theme(axis.title.x = element_text(face="bold", colour="black", size=15),

axis.title.y = element_text(face="bold", colour="black", size=15)) +

xlab(" Exam version (only predictor)") +

ggtitle("Model accounting for only Exam version")

41 / 41