stat22200 spring 2014 chapter 13a - university of...

STAT22200 Spring 2014 Chapter 13A

Yibi Huang

May 27, 2014

13.1-13.2 Randomized Complete Block Design (RCBD)13.3 Latin Square Designs

Chapter 13A - 1

Example – Hardness Readings (1)

A hardness testing machine operates by pressing the tool tip into ametal test coupon, then determine the hardness of the couponbased the depth of the resulting depression. To test the consistencyof the readings, four tool tips are used in an experiment, each tippresses four metal coupons made of same material.

Tip Metal CouponsTypes 1 2 3 4

1 9.3 9.4 9.6 10.02 9.4 9.3 9.8 9.93 9.2 9.4 9.5 9.74 9.7 9.6 10.0 10.2

Reading = hardness in Rockwell C scale minus 40

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

1.0 2.0 3.0 4.0

9.2

9.6

10.0

Tip TypeH

ardn

ess

Rea

ding

s

Questions: Which factor is the treatment factor? Tip types orcoupon?

Chapter 13A - 2

Example – Hardness Readings (2)

If we ignore the structure in the units and regard as a completelyrandomized design (CRD) with 4 treatments (tip types), theANOVA table would be

> hardness = read.table("hardness.dat",header=T)

> attach(hardness)

> summary(aov(readings ~ as.factor(tiptype)))

Df Sum Sq Mean Sq F value Pr(>F)

as.factor(tiptype) 3 0.385 0.12833 1.702 0.22

Residuals 12 0.905 0.07542

As the P-value 0.22 is big, we could NOT reject H0 that the fourtreatment groups have a common mean.

Wait! What does the ANOVA test mean in the contextof this experiment?

Chapter 13A - 3

Recall the Means Model for CRD

yij = µi + εij ,

in which

I yij = the hardness reading of tip type i on metal coupon j

I µi = mean hardness readings of tip type i

I εij : errors, i.i.d. ∼ N(0, σ2)

Does the model yij = µi + εij make sense in this context?

Different coupons may have different hardness, and should havedifferent readings.Why do we expect same reading when we are measuring differentmetal coupons?

Chapter 13A - 4

11

1

1

9.2

9.4

9.6

9.8

10.0

tiptype

Har

dnes

s R

eadi

ngs

22

2

23

3

3

344

4

4

1 2 3 4

Labels = Coupon ID

> ?interaction.plot

> interaction.plot(tiptype,coupon,readings,type="b",

+ ylab="Hardness Readings",legend=FALSE)

> legend("top","Labels = Coupon ID",bty="n",cex=.8)

We can see that coupon 4 always have the highest reading,regardless of tip types. Coupon 3 is the second. Coupon 1 and 2always have lower readings. The four coupons indeed differ inhardness.

Chapter 13A - 5

A More Reasonable Model

yij = µ+ αi + βj + εij

in whichI yij = the hardness reading of tip type i on metal coupon jI µ = the grand mean of yijI βj = the coupon effectI αi = bias of readings from the true hardness if a tip of type i

is usedI εij : measurement errors, i.i.d. ∼ N(0, σ2)

Question: In the context of this experiment, which parameters arewe more interested in? αi ’s or βj ’s?

Just like the models for factorial design, the model above isover-parameterized. some constraints must be imposed. Thecommonly used constraints are∑4

i=1αi = 0 and

∑4

j=1βi = 0.

Chapter 13A - 6

Block DesignsI A block is a set of experimental units that are homogeneous

in some sense. Hopefully, units in the same block will havesimilar responses (if applied with the same treatment.)

I Block designs: randomize the units within each block to thetreatments.

I Randomized Complete Block Design (RCBD): Supposethere are g treatments and the size of all the blocks are ofsize k , and k = rg is a multiple of g .

I Mostly, block size k = # of treatments g . From now on,assume k = g .

I Within each block, the k = rg experimental units arerandomized to the g treatments, r units per treatments.

I The word “complete” indicates that every block containsall the g treatments.

I The Match-Paired design in Chapter 2 is a special case ofRCBD in which the block size k = 2.

Chapter 13A - 7

Data Structure for a Randomized Complete Block Design(RCBD)

Block 1 Block 2 · · · Block b

Treatment 1 y11 y12 · · · y1bTreatment 2 y21 y22 · · · y2b

......

... · · ·...

Treatment g yg1 yg2 · · · ygb

Normally, data are shown arranged by block and treatment, as inthe hardness reading example. You cannot see what was/was notrandomized by looking at data only.

Chapter 13A - 8

Advantage of BlockingI Blocking is the second basic principle of experimental design

after randomization.

“Block what you can, randomize everything else.”

I If units are highly variable, grouping them into more similarblocks can lead to a large increase in efficiency (more powerto detect difference in treatment effects).

I The choice of blocks is crucial

Commonly used blockingI Block on batch (e.g., milk produced in a day)I Block spatiallyI Block on timeI Block when you can identify a source of variation (e.g., age,

gender, and history blocks)

Caution: Blocking must be done at the time of randomization;you can’t group units into blocks after the experiment has beenrun (That is called ANCOVA instead, not RCBD).

Chapter 13A - 9

A Blocking Factor is Not A Treatment Factor

It is sometimes easy to confuse blocking variables with treatmentfactors. Remember that

I We can assign an experimental unit to any treatment, butcannot assign an unit to any block.Blocking variables are a property of the experimental units,not something we can manipulate.e.g., in the hardness reading example, we can change the tiptype to measure the hardness of a unit, but cannot change thebatch an unit comes from.

I Since we cannot experimentally manipulate the blockingvariable, block effects are “observational”. We cannot makecausal inference to a blocking variable as to a treatmentfactor.

Chapter 13A - 10

Example: Patio Sealers

I Goal: Want to study the effects of 3 different sealers onprotecting concrete patios from the weather.

I Units: 15 unsealed patios are available spread across ChicagoI Which of the following two designs do you think will be better

in this context? CRD or RCBD?I CRD: Randomly assign the 15 units to the 3 sealers, 5

units each. Apply the assigned sealer to all the surface ofthe patio. The degree of erosion of the 15 patios areevaluated after 2 years.

I RCBD: Separate each patio into 3 portions, and applythe sealers (randomly) in such a way that each patioreceives each sealer for 1/3 of the surface. The patiosurface are evaluated after 2 years.

Patio (location) may be a important factor on the degree oferosion. Some patios may be better sheltered (by trees ornearby buildings etc.)

Chapter 13A - 11

Parameter Estimates for the RCBD

The model for a RCBD

yij = µ+ αi + βj + εij εij ’s are i.i.d. N(0, σ2).

has the same format as the additive model for a balanced two-wayfactorial design. So the two models have the same least squareestimate for parameters:

µ = y••

αi = y i• − µ = y i• − y••

βj = y•j − µ = y•j − y••

for i = 1, . . . , g and j = 1, . . . , b.Questions: Why not including treatment-block interactions?

Chapter 13A - 12

Fitted Values for RCBDThe fitted values for RCBD is then

yij = µ+ τj + βj

= y•• + (y i• − y••) + (y•j − y••)

= y i• + y•j − y••

= row mean + column mean − grand mean

just like in an additive model for a balanced two-way design.

Block 1 Block 2 · · · Block b row mean

Treatment 1 y11 y12 · · · y1b y1•Treatment 2 y21 y22 · · · y2b y2•

......

... · · ·...

...Treatment g yg1 yg2 · · · ygb yg•column mean y•1 y•2 · · · y•b grand mean y••

Chapter 13A - 13

Sum of Squares and Degrees of FreedomThe sum of squares and degrees of freedom for RCBD are just likethose for the additive models:

SST = SStrt + SSblock + SSE

where

SST =∑g

i=1

∑b

j=1(yij − y••)2

SStrt =∑g

i=1

∑b

j=1(y i• − y••)2 = b

∑g

i=1(y i• − y••)2

SSblock =∑g

i=1

∑b

j=1(y•j − y••)2 = g

∑b

j=1(y•j − y••)2

SSE =∑g

i=1

∑b

j=1(yij − y i• − y•j + y••)2.

Total Treatment Block ErrordfT = bg − 1 dftrt = g − 1 dfblock = b − 1 dfE = (g − 1)(b − 1)

Chapter 13A - 14

Expected Values for the Mean SquaresJust like CRD, the mean squares for RCBD is the sum of squaresdivided by the corresponding d.f.

MStrt =SStrt

g − 1, MSblock =

SSblock

b − 1, MSE =

SSE

(g − 1)(b − 1).

Under the effects model for RCBD,

yij = µ+ αi + βj + εij εij ’s are i.i.d. N(0, σ2).

one can show that

E(MStrt) = σ2 +b

g − 1

g∑i=1

α2i

E(MSblock) = σ2 +g

b − 1

b∑j=1

β2j

E(MSE) = σ2

Thus the MSE is again an unbiased estimator of σ2.Chapter 13A - 15

ANOVA F -Test for Treatment EffectTo test whether there is an treatment effect

H0 : α1 = α2 = · · · = αg v.s. Ha : not all αi ’s are equal

is equivalent to testing whether all αi ’s are zero

H0 : α1 = α2 = · · · = αg = 0 v.s. Ha : not all αi ’s are zero

because of the constraint∑g

i=1 αi = 0. This test is also equivalentto a comparison between 2 models:

full model: yij = µ+αi +βj+εij reduced model: yij = µ+βj+εij

The test statistic is

Ftrt =MStrt

MSE∼ Fg−1, (g−1)(b−1)under H0.

The ANOVA table is given in the next slide

Chapter 13A - 16

ANOVA Table for RCBD

Sum of MeanSource d.f. Squares Squares F

Treatment g − 1 SStrt MStrt Ftrt = MStrt

MSE

Block b − 1 SSblock MSblock (Fblock = MSblock

MSE )

Error (b − 1)(g − 1) SSE MSE

Total bg − 1 SST

I The F statistic Fblock for testing the block effect, which is notinteresting, and is often omitted.

Chapter 13A - 17

ANOVA Table for CRD and RCBDIf we ignore the block effect, and analyzed the experiment as aCRD, the ANOVA table becomes

Sum of MeanSource d.f. Squares Squares F

Treatment g − 1 SStrt MStrt Ftrt = MStrt

MSECRD

Error bg − g SSECRD MSECRD

Total bg − 1 SST

In the table SStrt and MStrt is the same in CRD and RCBD, butthe variability due to block is now in the error term

SSECRD = SSERCBD + SSblock .

If SSblock is large, by considering the block effect, one cansubstantially reduce the size of noise. With a smaller MSE, it iseasier to detect difference in treatment effects.

RCBD can be a very effective noise-reducing technique if SSblock islarge.

Chapter 13A - 18

Example – Hardness Reading

> anova(lm(readings ~ as.factor(coupon)+as.factor(tiptype)))


as.factor(tiptype) 3 0.385 0.12833 14.44 0.000871 ***

as.factor(coupon) 3 0.825 0.27500 30.94 4.52e-05 ***

Residuals 9 0.080 0.00889

If we ignore the block effect,

> summary(aov(readings ~ as.factor(tiptype)))


as.factor(tiptype) 3 0.385 0.12833 1.702 0.22

Residuals 12 0.905 0.07542

Chapter 13A - 19

Model Diagnosis

Just like CRD, one should also do model checking after fitting aRCBD model, including all kinds of residual plots, normal QQ plot,time plot, Box-Cox, etc.

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

9.2 9.4 9.6 9.8 10.0

−0.

100.

000.

10

Hardness Reading

Fitted Values

Res

idua

ls The residual plot showsnothing unusual, no clearevidence of non-constantvariance or outliers.

Chapter 13A - 20

Model Diagnosis

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

−2 −1 0 1 2

−0.

150.

000.

10

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

The QQ plot shows nothing un-usual. It is slightly curved, butas there is only a few points, weare not sure if it’s a real patternor just random fluctuation.

−10 −5 0 5 10

54.0

55.0

56.0

λ

log−

Like

lihoo

d

95%

The Box-Cox method gives awide C.I. for λ (about −10 to8), which means the data hasno specific preference for trans-forming the response.

Chapter 13A - 21

Contrasts in RCBD (1)

A natural estimator of a contrast C =∑g

i=1 wiαi is

C =

g∑i=1

wi αi =

g∑i=1

wi (y i• − y••) =

g∑i=1

wiy i•,

which is identical to the estimator in CRD. (Note a contrast musthave

∑gi=1 wi = 0). So is the SE of the estimator (group size ni

becomes b)

SE(C ) =

√√√√MSE×g∑

i=1

w2i

b

The (1− α)100% C.I. and test for C remain the same form, onlythe degrees of freedom changes from N − g to (g − 1)(b − 1).

C ± tα/2,(g−1)(b−1) × SE(C )

Chapter 13A - 22

Contrasts in RCBD (2)As for testing {

H0 : C =∑g

i=1 wiαi = 0

Ha : C =∑g

i=1 wiαi 6= 0

the test statistic remain the same form,

t0 =C

SE(C )=

∑gi=1 wiy i•√

MSE×∑g

i=1w2ib

∼ t(g−1)(b−1)

only the t−distribution changes its d.f. form N − g to(g − 1)(b − 1). At test level of α, reject H0 when

|t0| > tα/2, (g−1)(b−1)

For one-sided alternative, we reject H0 at level α when

t0 > tα, (g−1)(b−1), if Ha : C > 0,

t0 < −tα, (g−1)(b−1), if Ha : C < 0.

Chapter 13A - 23

Multiple ComparisonAll the multiple comparison procedures apply to RCBD, justchange the degree of freedom from N − g to (g − 1)(b − 1).

Critical Value toMethod Family of Tests Keep FWER < α

Fisher’s LSD a single pairwise tα/2,(g−1)(b−1)comparison

Dunnett all comparisons dα(g − 1, (g − 1)(b − 1))with a control

Tukey-Kramer all pairwise qα(g , (g − 1)(b − 1))/√

2comparisons

Bonferroni all pairwise tα/(2r),(g−1)(b−1),

comparisons where r = g(g−1)2

Scheffe all contrasts√

(g − 1)Fα,g−1,(g−1)(b−1)

Chapter 13A - 24

Power and Sample Size Calculation

When Ha is true, treatments have different effects, the ANOVAF -statistic also has a non-central F -distribution

F =MStrt

MSE∼

{Fg−1, (g−1)(b−1) under H0

Fg−1, (g−1)(b−1),δ under Ha

where the non-centrality parameter δ is

δ =

∑gi=1 bα

2i

σ2.

Note that only the degrees of freedom changes from N − g to(g − 1)(b − 1)

Chapter 13A - 25

13.3 Latin Square Designs — Blocking Two Variations Simultaneously

Sometimes there are more than one source of variations ordisturbance that can be eliminated by blocking.

Example Suppose in a farm there is a north-south variation insunlight and east-west variation in soil humidity. It is natural toblock on row and column position of plots.

For example, the treatments are 5 fertilizers A, B, C, D, E.Consider the design

ColumnRow 1 2 3 4 5I A B C D EII C D E A BIII E A B C DIV B C D E AV D E A B C

Each treatment occurs once in each row and in each column.This is called a Latin square. When we estimate treatmenteffects, the row and column effects will cancel out.

Chapter 13A - 26

Example — Automobile Emissions

Variables

Additives: A, B, C, D (chemicals aimed at reducing pollution)

Drivers: I, II, III, IV

Cars: 1, 2, 3, 4

Response: Emission reduction index measured for each testdrive

The experiment

Additives as treatments (i = 1, 2, 3, 4)

Drivers as a block variable (row block j = 1, 2, 3, 4)

Cars as another block variable (column block k = 1, 2, 3, 4)

The combination (driver, car) as experimental units

Latin square of order 4 as the design of the experiment

Chapter 13A - 27

Data and Design for the Automobile Emissions Example

CarsDrivers 1 2 3 4 driver average

I A B D C19 24 23 26 23

II D C A B23 24 19 30 24

III B D C A15 14 15 16 15

IV C A B D19 18 19 16 18

average grand meanper car 19 20 19 22 20

Treatment means

A: 18, B: 22, C: 21, D: 19

Chapter 13A - 28

Data Set of the Automobile Emissions Exampledriver car trt reduct

1 1 1 A 19

2 1 2 B 24

3 1 3 D 23

4 1 4 C 26

5 2 1 D 23

6 2 2 C 24

7 2 3 A 19

8 2 4 B 30

9 3 1 B 15

10 3 2 D 14

11 3 3 C 15

12 3 4 A 16

13 4 1 C 19

14 4 2 A 18

15 4 3 B 19

16 4 4 D 16

1520

2530

Treatment

mea

n of

red

uct

A B C D

car

1342

1520

2530

Treatment

mea

n of

red

uct

A B C D

driver

1243

Are there block effects by drivers, by cars?Which block effect is stronger?

Chapter 13A - 29

Model For a Latin Square DesignIf we label the two blocking variables j and k and the treatment iwith 1 ≤ i , j , k ≤ g , we write

yijk = µ+ αi + βj + γk + εijk

where∑iαi =

∑jβj =

∑kγk = 0, εijk ∼ i .i .d . N(0, σ2)

We have g2 experimental units. For given j and k we only haveone value i(j , k) corresponding to Treatment i .

The design is balanced, so we have the usual estimates:

µ = y•••, αi = yi•• − y•••

βj = y•j• − y•••

γk = y••k − y•••

Statistical analysis can be done using familiar R commands.Chapter 13A - 30

Why Do Latin Squares Design Work?What is the mean for y1••?

Chapter 13A - 31

Decomposition of the Latin Square (Automobile Emissions)1 2 3 4

I A B D C19 24 23 26

II D C A B23 24 19 30

III B D C A15 14 15 16

IV C A B D19 18 19 16

=

y•••

20 20 20 20

20 20 20 20

20 20 20 20

20 20 20 20

+

yijk − y•••

-1 4 3 6

3 4 -1 10

-5 -6 -5 -4

-1 -2 -1 -4

yijk − y•••-1 4 3 63 4 -1 10

-5 -6 -5 -4-1 -2 -1 -4

Y− YSSTotal = 312d .f . = 15

=

y•j• − y•••3 3 3 34 4 4 4-5 -5 -5 -5-2 -2 -2 -2

DSSDriver = 216

d .f . = 3

+

y••k − y•••-1 0 -1 2-1 0 -1 2-1 0 -1 2-1 0 -1 2

CSSCar = 24d .f . = 3

+

yi•• − y•••-2 2 -1 1-1 1 -2 22 -1 1 -21 -2 2 -1

TSSTrtl = 40d .f . = 3

+

residuals

-1 -1 2 01 -1 -2 2

-1 0 0 11 2 0 -3

RSSE = 32d .f . = 6

Chapter 13A - 32

ANOVA Table for a Latin Square — (w/o Replicates)Source d.f. SS MS F -value

Row-Block g − 1 SSrow SSrow/(g − 1) MSrow/MSE

Column-Block g − 1 SScol SScol/(g − 1) MScol/MSE

Treatment g − 1 SSTrt SSTrt/(g − 1) MSTrt/MSE

Error (g−2)(g−1) SSE SSE/[(g−2)(g−1)]

Total g2 − 1 SStotal

where (g − 2)(g − 1) = g2 − 1− 3(g − 1).

SSrow =∑

ijkβ2j =

∑ijk

(y•j• − y•••)2 = g∑

j(y•j• − y•••)2,

SScol =∑

ijkγ2k =

∑ijk

(y••k − y•••)2 = g∑

k(y••k − y•••)2,

SStrt =∑

ijkα2i =

∑ijk

(yi•• − y•••)2 = g∑

i(yi•• − y•••)2,

SSE = g∑

i(yijk − y i•• − y•j• − y••k + 2y••••)2

= SStotal − SSrow − SScol − SStrt ,

SStotal =∑

ijk(yijk − y•••)2

Chapter 13A - 33

Example: ANOVA Table for Automobile Emissions Data

> lm1 = lm(reduct ~ as.factor(car)+as.factor(driver)+as.factor(trt))

> anova(lm1)

Analysis of Variance Table

Response: reduct


as.factor(car) 3 24 8.000 1.5 0.307174

as.factor(driver) 3 216 72.000 13.5 0.004466 **

as.factor(trt) 3 40 13.333 2.5 0.156490

Residuals 6 32 5.333

Chapter 13A - 34

Advantages of Latin Squares

I For the same number of experimental units as a randomizedcomplete block design (RCBD) with g treatments and gblocks, we can simultaneously block for a second variable.

I Provide an elegant and efficient use of limited resources for asmall experiment.

Disadvantages of Latin Squares

I Cannot identify block-treatment interactions.

I There may be few d .f . left to estimate σ once both block andtreatment effects are estimated.E.g., in a 3× 3 square, block and treatment effects taken up3× 2 d .f ., allowing 1 d .f . for the grand mean leaves 2 d .f .for estimating σ, and this includes all the treatments.

I One can (and should) replicate Latin square (Next Lecture).

Chapter 13A - 35

Coming Up Next...

There are several extensions of RCBD

I 13.2 RCBD (done!) . . . . . . . . . . . . . . . . . . . . . 1 blocking variable

I 13.3 Latin Square Design (almost done!) 2 blocking variables

I 13.4 Graeco-Latin Square Design . . . . . . . . 3 blocking variables

I Chapter 14 Incomplete Block Design(what if block size 6= number of treatments?)

Chapter 13A - 36

stat22200 spring 2014 chapter 13a - university of...

Documents