iv. randomized complete block design (rcbd)

Statistical Modelling Chapter IV 1

IV. Randomized Complete Block Design (RCBD)

IV.A Design of an RCBD IV.B Indicator-variable models and estimation

for an RCBD IV.C Hypothesis testing using the ANOVA

method for an RCBDIV.D Diagnostic checking IV.E Treatment differences IV.F Fixed versus random effectsIV.G Generalized randomized complete block

design


IV.A Design of an RCBD

Definition II.6: A randomized complete block design is one in which the number of experimental units per block is equal to the number of treatments and every treatment occurs once and only once in each block, the order of treatments within a block being randomized.

– b denotes no. of blocks – t denotes both no. of units in each block and no. of

treatments.– n bt denotes total no. of observations.

• In RCBD group units into blocks such that the units in a block are as similar as possible.


Forming blocks in field experiments• Place plots parallel to

the trend and blocks perpendicular to it.

Less stony end of field

I

. . .

II

. . .

Block III

. . .

IV

. . .

stonier end of field

Less stony side of field

Stony side of field

I

. . .

II

. . .

Block III

. . .

IV

. . .

• Suppose trend not as I thought — went across the field.

• Clearly, Blocks would be similar and plots different.• In fact this experiment can be less sensitive than a CRD

— getting it wrong can be costly.


a) Obtaining a layout for an RCBD in R

• General set of expressions for obtaining RCBD layout is given in Appendix B, Randomized layouts and sample size computations in R.

• To generate a layout for particular case, need to substitute– actual values for b, t and n – actual names for Blocks, Units, Treats and the

data frame to contain them.


Example IV.1 Penicillin yield • In this example the effects of four treatments (A, B, C and D) on the

yield of penicillin are to be investigated. • Corn steep liquor, an important raw material in producing penicillin, is

highly variable from one blending to another.• To ensure that the results of the experiment apply to > 1 blend,

several blends to be used in experiment.• The trial was conducted using the same blend in 4 flasks and

randomizing treatments to these 4.• Altogether five blends were utilized.• Crucial feature, making RCBD different from CRD, is that there are

– 2 unrandomized factors indexing the units: Blends, Flasks– there is nesting between these factors: Flasks are nested within Blends

because randomize treatments to Flasks within Blends. • Names to be used for the blocks, units and treatments for this

example are Blends, Flask and Treat, respectively. • Also, b = 5 and t = 4 so that n = 20. • Assigning these values and substituting these names into the

general expressions, yields the following output for this case.


R> b <- 5> t <- 4> n <- b*t> RCBDPen.unit <- list(Blend=b, Flask=t)> RCBDPen.nest <- list(Flask = "Blend")> Treat <- factor(rep(1:t, times=b), labels=c("A","B","C","D"))

> data.frame(fac.gen(RCBDPen.unit), Treat) #basic systematic arrangement Blend Flask Treat1 1 1 A2 1 2 B3 1 3 C4 1 4 D5 2 1 A6 2 2 B7 2 3 C8 2 4 D9 3 1 A10 3 2 B

> RCBDPen.lay <- fac.layout(unrandomized = RCBDPen.unit, + nested.factors = RCBDPen.nest,+ randomized = Treat, seed = 311)

• Flask is a nested factor;

Blend Flask Treat11 3 3 C12 3 4 D13 4 1 A14 4 2 B15 4 3 C16 4 4 D17 5 1 A18 5 2 B19 5 3 C20 5 4 D

Systematic arrangement on which randomization based

Blend & Flask order determined by order in RCBDPen.unit

• Nested within Blend


Layout> RCBDPen.lay Units Permutation Blend Flask Treat1 1 11 1 1 C2 2 12 1 2 B3 3 10 1 3 D4 4 9 1 4 A5 5 13 2 1 C6 6 15 2 2 D7 7 16 2 3 B8 8 14 2 4 A9 9 8 3 1 D10 10 7 3 2 C11 11 5 3 3 A12 12 6 3 4 B13 13 17 4 1 A14 14 19 4 2 D15 15 20 4 3 B16 16 18 4 4 C17 17 4 5 1 A18 18 2 5 2 D19 19 1 5 3 B20 20 3 5 4 C• So with the first blend, the Treatments are to be done in the order C,

B, D, A.

This layout is said to be in standard order for Blend then Flask:

In general the first factor changes slowest and the last fastest.


IV.B Indicator-variable models and estimation for an RCBD

a)Maximal model • The maximal model used for an RCBD is:

2B+T B T and var nE Y X X Y I

where

Y is the n-vector of random variables for the response variable observations,

is the b-vector of parameters specifying a different mean response for each block,

XB is the nb matrix indicating the block from which an observation came,

is the t-vector of parameters specifying a different mean response for each treatment,

XT is the nt matrix indicating the observations that received each of the treatments.


Example IV.1 Penicillin yield (continued)• The yields of penicillin, in nonrandom order

Treatment A B C D 1 89 88 97 94 2 84 77 92 79

Blend 3 81 87 87 85 4 87 92 89 84 5 79 81 80 88

• initial exploration of the data — differences?

1 2 3 4 5

80

85

90

95

Blend

Yie

ld

A B C D

80

85

90

95

Treatment

Yie

ld


Yields in a vector in standard order for Blend then Treatment

B

89 1 0 0 0 088 1 0 0 0 097 1 0 0 0 094 1 0 0 0 084 0 1 0 0 077 0 1 0 0 092 0 1 0 0 079 0 1 0 0 081 0 0 1 0 087 0 0 1 0 0,87 0 0 1 0 085 0 0 1 0 087 0 0 0 1 092 0 0 0 1 089 0 0 0 1 084 0 0 0 1 079 0 0 0 0 181 0 0 0 0 180 0 0 0 0 188 0 0 0 0 1

y X

11

22

T33

44

5

1 0 0 00 1 0 00 0 1 00 0 0 11 0 0 00 1 0 00 0 1 00 0 0 11 0 0 00 1 0 0, , ,0 0 1 00 0 0 11 0 0 00 1 0 00 0 1 00 0 0 11 0 0 00 1 0 00 0 1 00 0 0 1

X

• Same order as systematic layout i.e. pre-randomization layout


Estimator of expected values• Our model also assumes Y ~ N(B+T, V)

• The model for the expectation is still of the form E[Y] X with X [XB XT] and [ ].

• It can be shown that B+Tˆ B T G

where MB, MT and MG are the block, treatment and grand mean operators, respectively.

• So once again the estimator of the expected values are functions of means.

where are the n-vectors of block, treatment and grand means, respectively.

, and B T G

B T GNote that , and B M Y T M Y G M Y


Mean operators• Suppose data arranged in the vector Y in

nonrandomized order with all the observations for a block placed together.– Standard order for blocks then treatments.

• Then the mean operators are: 1 1

G

1B

1T

b t n

b t

b t

n n

t

b

M J J J

M I J

M J Iwhere is called the direct product operator and,

• if Ar and Bc are square matrices of order r and c

Mean operators simpler than for CRD — divisors factored out leaving matrices with 0s & 1s.

11 1

1

r

r c

r rr

a a

a a

B BA B

B B


Grand mean operator for standard order

1G 5 4

4 4 4 4 4

4 4 4 4 4

4 4 4 4 4

4 4 4 4 4

4 4 4 4 4

20

1

20

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1

20

M J J

J J J J JJ J J J JJ J J J JJ J J J JJ J J J J

1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


Block mean operator for standard order

1B 5 4

4 4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4 4

4

1

4

1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00

1

4

M I J

J 0 0 0 00 J 0 0 00 0 J 0 00 0 0 J 00 0 0 0 J

0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 00

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1


Treat-ment mean operator for standard order

1T 5 4

4 4 4 4 4

4 4 4 4 4

4 4 4 4 4

4 4 4 4 4

4 4 4 4 4

5

1

5

1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 00 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 00 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 00 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 11 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 00 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 00

1

5

M J I

I I I I II I I I II I I I II I I I II I I I I

0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 00 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 11 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 00 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 00 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 00 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 11 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 00 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 00 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 00 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 11

0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 00 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 00 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 00 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1

Estimators for example


1 A

1 B

1 C

1 D

2 A

2 B

2 C

2 D

3 A

3 B

3 C

3 D

4 A

4 B

4 C

4 D

5 A

5 B

C5

D5

, and

B T GB T GB T GB T GB T GB T GB T GB T GB T GB T GB T G

GB TGB TGB TGB TGB T

B TB T

TBTB

B T G

GGGG

Estimates for the example


92 84 8692 85 8692 89 8692 86 8683 84 8683 85 8683 89 8683 86 8685 84 8685 85 86 , ,85 89 8685 86 8688 84 86

888 8588 8988 8682 8482 8582 8982 86

b t g

86868686868686

• The means are in the following table:

Treatment A B C D Means 1 89 88 97 94 92 2 84 77 92 79 83

Blend 3 81 87 87 85 85 4 87 92 89 84 88 5 79 81 80 88 82 Means 84 85 89 86 86

Estimates for the example


92 84 8692 85 8692 89 8692 86 8683 84 8683 85 8683 89 8683 86 8685 84 8685 85 86 , ,85 89 8685 86 8688 84 86

888 8588 8988 8682 8482 8582 8982 86

b t g B+T

90919592818286838384 and 888586

86 8786 9186 8886 8086 8186 8586 82

ψ b t g

• These fitted value are different for each block-treatment combination but display an additive pattern.


Additivity• The fitted value are those for a model that is additive in

Block and Treatment parameters: – B+T E[Y] XBXT.

• So its fitted values display an additive pattern:

Β+Τˆ B T G

• Hope an adequate description of the data.

• In one direction, same trend as means:


b) Alternative expectation models • There are 4 possible different models for the expectation

that we consider:

G G

Β B

Τ T

Β+Τ B T

no treatment or block differences

block differences only

treatment differences only

block and treatment differences

ψ X

ψ X

ψ X

ψ X X

• Note that: G B B T X X X XC C C

G T B T X X X XC C C• Consequently: G Β Τ Β+Τ Β Τ Β+Τ, , and , ψ ψ ψ ψ ψ ψ ψ

• Also note that, like the CRD, the models B and T can be obtained from B+T by

setting either or equal to zero and G can be obtained from B and T by setting = 1

and = 1, respectively.


Estimators of expected values

• Estimators of the expected values under the different models:

G

Β

Τ

Β+Τ

ˆ no treatment or block differences

ˆ block differences only

ˆ treatment differences only

ˆ block and treatment differences

ψ G

ψ B

ψ T

ψ B T G


IV.C Hypothesis testing using the ANOVA method for an RCBD

• An ANOVA will be used to choose between the 4 alternative expectation models for an RCBD.

a) Analysis of the penicillin example

Example IV.1 Penicillin yield (continued)• The hypothesis test for the example RCBD is as follows:

Step 1: Set up hypotheses a) H0: 1 2 3 4 (or XT not required in model)

H1: not all population Treatment means are equal b) H0: 1 2 3 4 5

(or XB not required in model)H1: not all population Blend means are equal

Set 0.05.


Hypothesis testStep 2: Calculate test statistics• The analysis of variance table for a RCBD is:

Source df SSq MSq F Prob Blends 4 264 66.0 3.50 0.041 Flasks [Blends] 15 296 Treatments 3 70 23.3 1.24 0.339 Residual 12 226 18.8 Total 19 560

• Note that Flasks[Blends] in this table means "Flasks within Blends".

Step 3: Decide between hypotheses

• It would appear that there are significant differences between the blends but not between the treatments so that the expectation model that best describes the response appears to be B XB.


Blocking effectiveness

• In our RCBD example there were significant differences between the blends so that the blocking based on blends has been effective.

• Turns out that, if the units within a block are as similar as possible, there will be block differences.

• If a CRD had been used, – that is 4 treatments randomized to 20 flasks

irrespective of blends, then – Residual SSq Blend SSq + RCBD Residual SSq – viz. 264 + 226 = 490 and the mean square 490/16 =

30.625.– That is, residual MSq would have been twice (30.6 vs

18.8) as large and the experiment much less sensitive.


b) Sums of squares for the analysis of variance

• In this section we will use the generic names of Blocks, Units and Treatments for the factors in an RCBD.

• The estimators of the SSqs for the RCBD ANOVA are the SSqs of the following vectors:

G

B

B+T

Total or Units SSq:

Blocks SSq:

Units[Blocks] SSq:

Treatments SSq:

Residual SSq:

e

e

e e

D Y G

B B G

D Y B

T T G

D Y B T G

Y B T G


SSq (continued)

• From section IV.B, Models and estimation for an RCBD, we have that

1G

1B

1T

b t

b t

b t

n

t

b

G M Y J J Y

B M Y I J Y

T M Y J I Y

Uand let .b t Y M Y I I Y


SSq (continued)• It can be shown that the SSqs for the ANOVA are

given by

Res

Res

G G U U U G

B B B G

B B BU BU U B

T T T G

B+T B+T BU

BU U T B G

with

with

with

with

with

e e

e e

D D Y G Y G Y Q Y Q M M

B B B G B G Y Q Y Q M M

D D Y B Y B Y Q Y Q M M

T T T G T G Y Q Y Q M M

D D Y B T G Y B T G Y Q Y

Q M M M M

• All the Ms and Qs are symmetric and idempotent.


ANOVA table is constructed as follows:

Source df SSq MSq F p Blocks b1 BYQ Y 2B

B1s

b

YQ Y

Res

2 2B BUs s Bp

Units[Blocks] b(t1) BUYQ Y

Treatments t1 TYQ Y 2TT1

st

YQ Y

Res

2 2T BUs s Tp

Residual (b1)(t1) ResBUYQ Y

Res

Res

BU 2BU1 1

sb t

YQ Y

Total bt1 UYQ Y


Geometrical interpretation• The matrix QU orthogonally projects the

data vector into the bt-1 dimensional part of the bt-dimensional data space that is orthogonal to equiangular line.

• This is partitioned, by QB and QBU, into two subspaces:a) the b1 dimensional part of the b-

dimensional Block space that is orthogonal to equiangular line and

b) b(t1) dimensional Units[Blocks] space. Source df SSq B locks b 1 BY Q Y U n its[B locks] b (t 1) B UY Q Y

T reatm ents t 1 TY Q Y

R esidual (b 1)(t 1) R esB UY Q Y

Tota l bt 1 UY Q Y

• That is, the Units space is divided into the three orthogonal subspaces:– the Blocks subspace, – Treatments subspace,– Residual subspace.

• Here Block and Treatment spaces are column spaces of the matrices XB and XT, respectively.

• The latter space is then partitioned, by QT and , into two subspaces:a) the t1 dimensional part of the t-

dimensional Treatment space that is orthogonal to equiangular line and

b) the (b1)(t1) Residual subspace.

ResBUQ


Example IV.1 Penicillin yield (continued)

• The effects needed for the analysis have been added to the means in the following table:

Treatment A B C D Means Effects 1 89 88 97 94 92 6 2 84 77 92 79 83 -3

Blend 3 81 87 87 85 85 -1 4 87 92 89 84 88 2 5 79 81 80 88 82 -4 Means 84 85 89 86 86 Effects -2 -1 3 0 0


Vectors for SSQ

Units SSq is YQUY 560,

Blend SSq is YQBY 264,

Flask[Blend] SSq is

YQBFY 296,

Treatments SSq is

YQTY 70 and

Residual SSq is 226.

Treat

Yield y

Total Flask

deviations G U

d Q y

y g

Blend Effects

Be

b Q y

b g

Flask[Blend] deviations

B BF

d Q y

y b

Treat

effects Te

t Q y

t g

Residual Flask[Blend] deviations

ResB+T BF

d Q y

y t b g

A 89 3 6 -3 -2 -1 B 88 2 6 -4 -1 -3 C 97 11 6 5 3 2 D 94 8 6 2 0 2 A 84 -2 -3 1 -2 3 B 77 -9 -3 -6 -1 -5 C 92 6 -3 9 3 6 D 79 -7 -3 -4 0 -4 A 81 -5 -1 -4 -2 -2 B 87 1 -1 2 -1 3 C 87 1 -1 2 3 -1 D 85 -1 -1 0 0 0 A 87 1 2 -1 -2 1 B 92 6 2 4 -1 5 C 89 3 2 1 3 -2 D 84 -2 2 -4 0 -4 A 79 -7 -4 -3 -2 -1 B 81 -5 -4 -1 -1 0 C 80 -6 -4 -2 3 -5 D 88 2 -4 6 0 6

SSq 560 264 296 70 226

e e B+TNote orthogonal decomposition of y g b t d


c) Expected mean squares

• Residual MSq estimates the uncontrolled variation, – that is the variation arising from uncontrolled differences

between units within the same block, both treatment and block differences having been eliminated.

Source df MSq E[MSq] B+Tψ

Blocks b 1 B

1b

Y Q Y 2Bq ψ

Units[Blocks] b(t 1) Treatments t 1 T

1t

Y Q Y 2Tq ψ

Residual (b 1)(t 1)

ResBU

1 1b t

Y Q Y

2

Total bt 1

• To justify choice of test statistic, want to work out the E[MSq]s under the 4 alternative expectation models.

• E[MSq]s under maximal model


E[MSq]s under the 4 alternative expectation models

Source df MSq E[MSq] B+Tψ

Tψ Bψ

Gψ

Blocks b 1 B

1b

Y Q Y 2Bq ψ 2 2

Bq ψ 2

Units[Blocks] b(t 1) Treatments t 1 T

1t

Y Q Y 2Tq ψ 2

Tq ψ 2 2

Residual (b 1)(t 1)

ResBU

1 1b t

Y Q Y

2 2 2 2

Total bt 1

2BB .

1

11

b

ii

q t bb

ψ Q ψψ and 2

TT .

1

11

t

jj

q b tt

ψ Q ψψ

• Once again numerator of:– qB() is SSq of QB (MB – MG)– qT() is SSq of QT (MT – MG) – where depends on model.

• Expressions qB() and qT() above are under maximal models• To obtain those for reduced models set is and js to 0 or to . • Could compute population means of MSqs if knew is, js and 2.


Justifying the F ratios

• Clear from these E[MSq]s that if the Treatments F is not significant then a model not involving XT is required– as those models are the ones for which qT() 0.

• Similarly, if the Blocks F is not significant then a model not involving XB is required.

• In the case where both are not significant, then the minimal model adequately describes the data.

• Generally, will only present the E[MSq]s under the maximal model, realizing that q() 0 under the H0 that removes the term from the model.


Potential contributers to block and treatment mean differences

• Two treatment means will differ because of the different treatments involved and because of the different runs (the units in this example) involved in the observations from which the means are calculated;

• but block differences will not contribute to treatment mean differences as all treatments involve the same set of blocks.

• E[MSq]s reflect this fact.• The Treatment F again involves the question:

– "Is the variance of the treatment means greater than can be expected from uncontrolled differences between the runs?"

Treatment A B C D Means 1 89 88 97 94 92 2 84 77 92 79 83

Blend 3 81 87 87 85 85 4 87 92 89 84 88 5 79 81 80 88 82 Means 84 85 89 86 86


d) Summary of the hypothesis test

• See notes


e) Comparison with traditional two-way ANOVA

• As for the analysis for the CRD, the above and the traditional two-way ANOVA tables are essentially the same —the values of the F-statistics are exactly the same.

Source df Source in two-way ANOVA Blocks b 1 Between Blocks Units[Blocks] b(t 1) Treatments t 1 Between Treatments Residual (b 1)(t 1) Error Total bt 1 Total

As indented, see Treatments confounded with Units[Blocks]

Residual is inherent variability of Units; Error?

– the two tables have in common 3 sources that are labelled differently

– but the tables differ in that our table includes the line Units[Blocks] — this source is partitioned.


f) Computation of the ANOVA in R

• The expressions for analyzing a randomized complete block design are summarized in Appendix C, Analysis of designed experiments in R.



• First the data is entered into a data frame so that it contains– the factors Blend,

Flask and Treat and – the numeric vector

Yield

Here data is in nonrandom order.

> RCBDPen.dat Blend Flask Treat Yield1 1 1 A 892 1 2 B 883 1 3 C 974 1 4 D 945 2 1 A 846 2 2 B 777 2 3 C 928 2 4 D 799 3 1 A 8110 3 2 B 8711 3 3 C 8712 3 4 D 8513 4 1 A 8714 4 2 B 9215 4 3 C 8916 4 4 D 8417 5 1 A 7918 5 2 B 8119 5 3 C 8020 5 4 D 88


Model formula for aov function

• As for CRD, use the aov function, either with or without the Error as part of the model.

• In this case the uncontrolled variation is:– Blend differences – differences between Flasks within Blends (we

denote Flasks[Blends]).

• R shorthand for this: Blend/Flask – expands to Blend + Blend:Flask.


Output > RCBDPen.aov <- aov(Yield ~ Blend + Treat +

+ Error(Blend/Flask), RCBDPen.dat)

> summary(RCBDPen.aov)

Error: Blend

Df Sum Sq Mean Sq

Blend 4 264 66

Error: Blend:Flask

Df Sum Sq Mean Sq F value Pr(>F)

Treat 3 70.000 23.333 1.2389 0.3387

Residuals 12 226.000 18.833

> #Compute Blend F and p

> Blend.F <- 66/18.833

> Blend.p <- 1-pf(Blend.F, 4, 12)

> data.frame(Blend.F,Blend.p)

Blend.F Blend.p

1 3.504487 0.0407441

Blend occurs outside and inside the Error function — necessary to get correct fitted values for diagnostic checking.

Computation of Blend F and p.


Output

> RCBDPen.NoError.aov <- aov(Yield ~ Blend + Treat, RCBDPen.dat)

> summary(RCBDPen.NoError.aov) Df Sum Sq Mean Sq F value Pr(>F)Blend 4 264.000 66.000 3.5044 0.04075Treat 3 70.000 23.333 1.2389 0.33866Residuals 12 226.000 18.833

• ANOVA table from the expression that – includes Error in model resembles our table — prefer– without is like the traditional ANOVA table.

F and p for Blend, but controversial


IV.D Diagnostic checking • Again, we have assumed Y ~ N(, 2I) where, for the

maximal model, B+T E[Y] XBXT • For this model to be appropriate requires a similar set of

behaviours as for the CRD:a) response is operating additively (see section IV.B,

Indicator variable models and estimation for an RCBD) as specified by the maximal model: a treatment has about the same additive effect on each unit;

b) variability of the units within a block are the same for each block;

c) each observation displays the covariance implied by the model (independence for Blocks fixed and equal correlation within blocks for Blocks random); and

d) that the response of the units is normally distributed.


Diagnostic plots• Same set of diagnostic plots as for the CRD can be used.

– Residual-versus-fitted-values – Normal probability plots.

• A particular pattern to look out for in the Residual-versus-fitted-values plot for this type of design is evidence of a curvilinear relationship– indicates nonadditivity between the blocks and treatments

* * * * * * * * * * * * * * * * _________________________

systematic trend in residuals


Nonadditivity• Such nonadditivity may be transformable by take logs,

square root or reciprocals of the data and analyzing these.

• Another type of block-treatment interaction would occur where say a particular blend had a poison in it that affected only process B.– Then only the observation corresponding to that particular

combination of blend and treatment would be affected.– It would be extremely low leading to an extreme residual.

• Possible to test for transformable nonadditivity using Tukey's one-degree-of-freedom-for-nonadditivity,

• Can be used with any design with an additive expectation model ( 2 terms), including regression (not CRD).

• Involves detecting whether or not there is a curvilinear relationship between the residuals and fitted values.

• For this, and subsequent designs, diagnostic checking should be based on the two plots and this one degree-of-freedom.


An R function from dae, tukey.1df • tukey.1df(aov.obj, data, error.term="within")

• where– aov.obj is an aov object or aovlist object

created from a call to aov,– data is optional and is a data.frame

containing the original response variable and factors used in the call to aov, and

– error.term is the error.term whose residuals are to be tested for nonadditivity.


Example IV.1 Penicillin yield (continued)> #

> # Diagnostic checking

> #

> res <- resid.errors(RCBDPen.aov)

> fit <- fitted.errors(RCBDPen.aov)

> data.frame(Blend,Flask,Treat,Yield,res,fit) Blend Flask Treat Yield res fit1 1 1 A 89 -1.000000e+00 902 1 2 B 88 -3.000000e+00 913 1 3 C 97 2.000000e+00 954 1 4 D 94 2.000000e+00 925 2 1 A 84 3.000000e+00 816 2 2 B 77 -5.000000e+00 827 2 3 C 92 6.000000e+00 868 2 4 D 79 -4.000000e+00 839 3 1 A 81 -2.000000e+00 8310 3 2 B 87 3.000000e+00 8411 3 3 C 87 -1.000000e+00 8812 3 4 D 85 -2.392617e-15 8513 4 1 A 87 1.000000e+00 8614 4 2 B 92 5.000000e+00 8715 4 3 C 89 -2.000000e+00 9116 4 4 D 84 -4.000000e+00 8817 5 1 A 79 -1.000000e+00 8018 5 2 B 81 -2.614662e-15 8119 5 3 C 80 -5.000000e+00 8520 5 4 D 88 6.000000e+00 82

> plot(fit, res, pch=16)

> qqnorm(res, pch = 16)> qqline(res) From plots, no serious departures

from the assumptions apparent

80 85 90 95

-4-2

02

46

fit

res

-2 -1 0 1 2

-4-2

02

46

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es


Example IV.1 Penicillin yield (continued)> tukey.1df(RCBDPen.aov, RCBDPen.dat, + error.term="Blend:Flask")$Tukey.SS[1] 2.001082

$Tukey.F[1] 0.0982679

$Tukey.p[1] 0.7597822

$Devn.SS[1] 223.9989

Source df SSq MSq E[MSq] F Prob Blends 4 264 66.0 2

Bq 3.50 0.041

Flasks[Blends] 15 296 Treatments 3 70 23.3 2

Tq 1.24 0.339

Residual 12 226 18.8 2 Nonadditivity 1 2.0 2.0 0.10 0.760 Deviation 11 224 20.4 Total 19 560

The hypotheses for the one-degree-of-freedom is:H0: Blends and Treatments are additiveH1: Blends and Treatments are nonadditive H0 cannot be rejected — no evidence of transformable nonadditivity.


IV.E Treatment differences

• For the purposes of the scientist the effect of the blocks are not of primary interest

• Rather, attention is likely to be focused on treatment differences which can be investigated using the treatment means.

• The discussion of multiple comparisons and submodels for the analysis of a CRD applies here also.



• The treatment means are:

Treatment A B C D 84 85 89 86

• As the treatment levels are qualitative a multiple comparison procedure would be used to examine the differences.

• However they are not significantly different so that we shall not apply such a procedure.



• Bar chart illustrates:Fitted values for Yield

Treatment

Yie

ld (

%)

20

40

60

80

A B C D


IV.F Fixed versus random effects

a) Another maximal model for the RCBD

• Two alternative maximal models for RCBD:

2B T and varE Y X X Y I

2T B and n b tE 2Y X V I I J

• Difference is that dropped from 2nd expectation model and covariance of observations from different units in the same block is , rather than being zero.

2B


Variance matrices for RCBD for b=3, t=4 Blocks fixed

Block I II III

Unit 1 2 3 4 1 2 3 4 1 2 3 4 1 2 0 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 3 0 0 2 0 0 0 0 0 0 0 0 0 4 0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0 2 0 0 0 0 0 4 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 3 0 0 0 0 0 0 0 0 0 0 2 0 4 0 0 0 0 0 0 0 0 0 0 0 2

Blocks random Block I II III

Unit 1 2 3 4 1 2 3 4 1 2 3 4 1 2

2B

2

B 2B 2

B 0 0 0 0 0 0 0 0 2 2

B 22B

2

B 2B 0 0 0 0 0 0 0 0

3 2B 2

B 22B

2

B 0 0 0 0 0 0 0 0 4 2

B 2B 2

B 22B

0 0 0 0 0 0 0 0

1 0 0 0 0 22B

2

B 2B 2

B 0 0 0 0 2 0 0 0 0 2

B 22B

2

B 2B 0 0 0 0

3 0 0 0 0 2B 2

B 22B

2

B 0 0 0 0 4 0 0 0 0 2

B 2B 2

B 22B

0 0 0 0

1 0 0 0 0 0 0 0 0 22B

2

B 2B 2

B 2 0 0 0 0 0 0 0 0 2

B 22B

2

B 2B

3 0 0 0 0 0 0 0 0 2B 2

B 22B

2

B 4 0 0 0 0 0 0 0 0 2

B 2B 2

B 22B

Notice that, for Blocks random, the covariance between units from the same block is non-zero and is equal for all blocks.


Fixed versus random factors• Definition IV.2: A factor will be designated as random if

it is considered appropriate to use a probability distribution function to describe the distribution of effects associated with the population set of levels.

• Definition IV.3: A factor will be designated as fixed if it is considered appropriate to have the effects associated with the population set of levels for the factor differ in an arbitrary manner, rather than being distributed according to a regularly-shaped probability distribution function.

• As far as the model is concerned, – random effects are modelled using terms in the variation model – fixed effects are modelled using terms in the expectation model.

• So when we are deciding whether a factor is random or fixed, we are choosing which mathematical model best describes the population distribution for the response variable.


Making the choice• Need to consider the population set of levels and how the set of

response variable effects corresponding to this set of levels behaves.

• To be classified as – random, we require that

• the set of population levels is large in number and • the effects are “well-behaved” so that a regularly-shaped probability

distribution function with some variance is appropriate for describing them.

– fixed, the effects do not have the restrictions that are placed on random effects.

• There might be a small or a large number of levels in the population and

• their effects do not have to conform to a regularly-shaped probability distribution function because the model allows for arbitrary differences between them.

• For example, effects from factor modelled in expectation model– If they display a systematic trend (perhaps involving polynomial

submodels) – If factor for a small set of treatments that are to be compared.

• In both cases, seems inappropriate to model the effects as being, say normally distributed, with some variance.– Pattern in the treatment effects may well be quite irregular — no

interest in the form of this distribution.


Summary

In practice– Random if

i. large number of population levels and ii. random behaviour

– Fixed if i. small or large number of population

levels and ii. systematic behaviour


Units & Blocks — fixed or random?• Effects from individual units treated alike (for example, animals, plots

of land, runs of a chemical reactor) are anticipated to arise randomly and the effects could well follow a probability distribution, say a normal distribution.– Hence appropriate to model them via a term in the variation model.

• Must always model terms to which other terms have been randomized as random effects– because Treatments are randomized to Units[Block] in an RCBD,

Units[Block] must be random.• What about Block effects in the RCBD?

– It could be either depending on the anticipated effects of the blocks. • Suppose the blocks are groups of plots and are contiguous and a

systematic trend is anticipated:– The distribution of block effects cannot be regarded as a random sample

— they display a systematic pattern. – The factor Blocks should be designated as fixed.

• However, suppose each block is in a separate location to other blocks and could be regarded as a random sample of all blocks obtained by dividing up the whole area under study.– It seems likely that the population block effects could be described by a

probability distribution such as the normal distribution and the factor Blocks could be designated as random.

• If there is some doubt, safest to not make the assumption of some probability distribution and to designate the factor as fixed.



• Should Blends be designated as fixed or random? – It was said at the outset that it was expected that there would be a

lot of variability from blend to blend — that is why the RCBD was employed.

– However, a systematic pattern in the average yields of the blends cannot be anticipated.

– Rather, it seems reasonable that the effects of the population set of blends can be described by a probability distribution.

– So Blends should be a random factor.

• Analysis needs to be revised, using a call to aov in which Blends is not included outside the Error function. RCBDPen.aov <- aov(Yield ~ Treat + Error(Blend/Flask), RCBDPen.dat)

• This will change the fitted values and Tukey's one-degree-of-freedom-for-nonadditivity.


b) Estimation and analysis of variance for Blocks random

T and nE 2Y X V I

the same as for the model

2T B and n b tE 2Y X V I I J

are T Tˆ ψ T M Y

• Estimator of expected values under the model

• Block hypotheses become2

0 B

21 B

H : 0

H : 0

• That is, can be dropped from V?2

B• Also, as expectation model no longer involves the sum

of two terms, Tukey’s one-degree-of-freedom for nonadditivity is no longer applicable.


ANOVA table for the RCBD • Form same irrespective of whether Blocks fixed or random

E[MSq]

Source df Blocks Fixed Blocks Random

Blocks b-1 2Bq 2 2

Bt

Units[Blocks] b(t-1)

Treatments t-1 2Tq 2

Tq

Residual (b-1)(t-1) 2 2

Total bt-1

• However, E[MSq]s differ — qB() becomes

2Bt

• The F-statistic for testing this hypothesis is again the ratio of the Block and Residual mean squares.

• Thus the test for both fixed and random block effects are the same —not always the case.


IV.G Generalized randomized complete block design

• Difference between generalized and ordinary RCBDs is that in GRCBD each treatment occurs > 1 in a block.

• As before we let b be no. of blocks and t no. of treatments. • In addition let

– k denote no. of units per block and – g no. of times a treatment occurs in a block

that is, k t g and n b k.• The R expressions for obtaining a layout for this design is given in

Appendix B, Randomized layouts and sample size computations in R.

• Advantages of this design – more df for the Residual compared to the standard RCBD. – Also, you can test for Block:Treatment interaction, as is discussed in

chapter VI, Determining the analysis of variance table. • Disadvantage of the design

– it has larger blocks – so it is likely that the units within a block will be less homogeneous than

would be the case if a standard RCBD with smaller blocks were employed.


Analysis of GRCBD• The model for the generalized RCBD, without the

Block:Treatment interaction, is virtually the same as that for the RCBD so that, in this case, the analyses of variance are similar.

• Thus, depending on whether Blocks are fixed or random the maximal model, would be chosen from the two given for the RCBD.

• For Blocks and Plots random, the ANOVA table is Source df SSq E[MSq]

Blocks 1b BY Q Y 2 2BU Bk

Units[Blocks] 1b k BUY Q Y

Treatments 1t TY Q Y 2BU Tq

Residual 1 1b k t ResBUY Q Y 2

BU

Total 1bk

• R expressions same as for the standard RCBD.


Example IV.2 Design for a wheat experiment

• For example, suppose 4 treatments are to be compared when applied to a new variety of wheat.

• The researcher wants to employ a generalized RCBD with 12 plots in each of 2 blocks so that each treatment is replicated 3 times in each block.

• Hence, b 2, t 4 and g 3.so that k 4 3 12 and n 2 12 24.

Layout for a generalized randomized complete block experiment

Plots 1 2 3 4 5 6 7 8 9 10 11 12

Blocks I C D D C B B A A D A B C II D A D C A D B A B B C C

• The yield of wheat from each plot was measured.


Analysis with Blocks and Plots random • The model for the example:

2 2T 24 B 2 12 U B B and 12E 2 2Y X V I I J M M

• The corresponding ANOVA table: Source df SSq E[MSq]

Blocks 1 BYQY 2 2BP B12

Plots[Blocks] 22 BPYQY

Treatments 3 TYQY 2BP Tq

Residual 19 ResBPYQ Y 2

BP

Total 23

• Note that a RCDB b 6, t 4 and – would also have n 6 4 24, – but would have (b 1)(t – 1) 5 3 15 Residual df.


IV.I Exercises

• Ex. IV.1-2 looks at quadratic forms for SSq

• Ex. IV.3 requires a design of an RCBD and then analysis of data

• EX. IV.4 asks for the complete analysis of an RCBD with a quantitative treatment factor

• EX. IV.5 asks for the complete analysis of an RCBD with a qualitative treatment factor

iv. randomized complete block design (rcbd)

Documents

etreatment differences

rcbd layout

blocks t

rcbd group units

rcbd different

number of treatments

r b t n rcbdpen

appendix b