hierarchical random effects - dtu course website 02429 · hierarchical random effects. enote 5...

eNote 5 1

eNote 5

Hierarchical Random Effects

eNote 5 INDHOLD 2

Indhold

5 Hierarchical Random Effects 1

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

5.2 Main example: Lactase measurements in piglets . . . . . . . . . . . . . . . 3

5.3 Constant mean value structure . . . . . . . . . . . . . . . . . . . . . . . . . 4

5.4 The one-layer model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5.5 Two-layer model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.6 Three or more layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.7 Zero/Negative variance parameters . . . . . . . . . . . . . . . . . . . . . . 15

5.8 Comparing different variance structures . . . . . . . . . . . . . . . . . . . . 16

5.9 Better confidence limits in the balanced case . . . . . . . . . . . . . . . . . 17

5.10 Prediction of random effect coefficients . . . . . . . . . . . . . . . . . . . . 20

5.11 R-TUTORIAL: The lactase example . . . . . . . . . . . . . . . . . . . . . . . . 22

5.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.1 Introduction

This eNote takes a closer look at a type of models where the experimental design leadsto a natural hierarchy of the random effects. Consider for instance the following expe-

eNote 5 5.2 MAIN EXAMPLE: LACTASE MEASUREMENTS IN PIGLETS 3

rimental design, where the purpose of the experiment is to determine the fat content inmilk (from a given area at a given time):

1. Ten farms were selected randomly from all milk producing farms in the area.

2. From each selected farm, six cows were randomly selected.

3. From each selected cow, two cups of milk were taken randomly from the dailyproduction, to be analyzed in the lab.

Intuitively, two cups of milk from the same cow would be expected to be more similarthan two cups from different cows at different farms. Notice how data collected thisway is in violation of the standard assumption of independent measurements.

5.2 Main example: Lactase measurements in piglets

As part of a larger study of the intestinal health in newborn piglets, the gut enzyme la-ctase was measured in 20 piglets taken from 5 different litters. For each of the 20 piglets,the lactase level was measured in three different regions. At the time the measurementwas taken, the piglet was either unborn (status = 1) or newborn (status = 2). These dataare kindly provided by Charlotte Reinhard Bjørnvad, Department of Animal Scienceand Animal Health, Division of Animal Nutrition, Royal Veterinary and AgriculturalUniversity. The 60 log-transformed measurements are listed below:

eNote 5 5.3 CONSTANT MEAN VALUE STRUCTURE 4

Obs litter pig reg status loglact Obs litter pig reg status loglact

1 1 1 1 1 1.89537 31 3 11 1 2 1.58104

2 1 1 2 1 1.97046 32 3 11 2 2 1.52606

3 1 1 3 1 1.78255 33 3 11 3 2 1.65058

4 1 2 1 1 2.24496 34 4 12 1 1 1.97162

5 1 2 2 1 1.43413 35 4 12 2 1 2.11342

6 1 2 3 1 2.16905 36 4 12 3 1 2.51278

7 1 3 1 2 1.74222 37 4 13 1 1 2.06739

8 1 3 2 2 1.84277 38 4 13 2 1 2.25631

9 1 3 3 2 0.17479 39 4 13 3 1 1.79251

10 1 4 1 2 2.12704 40 5 14 1 1 1.93274

11 1 4 2 2 1.90954 41 5 14 2 1 1.82394

12 1 4 3 2 1.49492 42 5 14 3 1 1.23629

13 2 5 1 2 1.62897 43 5 15 1 2 2.07386

14 2 5 2 2 2.26642 44 5 15 2 2 1.96713

15 2 5 3 2 1.96763 45 5 15 3 2 0.47971

16 2 6 1 2 2.01948 46 5 16 1 2 2.01307

17 2 6 2 2 2.56443 47 5 16 2 2 1.85483

18 2 6 3 2 1.16387 48 5 16 3 2 2.18274

19 2 7 1 2 2.20681 49 5 17 1 2 2.86629

20 2 7 2 2 2.55652 50 5 17 2 2 2.71414

21 2 7 3 2 1.69358 51 5 17 3 2 1.60533

22 2 8 1 2 1.09186 52 5 18 1 2 1.97865

23 2 8 2 2 1.93091 53 5 18 2 2 1.93342

24 2 8 3 2 . 54 5 18 3 2 0.74943

25 3 9 1 1 2.36462 55 5 19 1 2 2.89886

26 3 9 2 1 2.72261 56 5 19 2 2 2.88606

27 3 9 3 1 2.80336 57 5 19 3 2 2.20697

28 3 10 1 1 2.42834 58 5 20 1 2 1.87733

29 3 10 2 1 2.64971 59 5 20 2 2 1.70260

30 3 10 3 1 2.54788 60 5 20 3 2 1.11077

Notice the hierarchical structure in which the measurements are naturally ordered. Firstthe five litters, then the piglets, and finally the individual measurements. This structureis illustrated in Figure 5.1 for the first two litters.

Litter=1 Litter=2

Pig=1 Pig=2 Pig=3 Pig=4 Pig=5 Pig=6 Pig=7 Pig=8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 .

Figur 5.1: The hierarchical structure of the lactase data set for the first two litters

5.3 Constant mean value structure

The mixed model framework is very flexible, and it is possible to include the mean valuestructures known from fixed effects models, as well as the random effects structure.In this eNote focus is on the variance structure, and for this reason the same simplemean value structure is used when the models are presented. The mean structure is

eNote 5 5.4 THE ONE-LAYER MODEL 5

simply denoted µ without any subscripts. This is done in order to maintain focus on thevariance structure. In the analyzed examples more realistic mean value structures willbe used.

5.4 The one-layer model

The simplest ‘hierarchical’ model is the one layer model, typically denoted:

yi = µ + εi, εi ∼ i.i.d. N(0, σ2)

All observations are independent, so this is not really a hierarchical model, and it islisted mainly for completeness.

The mean value parameters in a model with this simple variance structure are esti-mated via the least squares method, the maximum likelihood method, or the restri-cted/residual maximum likelihood method. In this case, all these methods give the sa-me result, namely:

β = (X′X)−1X′y

Here, β is a vector which contains the parameters of the fixed effects part of the model,y contains the observations of the response variable, and X is the design matrix for thefixed effects part of the model, as described in the previous eNote.

The single variance parameter in the one-layer model is estimated via the restricted/residualmaximum likelihood method, which gives an unbiased estimate

σ2 =1

N − p

N

∑i=1

(yi − µi)2

Here, N is the total number of observations, p is the number of mean value parameters,and µi is the Best Linear Unbiased Estimate (BLUE) of the mean value for the i’th ob-servation, which is calculated as µi = (Xβ)i. The sum in the estimate of the varianceis often referred to as the Sum of Squared Deviations (SSD), or simply as the Sum ofSquares (SS).

Lactase measurements in piglets analyzed as independent observations

To analyse the lactase data set with a one-layer model, the information about litter andpig is ignored. Remaining is the region (1, 2, or 3), the status (1 or 2), and the interaction


[I]5359 status × reg2

6

reg23

status12

011

Figur 5.2: Factor structure in the lactase example when analyzed as independent mea-surements (one layer model).

between region and status ((1,1), (1,2), (2,1), (2,2), (3,1), or (3,2)). The one-layer modelfor the lactase data becomes:

yi = µ + α(statusi) + β(regi) + γ(statusi, regi) + εi, εi ∼ i.i.d. N(0, σ2)

where i = 1, . . . , 60 is the observation number. The model is illustrated in the factorstructure diagram in Figure 5.2

To analyze this model in R, write:

model1 <- lm(loglact ~ reg + status + reg:status, data = lactase)

anova(model1)

Analysis of Variance Table

Response: loglact

Df Sum Sq Mean Sq F value Pr(>F)

reg 2 2.5843 1.29217 5.2595 0.008248 **

status 1 1.1288 1.12882 4.5946 0.036680 *


reg:status 2 1.4074 0.70368 2.8642 0.065894 .

Residuals 53 13.0212 0.24568

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

In this case, the mean value of each log-lactase measurement is described by region, sta-tus, and the interaction between them. Notice how the intercept term µ is not specified.R adds the intercept term automatically. If the intercept term is undesired in a givenmodel it must be removed by adding -1 after the last model term.

From this output, it follows that the estimate of the variance parameter is σ2 = 0.2457,and that the value of twice the negative restricted/residual log-likelihood function, eva-luated at the optimal parameter values, 2`re, can be found using the following code:

(-2) * logLik(model1, REML = TRUE)

’log Lik.’ 89.46331 (df=7)

That is, 2`re = 89.46. This is an important value, as it is useful for comparing differentvariance structures. This eNote is about hierarchical variance structures, and hence littleattention is paid to the mean value structure. The ANOVA table indicates a p-value of6.6% for removing the interaction term from the model, which is above the standard 5%significance level.

Type I and Type III tests – again

• Type I (using anova):

– Succesive tests

– Each term is corrected for the terms PREVIOUSLY listed in the model

– Sums of squares sum to TOTAL sums of squares

– Depends on the order in which terms are specified in the model

• Type III (using drop1 or anova from the lmerTest package):

– Parallel tests

– Each term is corrected for ALL other terms in the model

eNote 5 5.5 TWO-LAYER MODEL 8

– Sums of squares do NOT generally sum to TOTAL sums of squares

– Does NOT depend on the order in which terms are specified in the model

• If data is balanced: Type I and III are the SAME!

• drop1 is really “Type II” = Type III with the condition:

– Do NOT test main effects which are part of (fixed) interactions!

5.5 Two-layer model

The simplest “real” hierarchical model is the two-layer model, which is denoted:

yi = µ + a(subi) + εi

where a(subi) ∼ N(0, σ2a ), and εi ∼ N(0, σ2), and all (both a’s and ε’s) are independent.

In this two-layer model, all observations are no longer independent. In fact, the covari-ance between two observations is:

cov(yi1 , yi2) =

0 , if subi1 6= subi2 and i1 6= i2σ2

a , if subi1 = subi2 and i1 6= i2σ2

a + σ2 , if i1 = i2

which is easily calculated from the model. This means that two observations comingfrom the same subject are positively correlated. This is exactly what is needed in manysituations where several observations are taken from the same subject. It correspondswell with the intuition behind the experiment, and the variance parameters can be in-terpreted in a natural way. The subject variance parameter σ2

a is the variance betweensubjects, and the residual variance parameter σ2 is the variance between measurementson the same subject.

The variance parameters and the fixed effects parameters in the two-layer model areestimated via the restricted/residual maximum likelihood method. The REML methodis preferred to the maximum likelihood method, because it gives less biased estimatesof the variance parameters. For the fixed effect parameters, the REML estimate is givenby:

β = (X′V−1X)−1X′V−1y

Here, β is the vector of parameters of the fixed part of the model, y contains the obser-vations of the response, and X is the design matrix for the fixed part of the model. Thematrix V is the covariance matrix for the observations y, which is estimated from the


REML estimates of the variance parameters. In this two-layer model, the structure of Vis:

V =

σ2a + σ2 σ2

a σ2a 0 0 0

σ2a σ2

a + σ2 σ2a 0 0 0

σ2a σ2

a σ2a + σ2 0 0 0

0 0 0 σ2a + σ2 σ2

a σ2a

0 0 0 σ2a σ2

a + σ2 σ2a

0 0 0 σ2a σ2

a σ2a + σ2

Here illustrated for the situation with two subjects and three observations on each sub-ject. The general pattern is a block diagonal covariance matrix with blocks correspon-ding to each subject.

Lactase measurements in piglets analyzed with two layer model

To analyze the lactase data set with a two-layer model, the information about litter isstill ignored, but the information about pig is now included. This corresponds to thetwo bottom layers in Figure 5.1, and is illustrated in the factor structure diagram inFigure 5.3. Alternatively, the the pig information could have been omitted, and the litterinformation included.

The two-layer model with a random pig effect is:

yi = µ + α(statusi) + β(regi) + γ(statusi, regi) + d(pigi) + εi,

where d(pigi) ∼ N(0, σ2d ), and εi ∼ N(0, σ2), and all (both d’s and ε’s) are independent.

Here, i = 1, . . . , 60 is the observation number.

The idea behind this model is that pigs are different, and that the pigs in this inve-stigation are random representatives of the relevant population of newborn pigs. Thevariance between pigs is described with the σ2

d parameter, and the remaining variancebetween measurements on the same pig is described with the σ2 parameter.

To analyze this model in R, write:

require(lmerTest)

model2 <- lmer(loglact ~ reg * status + (1|pig), data = lactase)

anova(model2)

Type III Analysis of Variance Table with Satterthwaite’s method

Sum Sq Mean Sq NumDF DenDF F value Pr(>F)


[I]

status × reg

[pig]

reg

status

0

Figur 5.3: Factor structure in the lactase example when analyzed with a two-layer model(random pig effect).


reg 1.64252 0.82126 2 34.930 6.0872 0.005394 **

status 0.35771 0.35771 1 17.685 2.6514 0.121142

reg:status 1.51820 0.75910 2 34.930 5.6265 0.007617 **

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

print(summary(model2), corr = FALSE)

Linear mixed model fit by REML. t-tests use Satterthwaite’s method [

lmerModLmerTest]

Formula: loglact ~ reg * status + (1 | pig)

Data: lactase

REML criterion at convergence: 79.9

Scaled residuals:

Min 1Q Median 3Q Max

-2.0709 -0.5030 0.1307 0.4950 1.9128

Random effects:

Groups Name Variance Std.Dev.

pig (Intercept) 0.1119 0.3345

Residual 0.1349 0.3673

Number of obs: 59, groups: pig, 20

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 2.129291 0.187780 37.626383 11.339 1.06e-13 ***

reg2 0.009363 0.196334 34.819977 0.048 0.9622

reg3 -0.008660 0.196334 34.819977 -0.044 0.9651

status2 -0.121178 0.232912 37.626383 -0.520 0.6059

reg2:status2 0.109818 0.243523 34.819977 0.451 0.6548

reg3:status2 -0.655019 0.245841 34.986886 -2.664 0.0116 *

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

From this output it is evident that the variation between pigs is a major part of thetotal variation. The variance between pigs is estimated to σ2

d = 0.33452, and the residualvariance is estimated to σ2 = 0.36732. So a little less than half of the total variation in the

eNote 5 5.6 THREE OR MORE LAYERS 12

data set is due to the variation between pigs. Twice the negative restricted/residual loglikelihood is 2`re = 79.9. The conclusion about the fixed effect parameters is differentfrom in the one-layer model. In the one-layer model, the interaction term between statusand region could be removed with a p-value of 6.6%, but now the interaction termis significant with a p-value of 0.76%. The important lesson to be learned from thisexample is that the variance structure matters. Even in two models with the exact samefixed effect structure, the conclusions can differ greatly if the variance structure is notcorrect. Later in this eNote, a method for comparing different variance structures willbe shown.

5.6 Three or more layers

Hierarchical models with three or more levels are natural in some situations. Consider,for instance, the situation with randomly selected farms, randomly selected cows fromeach farm, and randomly selected cups of milk from the daily production of each cow.In this situation, the natural model would be a three layer model. The basic three levelmodel is denoted:

yi = µ + a(blocki) + b(subi) + εi

where a(blocki) ∼ N(0, σ2a ), b(subi) ∼ N(0, σ2

b ), and εi ∼ N(0, σ2) are all independent.As usual, i is the observation index. The interpretation is similar to the interpretationof the two-layer model. The parameter σ2

a is the variance between blocks (farms), σ2b is

the variance between subjects (cows) from the same block (farm), and σ2 is the variancebetween observations from the same subject (cow).

The parameters are estimated using the REML method, just like in the two-layer model,and the structure of the covariance matrix is:

V =

σ2a +σ2

b +σ2 σ2a +σ2

b σ2a σ2

a 0 0 0 0σ2

a +σ2b σ2

a +σ2b +σ2 σ2

a σ2a 0 0 0 0

σ2a σ2

a σ2a +σ2

b +σ2 σ2a +σ2

b 0 0 0 0σ2

a σ2a σ2

a +σ2b σ2

a +σ2b +σ2 0 0 0 0

0 0 0 0 σ2a +σ2

b +σ2 σ2a +σ2

b σ2a σ2

a0 0 0 0 σ2

a +σ2b σ2

a +σ2b +σ2 σ2

a σ2a

0 0 0 0 σ2a σ2

a σ2a +σ2

b +σ2 σ2a +σ2

b0 0 0 0 σ2

a σ2a σ2

a +σ2b σ2

a +σ2b +σ2

Here illustrated for the case with two blocks, two subjects from each block, and twoobservations from each subject. The first 4 by 4 block in V corresponds to the first block,


[I]

status × reg

[pig]

reg

status

[litter]

0

Figur 5.4: Factor structure in the lactase example when analyzed with a three-layer mo-del (random pig and litter effects).

and the next non-zero 4 by 4 block corresponds to the second block. Within each ofthese blocks there is a similar structure, where two observations from the same subjectare more correlated than two observations from different subjects.

Lactase measurements in piglets analyzed using a three-layer model

The natural model for the lactase data set is a three-layer model with litter as the toplayer, pig as the middle layer, and each single measurement as the bottom layer, just asillustrated in the factor structure diagram in Figure 5.4. This hierarchy is illustrated inFigure 5.1 for the first two litters.

yi = µ + α(statusi) + β(regi) + γ(statusi, regi) + d(litteri) + e(pigi) + εi,

where d(litteri) ∼ N(0, σ2d ), e(pigi) ∼ N(0, σ2

e ), and εi ∼ N(0, σ2), and all (both d’s,e’s, and ε’s) are independent. Again, i is the observation number.

To analyze this model in R, write the following (here, the litter information has beenadded to the random part of the model):


model3 <- lmer(loglact ~ reg * status + (1|pig) + (1|litter),

data = lactase)

print(summary(model3), corr = FALSE)

Linear mixed model fit by REML. t-tests use Satterthwaite’s method [

lmerModLmerTest]

Formula: loglact ~ reg * status + (1 | pig) + (1 | litter)

Data: lactase

REML criterion at convergence: 79.9

Scaled residuals:

Min 1Q Median 3Q Max

-2.0709 -0.5030 0.1307 0.4950 1.9128

Random effects:

Groups Name Variance Std.Dev.

pig (Intercept) 0.1119 0.3345

litter (Intercept) 0.0000 0.0000

Residual 0.1349 0.3673

Number of obs: 59, groups: pig, 20; litter, 5

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 2.129291 0.187780 37.626382 11.339 1.06e-13 ***

reg2 0.009363 0.196334 34.819977 0.048 0.9622

reg3 -0.008660 0.196334 34.819977 -0.044 0.9651

status2 -0.121178 0.232912 37.626382 -0.520 0.6059

reg2:status2 0.109818 0.243523 34.819977 0.451 0.6548

reg3:status2 -0.655019 0.245841 34.986886 -2.664 0.0116 *

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

anova(model3)



reg 1.64252 0.82126 2 34.930 6.0872 0.005394 **

eNote 5 5.7 ZERO/NEGATIVE VARIANCE PARAMETERS 15

status 0.35771 0.35771 1 17.685 2.6514 0.121142

reg:status 1.51820 0.75910 2 34.930 5.6265 0.007617 **

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

confint(model3, 1:3, oldNames=FALSE)

Computing profile confidence intervals ...

2.5 % 97.5 %

sd_(Intercept)|pig 0.1850922 0.4938041

sd_(Intercept)|litter 0.0000000 0.2873889

sigma 0.2829005 0.4426169

Compared to the output from the two-layer model there is basically no difference. Thelikelihood is exactly the same, the variance corresponding to the litter effect is estimatedto zero, and the conclusions about the fixed effect parameters are unchanged.

5.7 Zero/Negative variance parameters

In some software, e.g. SAS Proc Mixed, an option can be set to allow for negative varian-ce components - NOT in R, though. When negative variance estimates occur, it is usuallyas an underestimate of small (or zero) variance. In this case, a reasonable way to handlethe negative estimates is simply to leave out the corresponding variance component orfix it to zero.

In some rare cases, it is possible to interpret a negative variance parameter estimate.For instance, in the case of several observations on the same litter. It could be possiblethat some of the animals would do well and eat all the available food, and some of theanimals would do poorly, and because all the food was now gone, they would do evenworse. In this type of competitive situations there is a negative within-group correlation,and the variation within a group could be greater than the variation between groups,which would lead to negative variance estimates.

In R, however, these negative estimates are instead set to zero, which really does makebetter sense in most cases.

eNote 5 5.8 COMPARING DIFFERENT VARIANCE STRUCTURES 16

5.8 Comparing different variance structures

To test a reduction in the variance structure, a restricted/residual likelihood ratio test canbe used. A likelihood ratio test is used to compare two models A and B, where B is asub-model of A. Typically, the model including some variance component (model A),and the model without this variance component (model B) are to be compared. To dothis, the negative restricted/residual log-likelihood values from both models (`(A)

re and`(B)re ) must be computed. The test statistic can then be computed as:

GA→B = 2`(B)re − 2`(A)

re

The likelihood ratio test statistic follows a χ2d f –distribution, where d f is the difference

between the number of parameters in A and the number of parameters in B. In the caseof removing a single variance component d f = 1.

For the lactase data set, three different variance structures have been used. The compa-rison of these variance models is seen in the following table:

Model 2`re G–value df P–value(A) Pig and litter included 79.9 GA→B = 0.0 1 PA→B = 1(B) Only pig included 79.9 GB→C = 9.6 1 PB→C = 0.0019/2(C) Independent observations 89.5

It follows that the litter variance component can be left out of the model, but the pigvariance component is very significant. In other words: Two observations taken on thesame pig are correlated, but two observations from different pigs within the same litterare not correlated (positively or negatively).

Warning: Really, it is more correct to use d f = 0.5 instead of d f = 1 in these tests – ORsimilarly (approximately) to divide the p–values by 2.

To calculate these p–values via R, either use the anova function with refit = FALSE (toget the restricted likelihood ratio tests):

anova(model2, model3, refit = FALSE)

or compute the negative log-likelihood directly:

eNote 5 5.9 BETTER CONFIDENCE LIMITS IN THE BALANCED CASE 17

(-2) * logLik(model3)

(-2) * logLik(model2)

(-2) * logLik(model1, REML = TRUE) # Need REML = TRUE for lm-objects

and compute the p–values ‘by hand’:

G_value <- -2 * c(logLik(model1, REML = TRUE) - logLik(model2))

pchisq(G_value, df = 1, lower.tail = FALSE)

[1] 0.002000645

5.9 Better confidence limits in the balanced case

In R, we get confidence intervals as profile likelihood based intervals from the confint

function. We also saw the alternative method of using an approximate χ2-distributionbased on the Satterthwaite approximation approach – however, this method requiresthe asymptotic variance-covariance matrix of the parameters of the random part of themodel. Per se, these are not so easily extracted from lmer results. (In the future, this willbe built into the lmerTest-package).

In the balanced case, the variance components can often be simply expressed by linearcombinations of independent Mean Squares, which are all chi-square distributed, anda simple chi-square approximation can be used to construct better confidence limits.Some of the theory behind this idea is given here.

A factor F is said to be balanced if the same number of observations are made for eachof its levels. In this section, it is assumed that all factors are balanced.

As seen in the eNote on factor structure diagrams, in “nice” cases, it is possible to calcu-late the degrees of freedom from the factor structure diagram. The formula used tocalculate the degrees of freedom dfF for a factor F is:

dfF = |F| − ∑G:G<F

dfG

Here, |F| is the number of levels of the factor F, and the sum is the sum over the degreesof freedom from all factors coarser than F.


In these “nice” cases, a similar formula exists for computing the sums of squares (SS)to be used in the analysis of variance table. The sum of squares SSF corresponding to afactor F is:

SSF =1nF

∑j∈F

SF(j)2 − ∑G:G<F

SSG

This formula require some explaining. The term nF is the number of observations oneach level of the factor (as the factor is balanced, this number is the same for all levels).The term SF(j) is the sum of all observations at the j’th level of the factor F. For instance,if F is the factor indicating treatment (“no” or “yes”), then SF(yes) is the sum of allobservations from treated individuals. The last sum is the sum of the SS’s from all factorscoarser than F.

Example 5.1 Balanced two-layer model

Consider again the two-layer model:

yi = µ + a(subi) + ε i

where a(subi) ∼ N(0, σ2a ), ε i ∼ N(0, σ2), all independent. The number of subjects is |sub|, the

number of observations on each subject is nsub, and hence the total number of observationsin the balanced case is N = |sub| · nsub.

The factor structure for this model is:

[I]N−|sub|N [sub]|sub|−1

|sub| 011

From this diagram, the degrees of freedom for each factor can be calculated (in fact, theyare already printed in the diagram). For the constant factor 0, the degrees of freedom arealways df0 = 1, as it is the coarsest factor and has only one level. The subject factor has|sub| levels and only 0 is coarser, so dfsub = |sub| − 1. Finally, to the finest factor I: Ihas N levels, but sub and 0 are coarser, so dfI = N − (|sub| − 1)− 1 = N − |sub|.


Similarly, the SS–values can be calculated as:

SS0 = 1N (∑i=1...N yi)

2

SSsub = 1nsub ∑j∈sub Ssub(j)2 − SS0

SSI = ∑i=1...N y2i − SSsub − SS0

These formulas may seem cumbersome, but actually they are fairly simple to use, asthey only depend on the total sum, the total sum of squares, and the subject–sums.

With these formulas in place, the analysis of variance table can be constructed as:

Source SS df MS EMSSubject SSsub |sub| − 1 SSsub/(|sub| − 1) σ2 + nsubσ2

aResidual SSI N − |sub| SSI/(N − |sub|) σ2

Thus, natural estimates of the two variance parameters are:

σ2 = MSI and σ2a =

MSsub −MSInsub

In this balanced case, it is also possible to write down better confidence intervals forthe variance parameters than the general confidence intervals based on the normal ap-proximation, which will be described later. The standard 95% confidence interval forthe residual variance is:

(N − |sub|)σ2

χ20.975;(N−|sub|)

< σ2 <(N − |sub|)σ2

χ20.025;(N−|sub|)

Here, N − |sub| are the degrees of freedom corresponding to MSI. The parameter σ2a is

estimated by a linear combination of two MS values, so the exact degrees of freedomcannot be found. Satterthwaite’s approximation is used to approximate the degrees offreedom. In the Satterthwaite method, the degrees of freedom are determined by requi-ring that the χ2–distribution has the same variance as the estimate. The 95% confidenceinterval is:

νσ2a

χ20.975;ν

< σ2a <

νσ2a

χ20.025;ν

where the ν is estimated by:

ν =(σ2

a )2(

MSsubnsub

)2

|sub|−1 +

(MSInsub

)2

N−|sub|

eNote 5 5.10 PREDICTION OF RANDOM EFFECT COEFFICIENTS 20

Slightly more generally, if fixed effects are also present in the model or, perhaps, if someimbalance occurs, one may still estimate variance components using a so-called momentmethod. This is actually a simple alternative to REML estimation. It amounts to a solu-tion of the linear system, where the theoretical expected random effect mean squaresare put equal to the observed mean squares. In balanced experiments these theoreticalexpected mean squares can be deduced from the factor structure diagram, cf. eNote 2,and the results of this will equal the REML estimates. In other cases, expected meansquares are almost impossible to deduce by hand/in R.

In general, IF a variance component is estimated as a linear combination of mean squa-res:

σ2A =

K

∑k=1

bkMSk

where the mean squares have the associated degrees of freedom f1, . . . , fK, then an ap-proximate 95% confidence band can be obtained by:

νσ2A

χ20.975;ν

< σ2A <

νσ2A

χ20.025;ν

where the ν is estimated by:

ν =(σ2

A)2

∑Kk=1

(bkMSk)2

fk

This is the basic Satterthwaite formula. In the example above, K = 2, b1 = 1/nsub,b2 = −1/nsub, f1 = |sub| − 1 and f2 = N − |sub|.

In general, the REML estimates cannot be written as such linear combinations, so othermethods are required. In eNote 4, we saw a different satterthwaite method describedfor REML estimates.

Before computers revolutionized the world of statistics, the balanced case was extreme-ly important. Great care was taken to obtain balanced experiments, but a single failedmeasurement could ruin a balanced design. Nowadays, with modern computers andsoftware, the question of balance seems less important.

5.10 Prediction of random effect coefficients

This is reproduced from eNote 4. A model with random effects is typically used in si-tuations where the subjects (or blocks) in the study are to be considered representativesfrom a greater population. As such, the main interest is usually not in the actual levels

eNote 5 5.10 PREDICTION OF RANDOM EFFECT COEFFICIENTS 21

of the randomly selected subjects, but in the variation among them. It is, however, po-ssible to obtain predictions of the individual subject levels u. The formula is given inmatrix notation as:

u = GZ′V−1(y− Xβ)

If G and R are known, u can be shown to be ‘the best linear unbiased predictor’ (BLUP)of u. Here, ‘best’ means minimum mean squared error. In real applications, G and R arealways estimated, but the predictor is still referred to as the BLUP.

In R we get the random effect BLUPs very easily:

ranef(model2)

$pig

(Intercept)

1 -0.17600554

2 -0.12850621

3 -0.40900010

4 0.01228298

5 0.09111246

6 0.06371047

7 0.23232880

8 -0.34709933

9 0.35715158

10 0.29422020

11 -0.17171747

12 0.04975418

13 -0.06476400

14 -0.33185022

15 -0.22806704

16 0.13572511

17 0.40563619

18 -0.19458734

19 0.59731937

20 -0.18764410

eNote 5 5.11 R-TUTORIAL: THE LACTASE EXAMPLE 22

5.11 R-TUTORIAL: The lactase example

The R-function lmer from the package lme4 cannot be used for models with no randomeffects. Instead, lm has to be used whenever a version of our models with no randomeffects is needed for reasons of comparison:

lactase <- read.table("lactase.txt", sep = ",", header = TRUE,

na.strings = ".")

lactase$litter <- factor(lactase$litter)

lactase$pig <- factor(lactase$pig)

lactase$reg <- factor(lactase$reg)

lactase$status <- factor(lactase$status)

model1 <- lm(loglact ~ reg + status + reg:status, data = lactase)

require(car)

Anova(model1, type = 3, test.statistic = "F")

Anova Table (Type III tests)

Response: loglact

Sum Sq Df F value Pr(>F)

(Intercept) 31.737 1 129.1792 7.931e-16 ***

reg 0.001 2 0.0023 0.99769

status 0.067 1 0.2719 0.60420

reg:status 1.407 2 2.8642 0.06589 .

Residuals 13.021 53

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Even though type III tests for main effects are produced by the Anova function, onemust be careful about the interpretation of these. For unbalanced cases and, in fact, alsofor models with covariates, we should not use the type III tests performed using thisfunction, as their interpretation is generally unclear.

In the present case, there is only one test for which we would always be sure to have agood interpretation: namely the test for the interaction term and this would, e.g., alsobe returned by the drop1 function:


drop1(model1, test = "F")

Single term deletions

Model:

loglact ~ reg + status + reg:status

Df Sum of Sq RSS AIC F value Pr(>F)

<none> 13.021 -77.146

reg:status 2 1.4074 14.429 -75.091 2.8642 0.06589 .

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

In R, the two-layer model is specified using the function lmer

model2 <- lmer(loglact ~ reg * status + (1|pig), data = lactase)

anova(model2)



reg 1.64252 0.82126 2 34.930 6.0872 0.005394 **

status 0.35771 0.35771 1 17.685 2.6514 0.121142

reg:status 1.51820 0.75910 2 34.930 5.6265 0.007617 **

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

where the (1|pig) part specifies that there is a random effect for each level of the vari-able pig.

The (default) output from anova, using the lmerTest package, is Type III F–statistics andp–values. A Type I version is also an option (use type = 1).

Remark 5.2

WARNING—a confusing detail: in general, the SS and MS columns of the lme4 (andhence lmerTest) will NOT coincide with what is seen when using lm, as the SS andMS columns from the lme4-anova are scaled by the proper mixed model error termssuch that when they are divided by the error variance, the proper mixed model F-statistics appear! Confused? See the example below!


Example 5.3 Comparing anova from lm and lmer

## We illustrate this on a balanced data set, so we remove pigs 3-8:

lactase_red <- subset(lactase, pig %in% c(1:2,9:20))

testmodel <- lmer(loglact ~ status + (1|pig), data = lactase_red)

testmodelfixed <- lm(loglact ~ status + pig, data = lactase_red)

anova(testmodel)



status 0.20575 0.20575 1 12 1.086 0.3179

anova(testmodelfixed)

Analysis of Variance Table

Response: loglact

Df Sum Sq Mean Sq F value Pr(>F)

status 1 0.5626 0.56264 2.9697 0.09586 .

pig 12 6.2172 0.51810 2.7346 0.01386 *

Residuals 28 5.3049 0.18946

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Note that as pigs are nested within status, the proper mixed model F-statistic for the statuseffect is

FMixed =MSstatus

MSpig=

0.562640.51810

= 1.085968

where the SSstatus is defined in the “usual” ANOVA way:

statusmeans <- tapply(lactase_red$loglact, lactase_red$status, mean)

overallmean <- mean(lactase_red$loglact)

21*sum((statusmeans-overallmean)^2)

[1] 0.5626422


But from anova(testmodel) we get the number SS = 0.20575 instead, which comes from:

SS(lmer)status = 0.20575 = 0.56264

0.189460.51810

= SS(usual)status

MSResidual

MSPig= SS(usual)

statusMSResidual

(Mixed Error)

Finally, note that the F-statistic for the status effect can then be achieved by taking the (so-called) MS from anova(testmodel) and dividing by the residual error:

FMixed =MS(lmer)

statusMSResidual

=0.205750.18946

= 1.085968

The residual error MS from the fixed model is also the estimate of the residual error varianceof the mixed model:

sigma(testmodel)^2

[1] 0.1894598

The function lmer returns a merMod object containing all relevant information about thefitted model. In the examples above, model2 is an merMod object: To extract informationfrom the object, various functions, so-called extractors, can be applied to the object. Wehave already met the two extractors anova and summary earlier.

A list of all such methods may be found as:

methods(class = "merMod")

[1] anova Anova as.function coef

[5] confint deltaMethod deviance df.residual

[9] drop1 extractAIC family fitted

[13] fixef formula getL getME

[17] hatvalues influence isGLMM isLMM

[21] isNLMM isREML linearHypothesis logLik

[25] matchCoefs model.frame model.matrix ngrps

[29] nobs plot predict print

[33] profile ranef refit refitML

[37] residuals show sigma simulate

[41] summary terms update VarCorr

[45] vcov vif weights

see ’?methods’ for accessing help and source code

eNote 5 5.12 EXERCISES 26

5.12 Exercises

Exercise 1 Blueberries

In an experiment with 4 sorts of blueberries, the fructification (the number of berries asa percentage of the number of flowers earlier in the season) was determined for 5 twigson each of 6 bushes for every sort. The data are available in the file blueberry.txt—seealso the Hierachical Section 13.3 in eNote 13.

a) Investigate whether the 4 sorts have different fructification.

b) Draw the factor structure diagram corresponding to the analyzed model.

c) Test whether it is important to include bush as a random effect.

d) Interpret the differences between the sorts and make relevant post-hoc tests.

e) Draw conclusions.

hierarchical random effects - dtu course website 02429 · hierarchical random effects. enote 5...

Documents