©cscar, 2010: proc mixed1 introduction to sas ® proc mixed cscar workshop may 19 & 21, 2010...

©CSCAR, 2010: Proc Mixed 1

Introduction to SAS® Proc Mixed

CSCAR WorkshopMay 19 & 21, 2010

Kathy Welch, Instructorkwelch@umich.edu

Workshop Goals• Introduce the Linear Mixed Model (LMM) and

some key concepts (theory)• Learn what types of data are appropriate for a

LMM analysis• Learn how to set up data for analysis using SAS

Proc Mixed• Learn how to set up Proc Mixed syntax to fit

LMMs for different types of data• Interpret output from Proc Mixed• Get diagnostic plots from Proc Mixed to check

LMM assumptions

Lab Examples

1. Randomized block design

2. Two-level clustered data

3. Three-level clustered data

4. Repeated measures data

5. Longitudinal data

What is a Linear Mixed Model (LMM)?

• A parametric linear model for a normally distributed response, appropriate for non-independent data– Clustered data

• Responses for units in same cluster may be correlated.

– Repeated Measures / Longitudinal data• Residuals for the same subject may be correlated

• Differs from ordinary linear regression and Anova, where we assume independent observations

What is a Linear Mixed Model? (Cont)

• Predictors may be– Fixed– Random

• Fixed predictors have fixed effects parameters and specify the mean structure

• Random effects are associated with individual subjects or clusters and determine the covariance structure

• Variances and covariances can differ by group• Differs from general linear model where we

assume constant variance©CSCAR, 2010: Proc Mixed 5

Data Appropriate for a LMM Analysis

• Clustered data• Subjects are nested in clusters, such as

classrooms, families, litters, neighborhoods

• Repeated Measures data• Multiple measurements for the same subjects over

time or another dimension

• Longitudinal data• Multiple measures for the same subject over long

period of time,

• Observations for the same cluster/subject are likely to be correlated (non-independent)

General Linear Model vs Linear Mixed Model

fixed random

~ ( , )

i i i i i

Y X Z u

General Linear Model

Linear Mixed Model

Independent observations

withconstant variance

The parameters in a GLM are β and σ2

The parameters in a LMM are β and the variances and covariances in D and Ri

Variance-Covariance

Matrix, captures non-independence

Linear Mixed Model for the ith Subject

fixed random

~ ( , )

Y X Z u

i i i i i

• where β are fixed effects parameters• ui are random variables, with a normal distribution and variance-covariance matrix D• ϵi are random residuals, with a normal distribution and variance-covariance matrix Ri

• ui,, ϵi are independent

©CSCAR, 2010: Proc Mixed9

Yi : Response for Subject i• Yi is a vector of Responses for the ith

cluster/subject• The ni responses for cluster/subject i are set up

in long format, with each response on a separate row of data, along with all of the covariates (predictors) for that response.

• Each subject/cluster may have a different number of responses

• Yi is approximately normally distributed (actually the residuals must be normal)

• If residuals are not normally distributed, we may consider a transformation to improve normality

Yi for Clustered Data• There are ni units within cluster i

• Response measured once for each unit in the cluster

• Number of units per cluster can vary• Some clusters may have only one unit

• Example 1• Mathgain score measured once for each sampled

student in 130 classrooms• Number of sampled students per classroom can

vary• What is the cluster?• What is the unit?

Clustered Data Examples• Example 2

• First, schools are sampled• Next, classrooms within schools are sampled• Finally, students within classrooms are sampled• Mathgain score for each student is measured• What is the cluster?• What is the unit?

• Example 3• Birth weights of litters of rat pups are measured• What is the cluster?• What is the unit?

• More examples?©CSCAR, 2010: Proc Mixed 11

Clustered Data Table

Y1 : response vector for the 3 students in classid 160

Yi forRepeated Measures Data

• The classic repeated measures design is multiple measures of a response over time

• Multiple measures for the same subject• ni measures of the response for subject i• ni can vary across subjects

• Data can vary over time, space, or other dimension/s

• Example 1• Insulin levels at 1 min, 5 min, 20 min, and 60 min

after injection of a drug for diabetes• What is the subject?• What is the repeated measures factor?

Repeated Measures DataExamples

• Example 2• Chemical concentration in 5 different brain regions

in rats• What is the response?• What is the subject?• What is the repeated measures factor?

• More examples?

Repeated Measures Data Table

Y1 : response vector for animal R111097

Yi forLongitudinal Data

• Longitudinal data are measures made on the same subject over a longer period of time

• ni measures of the response for a given subject• Number per subject can vary due to attrition

• Example 1• Socialization score for autistic children measured at

ages 2, 3, 5, 9, and 13• What is the subject?• What is the time frame?• Do you think attrition will be a problem?

• Other examples?©CSCAR, 2010: Proc Mixed 16

Longitudinal Data Table

Y1 : response vector for subject 1

Fixed Predictors• Fixed predictors can be categorical

(factors) or continuous• For factors, all levels of interest included

• Treatment• Gender

• Levels of fixed factors can be defined to represent contrasts of interest

• High Dose vs. Control, Medium Dose vs. Control • Female vs. Male

• Fixed continuous predictors can be included as linear, quadratic or other terms

Fixed Predictor Examples

• Age, Age2

• Income

• Gender

• Drug Treatment

• Region

• Examples from your work…

Xi : Design Matrix for Fixed Effects

• Xi contains values of the fixed predictor variables (X variables) for subject i.

• e.g. Age, Sex, Treatment

• The X matrix can include continuous and/or indicator variables for categorical predictors

• Xi has one row for each of the ni observations for the ith subject, and one column for each of the predictors– We implicitly include an intercept (a column of ones) for

most models– We do not need to have an intercept variable in the data

X Matrix Example

• The X matrix is formed from variables in the dataset. • We usually don’t include a variable for the intercept in the dataset. • The intercept is included in the model by default by Proc Mixed.• The X matrix variables must be present for each row of data for that observation to be included in the analysis

X MatrixX Matrix for subject 1

: Fixed Effects Parameters

are fixed-effects parameters or regression coefficients

• unknown fixed quantities

describe how the mean of Y depends on the predictor variables for an entire population or subpopulation of subjects

• The value of does not vary across individual subjects

• We usually include an intercept ( 0) as one of the components of

Random Factors• Random Factor: A classification variable • Random factors do not represent conditions

chosen to meet the needs of the study, but arise from sampling a larger population

• Variation in the dependent variable across levels of a random factor can be estimated and assessed

• Results can be generalized to the greater population

• A random factor may have different random effects associated with it (e.g. random intercept and random slope)

Random Factor Examples

• Clustered Data:– Classroom– Hospital– Neighborhood– More examples…

• Repeated Measures/Longitudinal Data:– Person (subject)– More examples….

• The random factor is the cluster in clustered data, and subject in repeated measures/longitudinal data

ui : Random Effects• ui are unobserved random variables (not

parameters)

• ui are specific for the ith subject (random factor) in a LMM

• ui vary across clusters/subjects

• ui are random deviations in the relationships described by fixed effects

• ui are assumed to have a normal distribution• mean=0 and variance-covariance matrix, D

• The parameters associated with the ui are the variances and covariances of these random variables

Normal Distribution Refresher

E(Y ) = μ = Mean, or Expected value

balance pointcenter of symmetric distribution

-infinity < Y < infinity

Var(Y) = σ2 = VarianceMeasure of variability, or spread

σ2 must be 0 or positive, >=0

σ is the standard deviation

Normal Distribution Refresher II

Same ,Different

Covariance Refresher• Covariance (denoted by Y1,Y2) is a measure of

how much two random variables change together.

• It can be positive or negative.• Normal random variables with zero covariance

are assumed to be independent

Variance-Covariance Matrix• A variance-covariance matrix contains the

variances and covariances between a set of random variables.

• The dimension of the var-covar matrix is the same as the number of random variables.

• If there is only one r.v., the dimension would be 1 by 1 (just a single value).

• If there are two r.v.s, the matrix would be 2 by 2 (or 2x2)

• The set of variances and covariances are called the covariance parameters.

The D matrix

• D is the variance-covariance matrix for the random effects (ui) for subject i

• D contains the covariance parameters for the random effects

• The dimension of D is based on the number of random effects per subject, not the number of observations per subject– If there is one random effect (e.g., a random intercept)

per subject, D would be 1x1– If there are two random effects (e.g., a random

intercept and random slope) per subject, D would be 2x2

Form of the D Matrix

• Variances of random effects are on the diagonal• Covariances between different random effects within the

same subject are on the off-diagonal• Symmetric, positive-definite matrix

• Variances must all be positive • SAS calls this G. • We use D and G interchangeably

1 1 2 1

1 2 2 2

( ) ( , ) ( , )

( , ) ( ) ( , )( )

( , ) ( , ) ( )

i i i i qi

i qi i qi qi

Var u cov u u cov u u

cov u u Var u cov u uVar

cov u u cov u u Var u

Two Common Structures for D

• Variance components type=vc (Independent)

• Unstructured

type=un

1 1, 2

1, 2 2

Although there are many other possible structures for D, one of these two structures is almost always used

Covariance Matrices• How would you fill

these in, if you had a random intercept and random slope per subject and 10 obs per subject?

_____vc

_____un

DScenario A: 2

intercepts=.282

slopes=.10

Scenario B: 2

intercepts=.282

slopes=.10intercepts,slopes=-.01

Zi : Design Matrix for Random Effects

• The Zi matrix can include both continuous and indicator variables for subject i

• Zi has one row for each observation for the ith subject

• Number of columns in Zi depends on the number of random effects in the model

• We often include a random intercept for each subject

• In a model with one random intercept per subject, Zi would have one column per subject

• In a model with a random intercept and random slope for each subject, Zi would have two columns per subject.

Z Matrix Example

Z Matrix for Subject 1

We don’t include variables for Z in our dataset. Note that Zi has all zero values for other subjects.

Random Residuals: εi

• The εi vector contains the residuals for the ith subject

• There is one value of εi for each observation for the ith subject

• We assume that the εi are normally distributed, with mean = 0 and variance-covariance matrix, Ri

• There are a large number of possible structures for Ri, some of which we will examine later

• For example, we can allow the variances of the residuals at different time points to differ.

The Ri matrix• Ri contains the variances and covariances of residuals

for the same subject– residual covariance parameters

• The dimension of Ri depends on the number of observations (ni) for subject i. – For a subject with 5 repeated measures, the Ri matrix

would be 5 X 5. – For a subject with only one measure, the Ri matrix would

be 1 X 1.• The default assumption in Proc Mixed for the Ri

matrix is that the variance of all residuals is the same and that the covariances are all zero.

Form of the R Matrix

• The diagonal elements are variances of residuals for the same subject

• The off-diagonal elements are covariances between two residuals for the same subject

• Symmetric, positive-definite matrix • The variances must all be > 0

1 1 2 1

1 2 2 2

( ) ( , ) ( , )

( , ) ( ) ( , )( )

( , ) ( , ) ( )

i i i i n i

i n i i n i n i

Var cov cov

cov Var covVar

cov cov Var

Form of the R Matrix II

• Proc Mixed has many possibilities for the structure of the R matrix

• We will discuss some of these later in the workshop.

Covariance Parameters• We estimate a set of covariance

parameters , which are the variances and covariances for the D and R matrices– For D we estimate D (variances and

covariances of the random effects)– For R we estimate R (variances and

covariances of the random residuals)

• The number of covariance parameters that we estimate depends on the number of random effects, and the structure we specify for D and R

Covariance Summary

• We use Proc Mixed to estimate the variance of random effects, and the covariance between pairs of random effects in a LMM.

• We also use Proc Mixed to estimate the variances and covariances of the random residuals in a LMM.

• We assume that the random effects and the random residuals are independent.

Implied Marginal Model

• A LMM uses random effects explicitly to model between-subject variance– Subject-specific model– Includes D matrix and R matrix

• Implied marginal model– Marginal model that results from fitting a LMM– The marginal variance-covariance matrix is called V– V is derived from D and R

• We can get the distribution of the population mean using the implied marginal model

Implied Marginal Distribution of Yi Based on a LMM

Vi, the marginal variance-covariance matrix, is derived from D and Ri

) ( ) )

Y X Z u

Y Y X Z u X

Y V X Z u

Z u Z u Z

= Z DZ R

i i i i

i i i i i i

Mean of =E E

Var Var

Var Var Var

Proc Mixed in SAS• Is an appropriate tool to fit models for clustered

or repeated measures / longitudinal data• Allows users to fit LMMs with both fixed and

random effects• Accomodates models with a wide variety of

correlation (covariance) structures• Can be used when there are unequal numbers

of observations per cluster/subject• Can be used when there are unequal variances

for different subgroups of observations• Has a rich array of graphical and analytic tools

to assess the fit of LMMs

Data Structure for Proc Mixed

• We structure the data in “long” form, so multiple observations for the same subject/cluster are on separate rows of data

• Each row contains all information specific to the cluster or subject– Some variables vary across clusters/subjects, but are

constant within a given cluster/subject, they will be repeated for all rows for the same subject/cluster

• In repeated measures, these are time-invariant– Some variables change for different subjects within a

cluster• In repeated measures, these are time-varying

Randomized Block Design• A block is a group of relatively homogeneous

experimental units• The use of blocks reduces variability (within-block

variability should be low, between-block variability can be high)

• Individual blocks are independent• Observations within a block are correlated• Blocks are usually random factors

– They represent a random sample from a population– We wish to make inferences to the population, not to

the individual blocks• Examples of blocks include batches, machines, plots,

mice, people, clinics, and bananas (the next example)

Lab Example 1

Randomized Block DesignBanana Data

(Hypothetical) Banana Example• Purpose: To compare shelf life of bananas, when treated with

three different food preservatives (A, B, C)

• Experimental material: 5 bananas

• Experimental design, bananas are blocks:

– Cut each banana into three pieces.

– Randomly assign one of the three preservatives to each piece

Treatment A B C

1 8.9 9.1 9.1 2 9.3 9.4 9.7 3 9.4 9.3 9.6 4 9.6 9.8 10.0

Banana (Block)

5 10.0 9.9 10.2

Fixed and Random Factors• Treatment is a fixed factor; contrasts between

treatments are of interest• Bananas are a random sample from a

population of bananas– We want conclusions for the study to apply to the

whole population of bananas, not just these particular bananas

– Banana is a random factor that will have random effects

• We will fit a LMM with fixed effects for treatment and a random effect for each banana

Model for Banana Data

• where Yti = shelf life of banana i, treated with preservative t 0 = intercept t = fixed effect of treatment t, t = 1, 2, 3

• bi = random effect (intercept) for banana i,

• εti = residual for banana i, treatment t

• We estimate five parameters, 0, 1, 2, σb2, σ2

3 = 0, set to zero restriction

~ (0, )

t titi ti i

Treat b

Y Note: The D matrix is 1 by1, because

we have only 1 random effect per banana

D matrix for Banana Data

• The bi are random intercepts, one for each banana

• σ2b is the variance of the random banana intercepts, and

captures the between-banana variance• In this case (1 random effect per subj), D is 1 x 1

• σ2b is the only random effects parameter we need to

estimate• The covariance of observations on the same banana

depends on the variance of the random effects, σ2b

2~ (0, )

Var(b )

Ri Matrix for Banana Data

• There are 3 observations per banana• The Ri matrix will be a 3 x 3 matrix for each banana• We assume the default structure (σ2I) for Ri

• The residual variance is constant

Var(εti) = σ2

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

~ (0, )

R = i i

Var( ) I

Data (long form) y treat banana

8.9 A 1

9.1 B 1

9.1 C 1

9.3 A 2

9.4 B 2

9.7 C 2

10.0 A 5

9.9 B 5

10.2 C 5

Model for each Banana

y11 = 0 + 1 + b1 + ε11

y21 = 0 + 2 + b1 + ε 21

y31 = 0 + 3 + b1 + ε 31

y12 = 0 + 1 + b2 + ε 12

y22 = 0 + 2 + b2 + ε 22

y32 = 0 + 3 + b2 + ε 32

yti = 0 + t + bi + ε ti

y15 = 0 + 1 + b5 + ε 15

y25 = 0 + 2 + b5 + ε 25

y35 = 0 + 3 + b5 + ε 35

______________________

= treat --- 1 2 3 (Fixed Effects)

b = banana --- b1 b2 b3 b4 b5 (Random Effects)

data fruit;input shelf treat $ banana;cards; 8.9 A 1 9.1 B 1 9.1 C 1 9.3 A 2 9.4 B 2 9.7 C 2 9.4 A 3 9.3 B 3 9.6 C 3 9.6 A 4 9.8 B 410.0 C 410.0 A 5 9.9 B 510.2 C 5;proc mixed data = fruit;class treat banana;model shelf = treat / solution;random banana; run;

Example 1: SAS Code

Proc Mixed Syntax

• Class statement sets up categorical factors for both fixed and random effects

• Model statement specifies the fixed factors in the model

• Random statement specifies the random factors to be included in the model, and specifies the structure for the D matrix (called G matrix by SAS)

• Repeated statement specifies the structure of the R matrix of residual variances and covariances

Example 1: Proc Mixed Syntax

Proc mixed data = fruit; class treat banana; model shelf = treat / solution; random banana;run;

Note: Proc Mixed will automatically include a dummy variable for each level of a class variable. The highest level of the class variable is given a coefficient of 0 for the dummy variable by default. This makes the highest level the reference.

The Mixed Procedure

Model Information

Data Set WORK.FRUIT Dependent Variable shelf Covariance Structure Variance Components Estimation Method REML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment

Class Level Information

Class Levels Values

treat 3 A B C banana 5 1 2 3 4 5

Ex 1: Proc Mixed OutputPart 1

Ex 1: Proc Mixed Output (Cont)

Dimensions Covariance Parameters 2 Columns in X 4 Columns in Z 5 Subjects 1 Max Obs Per Subject 15

Number of Observations Number of Observations Read 15 Number of Observations Used 15 Number of Observations Not Used 0

Iteration History Iteration Evaluations -2 Res Log Like Criterion 0 1 16.24999675 1 1 -2.40852048 0.00000000

Convergence criteria met.

SAS assumes all obs are for the same subject, because we did not specify a subject in the

random statement

Covariance Parameter Estimates Cov Parm Estimate banana 0.1430 Residual 0.008667 Fit Statistics -2 Res Log Likelihood -2.4 AIC (smaller is better) 1.6 AICC (smaller is better) 2.9 BIC (smaller is better) 0.8 Solution for Fixed Effects Standard Effect treat Estimate Error DF t Value Pr > |t| Intercept 9.7200 0.1742 4 55.81 <.0001 treat A -0.2800 0.05888 8 -4.76 0.0014 treat B -0.2200 0.05888 8 -3.74 0.0057 treat C 0 . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F treat 2 8 12.54 0.0034

Estimated between

Banana variance:σb2

Estimated within banana variance: σ2

Significance of Overall Treat Effect

Ex 1: Proc Mixed Output(Cont)

Intercept is estimatedMean for Treat C

Estimated diff in mean of treat A vs. C

Estimated diff in mean of Treat B vs. C

Ex 1: Interpreting Fixed Effects Estimates

• The estimated effect of each treatment represents a contrast between that level of treatment and the last level

• The intercept represents the estimated mean for the last level of treatment– The estimated shelf life for treatment C = 9.72 days

• The effect of treatment A = -0.28– Treatment A reduces shelf life by 0.28 days, as compared to

treatment C (the reference group), p = 0.0014.• The effect of treatment B = -0.22

– Treatment B reduces shelf life by 0.22 days, as compared to treatment C,p = 0.0057

Ex 1: Fixed Effects Estimates (Cont)

• We can substitute the estimated fixed effects parameters into the model equation to get the predicted mean for each treatment

ˆ ˆˆ 9.72 .28 9.44

ˆ ˆˆ 9.72 .22 9.50

Ex 1: Covariance Parameter Estimates

• There are two covariance parameters in this model, the estimated between-banana variance:

• And the estimated within-banana, or residual variance:

2ˆ 0.1430bD

2ˆ 0.008667

Estimated G Matrix

Row Effect banana Col1 Col2 Col3 Col4 Col5 1 banana 1 0.1430 2 banana 2 0.1430 3 banana 3 0.1430 4 banana 4 0.1430 5 banana 5 0.1430

Ex 1: Estimated G Matrix

• We can view G matrix (what we call the D matrix) by using the g option:

random banana / g;The D matrix for each

Banana is 1x1

Ex 1: Estimated Ri Matrix

• We can view the estimated 3 x 3 Ri matrix for the three observations for the first banana by adding a repeated statement:

repeated / subject=banana r;

Estimated R Matrix for Banana 1

Row Col1 Col2 Col3 1 0.008667 2 0.008667 3 0.008667

Estimated V Matrix for Subject 1 row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Col14 Col15 1 0.152 0.143 0.143 2 0.143 0.152 0.143 3 0.143 0.143 0.152 4 0.152 0.143 0.143 5 0.143 0.152 0.143 6 0.143 0.143 0.152 7 0.152 0.143 0.143 8 0.143 0.152 0.143 9 0.143 0.143 0.15210 0.152 0.143 0.14311 0.143 0.152 0.14312 0.143 0.143 0.15213 0.152 0.143 0.14314 0.143 0.152 0.14315 0.143 0.143 0.152

Ex 1: Estimated V matrix• We can view the estimated V matrix of marginal

variances and covariances for all bananas by using the v option in the random statement (note V is block-diagonal, with obs from different bananas being indep.):

random banana / v;

Ex 1: Calculation of V matrix from D and R

• The V matrix, is derived from the D and R matrices

• The covariance, and hence correlation, among observations within the same banana is due to the between-banana variation

• If there is zero between-banana variation, there is no correlation among obs for the same banana

( )Y V Z DZ Ri i i i i

Ex 1: Calculation of V Matrix(Cont)

• We first illustrate these calculations for the ith banana

• We then show how these calculations work for the entire data set, assuming we have only two bananas in the study

• This can then be generalized to any number of bananas

• We will then have tools to help understand more complicated models

Ex 1: Step 1 of Calculation of V Matrix for the ith banana

1ˆ 1 .143 1 1 1

.143 .143 .143

ˆ ˆ ˆ( )Y V Z DZ Ri i i i i

Est. Var

Ex 1: Step 2 of Calculation of V Matrix

1 0 0ˆ ˆ .008667 0 1 0

.0087 0 0

0 .0087 0

0 0 .0087

Ex 1: Step 3 of Calculation of V Matrix

2 2 2 2

ˆ ˆ ˆ

.1430 .1430 .1430 .0087 0 0

.1430 .1430 .1430 0 .0087 0

.1430 .1430 .1430 0 0 .0087

ˆ ˆ ˆ ˆ.1517 .1430 .1430

ˆ ˆ ˆ.1430 .1517 .1430

.1430 .1430 .1517

V ZDZ R

2 2 2 2

ˆ ˆ ˆ ˆb

Covariance of obs on same banana is the between-banana variance

Marginal variance of each obs is the between-banana variance plus within-banana variance

Ex 1: Calculation of V Matrixfor two bananas

1 0 0 0 0 0

0 1 0 0 0 0

0 0 1 0 0 0ˆ ˆ .0086670 0 0 1 0 0

0 0 0 0 1 0

0 0 0 0 0 1

.0087 0 0 0 0 0

0 .0087 0 0 0 0

0 0 .0087 0 0 0

0 0 0 .0087 0 0

0 0 0 0 .0087 0

0 0 0 0 0 .0087

Ex 1: Calculation of V Matrixfor two bananas (Cont)

.143 .143 .143 0 0 0

.143 .143 .143 0 0 0ˆ ˆ0 0 0 .143 .143 .143

0 0 0 .143 .143 .143

.0087 0 0 0 0 0

0 .0087 0 0 0 0

0 0 .0087 0 0 0

0 0 0 .0087 0 0

0 0 0 0 .0087 0

0 0 0 0 0 .0087

©CSCAR, 2010: Proc Mixed73

Ex 1: Calculation of Vfor two bananas (Cont)

• The V matrix is block-diagonal • Observations within the same banana are correlated• Observations for different bananas are independent• This can be extended for more bananas

.1517 .143 .143 0 0 0

.143 .1517 .143 0 0 0

.143 .143 .1517 0 0 0ˆˆ ˆ0 0 0 .1517 .143 .143

0 0 0 .143 .1517 .143

0 0 0 .143 .143 .1517

V ZGZ R

Ex 1: Intraclass Correlation ICC

• The ICC estimates the correlation of observations within the same subject

• It is very high for this example• Because it is based on variances, the ICC can

only be positive or zero (more on this later)2

.143Estimated (banana example) .9426

Ex 1: ICC for the Banana Data

• You can get an estimate of the marginal correlation matrix by adding the vcorr option:

random banana / v vcorr;Estimated V Correlation Matrix for Subject 1

row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Col14 Col15 1 1.000 0.942 0.942 2 0.942 1.000 0.942 3 0.942 0.942 1.000 4 1.000 0.942 0.942 5 0.942 1.000 0.942 6 0.942 0.942 1.000 7 1.000 0.942 0.942 8 0.942 1.000 0.942 9 0.942 0.942 1.00010 1.000 0.942 0.94211 0.942 1.000 0.94212 0.942 0.942 1.00013 1.000 0.942 0.94214 0.942 1.000 0.94215 0.942 0.942 1.000

Ex 1: Using Subject in the Random Statement

• The subject option tells SAS the V matrix is block diagonal, allowing computationally efficient methods of estimation.

random intercept / subject=banana;

• SAS will now say that there are 5 subjects, instead of one.

• With the subject option, if you ask SAS to print out the G, V, or Vcorr matrices, you will only get one block of the matrix, for a single subject

Ex 1: Using Subject in the Random Statement (Cont)

• Without the subject option, the V matrix will still be block diagonal, but SAS won’t know that in advance and will use less efficient methods of computation

• If you have a large sample size, using the subject option may be crucial for efficiency.

Ex 1: Summary• Random statement allows us to fit a model with

correlated observations for the same banana– Including the subject option is more efficient

• We get estimates of the between-banana variation, the within-banana variation and the intraclass correlation

• SAS will print estimates of the marginal variance-covariance matrices and the marginal correlation matrices for the model

• We estimate the fixed effects of treatment after adjusting for the random effects of banana

Model-Building Strategies• Top-Down:

– Start with a well-defined mean structure– Select a structure for the random effects– Select a covariance structure for residuals– Reduce the fixed effects in the model, removing non-

significant effects

• Step-Up:– Often used when fitting HLM models– Starts with “unconditional” or means-only model– Add fixed effects and random effects to the different

levels of the model

Estimation in LMMs

• Use either ML (Maximum Likelihood) or REML (Residual or Restricted Max Likelihood) to estimate covariance parameters in V

• Then use Generalized Least Squares (GLS) to estimate

REML Estimation• REML is a way of estimating covariance

parameters

• Produces unbiased estimates of covariance parameters

• Takes into account loss of df resulting from estimating fixed effects

• Used when carrying out hypothesis tests about covariance parameters

• Less important to use REML when sample size is large

ML Estimation• Maximize a profile log likelihood function lML()

• Nonlinear optimization• Constraints applied to the covariance parameters in

D and R• Iterates to a solution• ML estimation used for hypothesis tests (LRT) about

fixed effects parameters,

-1( ) 0.5 ln(2 ) 0.5 ln(det( )) 0.5 'ML i i i ii i

l n V r V r

Hypothesis tests

• We may want to test hypotheses about the fixed effects in – Test whether a particular fixed effect is zero, – Compare the fixed effects of two (or more) treatments

or groups– Get an overall test of the fixed effects for a categorical

predictor• We may want to test hypotheses about the

covariance parameters in – Is the variance of a particular random effect zero?– Should we include covariances between random

effects, or specify them to be zero?

F-tests• Often approximate in LMMs, except for balanced

designs or partially balanced designs• We do not base F-tests on sums of squares as in

traditional ANOVA models• Denominator df for F-tests may be estimated in a

number of waysH0: Lβ = 0 HA: Lβ ≠ 0

1 1 1ˆ ˆˆ( ) ( ( ' ) ) ( )

i i iiF

L L X V X L L

Denominator df in F-tests • df method can be specified as an option in Model

statement (ddfm= )• Default method for model with random statement is

contain: ddfm=contain (SAS uses syntax rules to figure out correct error term for fixed effects)

• Satterthwaite has good small sample properties: ddfm=sat

• Kenward-Roger tries to correct for the fact that we do not know (and so must estimate) variance of random effects: ddfm=kr

• Between-within divides df into between and within components: ddfm=bw (default for repeated measures)

• With a large number of subjects, the method used doesn’t make much difference

t-tests

• t-tests are also usually approximate in a LMM

• Degrees of freedom (df) for the t-test may also be approximated as for F-test

ˆ( )t

H0: β = 0

HA: β ≠ 0

Likelihood Ratio Tests (LRT)• Likelihood Ratio Tests compare the likelihood for a

nested (reduced) model to that for a reference (full) model.

• One or more parameters the in the nested model are constrained (i.e., certain parameters may be set to zero)

• The df for the test are derived by subtracting the number of parameters in the nested model from the number in the reference model

22 log( ) 2 log( ) ( 2 log( )) ~nested

nested reference df

reference

Likelihood Ratio Tests for Fixed Effects

• Use Maximum Likelihood (ML) as the estimation method– Fit the reference model– Fit the nested model– Subtract -2 log likelihood of reference model from that

of nested model– Calculate the p-value of the test, using appropriate df

• This can be done using SAS code, illustrated in Lab example 2

Likelihood Ratio Tests for Covariance Parameters

• Use Restricted Maximum Likelihood (REML) estimation to get unbiased estimates of covariance parameters

• Fit the reference model• Fit the nested model• Calculate the p-value using the appropriate 2

distribution or mixture of 2 distributions, and appropriate df

• This can be done using SAS code, illustrated later

Residual Diagnostics• Assess model residuals for

– Normality– Constant variance– Outliers

• Use histograms, normal q-q plots, residual vs. predicted plots, and other diagnostic plots

• Two basic kinds of residuals– Conditional residuals / Studentized conditional

residuals (conditional on random effects)-better for model diagnostics

– Unconditional residuals (not conditional on random effects)-not as good for model diagnostics

– We will use conditional / studentized conditional residuals for this workshop

Conditional Residuals• Difference between the observed value and the

conditional predicted value (conditional on random effects)

• May not be well as suited for verifying model assumptions and detecting outliers as studentized conditional residuals

• Variances may be different for different subgroups

ˆ ˆˆi i ii i

y X Z u

Studentized Conditional Residuals

• Scaled residuals, where each conditional residual is divided by its estimated standard deviation

• Because residuals are scaled, different residuals for subgroups with unequal variances will have similar scales

• Two types of studentization:– Internal studentization

• the observation itself is included in calculation of its standard deviation (studentized residuals)

– External studentization• the observation itself is excluded when calculating its

standard deviation (studentized deleted residuals)• Well suited for verifying model assumptions and detecting

outliers

Influence Diagnostics

• Can identify observations that are influential in estimation of β (fixed effects) or (covariance parameters)

• Examine effect of omission of each observation (or cluster) on analysis of the entire data set

• Proc Mixed includes many ways to study influence diagnostics for LMMs

• Active area of research

Hierarchical Data Structure

• A nice way to visualize data sets appropriate for an LMM analysis, is to think about them in a hierarchical sense.

• This way of thinking about the data is largely due to the HLM program of Bryck and Raudenbush.

• Each level of the data represents a different degree of summarization.

Clustered Data Hierarchy

• Dependent variable measured once for each unit of analysis– Units of analysis are nested within clusters of units

• Observations for units in same cluster may be correlated– Students in classrooms – Rat pups in litters – Patients in clinics – Students in classrooms and classrooms within

schools

Rat pup 1

Litter 2Litter 1

Rat pup 1

Rat pup 2

Rat pup 3 Rat pup 2

Two-level Clustered Data Structure (Rat Pup Data)

Level 2

(Litters)

Level 1

(Rat pups)

• Dependent variable is measured once for each rat pup

Clustered Data Setup

• Data in long format– One row for each unit within a cluster

• Unit-specific information– Response variable– Unit-specific covariates to be included in the model

• Cluster-specific information– Cluster ID– Cluster-specific covariates to be included in the model– These values are repeated for each row for a cluster

• All rows with complete data will be used in fitting the model

Lab Example 2

Two-Level Clustered DataRat Pup Data

Rat pup data Setup for SAS

Ex 2: Summary

• We can use various tests (F-tests, t-tests, and likelihood ratio tests) for the LMM that we fit.

• The appropriate test depends on the hypothesis that we are testing

• Model diagnostics in Proc Mixed are very extensive and provide helpful information about influential cases/clusters

School 1

Classroom 2Classroom 1

Student 1 Student 2 Student 3 Student 1 Student 2

Three-Level Clustered Data Structure

Level 3

(Clusters of Clusters)

Level 2

(Clusters of Units)

Level 1

(Units of Analysis)

• Dependent variable is measured once for each student

Levels of Data:Level 1

• The most detailed level of the data– Response is always measured at Level 1

• For Clustered data set:– Level 1 Represents the units of analysis (e.g.,

students in classroom)– Unit-specific covariates (e.g., SES, minority

status, sex of each child) are measured at Level 1

• The next level of the hierarchy in the data• For a Clustered data set:

– Level 2 represents clusters of units (e.g., classrooms)

– Includes cluster-specific covariates (e.g., classroom size, teacher experience)

• The third level of the hierarchy in the data, if it exists

• For a Clustered data set:– Level 3 represents clusters of level 2 units / clusters

of clusters (e.g., schools, which are clusters of classrooms)

– Level 3 specific covariates (e.g., neighborhood characteristics of school, such as household poverty in school neighborhood)

• Other examples of three-level clustered data?

Predicting Random EffectsBLUPS

• Even though classrooms and schools are a random sample from some larger population, we may still want to estimate the classroom or school-specific means.

• To identify classrooms with poor math scores that might be candidates for an intervention. Identify, post-hoc, attributes of poorly performing schools.

• Classroom is an element of the Z vector. We can interpret ui as the “effect” of classroom i (random classroom effect, or random intercept per classroom).

fixed random

i i i i i Y X Z u

Predicting Random EffectsBLUPS (Cont)

• If V is known, the estimates of u are Best Linear Unbiased Predictors (BLUPs).

• When V is unknown, the estimates of u are referred to as Empirical Best Linear Unbiased Predictors (EBLUPs).

• Use solution option added to the random statement to produce EBLUPs for the u’s (predictions for each ui, i.e., the effect of each classroom): random int / subject=classid solution; or random classid / solution;

Predicting Random EffectsBLUPS (Cont)

• The ui are the conditional expectations of the random effects, given the observed response values, yi

• We predict, rather than estimate, the values of the EBLUPS

• Recall that the assumed distribution of the random effects is normal, and we can check that assumption

1 ˆˆ ˆˆ ( | ) ( )i i i i i i i i

E u u Y y DZ V y X

Lab Example 3

Three-Level Clustered DataClassroom Data

Ex 3: Summary

• We can fit models for clustered data with three or more levels

• We can check the distribution of the Eblups (predicted random effects) to look for outliers (schools or classrooms that are doing particularly well or poorly)

Repeated Measures Data

• Dependent variable measured multiple times for each unit of analysis– Repeated measures factor may be time or other

observational or experimental factor– May be more than one repeated measures factor

• Examples– Regions/Treatments within rat brain– Insulin levels measured at various time points within

patient after injection of drug

• Observations made for same subject may be correlated

Repeated Measures Data Structure

Level 2

(Units of Analysis)

Level 1

(Repeated Measures)

Brain Region 2

Brain Region 3

Brain Region 1

Brain Region 2

Brain Region 3

Rat 2…

Brain Region 1

• Dependent variable is measured more than once for each rat

Data Setup:Repeated Measures / Longitudinal

Data• Data are in long format • One row for each time point for each subject• Each row contains

– Time-varying information• Dependent variable• Time-varying covariates to be included in the model

– Time-invariant information• Unit / subject ID• Time-invariant covariates to be included in the model• Values repeated for each row for a subject

• All rows with complete data will be used in fitting the model• Number of rows per subject can vary

Recall: Form of the Ri Matrix

• Variance-covariance matrix of the residuals • Many different possible structures for R

1 1 2 1

1 2 2 2

( ) ( , ) ( , )

( , ) ( ) ( , )( )

( , ) ( , ) ( )

i i i i n i

i n i i n i n i

Var cov cov

cov Var covVar

cov cov Var

Commonly Used Structures for R

Unstructured

type = UN

Variance

Components

type=VC

Compound Symmetry

type = CS

Banded

type = UN(2)

232313

232212

131221

21 1 1

232212

First-order Autoregressive

type = AR(1)

Toeplitz Toeplitz (2)

type = Toep type = Toep(2)

Heterogeneous

Compound Symmetry

type = CSH

Heterogeneous 1st-order Heterogeneous Toeplitz

Autoregressive type = Toeph

type = ARH(1)

233231

322221

312121

233231

23321312

32122211

31221121

More structures for R

Model Fit: Akaike Information Criteria

• SAS calculates the AIC based on the (ML or REML) log likelihood

• The penalty is 2p, where in SAS, p represents the total number of parameters being estimated for both fixed and random effects.

• Can be used to compare any two models fit for the same observations, they need not be nested.

• Smaller is better.• Often used to help chose an appropriate structure for the

R matrix

ˆ ˆ2 ( , ) 2AIC l p

Model Fit: Bayes Information Criterion• BIC applies a greater penalty for models with

more parameters than does AIC.

• The penalty is number of parameters, p, times ln(n), where n is the total number of observations in the data set.

• Can be used to compare two models for the same observations, need not be nested models.

• Smaller is better.• Often used, in conjunction with AIC to help

chose an appropriate structure for R matrix.

ˆ ˆ2 ( , ) ln( )BIC l p n

Marginal Model vs. LMM• LMM uses random effects explicitly to model

between-subject variance– Subject-specific model– Includes D matrix and R matrix

• Implied marginal model (discussed earlier)– Marginal model that results from fitting a LMM– The marginal variance-covariance matrix is called V– V is derived from D and R

• Marginal model does not use random effects in its specification – Population-averaged model– Uses only the R matrix, no random effects, so no D.

A Marginal Model With no random effects

• We do not include random effects in this model.• Therefore, D is zero• Covariances, and hence correlations, among residuals

are specified directly through the Ri matrix• Vi (the marginal variance-covariance matrix for Yi) = Ri

~ ( , )

LSMEANS

• Lsmeans (least squares means) give estimates of the mean of Y for each level of fixed predictors (that are in the class statement), after adjusting for all other fixed covariates in the model– Assumes all groups based on categorical predictors

are balanced– Assumes continuous covariates are fixed at their

mean• Post-hoc tests can be carried out on lsmeans, using

different adjustments for multiple comparisons, e.g., Bonferroni, Tukey, Dunnett, Scheffe, etc.

• Slices can be used to get simple effects for interactions

Lab Example 4

Repeated Measures DesignThe Rat Brain Data

Ex 4: Summary

• There are many possible ways to fit a model for repeated measures data

• A LMM with a single random intercept is equivalent to a marginal model with a compound symmetric variance-covariance structure, but only if the between-subject variance is > 0.

• Post-hoc comparisons can be easily carried out using lsmeans statements

• Many different post-hoc methods are available in SAS

Types of Data:Longitudinal Data

• Dependent variable measured multiple times for each unit of analysis– Repeated measures factor is time– May be over an extended period of time (e.g. years)

• Examples– Autistic children measured at different ages

• Observations made on same child may be correlated

Longitudinal Data Structure

Level 2

(Subjects: Units of Analysis)

Level 1

(Repeated Measures)

Child ID 2Child ID 1

Age 2 years Age 3 years Age 9 yearsAge 2 years Age 5 years

• Dependent variable is measured more than once for each child

• Number of measurements does not need to be equal for all subjects

• Spacing of intervals not required to be equal for all measurement times

• Measurement times do not have to be the same for all subjects

Missing Data• Assume data are Missing at Random (MAR)

– Probability of having missing data on a given variable may depend on other observed information

– Does not depend on the data that would have been observed, but is missing

• Include in model other covariates that are predictive of missingness

Lab Example 5

Random Coefficients Models for Longitudinal Data The Autism Data

Ex 5: Summary

• Random coefficient models can be used to model both the trajectory of change and the variance of the trajectories

• Variance of random intercepts may not be estimable in all situations

• If a problem comes up, it is best to investigate it thoroughly

References I

• Pinheiro, J. C. and Bates, D. M., Mixed-Effects Models in S and S-PLUS, Springer-Verlag, Berlin, 2000.

• Laird, N.M. and Ware, J.H., Random-effects models for longitudinal data. Biometrics, 38, 963, 1982.

• Oti, R., Anderson, D., and Lord, C. (submitted) Social trajectories among individuals with autism spectrum disorders, Journal of Developmental Psychopathology.

References II

• West, Brady T., Welch, Kathleen B., Galecki, Andrzej T., Linear Mixed Models: A Practical Guide Using Statistical Software, Chapman & Hall/CRC, 2006.

• Verbeke, G. and Molenberghs, G., Linear Mixed Models for Longitudinal Data, Springer, 2000.

• Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., Schabenberger, O., SAS for Mixed Models, 2nd Edition, Cary, NC: SAS Institute Inc.

References III

• Little, R.J.A., and Rubin, D.B., Statistical Analysis with Missing Data: 2nd Edition, Wiley, 2002.

©cscar, 2010: proc mixed1 introduction to sas ® proc mixed cscar workshop may 19 & 21, 2010...

linear mixed model lmm

mixed syntax

parametric linear model

lmm assumptions cscar

ordinary linear regression

lmm analysislearn

distributed response

ith subject

Documents

syllabus - umich.edu

white pixel artifact - umich.edu

marketing management answers - umich.edu

real-time financials financial unit liaisons 4/18/2007 thom...

ncaa division i officials - umich.edu

great using proc sgplot, proc sgscatter, and ods for sas ......

recombinant dna technology - umich.edu

615.19 -- simulated annealing - umich.edu

the kids are all right by diana welch and liz welch with...

kara steeland kstee@umich.edu adena kass adenaka@umich.edu...

authority control what you need to know to do it yourself...

it bootcamp – copyright basics melissa levine copyright...

regular expressions - umich.edu

developing a dashboard for sakai cle jim eng...

pops in paris - umich.edu

health politics bill weissert 936-1311 m3141 sph ii...

hourly i (november 4) - umich.edu

sled test evaluation - umich.edu

excel tutorial - umich.edu

climate change in the south pacific jane horwitz...