realcom multilevel models for realistically complex data measurement errors multilevel structural...

31
REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and of different types Methodology and examples for: An ESRC research project at Bristol University

Upload: jesus-fraser

Post on 28-Mar-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

REALCOM

Multilevel models for realistically complex data

Measurement errors

Multilevel Structural equations

Multivariate responses at several levels and of different types

Methodology and examples for:

An ESRC research project at Bristol University

Page 2: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and
Page 3: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

General Format

• MATLAB software– Free standing executable programs– ASCII and worksheet input and output– Graphical menu based input specification– Model equation display– Monitoring of MCMC chains

• A training manual containing:– Outline of methodology– Worked through examples

Page 4: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Markov Chain Monte Carlo – a quick

introduction

• Bayesian simulation based method that, given starting values samples a new set of parameters at each cycle of a ‘Markov chain’

• This yields a final chain (after discarding a burn-in set) of, say, 5000 sets of values from the (joint) posterior distribution of the parameters

• This is formed by combining the likelihood based on the data and a prior distribution – typically diffuse.

• These chains are used for inference – e.g. the mean for a parameter is analogous to the point estimate from a likelihood analysis, intervals etc.

Page 5: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

2 2

0 1 , ~ (0, ), ~ (0, )ij ij j ij j u ij ey x e Nu eNu

The parameters in this model are the fixed coefficients, the two variances and the level 2 residuals.

Consider the simple 2-level model:

From suitable starting values eventually the chain ‘settles down’ so that sampling is from the true posterior distribution and we need to sample sufficient to provide stable estimates – using suitable ‘convergence’ criteria.

All the MATLAB routines use MCMC sampling.

Page 6: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Measurement errors

1. Continuous variables: a simple example:• Basic model is:

• With a model of interest e.g.

0x x m

0 1

2 2~ (0, ), ~ (0, )

ij ij j ij

j u ij e

y u e

u e

x

N N

Page 7: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Some assumptions we need to make

• Variance assumed known – or alternatively• Reliability:• We also need a distribution for true value:

• An important issue is value for and sensitivity analysis useful – we can also give it a prior.

2m

2~ (0, )mm N

0 0

0 2 2 2 2 2( ) / , x x mx xR R x

2~ ( , )x xx N

2m

Page 8: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

2. Missclassification errors

• Assume a binary (0,1) variable, for example whether or not a school pupil is eligible for free school meals

(yes=1) • Probability of observing a zero (no eligibility), given that

the true value is zero, is and the probability of observing a one given that the true value is zero by - likewise we have and

• We now assume we know these missclassification probabilities – similar target model as before with a binary predictor.

(0 | 0)obsP

(1| 0)obsP (0 |1)obsP (1|1)obsP

Page 9: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Modelling considerations• We can model multivariate continuous

measurement errors, but only independent binary missclassifications.

• We can allow different measurement error variances and covariances for different groups – e.g. gender.

• In multivariate case we typically need non-zero correlations between measurement errors:

•Thus, say, if R=0.7 observed correlation = 0.8 then we require measurement error correlation >0.33

1 1o o

m

R R

R R

Page 10: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

An educational example

• Maths test score related to prior test scores and FSM eligibility.

• We will look at continuous, correlated and binary measurement errors.

Open measurement-error.exe and read file ‘classsize’

Page 11: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Summary table for analyses:

Page 12: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Factor analysis and structural equation models

i is the loading of the r th response on the single factor r

Consider a single level factor model where we have several responses on each member of a sample:

Where r indexes the response variable and i the person.

This is a special kind of multivariate model where we assume the residuals are independent and the covariance between two responses is thus given by

0

2 2~ (0, ), ~ (0, )

ri r r i ri

i ri er

y e

N e N

1 2

2r r

A constraint is needed for identifiability and the default is to choose 12

Page 13: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Extensions- further factors

We can add explanatory variables in addition to the

(see later) or we can add further factors:

0r

0 1 1 2 2

21 1

22 12 2

2

~

~ (0, )

ri r r i r i ri

i

i

ri er

y e

N

e N

As number of factors increases, we require further constraints, typically on loading values. A popular choice is ‘simple structure’ with each response loading on only 1 factor and non-zero correlations between factors.

Page 14: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Extensions – structural variables

We can allow the factors themselves to depend on further variables e.g.

2

0

*1 1

* * 2~ (0, ), ~ (0, )

ri r r i ri

i i i

i ri er

y e

x

N e N

Or alternatively, but less commonly

0 1 1

2 2~ (0, ), ~ (0, )

ri r r i r i ri

i ri er

y x e

N e N

Page 15: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Two level factor models

(1) (1) (2) (2)

0

(1) 2 (2) 2 2 2(1) (2)~ (0, ), ~ (0, ), ~ (0, ), ~ (0, )

rij r r ij r j rj rij

ij j rij er rj ur

y u e

N N e N u N

*

* 2 2

(1) (1)0

(1) (1) *

(1) * 2 * *(1)~ (0, ), ~ (0, ), ~ (0, )

rij r r ij rij

ij ij j

ij rij er j u

y e

u

N e N u N

Standard formulation

Alternatively

But we shall not consider this case

Page 16: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Example – PISA data

A survey of reading performance, of 15 year olds in 32 countries by OECD in 2000.

We use one subscale of 35 items ‘retrieving information’

and look at France and England.

First we shall fit one and two level models assuming responses are Normal – in fact they are binary and ordered but we come to that later.

Open structural-equation.exe load pisadata

Page 17: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Binary and ordered responses

Assume a binary response z. We will use the idea of a latent Normal distribution. Consider the (factor) model for a

single response:

Where we observe a positive (=1) response for our binary variable z if y is positive, that is

So that we obtain the probit model

0

0

( )

0

( )

Pr ob( 1) Pr ob( ( )) ( ) ( )r r i

r r i

ri r r iz e t dt t dt

0

2~ (0, ), ~ (0,1)

ri r r i ri

i ri

y e

N e N

0

0

0 or

( )ri r r i ri

ri r r i

y e

e

Page 18: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Ordered data

Consider the cumulative probability of being in one of the lowest s+1 categories of a p category variable - categories numbered from 0 upwards: s=0,…p-2

We extend the binary response model as:

Where the define a set of ‘thresholds’ for the categories.

So suppose we have a 3-category variable, then for observed responses

0

ss fri ri

f

γ

0

0~ (0,1), 0ri r sr r i ri

ri r

y e

e N

0r sr

0

0 0 1

0 1

0 if

1 if

2 if

ri r r i

ri r r i ri r r r i

ri r r r i

y

z y

y

Page 19: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

PISA data with binary/ordered responses

• In fact all the responses are binary except for 4 with 3 ordered categories: C9, C14, C20, and C26

• Change these responses and rerun models.

•Finally fit explanatory variables Country and Gender in structural part of model.

Page 20: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Multivariate models with responses at 2 levels

• Consider first 2 Normal responses:

Superscript indicates level

• Models are linked via level 2 covariance matrix• MCMC algorithm handles missing response data and

categorical (binary, ordered and unordered) as well as Normal data.

• First example is a repeated measures growth curve model

(1) (1) (1) (1)1

(2) (2) (2)2

(1) (1) (2)1 2~ (0, ), ( , ) , ~ (0, )

ij ij j ij

j j j

Tij j j j j

y X u e

y X u

e MVN u u u u MVN

Page 21: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Child heights + adult height

2

2

2

(2) (2)0 0

(1) 2 3 (1) (1)0 1 2 3 0 1

(1)(1)00

(1) (1,1) (1) 21 2 2 01 1(2) (1,2) (1,2) (2)0 00 10 0

~ (0, ), , ~ (0, )

j j

ij ij ij ij j j ij ij

uj

j u u ij e

j u u u

y u

y t t t u u t e

u

u MVN e N

u

Child height as a cubic polynomial with intercept + slope random at level 2

Page 22: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Load growthdata.txt and fit the model

Results:

Two level growth model. Coefficient Estimate S.E. Level 1 model Intercept 153.05 0.69 Age (about age 13.0) 7.07 0.16 Age-squared 0.294 0.054 Age-cubed -0.208 0.029 Level 2 model Intercept 174.70 0.80 Level 2 covariance matrix

55.77 1.29 50.01

1.30 0.53 1.24

50.01 1.24 69.42

Level 1 variance 3.21

Page 23: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Adult height prediction

Suppose we have 2 growth measures: we want a regression prediction of the form

This leads to:

0 1 1 2 2j j j jy y y w (1) 2 30 1 2 3( ), 1, 2ij ij ij ij ijy y t t t i

2 2

2 2 2

1(1) (1) 2 (0,1) (1,2) (1,2)0 1 1 01 11 00 10 1

(1,2) (1,2)(1) (0,1) (1) (1) 2 (0,1)2 00 10 20 01 1 2 0 1 2 01 2

ˆ 2

u u j u j u u j

u u ju u j j u u j u j

t t t

tt t t t

Page 24: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Mixed response types and missing data

• Normal and ordered data already considered in structural equation models

• We now introduce unordered categorical responses

• We can also have general Normalising transformations

• Missing data via imputation is an important application for these models

Page 25: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Unordered categorical responses

We have where h indexes the response. For each we assume an underlying latent variable exists and that we have the following model:

For identifiability we model p-1 categories and assume .

The maximum indicant model: we observe category h for individual i iff .

so that

1 if response is in category for individual , 0 otherwisehiy h i

hiyhiv

1 1

T1 1 1 11 1

, ~ (0, )

is a correlation matrix, mutually independent vectors

is (1 ), is ( 1), is ( 1), { ,.... } , is ( 1)

hi hi h hi i

i

T Thi h i p

v X e e MVN

p p e

X s s e p ps

I

** and observe category if 0 hi hih i

v v h h p v h

* **

1 1[ ] hi hi h hi hi h h ipr X e X e h h

Assume p categories where an individual responds to just one.

Page 26: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Handling missing data

Page 27: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Multiple imputation – briefly and simply

Consider the model of interest (MOI)

We turn this into a multivariate response model

and obtain residual estimates of (from an MCMC chain) which are missing. Use these to ‘fill in’ and produce a complete data set. Do this (independently) n (e.g. = 20) times. Fit MOI to each data set and combine according to rules to get estimates and standard errors.

0 1i i iy x e

1 1

2 2

21 1

22 12 2

~ (0, ),

i i

i i

y e

x e

eN

e

1 2ˆ ˆ, i ie e

Page 28: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Class size example Load classsize_impute

MOI is Normalised exam score as response regressed on pretest score, gender, FSM, class size. 50% level 1 units have missing data. Multivariate model:

Table 6. Multivariate responses model fitted to data with 50% with missing data Variable Intercept (s.e.) Post maths 0.1336 (0.0708) Pre Maths 0.0321 (0.0713) Gender 0.0734 (0.0474) FSM -1.0898 (0.1293) Class size (-30) -4.0494 (0.5968) Level 1 covariance matrix

0.6918 0.4440 -0.0957 -0.1956

0.4440 0.7836 -0.1205 -0.1742

-0.0957 -0.1205 1.0000 -0.0119

-0.1956 -0.1742 -0.0119 1.0000

Level 2 covariance matrix 0.2147 0.1046 -0.0057 -0.0597 -0.1930

0.1046 0.2141 0.0185 -0.1404 0.0965

-0.0057 0.0185 0.0242 -0.0423 0.0151

-0.0597 -0.1404 -0.0423 0.6005 0.0109

-0.1930 0.0965 0.0151 0.0109 14.7433

Page 29: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

MI estimates vs listwise deletion

Fixed effects in multivariate model: 50% records MCAR

Estimate Listwise (SE) MI (SE): Complete (SE)

Post maths 0.102 (0.088) 0.134 (0.071): 0.134 (0.070)

Pre Maths 0.011 (0.088) 0.032 (0.071): 0.019 (0.071)

Gender 0.096 (0.074) 0.073 (0.047): 0.069 (0.047)

FSM -1.124 (0.159) -1.090 (0.129): -1.064 (0.129)

Class size (-30) -4.030 (0.602) -4.049 (0.597): -4.267 (0.544)

Page 30: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

Further extensions

• Box-Cox normalising transformations:

• Application to survival data treated as an ordered response when divided into discrete time intervals

• Combination of measurement errors, structural models and responses at >1 level into a single program

• Incorporation into MLwiN

1( 1)z y

Page 31: REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and

General remarks

• Report back welcome ([email protected])

• A REALCOM discussion group is under consideration

Use with care!