multilevel modeling in r tom dunn and thom baguley, psychology, nottingham trent university...

28
Multilevel modeling in R Tom Dunn and Thom Baguley, Psychology, Nottingham Trent University [email protected]

Upload: james-thompson

Post on 15-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

  • Multilevel modeling in R

    Tom Dunn and Thom Baguley, Psychology, Nottingham Trent [email protected]

  • *1. Models for repeated measures or clustered data

  • *Repeated measures ANOVAUsual practice is psychology is to analyze repeated measures data using ANOVA:One-way independent measures

    One-way repeated measures

  • *Limitations of standard approachese.g., repeated measures ANOVAsphericity or multi-sample sphericity assumptionsdealing with non-orthogonal predictorse.g., time-varying covariates in RM ANOVAdealing with missing valuestreating items as fixed effects (e.g., Clark, 1973)the problem of categorizationthe problem of aggregation/disaggregation

  • *Avoid repeated measures regression!Repeated measures regression is one attempt to deal with limitations of ANOVA such as non-orthogonal predictorse.g., using manual dummy coding(Lorch & Myers, 1990; Pedhazur, 1982) data hungry (each indicator requires 1 df) assumes sphericity; fixed effects less flexible, powerful than multilevel models(Misangyi et al., 2006)

  • *Multilevel models with random interceptsA random intercept model has predictors with fixed effects only:e.g.,fixedrandom or combined in single equation:

  • *

  • *Random effects in multilevel modelsIn the random intercept model the individual differences at level 2 are (like the random error at level 1) assumed to have a normal distribution:

    Individual differences in the effect of a predictor can also be modeled this way:

  • *

  • *Example - voice pitch (1)How is male voice pitch is related to subjective attractiveness of a female face? - 30 male participants - 32 female faces - ratings of attractiveness (1-9) in 2 contexts - potential time-varying covariates (e.g., baseline measure) Classical ANOVA:i) treats participants (but not faces) as random sample can't incorporate time-varying covariates aggregates data (effective n = 30 per context) Data from Dunn, Wells and Baguley (in prep) and in Baguley (2012)

  • *Example - voice pitch (1) contd.library(nlme)pitch.ri
  • *Modeling covariance matrices 1In a two level random-intercept model the covariance structure is:This, in effect, assumes a form of compound symmetry of the repeated measures (with equal variances and covariances all zero)

  • *Modeling covariance matrices 2In a two level random-slope model with one predictor random at level 2 the covariance matrix is:In a repeated measures design this models the individual differences in the effect of the predictor (as well as its covariance with the intercept).

  • *Unstructured covariance matricesIn repeated measures it is possibly to have an unconstrained covariance matrix at the participant level (usually level 2). This is an example for four measurement occasions:In this kind of unstructured matrix there are no assumptions about the form of the matrix (e.g., sphericity, compound symmetry or multisample sphericity)

  • *Example - random effect of voice pitch (2)Attractiveness might not have a fixed effect (in fact it is more likely that it varies between people)

    lme(pitch ~ base + attract, random=~ attract|Participant, data=pitch)AIC BIC logLik 14133.03 14160.82 -7061.513Random effects: Formula: ~1 | Participant (Intercept) ResidualStdDev: 13.05003 9.214851Fixed effects: pitch ~ base + attract Value Std.Error DF t-value p-value(Intercept) 89.58640 5.537340 1888 16.178598 0base 0.20915 0.044474 1888 4.702804 0attract 0.46546 0.104619 1888 4.449042 0

  • *

  • *

  • *2. Estimation and inference

  • *Estimation in multilevel modelsEstimation is iterative and usually uses maximum likelihood (as with logistic regression):- IGLS or FML (iterative generalized least squares)- RIGLS or RML (restricted maximum likelihood estimation)- Parametric bootstrapping- Non-parametric bootstrapping- MCMC (Markov chain Monte Carlo methods)

  • *Comparing modelsConfidence intervals and tests- deviance (likelihood ratio) tests(-2LL or change in -2 log likelihood has approximate 2 distribution)Wald tests and CIs(estimate/SE has approximate z distribution)Information criteriaAIC, BIC or (MCMC derived) DIC (-2LL with a penalty for number of parameters)

  • *Accurate inference - for standard repeated measures ANOVA models it is possible to use t and F statisticsif a complex covariance structure (anything other than compound symmetry) or unbalanced model is used then inference is problematic owing to:a) difficulty estimating the error dfb) boundary effects (for variances)

  • *Possible solutions - asymptotic approximations (in large samples)*corrections such as the Kenwood-Rogers approximation (e.g., using pbkrtest)bootstrapping*MCMC estimation (e.g., using lme4 or MCMCglmm)** e.g., see Baguley (2012) for examples

  • *Requirements for accurate estimation Centering predictorsessential to use appropriate centering strategy in random slope modelssee Enders & Tofighi (2007)Nested versus fully-crossed structuresmany experimental designs in psychology are fully crossedsee Baayen et al. (2008)Estimation, sample size and biassample size at highest level of model is crucialsee Hox (2002), Maas & Hox (2005)

  • *Nested versus fully crossed structuresIn nested structures lower level units occur in only one higher level unit e.g., children in schoolsIn fully crossed structures lower level units are observed within all higher level units e.g., same 32 faces used for all 30 participantsBaayen et al. (2006) argue that many researchers incorrectly model fully crossed structures as nested

  • *

  • *Example - fully crossed model (3)detach(package:nlme) ; library(lme4)lmer(pitch ~ base + attract + (1|Participant) + (1|Face), data=pitch) Formula: pitch ~ base + attract + (1 | Participant) + (1 | Face) AIC BIC logLik deviance REMLdev 14134 14167 -7061 14118 14122Random effects: Groups Name Variance Std.Dev. Face (Intercept) 0.44417 0.66646 Participant (Intercept) 171.72946 13.10456 Residual 84.47292 9.19092Number of obs: 1920, groups: Face, 32; Participant, 30Fixed effects: Estimate Std. Error t value(Intercept) 90.03032 5.54083 16.249base 0.20543 0.04447 4.620attract 0.45910 0.11229 4.088

    lmer(pitch ~ base + attract + (attract|Participant) + (1|Face), data=pitch)

    Assumes that the fully crossed structure is correctly coded in the data set

  • *Advantages of multilevel approachesoften greater statistical power (e.g., for RM ANOVA)multiple random factors (nested or crossed)copes with non-orthogonal predictorscopes with time-varying covariatesassumes missing outcomes are MAR (not MCAR)explicitly models variances and covariances

  • ************************