introduction to mixed model and missing data issues in...
TRANSCRIPT
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Introduction to mixed model and missing dataissues in longitudinal studies
Hélène Jacqmin-Gadda
INSERM, U897, Bordeaux, France
Inserm workshop, St Raphael
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Outline of the talk I
Introduction
Mixed models
Typology of missing data
Exploring incomplete data
Methods MAR data
Conclusion
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Longitudinal data : definition
Definition :Variables measured at several times on the same subjects
Examples :
• repeated measures of biological markers (CD4, HIV RNA)in HIV patients
• repeated measures of neuropsychological tests to studycognitive aging
• Repeated events : dental caries, absences from school orjob, ...
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Longitudinal data analysis
Objective :
• Describe change of the variable with time
• Identify factors associated with change
Problem : Intra-subject correlation
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Example : HIV clinical trial
Xi=1 if treatment A,
Xi=0 if treatment B
Criterion : Change over time of CD4
Repeated measures of CD4 over the follow-up period.
t = 0 at initiation of treatment.
Yij = CD4 measure for subject i at time tij, i = 1, ...,N,j = 1, ..., ni.
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Analysis assuming independence
Yij = β0 + β1tij + β2Xi + β3Xitij + ǫij
with ǫij ∼ N (O, σ2) and ǫij ⊥ ǫij′
Intra-subject correlation
→ V̂ar(β̂) biased
→ Tests for β biased
For time-independent covariate :
• var(β̂2) under-estimated
• Tests for H0 : β2 = 0 anti-conservative (p value too small)
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Linear mixed model with random intercept
Yij = (β0 + γ0i) + β1tij + β2Xi + β3Xitij + ǫij
with γ0i ∼ N (O, σ20), and ǫij ∼ N (O, σ2) and ǫij ⊥ ǫij′
• γ0i are random variables
• Only one additional parameter : σ20
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Linear mixed model with random intercept (2)
• Population (marginal) mean :
E(Yij) = β0 + β1tij + β2Xi + β3Xitij
• Subject-specific (conditional) mean :
E(Yij|γ0i) = (β0 + γ0i) + β1tij + β2Xi + β3Xitij
• Assume common correlation between all the repeatedmeasures
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Linear mixed model with random intercept and slope
Yij = (β0 + γ0i) + (β1 + γ1i)tij + β2Xi + β3Xitij + ǫij,
γ0i ∼ N (O, σ20), γ1i ∼ N (O, σ2
1), ǫij ∼ N (O, σ2), ǫij ⊥ ǫij′
• Population (marginal) mean :
E(Yij) = β0 + β1tij + β2Xi + β3Xitij
• Subject-specific (conditional) mean :
E(Yij|γi) = (β0 + γ0i) + (β1 + γ1i)tij + β2Xi + β3Xitij
• The correlation between repeated measures depend onmeasurement times
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Linear mixed model : general formulation
Yij = XTijβ + ZT
ijγi + ǫij
γi ∼ N (0,B) and ǫi ∼ N (0,Ri).
Xij : vector of explanatory variablesβ : vector of fixed effectsZij : sub-vector of Xij (including functions of time)γi : vector of random effects.
Population (marginal) mean : E(Yij) = XTijβ
Subject-specific (conditional) mean : E(Yij|γi) = XTijβ + ZT
ijγi
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Linear mixed model : example
Linear mixed model with AR Gaussian error
Yij = (β0 + γ0i) + (β1 + γ1i)tij + β2Xi + β3Xitij + wij + eij
with γti = (γ0i, γ1i) ∼ N (0,B),
eij ∼ N (O, σ2) , eij ⊥ eij′ ,
wij ∼ N (O, σ2w) and Corr(wij,wij′) = exp(−δ|tij − tij′ |)
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Linear mixed model : Estimation
• Maximum likelihood estimator
• Yi = (Yi1, ...,Yij, ...,Yini)T multivariate Gaussian with
• mean Xiβ• and covariance matrix Vi = ZiBZT
i + Ri
• Softwares : SAS Proc mixed, R lme, stata
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Generalized linear mixed model
Yij ∼ exponential family of distribution and
g(E(Yij|γi)) = XTijβ + ZT
ijγi with γi ∼ N (O,B).
• Example : Logistic mixed model
logit(Pr(Yij = 1|γi)) = XTijβ + ZT
ijγi with γi ∼ N (0,B).
• Maximum likelihood estimation : Numerical integration
• Softwares : SAS Proc nlmixed, R nlme, stata
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Typology of missing data in longitudinal studies
Notation :
Yi = (Yobs,i,Ymis,i)with Yobs,i the observed part of Yi and Ymis,i the missing part,
Rij = 1 if Yij is observed and Rij = 0 if Yij is missingRi = (Ri1, ...,Rij, ...,Rini)
′
Xi explanatory variables completely observed
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Typology of missing data (2)
Monotone missing data = dropout : P(Rij = 0|Rij−1 = 0) = 1Ri may be summarized by the time to dropout Ti
and an indicator for dropout δi
Intermittent missing data : P(Rij = 0|Rij−1 = 0) < 1
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Typology of missing data (3)
Missing Completely at random (MCAR) :P(Rij = 1) is constantThe observed sample is representative of the whole sample.
→ Loss of precision, no bias
Covariate-dependent missingness process :P(Rij = 1) = f (Xi)
→ Loss of precision, no bias if analyses are adjusted on Xi
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Typology of missing data (4)
Missing at random (MAR) : P(Rij = 1) = f (Yobs,i,Xi)
Example : Probability of dropout depends on past observedvalues→ Loss of precision, no bias with appropriate statistical methods
Informatives or MNAR : P(Rij = 1) = f (Ymis,i,Yobs,i,Xi)
Example : Probability that Y be observed depends on current Yvalue
→ Loss of precision, biases→ Sensitivity analyses
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Exploring incomplete data
• Describe missing data frequency
• Cross classify missing data patterns with covariates
• Compare mean evolution for available data and completecases
• Compare mean evolution until time t given observationstatus at time t + 1
• Logistic regression for P(Rij = 1) given covariates andYik, k < j
• Cox regression for time to dropout given covariates
→ Impossible to distinguish MAR from MNAR
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
An example : Paquid data set
The Paquid Cohort in Gironde
• 2792 subjects of 65 years and older at baseline
• Living at home at the beginning of the study (1988) inGironde (France)
• Seen at home at 1, 3, 5, 8, and 10 years after the baselinevisit
• Cognitive measure : Digit Symbol Substitution Test ofWechsler (attention, limited time to 90s)
Sample :
• 2026 subjects
• without diagnosis of dementia between T0 and T10
• with the test completed at least once (at T0)
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Description of dropout : Kaplan-Meyer
Dropout time (=event) : first visit with missing score
Probability to be in the cohort
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1 5 8 10
Pro
babi
lity
Follow-up time
95% confidence intervalKaplan-Meyer estimate
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Observed means of the DSST score given time
10
15
20
25
30
35
40
65-69 years 70-74 75-79 80 and +
Sco
re
Age
Available data
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Observed means of the DSST score given time
10
15
20
25
30
35
40
65-69 years 70-74 75-79 80 and +
Sco
re
Age
Complete dataAvailable data
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Logistic regression model for dropout in the first 5years
Covariates OR 95% CI of the ORT3 0.02 0.003 - 0.10T5 0.01 0.001 - 0.09age 1.01 0.99 - 1.02
age × T3 1.05 1.03 - 1.08age × T5 1.06 1.03 - 1.09
previous MMSE score 0.91 0.88 - 0.93men 0.86 0.75 - 0.99
Education (vs university level)No education 1.88 1.15 - 3.07no diploma 2.02 1.39 - 2.93
CEP 1.67 1.17 - 2.40high school level 1.39 0.96 - 2.00
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Methods for MCAR or MAR data
• Complete case analysis (loss of precision, require MCAR)
• Imputation (require MCAR or MAR)
• Maximum likelihood using available data (require MAR)
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Maximum likelihood for MAR data (1)
Objective : Estimate θ from the distribution f (Y|θ)Likelihood of the observed data : Yobs,R
f (Yobs,R|θ, ψ) =
∫
f (Yobs,Ymis|θ)f (R|Yobs,Ymis, ψ)dYmis
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Maximum likelihood for MAR data (2)
If the data are MAR :
f (Yobs,R|θ, ψ) =
∫
f (Yobs,Ymis|θ)f (R|Yobs, ψ)dYmis
= f (R|Yobs, ψ)
∫
f (Yobs,Ymis|θ)dYmis
= f (R|Yobs, ψ)f (Yobs|θ)
Log-likelihood :
l(θ, ψ|Yobs,R) = l(θ|Yobs) + l(ψ|R,Yobs)
If ψ and θ are distinct :→ the missing data are ignorable→ θ is estimated by maximisation of l(θ|Yobs) using onlyavailable reponses.
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Example : MAR analysis of Paquid data
Mixed effect model
Yij test score for subject i at time tij
Yij = (β0 + age′
iγ0 +α0i) + (β1 + age′
iγ1 +α1i)× tij + β3I{tij=0} + eij
withαi = (α0iα1i)
T ∼ N(0,G), eij ∼ N(
0, σ2e
)
agei vector of indicators for baseline age classes (70-74, 75-79,80 years and older , ref= 65-69)I{tij=0} indicator of the baseline visit
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Observed and predicted means of the score given time
10
15
20
25
30
35
40
65-69 years 70-74 75-79 80 and +
Sco
re
Age
Complete dataAvailable data
Mixed model (MAR)
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
Conclusion
Advantages of mixed models
• use all the available information (repeated measures)
• Flexibly handle intra-subject correlation (unbiasedinference)
• Any number and times of measurements
• Robust to missing at random data
• Available in most softwares
Limits of mixed models
• Assume homogeneous population−→ extended models included latent classes(mixture)
• As the MAR assumption is uncheckable, complete thestudy by a sensitivity analysis−→ extended models for MNAR data
Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion
References
Chavance, M. et Manfredi R. Modélisation d’observation incomplètes .Revue d’Epidémiologie et Santé Publique 2000,48,389-400.Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of LongitudinalData .2nd Edition. Oxford Statistical Science series 2002, OxfordUniversity Press.Jacqmin-Gadda H, Commenges D, Dartigues JF. Analyse de donnéeslongitudinales gaussiennes comportant des données manquantes sur lavariable à expliquer. Revue d’Epidémiologie et Santé Publique 1999,47,525-534.Little R.J.A. et Rubin D.B. Statistical Analysis with Missing Data , NewYork : John Wiley & Sons, 1987.Verbeke G and Molenberghs G Linear mixed models for longitudinal data
. Springer Series in Statistics, Springer-Verlag,2000, New-York.