gee and generalized linear mixed models tom greene

44
GEE and Generalized Linear Mixed Models Tom Greene

Upload: damon-bishop

Post on 24-Dec-2015

296 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: GEE and Generalized Linear Mixed Models Tom Greene

GEE and Generalized Linear Mixed Models

Tom Greene

Page 2: GEE and Generalized Linear Mixed Models Tom Greene

Outline

• Subject specific and population average inference in generalized linear models

• Review of classical generalized linear models with independent observations

• Generalized Estimating Equations • Contrasts of GLMMs with GEEs• GEE example

Page 3: GEE and Generalized Linear Mixed Models Tom Greene

Classes of Generalized Linear ModelsLinear Models

(Linear regression, ANOVA, ANCOVA)E(Y) = X β,

Responses Independent

Generalized Linear Models(Logistic regression, Poisson

regression, etc.)g(E(Y)) = X β

Responses Independent

Linear Mixed ModelsE(Y|b) = X β + Z b

Responses Correlated Correlation modeled in part by

“random effects”

Generalized Linear Mixed Models (GLMM)

g(E(Y|b)) = X β + Z bResponses Correlated

Correlation modeled in part by “random effects”

Generalized Estimating Equations Approach (GEE)

g(E(Y)) = X β Responses Correlated

Page 4: GEE and Generalized Linear Mixed Models Tom Greene

Classes of Generalized Linear Modelsfor Correlated Data

Linear Mixed ModelsE(Y|b) = X β + Z b

Responses Correlated Correlation modeled in part by

“random effects”

Generalized Linear Mixed Models (GLMM)

g(E(Y|b)) = X β + Z bResponses Correlated

Correlation modeled in part by “random effects”

Generalized Estimating Equations Approach (GEE)

g(E(Y)) = X β Responses Correlated

Population Average Inference Subject Specific Inference

Page 5: GEE and Generalized Linear Mixed Models Tom Greene

Classes of Generalized Linear Modelsfor Correlated Data

Generalized Linear Mixed Models (GLMM)

g(E(Y|b)) = X β + Z bResponses Correlated

Generalized Estimating Equations Approach (GEE)

g(E(Y)) = X β Responses Correlated

Population Average Inference Subject Specific Inference

• Analysis describes differences in the mean of Y across the entire population

• Analysis informative from population perspective; most relevant from perspective of Policy makers Providers desiring to optimize

outcomes across entire population

• Analysis describes differences in the mean of Y conditional on the patient’s specific random effect b

• Most relevant from an individual patient’s perspective

• Often b represent a dimension of frailty – Hence, X β tells about therelationship of Y to X among patients with the same frailty

Page 6: GEE and Generalized Linear Mixed Models Tom Greene

Extreme Example

Subject specific effects of X on Pr(Death), OR = 20 per 1 unit increase in X

Population average effect of X on Pr(Death), OR = 2.7 per 1 unit increase in X

Page 7: GEE and Generalized Linear Mixed Models Tom Greene
Page 8: GEE and Generalized Linear Mixed Models Tom Greene

Example: Toenail Data Toenail Dermatophyte Onychomycosis: Common toenail infection, difficult to treat, affecting more than 2% of population.

Design: Randomized, double-blind, parallel group, multicenter study for the comparison of two new compounds (A and B) for oral treatment. 2 x189 patients randomized, 36 centers 48 weeks of total follow up (12 months) 12 weeks of treatment (3 months) Measurements at months 0, 1, 2, 3, 6, 9, 12.

Research question: Severity relative to treatment of TDO ?

Page 9: GEE and Generalized Linear Mixed Models Tom Greene
Page 10: GEE and Generalized Linear Mixed Models Tom Greene
Page 11: GEE and Generalized Linear Mixed Models Tom Greene

• Independent responses Yi, i = 1, 2, …, N – Yi, with distribution from exponential family– f(y;θ,ø) =

• Mean model – μi = E(Yi|Xi1,Xi2,…,Xip)– g(μi) = β0 + β1Xi1 + β2Xi2+ βpXip

• Variance function – Var(Yi) = øV(μi)– V(μi) is a known function determined by the assumed

distribution of Y within the exponential family

Review of Generalized Linear Models(Independent Responses)

),()()(

exp

yc

aby

Page 12: GEE and Generalized Linear Mixed Models Tom Greene

Review of Generalized Linear Models(Independent Responses)

Page 13: GEE and Generalized Linear Mixed Models Tom Greene

Review of Generalized Linear Models(Independent Responses)

Page 14: GEE and Generalized Linear Mixed Models Tom Greene

• Independent responses Yi, i = 1, 2, …, N – Yi, with distribution from exponential family– f(y;θ,ø) =

• Mean model – μi = E(Yi|Xi1,Xi2,…,Xip)– g(μi) = β0 + β1Xi1 + β2Xi2+ βJXiJ

• Variance function – Var(Yi) = øV(μi)– vi = V(μi) is a known function determined by the assumed

distribution of Y within the exponential family

Review of Generalized Linear Models(Independent Responses)

),()()(

exp

yc

aby

The mean model is the only part we have to get right for valid large-sample inference!!!

Page 15: GEE and Generalized Linear Mixed Models Tom Greene

Extension to GEE for Longitudinal Data GEE: Generalized Estimating Equations (Liang & Zeger, 1986;Zeger & Liang, 1986)• Method is semi-parametric

– estimating equations are derived without full specification

of the joint distribution of a subject’s observations • Instead, specification of•The mean model for the marginal distributions of the yij

•The variance function of yij given µij

•The “working” correlation matrix for the vector of repeated observations from each subject1.Relies on the independence across subjects (or clusters) to estimate consistently the variance of the regression coefficients

Page 16: GEE and Generalized Linear Mixed Models Tom Greene

GEE Method Outline1. Relate the marginal response μij = E(yij) to a linear combination of the covariates g(μij) = Xt

ijβ• yij is the response for subject i at time j, j = 1,2, .., J• Xij is a p × 1 vector of covariates• β is a p × 1 vector of regression coefficients• g(·) is the link function

2. Describe the variance of yij as a function of the meanV(yij) = v(μij)ø

• ø is possibly unknown scale parameter• v(·) is a known variance function

Page 17: GEE and Generalized Linear Mixed Models Tom Greene

Link and Variance Functions• Normally-distributed response

g(μij) = μij “Identity link”v(μij) = 1V(yij) = ø

• Binary response (Bernoulli)g(μij) = log[μij/(1 − μij)] “Logit link”v(μij) = μij(1 − μij)

ø = 1• Poisson response

g(μij) = log(μij) “Log link”v(μij) = μij

ø = 1

Page 18: GEE and Generalized Linear Mixed Models Tom Greene

GEE Method Outline3. Choose the form of a n × n “working” correlation matrix Ri for each Yi

Page 19: GEE and Generalized Linear Mixed Models Tom Greene

Working Correlation Structures

Page 20: GEE and Generalized Linear Mixed Models Tom Greene

Working Correlation Structures

Page 21: GEE and Generalized Linear Mixed Models Tom Greene

Working Correlation Structures

(AR(1)

Page 22: GEE and Generalized Linear Mixed Models Tom Greene

Working Correlation Structures

Page 23: GEE and Generalized Linear Mixed Models Tom Greene

GEE Estimation• Define Ai = n × n diagonal matrix with V(μij) as the jth diagonal element• Define Ri(α) = n × n “working” correlation matrix (of the n repeated measures)

Working variance–covariance matrix for Yi equals

Vi(α) = øAi1/2 Ri(α) Ai

1/2

Page 24: GEE and Generalized Linear Mixed Models Tom Greene
Page 25: GEE and Generalized Linear Mixed Models Tom Greene
Page 26: GEE and Generalized Linear Mixed Models Tom Greene
Page 27: GEE and Generalized Linear Mixed Models Tom Greene
Page 28: GEE and Generalized Linear Mixed Models Tom Greene
Page 29: GEE and Generalized Linear Mixed Models Tom Greene

1) Target of Inference: •GEE:Population Average•GLMM: Subject Specific

Notes: Recent work on perform population average inference under GLMM models

GEE vs. GLMM

Page 30: GEE and Generalized Linear Mixed Models Tom Greene

2) Outputs: •GEE:– Coefficients relating Y to X

•GLMM:– Coefficients relating Y to X conditional on b– Estimates of subject specific random effects– Variance of subject specific random effects

GEE vs. GLMM

Page 31: GEE and Generalized Linear Mixed Models Tom Greene

3) Robustness: •GEE (with robust variance estimates):– Inference valid in large samples even if distribution of Y

and/or variance of Y are incorrectly specified •GLMM (with model-based estimates)– Valid inference generally requires correct specification of

distribution of Y and of variance of Y

Notes: 1)Recent proposals for robust variance estimates under GLMM2)Inference for Linear Mixed Models remains valid if Y is not normal for large N3)Caveat to GEE robustness: GEE can be biased if time dependent covariates are used unless an independent working correlation matrix is used

GEE vs. GLMM

Page 32: GEE and Generalized Linear Mixed Models Tom Greene

4) Efficiency (power and width of confidence intervals)•GEE:– Usually fairly efficient if variance function is

correctly specified – Between subject comparisons are nearly efficient

if an independence covariance structure is used for balanced data

•GLMM:– Maximum likelihood estimates are asymptotically

efficient as long as the model is correctly specified

GEE vs. GLMM

Page 33: GEE and Generalized Linear Mixed Models Tom Greene

5) Missing Data: •“Classical” GEE (with robust variance estimates)– Valid inference if data are Missing Completely At

Random (MCAR) even if variance model is wrong– If variance model is correct, estimate of β is still

consistent if data are MAR but not MCAR (but standard errors are not correct)

•GLMM (with model-based estimates)– Valid inference if data are Missing At Random (MAR)

Notes: 1)Various strategies for valid GEE inference if data are MAR

GEE vs. GLMM

Page 34: GEE and Generalized Linear Mixed Models Tom Greene

Missing data•Three general approaches to dealing with missing data under GEE which assume MAR but not MCAR

1. Inverse probability weighting (Robins, Rotnitzky and Zhao, JASA, 1995)

2. Multiple imputation 3. Inverse probability weighting with augmentation, or

doubly robust estimation •Each method can incorporate covariate information not included in the GEE model itself. This can make the MAR assumption much more plausible.•Methods 2 and 3 can be considerably more efficient than standard inverse probability weighting

Page 35: GEE and Generalized Linear Mixed Models Tom Greene
Page 36: GEE and Generalized Linear Mixed Models Tom Greene

6) Small to Moderate Samples: •GEE (with robust variance estimates):– Estimated standard errors are unstable and biased

downwards • Inefficient estimating equation for estimating variance• Effectively uses fully unstructured variance model

– “Sample size” means the number of independent units

– Various corrections have been proposed (available in PROC GLIMMIX)

•GLMM (with model-based estimates)– Large-sample approximations are often invoked, but

performance usually better than GEE with small to moderate N if model is correctly specified.

GEE vs. GLMM

Page 37: GEE and Generalized Linear Mixed Models Tom Greene

More Toenail Data

• Multicenter trial comparing active vs. control oral treatments for toenail infection

• Repeated measurements of binary outcome:– 0 = none or mild separation – 1 = severe separation

• 1908 observations in 294 patients, mostly over 1 year

Page 38: GEE and Generalized Linear Mixed Models Tom Greene

**** Standard GENMOD GEE program using Robust SEs *****;**** Binary outcome leads to default logistic link function ****;proc genmod descending;Class id;model outcome = treatment month treatment*month/ dist=bin;repeated subject=id/type=exch covb corrw;estimate 'Control Slope' month 1/exp;estimate 'Treartment Slope' month 1 treatment*month 1/exp;run;

Working Correlation Matrix

Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1.0000 0.4212 0.4212 0.4212 0.4212 0.4212 0.4212 Row2 0.4212 1.0000 0.4212 0.4212 0.4212 0.4212 0.4212 Row3 0.4212 0.4212 1.0000 0.4212 0.4212 0.4212 0.4212 Row4 0.4212 0.4212 0.4212 1.0000 0.4212 0.4212 0.4212 Row5 0.4212 0.4212 0.4212 0.4212 1.0000 0.4212 0.4212 Row6 0.4212 0.4212 0.4212 0.4212 0.4212 1.0000 0.4212 Row7 0.4212 0.4212 0.4212 0.4212 0.4212 0.4212 1.0000

Page 39: GEE and Generalized Linear Mixed Models Tom Greene

**** Standard GENMOD GEE program using Robust SEs;**** Binary outcome leads to default logistic link function;proc genmod descending;Class id;model outcome = treatment month treatment*month/ dist=bin;repeated subject=id/type=exch covb corrw;estimate 'Control Slope' month 1/exp;estimate 'Treatment Slope' month 1 treatment*month 1/exp;run;

Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates

Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z|

Intercept -0.5819 0.1720 -0.9191 -0.2446 -3.38 0.0007 treatment 0.0072 0.2595 -0.5013 0.5157 0.03 0.9779 month -0.1713 0.0300 -0.2301 -0.1125 -5.71 <.0001 treatment*month -0.0777 0.0541 -0.1838 0.0283 -1.44 0.1509

Page 40: GEE and Generalized Linear Mixed Models Tom Greene

**** Standard GENMOD GEE program using Robust SEs *****;**** Binary outcome leads to default logistic link function ****;proc genmod descending;Class id;model outcome = treatment month treatment*month/ dist=bin;repeated subject=id/type=exch covb corrw;estimate 'Control Slope' month 1/exp;estimate 'Treatment Slope' month 1 treatment*month 1/exp;run;

Contrast Estimate Results Mean Mean L'Beta StandardLabel Estimate Confidence Limits Estimate Error

Control Slope 0.4573 0.4427 0.4719 -0.1713 0.0300Exp(Control Slope) 0.8426 0.0253Treatment Slope 0.4381 0.4165 0.4599 -0.2490 0.0450Exp(Treatment Slope) 0.7796 0.0351

Contrast Estimate Results L'Beta Chi- Label Alpha Confidence Limits Square Pr > ChiSq

Control Slope 0.05 -0.2301 -0.1125 32.60 <.0001 Exp(Control Slope) 0.05 0.7945 0.8936 Treatment Slope 0.05 -0.3373 -0.1607 30.57 <.0001 Exp(Treatment Slope) 0.05 0.7137 0.8515

Can ignore in this case

Page 41: GEE and Generalized Linear Mixed Models Tom Greene

**** GLIMMIX GLMM Estimating Subject Specific Effects ****;**** Binary outcome leading to default logistic link function ****;proc glimmix method=RSPL data=toenail;Class id;model outcome (event="1") = treatment month treatment*month/ s dist=binary;random int / subject=id;estimate 'Control Slope' month 1/or;estimate 'Treartment Slope' month 1 treatment*month 1/or cl; run;

Solutions for Fixed Effects StandardEffect Estimate Error DF t Value Pr > |t|

Intercept -0.7204 0.2370 292 -3.04 0.0026treatment -0.02594 0.3360 1612 -0.08 0.9385month -0.2782 0.03222 1612 -8.64 <.0001treatment*month -0.09583 0.05105 1612 -1.88 0.0607

Page 42: GEE and Generalized Linear Mixed Models Tom Greene

*** Small Sample; data small; set toenail; if id <= 20;** Standard GENMOD GEE with Robust SEs: 17 Patients Only ***;** Binary outcome leading to default logistic link function **;proc genmod descending;Class id;model outcome = treatment month treatment*month/ dist=bin;repeated subject=id/type=exch covb corrw; run;

Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z|

Intercept -0.3558 0.6272 -1.5851 0.8736 -0.57 0.5706 treatment 0.0527 0.9679 -1.8444 1.9497 0.05 0.9566 month -0.1543 0.0991 -0.3485 0.0400 -1.56 0.1196 treatment*month 0.0272 0.1725 -0.3109 0.3654 0.16 0.8746

Page 43: GEE and Generalized Linear Mixed Models Tom Greene

**** GLIMMIX GEE program using Robust SEs;**** Binary outcome leads to default logistic link function;**** Restricted to 17 patients;**** Small N Adjustment of Morel, Bokossa, and Neerchal (2003); proc glimmix method=RSPL empirical=mbn data=small;Class id;model outcome (event="1") = treatment month treatment*month/ s dist=binary ddfm=kenwardroger;random _residual_ / subject=id type=cs;run;

Solutions for Fixed Effects

StandardEffect Estimate Error DF t Value Pr > |t|

Intercept -0.3605 0.7369 15 -0.49 0.6317treatment 0.05762 1.1209 15 0.05 0.9597month -0.1530 0.1197 94 -1.28 0.2043treatment*month 0.02560 0.1984 94 0.13 0.8976

Page 44: GEE and Generalized Linear Mixed Models Tom Greene

THAT’s ALL