joint modeling li l

Liang Li

Department of Quantitative Health SciencesCleveland Clinic

Joint Modeling of Longitudinal and Survival Data

Presented at ASA North Illinois Chapter Spring Meeting, March 5 2009

Outline of the Talk

What is joint modeling of longitudinal & survival data?

The shared parameter model

The measurement error perspective

Our proposal

Why it works (theoretical properties)

How it works (empirical performance)

Extension and on-going work

2

Longitudinal Data

Each subject is followed over a period of time; a series of measurements made.

e.g., After lung transplant, FEV1 measured every week for a month, and every months afterwards till the end of the study

3

Months

FEV1

0 6 12 18

Survival Data

Time to event such as death, machine failure, disease relapse, PFS, etc.

Could be censored (partially observed)

4

0 1 2 3 4 5 6

02

04

06

08

01

00

Years

Su

rviv

al (%

)

2 4 6 8 10

Time (Months)

death

censored

Kaplan-Meier Curve

Joint Modeling of Longitudinal and Survival Data

Question: how does the change in the (earlier) longitudinal profile of a subject relate to the risk of the (later) survival event?

Example 1: Rate of change of glomerular filtration rate (GFR) & time to end stage renal disease (ESRD) or death

Example 2: FEV1 & survival among cystic fibrosis patients

Wide-spread use & active research field, e.g, surrogate endpoint

5

Longitudinal Profile

6

Months

FEV1

0 6 12 18

longitudinal profile =

signal + noise

Linear profile: subject-specific (random) intercept & slope

Relates intercept & slope to survival

Can we use raw data profile and avoid joint modeling?

Nonlinear profile: time-dependent covariate curve

Data Structure

7

subject-specificintercept & slope

longitudinal data survival data

Stage 1

Stage 2

e.g., Cox Modele.g., Linear Mixed Model

Two-stage hierarchical model

Longitudinal part and survival part are conditionally independent given the subject-specific intercept and slope

Shared Parameter Model

Two-stage hierarchical structure suggests the shared parameter model

Review by Tsiatis & Davidian (2004), and Tseng, Hsieh, Wang (2005), Liu & Ying (2007), among others

Almost all based on the following Fisher-likelihood

8

n

i=1

log{

f(longint, slope)f(surv

int, slope)f(int, slope)d[int, slope]}

Pros: maximum likelihood estimator

Cons: computational intensive, distributional assumptions needed

A New Perspective

9

Can we use a two-step approach for the two-stage problem?

step 1: estimate the intercept and slope for each subject

step 2: relate them to survival

0 2 4 6 8 10

810

12

14

16

Time

true line

fitted line

fitted line

The Measurement Error Perspective

10

Do a regression of survival using true subject-specific intercepts and slopes

true intercept & slope unknown

estimated intercept & slope act as surrogates

measurement error may cause bias in regression

-1 0 1 2

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

Z (red) or X (blue)

Y

Y = b0 + b1Z + e

X = Z + U

Measurement error cause attenuation in regression

Y ~ X

The Example

11

HEMO study: a clinical trial coordinated at Cleveland Clinic

2 by 2 design: standard or high dose of dialysis, low or high flux dialyzer

Neither treatment was found to significantly affect time to all-cause mortality (Rocco et al 2004)

We want to study a secondary question: whether the decline of albumin levels is a strong predictor of mortality

Challenge: albumin measurements need to be calibrated to remove artificial differences due to variations in total body water.

Monday-Wednesday-Friday

Tuesday-Thursday-Saturday

Model & Notation - Longitudinal Part

12

Longitudinal sub-model: linear mixed model

Wij =VTij + DTij

i + #ij

i =Xi + i

In the context of the application:

ALBij =Mon/Tues + i0 + Timeiji1 + Noiseij

i0 =Intercept0 + DoseiA0 + FluxiB0 + i0i1 =Intercept1 + DoseiA1 + FluxiB1 + i1

We start with i N(0,)

and shows later that conclusion holds even when this assumption is dropped.

Stage 1

Stage 2

Model & Notation - Survival Part

Survival sub-model: Cox proportional hazard model

T: time to event C: time to censoring

Y = min(T, C) = 1{ T < C}

13

Log hazard function:

h(t;Zi, i) = h0(t) + ZTi a1 + Ti a2

In the context of the example:

h(t;Zi, i) =h0(t) + Doseia11 + Fluxia12+i0a20 + i1a21

The proposed model includes as special cases the models considered by Wang (2006), Ratcliffe et al (2004), Hsieh, Tseng, Wang (2006), Tsiatis & Davidian (2004), among others

Stage 2

Poissonization of Cox Model

Step 3: use Trapezoidal rule for numerical integration

14

Step 1: use B-spline to approximate log baseline hazard

h0(t) K

k=1

a0kk(t)

Step 2: use full likelihood of Cox model, not partial likelihood

iUi(Yi)T Yi

0exp{Ui(t)T}dt

Finally: we can fit a Cox model using Poisson regression

15

The joint log likelihood (for one subject)

Key observation: appears in linear, quadratic or exponential terms

Survival/Poisson

Longitudinal

Stage 1

LLi() =ni2

log(22! ) Wi Xi Dii 2

22!

LSi() =Mi

g=0

{Y ig

{UT1ig1 +

Ti 2 T2 Xi + log(cig)

}

exp{UT1ig1 +

Ti 2 T2 Xi + log(cig)

}}

LMi() =q

2log(2) 1

2log | |

12(i Xi)T1 (i Xi)

i

True Likelihood corrected version

16

From linear model theory, is a measurement of

i = (DTi Di)

1DTi (Wi Vi)

i |i N(i ,

2! (D

Ti Di)

1)

i i

W N(X,2u)If then,

i

Xi

i

Wi

i

X2i

i

(W 2i 2u)

i

exp(Xi)

i

exp(Wi 122u)

n

i=1

LL() +n

i=1

LS() +n

i=1

LM()

Do correction to the joint log likelihood (formula omitted)

Corrected Likelihood

Linear

Quadratic

Exponential

0 2 4 6 8 10

810

12

14

16

Time

The proposed estimators are maximizers of the corrected joint log likelihood function

Variance components estimated separately in a side step.

Mis-specification allowed, like GEE

Result not sensitive to the B-spline approximation

Statistical inference based on sandwich variance estimator

17

A Few Remarks

Summary on Proposed Method

Key idea: find a corrected joint log likelihood that looks like the true joint log likelihood with the unknowns eliminated

This is possible because the unknowns reside in linear, quadratic or exponential terms (Li and Greene, Biometrics 2008)

Combine three pieces of log likelihood together, similar in spirit to the h-likelihood (1996), but different from the classical Fisher likelihood (1922)

Compared with Wang (2006, Stat Sinica), our method

more general (unknown parameters in both sub-models), including most published models as special case

exact correction with full likelihood instead of approximate correction with partial likelihood

concave likelihood (next page)

18

Theoretical Properties

The estimators of the unknown parameters are maximizers of the corrected joint log likelihood

As sample size becomes large:

the estimator is consistent

the estimator is asymptotically normal

the corrected joint log likelihood is concave

These properties remain valid even when the random effects do not have normal distribution or their variance matrix is misspecified (robust)

19

Simulation Results

We conducted extensive computer simulations to investigate the empirical performance of the proposed method

Bias, variance, coverage of confidence interval: Good

Result not sensitive to number of knots of B-spline

The computation is much faster than competing methods based on maximum likelihood

The algorithm is stable, always converge (concavity)

Estimator expected to be less efficient than maximum likelihood based methods, a trade-off for robustness

20

Parameter

Bias CI coverageof

proposeduncorrected(two-step)

proposed

L 1 = 1 0.00197 0.00299 94.5

L 2 = 2 -0.00370 -0.00571 94.0

L 3 = 1 0.00591 0.00659 94.0

L 4 = 0.5 -0.0104 -0.0118 97.0

intercept = 0.5 -0.347 0.0196 96.0

slope = 1 -0.471 0.0552 95.5

21

n=250

Application to HEMO Study Data

1628 patients with between 3 and 15 repeated measurements

22

Parameter Estimator p-value

intercept 3.7 < 0.001

high dose 0.0012 0.94

high flux -0.007 0.67

time (years) -0.058 < 0.001

high dose by time -0.014 0.311

high flux by time -0.01 0.468

Monday / Tuesday -0.026 0.017

high dose -0.061 0.5

high flux -0.069 0.44

random intercept -1.5 < 0.001

random slope -3.7 < 0.001

0 2 4 6 8 10

810

12

14

16

Time

smaller slope (-0.4)

larger slope (-0.2)

Estimated baseline survival function and its 95% point-wise confidence interval

23

0 1 2 3 4 5 6

02

04

06

08

01

00

Years

Su

rviv

al (%

)

smooth curve

step function frompartial likelihood

Summary

A new method for joint modeling

A general model that includes most published models as special case

Theoretically appealing properties and reliable and easy computation

Robust against certain model mis-specification

May use other methods than Trapezoidal rule (Poissonization is not inevitable)

Limitation:

Need at least three repeated measurements per subject

Trade efficiency for robustness, best for large sample size

24

Nonlinear Longitudinal Data

In a lung transplant study at Cleveland Clinic, investigators want to use FEV1 profile after lung transplant to predict mortality

The profile is clearly nonlinear

25

0 20 40 60 80 100

30

35

40

45

50

55

60

65

mean FEV1 trajectory, subject!clustering ignored

months after transplant

FE

V1

26

0 20 40 60 80 100

0.0

0.5

1.0

1.5

Subject!Specific Fitted Curves

months after transplant

fitted c

urv

es

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

3334

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

5253

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

7071

72 73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

9394

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158159

160

161

162163164

165

166

167168

169

170

171

172

173

174

175

176177

178

179

180

181

182

183

184

185

186

187

188189

190

191

192

193

194

195

196

197

198

199

200

201202

203

204205

206

207208

209

210

211

212

213

214

215

216

217218

219

220

221

222

223

224

225

226

227

228

229230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249250251

252

253

254255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299300

301

302

303

304

305

306

307

308

309310

311

27

0 1 2 3 4 5 6

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

Time

Replace subject-specific intercept or slope with time-dependent covariate

Want Error correction?true curve

estimated curve

Proposed Model & Method

Cox model with time-dependent covariate and time-dependent hazard ratios (varying coefficients)

29

Why varying coefficients: constant hazard ratios unlikely for surgical data

hi(t;X(t)) = exp{0(t) + 1(t)T Xi(t)}

Wi(t) = Xi(t) + !i(t)hi(t;X(t)) = exp{0(t) + 1T Xi(t)}

Use what ever method to fit each subjects longitudinal profile separately

Get estimated curve & its variation; do the measurement error correction

Deal with varying coefficients

0 2 4 6 8 10

-2-1

01

2

Time

Y

0 2 4 6 8 10-2

-10

12

Time

Y

Local linear method: estimate the curves piece by piece at local neighborhoods.

Proposed Method

Proposed Method

Local linear method for the full likelihood of Cox model

31

Our proposal different from all previous methods in that we did not use partial likelihood (for exact correction)?

2 4 6 8 10

Time

2 4 6 8 10

Time

artificiallycensored

removed

n

i=1

[i{Xi(Yi)T(Yi)

} Yi

0exp

{Xi(t)T(t)

}dt

]

n

i=1

[Kh(Yi t0)i

{Xi(Yi)T(Yi)

} Yi

0Kh(t t0) exp

{Xi(t)T(t)

}dt

]

n

i=1

[Kh(Yi t0)i

{Wi(Yi)T(Yi)

}

Yi

0Kh(t t0) exp

{Wi(t)T(t)

12(t)T(t)(t)

}dt

]

Cox log likelihood

Cox local likelihood

The Evolution of Likelihoods

Replace (t) by intercept + slope t

under construction ... ...

32

with correction

with local linear approx.

References

Liang Li, Bo Hu, Tom Greene (2009) A semiparametric joint model for longitudinal and survival data with application to hemodialysis study. Biometrics, in press.

Liang Li. Semiparametric joint modeling of nonlinear time-dependent covariate process and time to event outcome with varying coefficients. Working paper.

33

joint modeling li l

Documents

survival data time

regression of survival

fev1 survival

survival data question

survival datapresented

later survival event

earlier longitudinal

raw data profile