01/20151 epi 5344: survival analysis in epidemiology cox regression: introduction march 17, 2015 dr....

48
01/2015 1 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa

Upload: jeffery-french

Post on 29-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

101/2015

EPI 5344:Survival Analysis in

EpidemiologyCox regression: Introduction

March 17, 2015

Dr. N. Birkett,School of Epidemiology, Public Health &

Preventive Medicine,University of Ottawa

Page 2: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

201/2015

Objectives

• Review proportional hazards• Introduce Cox model and methods of

estimation• Tied data

Page 3: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

301/2015

Exponential model (1R)

• Exponential model– Most common parametric model in epidemiology– Assumes a constant h(t) = λ– How did we create the likelihood function?

• Subjects can have two types of ‘ends’– Death– Censored

• Each contribute to the likelihood function but in different ways

Page 4: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

401/2015

Exponential Model (2R)

• Likelihood contribution of a death at time ti:

• Likelihood contribution if censored at time :– Actual time of ‘failure’ is unknown.– Must survive until at least time

– Multiply these across all deaths and all censored events to get full likelihood

Page 5: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

501/2015

Exponential Model (3R)

Where:

N = # events

PT = Person-time of

follow-up

Page 6: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

601/2015

Exponential Model (4R)• How do we find the MLE for λ?

Page 7: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

701/2015

Exponential Model (5R)

• What if we want to examine predictors of the outcome?– λ is allowed to vary by sex, age, cholesterol,

etc.• Use the same approach but now, instead

of ‘λ’, we have the following in the likelihood function:

Page 8: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

8

End of review

01/2015

Page 9: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

9

Proportional hazard models (1)

• Now, use this approach BUT do not pre-specify form

for h(t)

• We start with proportional hazards

• Hazard (h(t)) = rate of change in survival conditional

on having survived to that point in time.

01/2015

Page 10: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

10

Hazard models (2)

• Suppose we want to compare two treatment groups– Different survival is expected they have different hazards

– How can we summarize this?

01/2015

In general, HR(t) will be different at different follow-up times

Page 11: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

1101/2015

h2(t)

h1(t)

This can be hard to describe and interpret• Effect of the treatment varies with length of follow-up

Page 12: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

1201/2015

h2(t)

h1(t)

• HR could switch from below to above 1.0

Page 13: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

13

Hazard models (3)

• SUPPOSE that HR(t) were constant at all follow-up times.– Effect of the treatment is the same at all times

PROPORTIONAL HAZARDS model (PH)

• This does not require that h(t) be constant;• It can vary in an unconstrained manner.

01/2015

Page 14: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

1401/2015

h2(t)

h1(t)

Page 15: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

1501/2015

h2(t)

h1(t)

HR

Page 16: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

1601/2015

Page 17: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

1701/2015

Cox models (1)

• For most of the rest of this course, we will assume a Proportional hazards model:

h1(t) = h0(t) * HR

• h0(t) is the ‘baseline’ or reference hazard.– Contains all of the time variability of the hazard.

• HR is assumed to remain the same for all follow-up time.

Constant over follow-up time

Page 18: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

1801/2015

Cox models (2)

• HR can still be affected by predictor variables– Race– Exposure (low/mid/high)– Sex– Caloric intake

• For now, we will assume that these are – measured at baseline (time ‘0’)– remain fixed during follow-up

Page 19: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

1901/2015

Cox models (3)

• In general, we have:

• Most common model assumes that ln(HR) is a linear function of the predictors. This is similar to the model for logistic regression and linear regression.

• NOTE: there is no intercept!– This is ‘subsumed’ into the baseline hazard term h0(t)

Page 20: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2001/2015

Cox models (4)

• HR model can be written:

• How does the fit into our ‘hazard’ model? Our base model is:

Page 21: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2101/2015

Cox models (5)

• This implies:

• But, so what? How do we estimate the Betas?– As with exponential model, it appears we

need to know the shape of h0(t)

Page 22: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2201/2015

Cox models (6)

• COX (1972) SHOWED THAT THIS IS WRONG!– Can estimate the Beta’s without needing to model h0(t)

– Semi-parametric model– Based on:

• Risk sets• Partial likelihoods

• We will skip a lot of math – Use an intuitive approach– Method relates to approach used with exponential model

Page 23: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2301/2015

Cox models (7)

• Start off trying to build a likelihood for the data based on the whole model (with baseline hazard included)

• Concentrate on the times when events happened– Similar to the Kaplan-Meier method

• S(t) only changes when an event happens• can ignore losses between events

• Action happens within Risk Set at the event times.

Page 24: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2401/2015

Cox models (8)

• Action happens within Risk Set at the event times.

• The theory assumes that only one event happens at any point in time– This is not the ‘real world’– In theory, time is continuous.

• So no two events happen at the same time

– We’ll deal with ‘ties’ later on

Page 25: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2501/2015

Cox models (9)

• Consider the risk set at time ‘ti’ when an event happens– Each subject in risk set has a probability of

being the one having the event• Higher hazard higher probability

• ‘likelihood’ contribution from person ‘j’ in risk set is:

Page 26: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2601/2015

Cox models (10)

• Using the definition of conditional probability, this is:

• How do we get the numerator and denominator?

• The hazard is a measure of how likely an event is to occur for a person– Higher hazards an event is more likely

Page 27: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2701/2015

Cox models (11)

• We can get:

Page 28: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

2801/2015

Cox models (12)

Now, because the hazards are proportional, we have:

Page 29: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

29

Cox models (13)

• The likelihood contribution from this event (risk set) can be written:

Cancel out the h0(t)

01/2015

Page 30: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

30

Cox models (14)

• The final likelihood contribution from this risk set is:

• Which does not depend on h0(t)

01/2015

Page 31: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

3101/2015

Cox models (15)

• Now, multiply all of the contributions from each risk set (defined when an event occurs)

• Produces a Partial Likelihood• Estimate the Betas using MLE.

Page 32: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

3201/2015

Cox models (16)

• We can ignore censored times since we are not estimating the actual hazard

• Beta’s depend only on the ranking of events, not on the actual event times– Implies that Cox does not give the same

estimates as Person-time epidemiology analyses

– Standard Cox models do not estimate survival, just relative survival

Page 33: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

3301/2015

D

D

D

C

C

t1 t2t3

Let’s consider a simple example.• Three events three risk sets to consider

Page 34: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

34

For subject ‘m’, the hazard function is:

1st event.

risk set: 1/2/3/4/5

Subject with event: 3

Likelihood contribution:

01/2015

Page 35: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

35

But, we have:

So, likelihood contribution from risk set #1 is:

01/2015

Page 36: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

36

Extending this to the other risk sets:

2nd event.

risk set: 1/2/4

Subject with event: 1

Likelihood contribution:

3nd event.

risk set: 4

Subject with event: 4

Likelihood contribution:

01/2015

Page 37: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

37

Overall Partial Likelihood is:

This can easily be extended to very large data sets.

• Writing out the entire partial likelihood function would be ‘crazy’

• But, this is what our computer has to do

01/2015

Page 38: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

38

Suppose that we are using the Cox model. Let’s also limit to one predictor. Then, we have:

• Partial Likelihood form is now:

• We will see this layout again

01/2015

Page 39: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

39

‘Ties’ (1)

• Above assumed that only one event

happened at any given time– True ‘in theory’ because time is a continuous

variable.

– No true in reality because time is measured

‘coarsely’.• For example

– Only get measurement data every year

– Time of event measured to the day, not hour/min/second

01/2015

Page 40: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

40

‘Ties’ (2)

• More than one event at the same time is called a ‘tied’ event.

• How do we modify the method to handle tied event times?

01/2015

Page 41: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

41

‘Ties’ (3)

• Two main approaches to ‘ties’– Discrete models

• Change the basic theory underlying the model• Assumes that event times are discrete points• Relates to logistic regression• Useful when event time can only occur at fixed

points– graduation from high school

– Exact method• Often implemented using an approximation.

01/2015

Page 42: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

42

‘Ties’ (4)

• Exact method– Suppose we have two events (s1 & s2) which occur

at the same time due to imprecise measurement of the event time.

– IF we had been able to measure the event time with enough precision, we would know if s1

occurred first or second• Birth of twins

– We don’t know, so we assume that the two possibilities are equally likely.

01/2015

Page 43: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

43

‘Ties’ (5)

• Suppose s1 occurred before s2.– Likelihood contribution would be:

• Suppose s2 occurred before s1.– Likelihood contribution would be:

01/2015

Page 44: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

44

‘Ties’ (6)

• Don’t know order. Each is equally likely.• Overall likelihood contribution is:

01/2015

Page 45: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

45

‘Ties’ (7)

• A bit messy but not too bad.

• However, consider the recidivism data.– 5 arrests occurred in week 8

– We don’t know which order they occurred in

– 120 potential orders (= 5!)

– Each order contributes a likelihood product with 5 terms

– Need to add up 120 of these products to give ONE

contribution.

• Can rapidly get even worse!01/2015

Page 46: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

46

‘Ties’ (8)

• Computationally demanding– Not that big a task for modern computers

• Two approximate methods have been developed– Breslow

– Efron

• Both are ‘OK’ as long as number of ties is not too big– Efron is better.

• With modern computers, using the exact approach is

likely fine.01/2015

Page 47: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

47

‘Ties’ (9): Summary

01/2015

Situation Comment

No ties • All methods give the same results

A few ties (<2%) • All methods give similar results

Many ties • Approximations are all biased towards ‘0’.• Prefer Efron to Breslow.• Exact methods are best but be careful

about computational demands

SAS default method is Breslow

Page 48: 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health

4801/2015