01/20151 epi 5344: survival analysis in epidemiology age as time scale march 31, 2015 dr. n....
TRANSCRIPT
01/2015 1
EPI 5344:Survival Analysis in
EpidemiologyAge as time scale
March 31, 2015
Dr. N. Birkett,School of Epidemiology, Public Health &
Preventive Medicine,University of Ottawa
01/2015 2
Objectives
• Choice of time scale for observational epidemiology
• Risk-set based analysis approaches
01/2015 3
Example Study (1)
• Are Uranium miners at risk for dying from
lung cancer?– Uranium is radioactive and has a complex
decay process
01/2015 4
01/2015 5
Example Study (1)
• Are Uranium miners at risk for dying from
lung cancer?– Uranium is radioactive and has a complex
decay process
– Miners work in enclosed areas with high
levels of radioactive dust
– Is there evidence that their health is affected?
01/2015 6
Example Study (2)
• Colorado Plateau study– Subject eligibility
• Worked underground in uranium mines in the four-state Colorado
Plateau area– at least one month of work
• 2,500 mines in target area
• Examined at least once by Public Health Service MDs between
1950 and 1960
– Followed-up to Dec 31, 1982• Vital Stats records
– Death
– Lung cancer death
01/2015 7
Example Study (3)
• Entry date:– latest of:
• one month of work and exam by MD
• January 1, 1952
• Main outcome– death from lung cancer
01/2015 8
Example Study (4)
• Exposure:– 43,000 direct measurements of radon levels in mines
between 1951 and 1968
– Converted to annual exposure
– Combined with worker’s ‘in mine’ work time
– Generated Working-Level months (WLM)• WL = 20.8 µJ (microjoules) alpha energy per cubic meter (m3) air
• WLM = 1 WL exposure for 170 hours
– Cumulated in five year age intervals• 0-5; 5-10; 10-15; 15-20; 20-25; ….
01/2015 9
agest = age at entry to study
ageexit = age at exit from study
ind = died from lung cancer (=1)
rexp20 = WLMs from age 15-20
Example study (5)
Item Number Percent
Sample size 3,347
Dying (any cause) 1,258 38%
Lung cancer death 258 7.7%
Lung cancer as proportion of all deaths
20.5%
01/2015 10
How to apply survival analysis methods to this data?
Example study (6)
• Based on course to now:– Time is the number of years (month, days,
etc.) from initial entry into the study– Time ‘0’ is the entry date– End of follow-up
• Death (or death from lung cancer)• Censored if
– lost– died from ‘wrong cause’
01/2015 11
Example study (7)
• Based on course to now:– Exposure is time varying
• Cumulative• Mean• Peak
– We will look at exposure to more than 500 WLM
– Use PHREG to generate HR estimates
01/2015 12
Choosing a time scale (1)
• Time scale choices include:– Age– Calendar year– Time since entry into study– Time since initial employment
01/2015 13
Choosing a time scale (2)
• Cox model is:
• Choice of time scale affects the shape of the baseline hazard
• It also affects which people belong together in a risk set
• Betas will have different values01/2015 14
Choosing a time scale (3)
• Time on study– Hazard affected by
• cumulative exposure• Length of time for disease to develop post-exposure
– Usually a ‘gentle’ increase– Risk set groups people with same time post-
entry• Combines people of different ages• Averages age-specific hazards
01/2015 15
Choosing a time scale (4)
• The actual year (calendar time)– Hazard affected by
• Temporal changes in exposure or risk– increased air pollution
– climate change
– legislation
– Changes usually slow
– Hazard is fairly constant, controlling for age, etc.
– Risk set groups people in same years
– Most commonly used for trend analyses with Poisson
regression models01/2015 16
Choosing a time scale (5)
• Age– Hazard affected by
• Cumulative exposure• Aging
– Often shows a very strong effect on hazard• Prostate cancer hazard increases ‘super-
exponentially’
– Risk set groups people of the same age• Ignores how long you have been ‘on study’
01/2015 17
Choosing a time scale (6)
• Choices are not independent– One year of follow-up increases all three time
scale measures by one year• Cox models ‘work’ best if the baseline
hazard captures a lot of hazard variation
01/2015 18
Choosing a time scale (7)
• For an RCT, ‘time on study’ is appropriate– follow-up time is usually short
– Intervention has a strong effect, overwhelms age effect
• For etiological studies– Risk increases with age
– Risk relates to exposure, not to length of time since study
entry
– Length of time is a proxy for cumulative exposure
• For etiological studies, several people have studied the
choice of time scale01/2015 19
Choosing a time scale (8)
• Breslow et al (1983)– Time-on-study as time scale
• fine for RCT’s, etc.
– Not optimal for cohort studies• Most outcome death rates increase rapidly with age
– Want to maximize control of the age effect
• Time-on-study often strongly correlated with
cumulative exposure– Can produce negative bias if used as time scale
01/2015 20
Choosing a time scale (9)
• Breslow et al (1983)– Recommendation
• Use age as time scale• Stratify by calendar time (5 year groups)
– Risk sets consist of people at the same age in each calendar group
– Ignores length of time since entry as factor– Subjects are left truncated (‘late entry’)
• Time ‘0’ is ‘birth date’01/2015 21
Choosing a time scale (10)
• Korn et al (1997)– Cox models don’t specify a form for h(t)
– Best choice of time scale is the one which has the biggest
impact on the hazard function shape• NOT the biggest impact on the HR!
– Which would differ the most:• hazard for people aged 50 vs. aged 60, both with 10 years of
follow-up?
• hazard for two 55 year olds, one with 5 years of follow-up and one
with 15 years?
– Cannot study in the effect of the time scale variable01/2015 22
Choosing a time scale (11)
• Korn et al– Recommendation
• Use age as time scale
• Stratify by year of birth (birth cohort)– 5 year groups are commonly used
– Essentially the same model as proposed by
Breslow et al
01/2015 23
Choosing a time scale (12)
• Korn et al– Considered s second model (commonly used):
• ‘Time-on-study’ as time scale• Adjust for age at entry in model
– Results are the same as having age as time scale if:• h0(t) is exponential in age
– can give strong bias, especially with time-dependent covariates.
01/2015 24
Choosing a time scale (13)
01/2015 25
• Uranium miners study
• 4 different time scales
• Differences are not big
• HR/RR all around 3.5-5.2
• ‘Age’ is used as time scale in rest of session
Time scale RR 95% CI
Time since entry 4.7 3.5 – 6.3
Time since first mining 3.6 2.7 – 4.9
Calendar year 5.2 3.9 – 6.9
Age 4.3 3.2 – 5.7
Implications for Analysis
• Age is the time variable– Left truncated
– Requires ‘late entry’ methods
• Compute exposure as a time varying variable– Cumulative
– Mean
• Analysis option #1:– Use regular Cox model
• Other options– risk set modelling methods
01/2015 26
Regular Cox models (1)
• Uses the Phreg approach• Time varying exposure
– e.g. use ‘500 WLM’ as time varying cut-point• SAS code uses programming statements
within Phreg• Data file uses layout shown earlier
01/2015 27
28
* model has ageexit as failure time, ind as failure indicator and agest as entry time;
proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits;
* Time-dependent programming steps- see PHREG documentation;
array rexp {18} rexp5 rexp10 rexp15 rexp20 rexp25 rexp30 rexp35 rexp40 rexp45 rexp50 rexp55 rexp60 rexp65 rexp70 rexp75 rexp80 rexp85 rexp90;
m = min((ageexit-2)/5,18); i = 0; cradon = 0; do while (i < m); if (m > (i+1)) then do; cradon = cradon + rexp[i+1]; end; else do; cradon = cradon + (m-i)*(rexp[i+1]); end; i = i+1; end;
* Determine whether cumulative radon is >= 500 WLM; cr500 = (cradon >= 500);run;
01/2015
29
proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits;
/***** CODE REMOVED FOR CLARITY *****/
cr500 = (cradon >= 500);run;
01/2015
Regular Cox models (2)
• Could do with counting process style input– Need to create one record for each subject for each year.
– Code gets complex
– I won’t show this
• Either way, Phreg needs to:– create risk set data for each risk set
– compute time varying covariates
– do the MLE algorithm
• Time consuming process– BUT, not a big issue with modern computers.
01/2015 30
Risk Set Methods (1)
A Different Approach• Use the data step to create new data set with
the risk set data• Risk set grouped data
– series of records for each risk set– one line for each subject in the risk set
• Code is complex (not shown)
01/2015 31
01/2015 32
01/2015 33
Risk Set Methods (2)
• How can we use this data?• Consider any risk set (take risk set #1)• Can represent data as 2x2 table
01/2015 34
Lung CA no Lung CA
>=500 WLM 1 4 5
< 500 WLM 0 8 8
1 12 13
Risk Set Methods (3)
• Treat each risk set as a stratum– matched on age (the time scale variable)
• Combine tables into an overall estimate– Mantel-Haenzel methods could be used
• Better approach– Conditional logistic regression.
• Can do this using either:– Proc Logistic
– Proc Phreg
• Likelihood functions are identical
01/2015 35
Risk Set Methods (4)
• Three approaches can be used to do these analyses:– the ‘bit of time’ method (phreg)– the ‘separate strata’ method (phreg)– the ‘binary data’ method (logistic)
01/2015 36
Risk Set Methods (5)
• Approach #1 (‘bit of time’ method)– Use Phreg– Treat the risk set file as a counting process
structure– Need to add an ‘entry’ and ‘exit’ time for each
subject in each risk set
01/2015 37
Risk Set Methods (6)
• Approach #1 (‘bit of time’ method)– Need to add an ‘entry’ and ‘exit’ time for each
subject in each risk set• exit time
– age when the risk set occurred
• entry time– exit time – 0.001– 0.001 is arbitrary but the math works (trust me )
01/2015 38
01/2015 39
Risk Set Methods (7)
• Approach #1 (‘bit of time’ method)– Ignores all of the time between risk sets– Seem weird but the math works (trust me )
01/2015 40
41
proc phreg data=cumexp; model _rstime*_cc(0)=cr500 / entry=_rsentry rl;run;
01/2015
Risk Set Methods (8)
• Approach #2 (separate strata method)– Use Phreg– Number the risk sets from 1 to n– Use the risk set ID number as the time
variable!• Seems weird• Risk set ID is not actually a ‘time’• But the math works (trust me )
– No need for a late entry variable
01/2015 42
01/2015 43
44
proc phreg data=cumexp nosummary; model _setno*_cc(0)=cr500 / rl; strata _setno;run;
01/2015
Identical to Method #1
Risk Set Methods (9)
• Approach #3 (binary data method)– Uses Proc Logistic– Treats each risk set as a stratum
• Remember my 2x2 table from an earlier slide
– Uses conditional logistic regression• Condition on the risk set ID• Not interested in OR or RR for each risk set
– just ‘nuisance’ parameters
• Including strata parameter can lead to strong bias
01/2015 45
Risk Set Methods (10)
• Approach #3 (binary data method)– Stratify by the risk set ID
• similar to STRATA statement in Phreg
– Model yields an OR.• with this sampling approach, OR = RR• the math works (trust me )
01/2015 46
01/2015 47
48
proc logistic data=cumexp descending; model _cc=cr500 / clodds=wald; strata _setno;run;
01/2015
Identical to Method #1
Risk Set Methods (11)
• All three methods gave the same results.– Results are not quite the same as initial Phreg
analysis (with age as the time scale):
01/2015 49
Method HR (RR) 95% CI
Regular Phreg 4.263 3.175 – 5.722
Risk sets 4.267 3.179 – 5.728
Risk Set Methods (12)
• Why bother with risk set method?– Some people claim it is faster
• I didn’t see this effect
• If true, is this an issue with modern computers?
• does 1 sec vs. 2 secs matter?
01/2015 50
Regular RS #1 RS #2 RS #3
0.39 1.65 0.47 1.71
Risk Set Methods (13)
• Why bother with risk set method?– Can handle random effects code better (I am
told)– More easily extends to nested case-control
and case-cohort methods.
01/2015 51
01/2015 52
Full risk data• 1 ‘case’ per risk
set• Multiple non-
cases
Nested case-control (1)
• Most studies will have hundreds or thousands of non-cases in each risk set.
• Suppose we needed to collect new exposure information on all subjects– genotyping
• Gets very expensive to use whole cohort.
01/2015 53
Nested case-control (2)
• Do we need all of the non-cases in each risk set?
• NO!!!
01/2015 54
Nested case-control (3)
• Select a random sample of non-cases from each risk set– Usually a small number
• 4 is common• up to 20 in pharmaco-epidemiology studies
• A person can be used more than once– Multiple time as control– As control and case
• Collect new exposure information only on selected subjects• Analyze using only these subjects• Use any of the three risk set methods shown here
01/2015 55
Nested case-control (4)
• Will give an unbiased estimate of the true HR/RR
• 95% confidence intervals will be larger• Why does it work?• Go back to the Partial Likelihood for Cox
models
01/2015 56
57
• The final likelihood contribution from each risk set is:
• For the nested case-control, the likelihood contribution is given by:
01/2015
Nested case-control (5)
• Likelihoods are the same form– denominator sums over the available risk set
• Can vary method of non-case selection– random sample
– matched
– counter-matched
• Easily extends to case-cohort design– Select a random sample from initial cohort
– Entire sample is retained as the risk set members through-out
follow-up• treats case status as a time varying covariate
01/2015 58
Summary
• Observational epidemiology analysis is more
complex than an RCTs
• Survival methods generalize– discrete time methods
– risk set approaches
• Choice of time scale
• More information on Langholz’s web site– Risk set analysis course, Lanhgolz, USC
01/2015 59
01/2015 60