01/20141 epi 5344: survival analysis in epidemiology quick review and intro to smoothing methods...
TRANSCRIPT
101/2014
EPI 5344:Survival Analysis in
EpidemiologyQuick Review and Intro to Smoothing Methods
March 4, 2014
Dr. N. Birkett,Department of Epidemiology & Community
Medicine,University of Ottawa
201/2014
Objectives (for entire session)
• Primary goal is to address two key concepts:– Hazard
• estimation
• role in survival methods
– Methods to compare two survival curves using non-
parametric methods
301/2014
Objectives (for entire session)
• Review– Survival concepts
– Hazard
• Smoothing methods
• Methods for estimation of hazard
• Proportional hazards
• Non-regression comparison of survival curves– Log-rank test
– Variations of log-rank test
• Relate Hazard/ID to person-time
01/2014 5
Time ‘0’ (1)
• Time is usually measured as ‘calendar time’
Patient #1 enters on Feb 15, 2000 & dies on Nov 8, 2000
Patient #2 enters on July 2, 2000 & is lost (censored) on April 23, 2001
Patient #3 Enters on June 5, 2001 & is still alive (censored) at the end of the follow-up period
Patient #4 Enters on July 13, 2001 and dies on December 12, 2002
01/2014 8
t
dxxftF0
)()(
Histogram of death time- Skewed to right- pdf or f(t)- CDF or F(t)
- Area under ‘pdf’ from ‘0’ to ‘t’
t
F(t)
01/2014 9
Survival curves (3)
• Plot % of group still alive (or % dead)
S(t) = survival curve
= % still surviving at time ‘t’
= P(survive to time ‘t’)
Mortality rate = 1 – S(t)
= F(t)
= Cumulative incidence
01/2014 12
S*(t) = survival curve conditional on surviving to ‘t0‘
CI*(t) = failure/death/cumulative incidence at ‘t’ conditional on surviving to ‘t0‘
Hazard at t0 is defined as: ‘the slope of CI*(t) at t0’
Hazard (instantaneous)Force of MortalityIncidence rateIncidence density
Range: 0 ∞
01/2014 13
Some relationships
If the rate of disease is small: CI(t) ≈ H(t)If we assume h(t) is constant (= ID): CI(t)≈ID*t
01/2014 15
Actuarial Method
A B C D E F G H
Year # people under follow-up
# lost # people dying in this year
Effective # at risk
Prob die in year
Prob survive this year
S(t)
0-1 10 0 0 10 0 1 1
1-2
2-3
3-4
4-5
5-6
6-7
7-8
8-9
9-10
A B C D E F G H
Year # people under follow-up
# lost # people dying in this year
Effective # at risk
Prob die in year
Prob survive this year
S(t)
0-1 10 0 0 10 0 1 1
1-2 10 1 1 9.5 0.105 0.895 0.895
2-3
3-4
4-5
5-6
6-7
7-8
8-9
9-10
A B C D E F G H
Year # people under follow-up
# lost # people dying in this year
Effective # at risk
Prob die in year
Prob survive this year
S(t)
0-1 10 0 0 10 0 1 1
1-2 10 1 1 9.5 0.105 0.895 0.895
2-3 8 0 1 8 0.125 0.875 0.783
3-4
4-5
5-6
6-7
7-8
8-9
9-10
A B C D E F G H
Year # people under follow-up
# lost # people dying in this year
Effective # at risk
Prob die in year
Prob survive this year
S(t)
0-1 10 0 0 10 0 1 1
1-2 10 1 1 9.5 0.105 0.895 0.895
2-3 8 0 1 8 0.125 0.875 0.783
3-4 7 2 1 6 0.167 0.833 0.652
4-5 4 0 0 4 0 1 0.652
5-6 4 0 1 4 0.25 0.75 0.489
6-7 3 1 0 3.5 0 1 0.489
7-8 2 1 0 2.5 0 1 0.489
8-9 1 1 0 1.5 0 1 0.489
9-10 0 0 0 0 0 1 0.489
01/2014 16
Kaplan-Meier method
‘i' time # deaths
# in risk set
Prob die in interval
Prob survive interval
S(t1)
0 0 --- --- --- 1.0 1.0
1 22 1 9 0.111 0.889 0.889
2
3
4
‘i' time # deaths
# in risk set
Prob die in interval
Prob survive interval
S(t1)
0 0 --- --- --- 1.0 1.0
1 22 1 9 0.111 0.889 0.889
2 29 1 8 0.125 0.875 0.778
3
4
‘i' time # deaths
# in risk set
Prob die in interval
Prob survive interval
S(t1)
0 0 --- --- --- 1.0 1.0
1 22 1 9 0.111 0.889 0.889
2 29 1 8 0.125 0.875 0.778
3 46 1 5 0.200 0.800 0.622
4
‘i' time # deaths
# in risk set
Prob die in interval
Prob survive interval
S(t1)
0 0 --- --- --- 1.0 1.0
1 22 1 9 0.111 0.889 0.889
2 29 1 8 0.125 0.875 0.778
3 46 1 5 0.200 0.800 0.622
4 61 1 4 0.250 0.750 0.467
18
Smoothing methods
• Naïve non-parametric regression• ‘windows’• Sliding windows• Local averaging• Kernel estimation
01/2014
26
Sliding windows (1)
• The divisions we used created five ‘windows’ into the data.– Within each window, we computed the mean ‘X’
and ‘Y’ and plotted that point for the regression line• Why do we need to make the windows ‘fixed’?
– Define the width of a window– Slide it from left to right– Compute the ‘window-specific data point’ and plot
as before.• The essence of ‘smoothing’.01/2014
28
Sliding windows (2)
• The size of the window is a ‘tuning parameter’.– Fixed number of neighboring data points– Fixed width
• include all points inside
• Large windows tend to ‘over-smooth’• Small windows do little smoothing and
show the random noise.
01/2014
29
Window-specific data point (1)
• Many ways to compute the representative data point for
the window:– X-value
• Mean of the x’s in window
• Median of the x’s in window
• Define window around a specific data point and use that x-value
– Y-value• Mean of the y’s in the window
• Median of the y’s in the window
• Do a regression (linear, quadratic or cubic) of data points in window– use the predicted ‘y’ for the selected ‘x’
01/2014
31
Window-specific data point (2)
• Can ‘weight’ data points– Points closer to the middle should provide
more information about the true (x,y) than those further away.
• The weights are called a ‘kernel’. The method is called ‘Kernel Smoothing’
01/2014
32
Window-specific data point (3)
• Many weight functions (kernels) can be used.
• A common one is the tricube weight
• Select an ‘xi’
– Define the window around xi to get points inside window
– For each point inside the window• let ‘zij’ measure how far the point ‘xij’ is from the left boundary of the window towards
the right boundary– -1 means on the left boundary
– +1 means on the right boundary
– Then the weight for that point is given by:
01/2014
34
LOWESS• LOWESS = LOcally WEighted Scatterplot Smoothing
• Use above procedure but compute a linear regression of ‘x’ on ‘y’ and
use the regression equation to estimate ‘yi’ for given ‘xi’
• Implemented in SAS as a PROC (LOESS)– Available through ODS Graphics and elsewhere
• Can use a higher order polynomial regression instead of the linear
model– Linear model is usually OK
• ‘Tuning’ done by varying the percentage of the data set included in the
window.– Empirical/feel are ‘best’ for choosing tuning
– Some statistics are available (e.g. residuals) but that is advanced material
01/2014