01/20141 epi 5344: survival analysis in epidemiology quick review and intro to smoothing methods...

101/2014

EPI 5344:Survival Analysis in

EpidemiologyQuick Review and Intro to Smoothing Methods

March 4, 2014

Dr. N. Birkett,Department of Epidemiology & Community

Medicine,University of Ottawa

201/2014

Objectives (for entire session)

• Primary goal is to address two key concepts:– Hazard

• estimation

• role in survival methods

– Methods to compare two survival curves using non-

parametric methods

301/2014

Objectives (for entire session)

• Review– Survival concepts

– Hazard

• Smoothing methods

• Methods for estimation of hazard

• Proportional hazards

• Non-regression comparison of survival curves– Log-rank test

– Variations of log-rank test

• Relate Hazard/ID to person-time

4

Review Material, Session #1

01/2014

01/2014 5

Time ‘0’ (1)

• Time is usually measured as ‘calendar time’

Patient #1 enters on Feb 15, 2000 & dies on Nov 8, 2000

Patient #2 enters on July 2, 2000 & is lost (censored) on April 23, 2001

Patient #3 Enters on June 5, 2001 & is still alive (censored) at the end of the follow-up period

Patient #4 Enters on July 13, 2001 and dies on December 12, 2002

01/2014 6

Study course for patients in cohort

2001 2003 2013

01/2014 7

01/2014 8

t

dxxftF0

)()(

Histogram of death time- Skewed to right- pdf or f(t)- CDF or F(t)

- Area under ‘pdf’ from ‘0’ to ‘t’

t

F(t)

01/2014 9

Survival curves (3)

• Plot % of group still alive (or % dead)

S(t) = survival curve

= % still surviving at time ‘t’

= P(survive to time ‘t’)

Mortality rate = 1 – S(t)

= F(t)

= Cumulative incidence

01/2014 10

Deaths CI(t)

Survival S(t)

t

S(t)

1-S(t)

01/2014 11

Essentially, you are re-scaling S(t) so that S*(t0) = 1.0

Conditional Survival Curves

01/2014 12

S*(t) = survival curve conditional on surviving to ‘t0‘

CI*(t) = failure/death/cumulative incidence at ‘t’ conditional on surviving to ‘t0‘

Hazard at t0 is defined as: ‘the slope of CI*(t) at t0’

Hazard (instantaneous)Force of MortalityIncidence rateIncidence density

Range: 0 ∞

01/2014 13

Some relationships

If the rate of disease is small: CI(t) ≈ H(t)If we assume h(t) is constant (= ID): CI(t)≈ID*t

01/2014 14

DEAD

DEAD

DEAD

p1

1-p1

p2

1-p2

p3

1-p3

Year 0 Year 1 Year 2 Year 3

01/2014 15

Actuarial Method

A B C D E F G H

Year # people under follow-up

# lost # people dying in this year

Effective # at risk

Prob die in year

Prob survive this year

S(t)

0-1 10 0 0 10 0 1 1

1-2

2-3

3-4

4-5

5-6

6-7

7-8

8-9

9-10

A B C D E F G H



Effective # at risk

Prob die in year


S(t)

0-1 10 0 0 10 0 1 1

1-2 10 1 1 9.5 0.105 0.895 0.895

2-3

3-4

4-5

5-6

6-7

7-8

8-9

9-10

A B C D E F G H



Effective # at risk

Prob die in year


S(t)

0-1 10 0 0 10 0 1 1

1-2 10 1 1 9.5 0.105 0.895 0.895

2-3 8 0 1 8 0.125 0.875 0.783

3-4

4-5

5-6

6-7

7-8

8-9

9-10

A B C D E F G H



Effective # at risk

Prob die in year


S(t)

0-1 10 0 0 10 0 1 1

1-2 10 1 1 9.5 0.105 0.895 0.895

2-3 8 0 1 8 0.125 0.875 0.783

3-4 7 2 1 6 0.167 0.833 0.652

4-5 4 0 0 4 0 1 0.652

5-6 4 0 1 4 0.25 0.75 0.489

6-7 3 1 0 3.5 0 1 0.489

7-8 2 1 0 2.5 0 1 0.489

8-9 1 1 0 1.5 0 1 0.489

9-10 0 0 0 0 0 1 0.489

01/2014 16

Kaplan-Meier method

‘i' time # deaths

# in risk set

Prob die in interval

Prob survive interval

S(t1)

0 0 --- --- --- 1.0 1.0

1 22 1 9 0.111 0.889 0.889

2

3

4

‘i' time # deaths

# in risk set



S(t1)

0 0 --- --- --- 1.0 1.0

1 22 1 9 0.111 0.889 0.889

2 29 1 8 0.125 0.875 0.778

3

4

‘i' time # deaths

# in risk set



S(t1)

0 0 --- --- --- 1.0 1.0

1 22 1 9 0.111 0.889 0.889

2 29 1 8 0.125 0.875 0.778

3 46 1 5 0.200 0.800 0.622

4

‘i' time # deaths

# in risk set



S(t1)

0 0 --- --- --- 1.0 1.0

1 22 1 9 0.111 0.889 0.889

2 29 1 8 0.125 0.875 0.778

3 46 1 5 0.200 0.800 0.622

4 61 1 4 0.250 0.750 0.467

17

END OF REVIEW MATERIAL

01/2014

18

Smoothing methods

• Naïve non-parametric regression• ‘windows’• Sliding windows• Local averaging• Kernel estimation

01/2014

1901/2014

2001/2014

2101/2014

2201/2014

2301/2014

2401/2014

2501/2014

26

Sliding windows (1)

• The divisions we used created five ‘windows’ into the data.– Within each window, we computed the mean ‘X’

and ‘Y’ and plotted that point for the regression line• Why do we need to make the windows ‘fixed’?

– Define the width of a window– Slide it from left to right– Compute the ‘window-specific data point’ and plot

as before.• The essence of ‘smoothing’.01/2014

2701/2014

28

Sliding windows (2)

• The size of the window is a ‘tuning parameter’.– Fixed number of neighboring data points– Fixed width

• include all points inside

• Large windows tend to ‘over-smooth’• Small windows do little smoothing and

show the random noise.

01/2014

29

Window-specific data point (1)

• Many ways to compute the representative data point for

the window:– X-value

• Mean of the x’s in window

• Median of the x’s in window

• Define window around a specific data point and use that x-value

– Y-value• Mean of the y’s in the window

• Median of the y’s in the window

• Do a regression (linear, quadratic or cubic) of data points in window– use the predicted ‘y’ for the selected ‘x’

01/2014

3001/2014

31


• Can ‘weight’ data points– Points closer to the middle should provide

more information about the true (x,y) than those further away.

• The weights are called a ‘kernel’. The method is called ‘Kernel Smoothing’

01/2014

32


• Many weight functions (kernels) can be used.

• A common one is the tricube weight

• Select an ‘xi’

– Define the window around xi to get points inside window

– For each point inside the window• let ‘zij’ measure how far the point ‘xij’ is from the left boundary of the window towards

the right boundary– -1 means on the left boundary

– +1 means on the right boundary

– Then the weight for that point is given by:

01/2014

3301/2014

34

LOWESS• LOWESS = LOcally WEighted Scatterplot Smoothing

• Use above procedure but compute a linear regression of ‘x’ on ‘y’ and

use the regression equation to estimate ‘yi’ for given ‘xi’

• Implemented in SAS as a PROC (LOESS)– Available through ODS Graphics and elsewhere

• Can use a higher order polynomial regression instead of the linear

model– Linear model is usually OK

• ‘Tuning’ done by varying the percentage of the data set included in the

window.– Empirical/feel are ‘best’ for choosing tuning

– Some statistics are available (e.g. residuals) but that is advanced material

01/2014

3501/2014

3601/2014

01/20141 epi 5344: survival analysis in epidemiology quick review and intro to smoothing methods...

Documents