survival analysis - purdue universityovitek/stat526-spring11_files/pdfs/9-surviva… · survival...
Post on 31-May-2020
11 Views
Preview:
TRANSCRIPT
Survival Analysis
STAT 526
Professor Olga Vitek
May 4, 2011
9
Survival Data andSurvival Functions
• Statistical analysis of time-to-event data
– Lifetime of machines and/or parts(called failure time analysis in engineering)
– Time to default on bonds or credit card(called duration analysis in economics)
– Patients survival time under different treatment(called survival analysis in clinical trial)
– Event-history analysis (sociology)
• Why a special topic on survival analysis?
– Non-normal and skewed distribution
– Need to answer questions related to
P (X > t+ t0|X ≥ t0)
– Censored/truncated data
– Here we only focus on right-censored data
9-1
Survival Function
• Continuous survival time T
– its probability density function is f(t)
– its cumulative probability function is F (t)
F (t) = P (T ≤ t) =
∫ t
0f(s)ds
• The survival function of T is
S(t) = P (T > t) = 1− F (t)
– also called survival rate
– steeper curve → shorter survival
– S′(t) = −f(t)
• Mean survival time
µ = E{T} =
∫ ∞0
tf(t) dt =
∫ ∞0
t dF (t) =
∫ ∞0
t d[1− S(t)]
=
∫ ∞0
[∫ t
0dx
]d[1− S(t)] =
∫ ∞0
[∫ ∞x
d[1− S(t)]
]dx
=
∫ ∞0
S(x) dx
9-2
Hazard Function
• The hazard function of T is
λ(t) = lim∆t↘0
P (t ≤ T < t+ ∆t |T ≥ t)∆t
– proportion of subjects with the event per unittime, around time t; λ(t) ≥ 0
– measure of ’proneness’ to the even as functionof time
– λ(t) 6= f(t): λ(t) is conditional on survival to t
• Relates to the survival function
λ(t) = lim∆t↘0
F (t+ ∆t)− F (t)
∆t
/S(t) =
f(t)
S(t)= −
S′(t)
S(t)
= −d
dtlogS(t) (negative slope of log-survival)
• The cumulative hazard function of T is
defined as Λ(t) =∫ t0 λ(s)ds
– Λ(t) = − logS(t)
– S(t) = exp{−Λ(t)}
– λ(t), Λ(t) or S(t) define the distribution
9-3
Parametric Survival Models
• Exponential distribution:
– λ(t) = ρ, where ρ > 0 is a constant, and t > 0
– S(t) = e−ρt, ⇒ f(t) = −S′(t) = ρe−ρt
• Weibull distribution:
– λ(t) = λp(λt)p−1; λ, p > 0 are constants, t > 0
– S(t) = e−(λt)p, ⇒ f(t) = −S′(t) = e−(λt)pλp(λt)p−1
– is the Exponential distribution when p = 1
• Gompertz distribution:
– λ(t) = αeβt; α, β > 0 are constants, t > 0
– is the Exponential distribution when β = 0
• Gompertz-Makeham distribution:
– λ(t) = λ+ αeβt; α, β > 0 are constants, t > 0
– adds an initial fixed component to the hazard
9-4
Non-Parametric SurvivalModels
• Approximate time by discrete intervals
– f(xi) =
{P{T = xi}0 otherwise
– S(xi) =∑
j: xj≥t f(xj)
– λ(xi) = f(xi) / S(xi)
• After substitution:
– f(xi) = λ(xi)∏i−1j=1 (1− λj)
– S(xi) =∏i−1j=1(1− λj)
– S(t) =∏j:xj<t
(1− λj), t > 0
9-5
Right-Censored Data
• Survival time of i-th subject Ti, i = 1, · · · , n.
• Censoring time of i-th subject Ci, i = 1, · · · , n.
• Observed event for ith subject:
Yi = min(Ti, Ci), δi = 1{Ti≤Ci} =
{1, if Ti ≤ Ci0, if Ti > Ci
– Data are reported in pairs (Yi, δi)
= (observedTime, hasEvent)
• Assumption:
– Ci are predetermined and fixed, or
– Ci are random, mutually independent, and inde-pendent of Ti
9-6
Non-parametric Estimation
of Survival Function
And
Comparison Between Groups
With Censoring
9-7
Example: Remission Timesof Leukaemia Patients
• 21 leukemia patients treated with drug (6-mercaptopurine)
• 21 matched controls, no covariates.
> library(MASS) #Described in Venable & Ripley, Ch.13> data(gehan)
> gehanpair time cens treat
1 1 1 1 control2 1 10 1 6-MP3 2 22 1 control4 2 7 1 6-MP5 3 3 1 control6 3 32 0 6-MP7 4 12 1 control8 4 23 1 6-MP9 5 8 1 control10 5 22 1 6-MP11 6 17 1 control12 6 6 1 6-MP13 7 2 1 control14 7 16 1 6-MP15 8 11 1 control16 8 34 0 6-MP17 9 8 1 control.........................
9-8
Kaplan-Meier Estimator ofSurvival Function
• Also called product limit estimator
• Re-organize the data
Distinct Event Times t1 · · · ti · · · tk# of Events d1 · · · di · · · dk
# Survivors Right Before ti n1 · · · ni · · · nk
• Kaplan-Meier estimator S(t) :
– Estimate P (T > t1) by p1 = (n1 − d1)/n1
– Estimate P (T > ti|T > ti−1) by pi = (ni − di)/ni,i = 2, · · · , k
– If ti−1 < t < ti,
S(t) =
P (T > t) = P (T > t1)P (T > t|T > t1) = · · ·
P (T > t1){i−1∏m=2
P (T > tm|T > tm−1)}P (T > t|T > ti−1)
– Estimate S(t) by S(t) = S(ti−1)ni−dini
=∏i:ti≤t
ni−dini
,
(Kaplan & Meier, 1958, JASA)
9-9
Non-Parametric Fit ofSurvival Curves
• Represent data in the censored survival form– create a data structure ’Surv’
– ’+’ represents censored time
> library(survival)> Surv(gehan$time, gehan$cens)1 10 22 7 3 32+ 12 23 8 22 17 6 2 1611 34+ 8 32+ 12 25+ 2 11+ 5 20+ 4 19+ 15 68 17+ 23 35+ 5 6 11 13 4 9+ 1 6+ 8 10+
• K-M estimation of survival function– subset of results for the treatment group:
> fit <- survfit(Surv(time, cens) ~ treat, data=gehan)> summary(fit)
treat=6-MPtime n.risk n.event survival std.err lower95%CI upper95%CI
6 21 3 0.857 0.0764 0.720 1.0007 17 1 0.807 0.0869 0.653 0.996
10 15 1 0.753 0.0963 0.586 0.96813 12 1 0.690 0.1068 0.510 0.93516 11 1 0.627 0.1141 0.439 0.89622 7 1 0.538 0.1282 0.337 0.85823 6 1 0.448 0.1346 0.249 0.807
9-10
Interval K-M Estimatorof Survival Function
• Direct interval estimation of S(t)
– The variance of S(t) can be estimated by
var(S(t)) = [S(t)]2∑i:ti≤t
di
ni(ni − di)
– This is Greenwood’s formula (Greenwood, 1926)
– CI for S(t) is S(t)± 1.96√var[S(t)],
– S(t) ∈ [0,1], but var(S(t)) may be large enough
that the usual CI will exceed this interval
• Interval estimation using log S(t) (preferred)
– The variance of log S(t) can be estimated as
var[ log S(t) ] =∑i:ti≤t
di
ni(ni − di)
– Construct the CI for log S(t)
[L,U ] = log S(t)± 1.96√var[log S(t)],
– For CI of S(t), transform the limits with [eL, eU ].
9-11
Estimate the Cumulative Hazard
Function Λ(t) = − logS(t)
• Using the Kaplan-Meier estimator of S(t):
Λ(t) = − log S(t)
• Alternatively, the Nelson-Aalen Estimator:
Λ(t) =
{0, if t ≤ t1∑
i:ti≤tdini, if t ≥ ti
V ar(Λ(t) =∑i:ti≤t
di
n2i
– better small-sample-size performance than basedon the K-M procedure
– useful in comparing the fit of a parametric modelto its non-parametric alternative
9-12
Non-Parametric Fit ofSurvival Curves
>plot(fit, conf.int=TRUE, lty=3:2, log=TRUE,xlab="Remission (weeks)", ylab="Log-Survival", main="Gehan")
• Plot curves and CI; default CI are on the log scale
• ’log=TRUE’ plots y axis (i.e. S(t)) on the log scale(i.e. y axis shows the hegative cumulative hazard)
0 5 10 15 20 25 30 35
0.05
0.10
0.20
0.50
1.00
Gehan
Remission (weeks)
Sur
viva
l
control6−MP
9-13
Log-Rank Test for Homogeneity
• Non-parametric test
– Compare two populations with hazard functionsλi(t), i = 1,2.
– Collect two samples from each population.
– Construct a pooled sample with k distinct eventtimes
Distinct Failure t1 · · · ti · · · tkTime
Pool # of Failures d1 · · · di · · · dkSample # survivors n1 · · · ni · · · nk
right before tiSample # of Failures d11 · · · d1i · · · d1k
1 # survivors ti n11 · · · n1i · · · n1k
right before tiSample # of Failures d21 · · · d2i · · · d2k
2 # survivors n21 · · · n2i · · · n2k
right before ti
– Note di = d1i + d2i, ni = n1i + n2i, i = 1, · · · , k
– Test the hypotheses
H0 : λ1(t) = λ2(t), t ≤ τvs. Ha : λ1(t) 6= λ2(t) for some t ≤ τ
9-14
Log-Rank Test for
Homogeneity: Procedure
• Consider [ti, ti + ∆) for small ∆Sample Sample Pooled
1 2 Sample# failures d1i d2i di
# survivors at ti n1i − d1i n2i − d2i ni − di# of survivors ti n1i n2i ni
right before ti
• If λ1(t) 6= λ2(t) around ti, there should be associ-ation between sample and event in this 2 × 2 table
• d1i|(ni, di, n1i)H0∼ Hypergeometric(ni, di, n1i):
P (d1i = x|ni, di, n1i) =
(dix
)(ni − din1i − x
)(
nin1i
)=⇒ E[d1i|ni, di, n1i] =
din1i
ni
• Define U =∑k
i=1
[d1i − din1i
ni
]E[U ] = 0, var(U) =
∑ki=1
din1i(ni−di)(ni−n1i)n2i (ni−1)
U/var(U)1/2 H0, asy∼ N(0,1)
9-15
Log-Rank Test in R
(Venable & Ripley Sec.13.2)
> ?survdiff
> survdiff(Surv(time,cens)~treat,data=gehan)
N Observed Expected (O-E)^2/E (O-E)^2/Vtreat=6-MP 21 9 19.3 5.46 16.8treat=control 21 21 10.7 9.77 16.8
Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05
• Conclusion: Reject H0
• Warning: this test does not adjust for covariates,and may often be inappropriate
– Appropriate for this study, since the individualsare matched pairs
9-16
Parametric Estimation of
Survival Function
and
Comparison Between Groups
9-17
The Likelihood Function
• The likelihood of i-th observation (yi, δi, xi):{f(yi|Xi), if δi = 1, (not censored)
S(yi|Xi), if δi = 0, (right− censored)
• The likelihood of the data is
L(β) ∝n∏i=1
{[f(yi|Xi)]δi[S(yi|Xi)]1−δi
}– The inference approaches based on the likeli-
hood function can be applied.
∗ See chapter by Lee on ML parameterestimation
– Use AIC or BIC to compare nested models.
– Graphical check of model adequacy
∗ Plot Λ(t) vs. t for exponential models;
∗ Plot log(Λ) vs. log(t) for Weibull models;
∗ Can also plot deviance residuals.
9-18
Accounting for Covariates:Models for Hazard Function
• General form: λ(t) = λ0(t)ex′β
– λ0(t): the baseline hazard
– ex′β: multiplicative effect, independent of time
• Implications for the Survival Function:
– S(t) = e−Λ(t) = e−∫ t
0λ(s) ds = e−Λ0(t)ex
′β
– logΛ(t) = log[−log S(t)] = logΛ0 + x′β
– If the model is appropriate, a plot of log[−log S(t)KM ]for different groups yields roughly parallel lines
9-19
Accounting for Covariates:Models for Hazard Function
• Exponential distribution, no predictors:
– λ(t) = ρ - constant hazard function
– S(t) = e−ρ t - survival function
• Exponential distribution, one predictor:
– λ(t) = eβ0+β1X = eβ0 · eβ1X,
– eβ0 plays the role of λ0,i.e. the constant baseline hazard
• Weibull distribution, no predictors:
– λ(t) = λp p tp−1 - hazard function
– S(t) = e−(λ t)p - survival function
• Weibull distribution, one predictor:
– λ(t) = p tp−1 · e(β0+β1X)p = p tp−1eβp0 · epβ1X
– eβ0+β1X plays the role of λ in overall hazard
– p tp−1eβp0 is the baseline hazard
– eβ0 plays the role of λ in the baseline hazard
9-20
More on Proportional Hazard
• Assumption of proportional hazard implies:
– The hazard ratio for two subjects i and j withdifferent covariates, at a gven time is
λi(t)
λj(t)=λ0(t)
λ0(t)·eβ0+β1Xi
eβ0+β1Xj
= e(Xi−Xj)β
– constant over time
– only function of covariates
• Can graphically verify the assumption of
proportional hazard.
– Check for parallel lines of SKM(t) on the com-plementary log-log scale
– For the gehan dataset:
> plot(fit, lty=3:4, col=2:3, fun="cloglog",xlab="Remission (weeks)", ylab="log Lambda(t)")
> legend("topleft", c("control", "6-MP"),lty=4:3, col=3:2)
9-21
Verify the Assumptionof Proportional Hazard:
Gehan Study
1 2 5 10 20
−2.
0−
1.5
−1.
0−
0.5
0.0
0.5
1.0
Remission (weeks)
log
Lam
bda(
t)
control6−MP
• Good agreement with the additive structure on thelog-log scale
• The assumption of proportional hazard is plausible
9-22
Accounting for Covariates:Models fot Survival Function
• Also called Accelerated Failure Models
S(t) = S0(t · ex′β)
– S0 is the baseline survival function
– Covariates accelerate or contract the time toevent
– Survival time T0 = T ·ex′β has a fixed distribution
log T0 = log T + X′β, ⇒log T = log T0 −X′β
– Weibull (and its special case Exponential) arethe only distributions that can be simultaneously(and equivalently) specified as proportional haz-ard and accelerated failure models.
9-23
Accelerated Failure:Exponential
• Exponential distribution:
– λ(t) = ρ, S(t) = e−ρ t
– S0(t) = e−t - survival of the standard exponential
• Effect of the covariates:
– S(t) = S0(t · eβ0+β1X), eβ0+β1X = ρ
– If T0 ∼ exp(1), and T are the observed times
T0 = T · eβ0+β1X
log T0 = log T + β0 + β1X, and
log T = −β0 − β1X + log T0
• In R:
log T = β0 + β1X + σ · log ε,
– σ is a scale parameter fixed at σ = 1
– ε ∼ Exp(1)
– use opposite sign of β to estimate survival:
S(t) = S0(t · e−β0−β1X)
9-24
Accelerated Failure:Exponential
> fit.exponential <- survreg(Surv(time,cens) ~ treat,data=gehan, dist="exponential")
> summary(fit.exponential)Value Std. Error z p
(Intercept) 3.69 0.333 11.06 2.00e-28treatcontrol -1.53 0.398 -3.83 1.27e-04
Scale fixed at 1
Exponential distributionLoglik(model)= -108.5 Loglik(intercept only)= -116.8Chisq= 16.49 on 1 degrees of freedom, p= 4.9e-05
• Parameter interpretation:
– −βtreatcontrol = 1.53 > 0
– time until remission is longer for controls
• Predicted survival above T = 10 for trt:
– exp(-10*exp(-3.69))=0.7790189
– comparable to StrtKM(10) = 0.753
9-25
Accelerated Failure: Weibull
• Weibull distribution:
– λ(t) = λp p tp−1 - hazard function
– S(t) = e−(λ t)p - survival function
– Can be viewed as the survival function of theexponential random variable T ′ = T p ∼ Exp(λp)
• Effect of the covariates:
– Identical to the exponential:S(t′) = S0(t′ · eβ0+β1X), eβ0+β1X = λp
– If T0 ∼ exp(1), and T ′ are the observed times
T0 = T ′ · eβ0+β1X
log T0 = log T ′ + β0 + β1X
– Returning to the original notation T ′ = T p:
log T0 = p logT + β0 + β1X
log T = −1
pβ0 −
1
pβ1X +
1
plog T0
9-26
Accelerated Failure: Weibull
• In R Weibull is the default distribution:
log T = β0 + β1X + σ · log ε,
– σ is a scale parameter, σ = 1p, ε ∼ Exp(1)
– use −β/σ to estimate survival:
S(t) = S0(t1/σ · e−β0/σ−β1/σX)
> fit.weibull <- survreg(Surv(time,cens)~treat, data=gehan)
> summary(fit.weibull)Value Std. Error z p
(Intercept) 3.516 0.252 13.96 2.61e-44treatcontrol -1.267 0.311 -4.08 4.51e-05Log(scale) -0.312 0.147 -2.12 3.43e-02
Scale= 0.732Weibull distributionLoglik(model)= -106.6 Loglik(intercept only)= -116.4Chisq= 19.65 on 1 degrees of freedom, p= 9.3e-06
• Predicted survival above T = 10 for trt:
– exp(-10^(1/0.732)*exp(-3.1516/0.732))=0.7308616
– comparable to StrtKM(10) = 0.753
9-27
Variable Selectionand Prediction
• Compare models with and without pair asblocking factor
anova( survreg(Surv(time,cens) ~ treat, data=gehan),survreg(Surv(time,cens)~factor(pair)+treat, data=gehan))
Terms Res.Df -2*LL TestDf Deviance P(>|Chi|)treat 39 213.159 NA NA NA
(pair)+treat 19 181.343 +(pair) 20 31.81597 0.04529
• Prediction for the median survival
– On the linear predictor scale
> fit.weibull.nopairs <-survreg(Surv(time,cens)~treat, data=gehan)
> ?predict.survreg> pred.contr <- predict(fit.weibull.nopairs,
data.frame(treat=’control’), type="uquantile",p=0.5, se=TRUE)
– On the survival function scale
> exp( c(L=pred.contr$fit - 2*pred.contr$se.fit,U=pred.contr$fit + 2*pred.contr$se.fit) )
L.1 U.15.045779 10.396147
9-28
Cox Proportional Hazards
Model
A semi-parametric
approach
9-29
Cox Proportional Hazards(Relative Risk) Model
• Assume:
– Ci and Ti conditionally independent, given Xi.
– Observe k distinct exact failure times(i.e. δi = 1); t1 < t2 < · · · < tk.
– No ties in observed exact failure times(i.e., if δi = δj = 1, then yi 6= yj for i 6= j).
• Define the risk set at time t
R(t) = {i : yi ≥ t} = {individuals alive right before time t}
• Cox (1975) considered the conditional prob-ability
– Probability of event in a small interval aroundtj, given that the individual is in the risk set
P (ind. i failed at [tj, tj + ∆) | i ∈ R(tj)) ≈λ(tj)∆∑
k∈R(tj)λ(tj)∆
9-30
Partial Likelihood Approachto Estimate β (Cox, 1975)
• With covariate Xi(t), assume a proportional hazardsmodel
λ(t) = λ0(t) expXi(t)′β(t)
where λ0(t) is a baseline hazard and eX′iβ(t) is the rel-
ative risk. Treating λ0(t) as a nuisance parameter,one may estimate β(t) by maximizing the partiallikelihood
L(β) =∏j
expXi(tj)′β(tj)∑l∈R(tj)
expXl(t)′β(tj),
where the i(j)th item fails at tj and R(tj) is the riskset at tj.
• Conditional on the history up to tj and the fact thatone item fails at tj, each term within the productis proportional to the likelihood of a multinomialmodel.
• λ0(t) is not specified, which is the non-parametricpart of the model.
• Xl(t)′β(tj) is the parametric part of the model.
9-31
Partial Likelihood Approach toEstimate β (Cox, 1975)
• The partial likelihood function by Cox (1975) is
L(β) =nobs∏i=1
eX′iβ∑
k∈R(tj)eX
′kβ
=n∏i=1
{eX
′iβ∑
k∈R(yi)eX
′kβ
}δi
— Cox argued that the partial likelihood has allproperties of an usual likelihood, e.g., maximizingit for an optimal β =⇒ maximum partial likelihoodestimator (MPLE) β
• Breslow’s Estimator of the Baseline Cumulative Haz-ard Rate:
Λ0(t) =∑j:tj≤t
1∑i∈R(t) e
X′jβ=∑i:yi≤t
δi∑j∈R(yi)
eX′jβ
• There are different ways to relax Assumption 3to handle tied failure times, e.g., exact method,Efron’s method, and Breslow’s method.
Method R Options CommentExact method="exact" accurate, long
Efron’s method="efron" approximate, betterBreslow’s method="breslow" approximate
9-32
Cox Proportional Hazard
• Use exact method to account for ties
• β > 0 as expected
> fit.cox1 <-coxph(Surv(time,cens)~treat,method="exact", data=gehan)
> summary(fit.cox1)
n= 42, number of events= 30coef exp(coef) se(coef) z Pr(>|z|)
treatcontrol 1.6282 5.0949 0.4331 3.759 0.000170 ***
exp(coef) exp(-coef) lower .95 upper .95treatcontrol 5.095 0.1963 2.18 11.91
Rsquare= 0.321 (max possible= 0.98 )Likelihood ratio test= 16.25 on 1 df, p=5.544e-05Wald test = 14.13 on 1 df, p=0.0001704Score (logrank) test = 16.79 on 1 df, p=4.169e-05
9-33
Cox Proportional Hazard
• Add pair as block
> fit.cox2 <- coxph(Surv(time,cens)~treat+factor(pair),method="exact", data=gehan)
> summary(fit.cox2)coef exp(coef) se(coef) z Pr(>|z|)
treatcontrol 3.314679 27.513571 0.742620 4.463 8.06e-06 ***factor(pair)2 -5.015219 0.006636 1.550131-3.235 0.001215 **factor(pair)3 -3.598195 0.027373 1.547371-2.325 0.020053 *
.................................................Rsquare= 0.662 (max possible= 0.98 )Likelihood ratio test= 45.51 on 21 df, p=0.001484Wald test = 27.42 on 21 df, p=0.1573Score (logrank) test = 39.73 on 21 df, p=0.008023
• LR test compares log(partial likelihoods),
but the test has similar properties
> anova(fit.cox1, fit.cox2, test="Chisq")Analysis of Deviance TableCox model: response is Surv(time, cens)Model 1: ~ treatModel 2: ~ treat + factor(pair)
loglik Chisq Df P(>|Chi|)1 -74.5432 -59.915 29.256 20 0.08283 .
9-34
Visualize Model Fit
• Survival curve and CI for an individual withaverage covariates– K-M curves refer to unadjusted populations, Cox
curves refer to an ’average’ patient
– can make survival curves for the Cox model forspecific values of covariates by providing newdata
>plot(survfit(fit.cox1),lty=2:3,xlab="Remission (weeks)",ylab="Survival", main="Gehan", cex=1.5)
0 5 10 15 20 25 30 35
0.0
0.2
0.4
0.6
0.8
1.0
Gehan
Remission (weeks)
Sur
viva
l
9-35
Model Diagnostics
• Cox-Snell residuals are most useful for examiningthe overall fit of a model
rc,i = − log[S(yi|xi)] = Λ(yi|xi) = Λ0(yi)eX′iβ
— Λ0(t) is estimated by the Breslow’s estimator.
— If the estimates Λ0(t) and β were accurate, {(rc,i, δi), i =1, · · · , n} are right-censored observations of Exponential(1).
— For the right-censored data {(rc,i, δi), i = 1, · · · , n},construct the Nelson-Aalen estimator Λ(t) and plotΛ(t) vs. t.
• Martingale residuals can be used to determine thefunctional form of a covariate
rm,i = δj − rc,j
— Fit the Cox model with all covariates except theone of interest, and plot the martingale residualsagainst the covariate of interest.
• Deviance residuals are approximately symmetricallydistributed about zero and large values may indicateoutliers.
rd,i = sign(rm,i)√−2[rm,i + δi log(rc,j)]
9-36
top related