kernel estimation as alternative to (semi)parametric

Post on 10-Dec-2021

15 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction Kernel estimates Simulation study Real data Conclusion

Kernel estimation as alternative to(semi)parametric models in survival analysis

Iveta Selingerová, Stanislav Katina, Ivana Horová

Department of Mathematics and StatisticsFaculty of Science, Masaryk University

Brno

ROBUST 201611 - 16 September 2016

Introduction Kernel estimates Simulation study Real data Conclusion

Outline1 Introduction

Survival analysisKernel smoothing

2 Modeling hazard function using kernel methodTwo types of kernel estimates of conditional hazard functionCox proportional hazard modelBandwidth selection

3 Simulation studyWeibull modelLognormal modelCox modelGeneral hazard function

4 Real dataTriple negative breast cancer

5 Conclusion

Introduction Kernel estimates Simulation study Real data Conclusion

Survival analysis

Survival time T with distribution function FRandom right censoringCensoring time C with distribution function GObserved time Y = min(T ,C) with distribution function L,censoring indicator δ = I(T ≤ C)Survival function F (t) = P(T ≥ t) = 1− F (t)Hazard function

λ(t) = lim∆t→0+

P(t ≤ T < t + ∆t|T ≥ t)∆t = f (t)

F (t)

Cumulative hazard function Λ(t) =∫ t

0 λ(u)du

Introduction Kernel estimates Simulation study Real data Conclusion

Properties

Survival function of observed time L(t) = F (t)G(t)Joint density of (Y , δ) is l(·, ·) and is composed of two”subdensities” l(·, 0) and l(·, 1)Subdensity of the event time r(t) = l(·, 1) = f (t)G(t)Hazard function

λ(t) = r(t)L(t)

Introduction Kernel estimates Simulation study Real data Conclusion

Nonparametric estimation

Kaplan-Meier estimator of the survival function

F KM(t) =

1 t ≤ Y(1),∏i :Y(i)<t

(n−i

n−i+1

)δ(i) t > Y(1).

Nelson-Aalen estimator of the cumulative hazard

ΛNA(t) =

0 t ≤ Y(1),∑i :Y(i)<t

δ(i)n−i+1 t > Y(1).

Smoothing techniques for estimation of hazard function λ(t)

Introduction Kernel estimates Simulation study Real data Conclusion

Survival models

X = (X1,X2, . . . ,Xp)T vector of p covariatesConditional hazard function

λ(t|x) = lim∆t→0+

P(t ≤ T < t + ∆t|T ≥ t,X = x)∆t

Parametric survival modelAssumption of distribution of survival timeAccelerated failure time (AFT) models, parametricproportional hazard (PH) models

Semiparametric survival model – Cox modelExponential dependence on covariatesAssumption proportionality of hazard ratioNonparametric estimate of baseline hazard

Nonparametric survival modelNo assumption of distribution or dependence on covariatesSmoothing techniques

Introduction Kernel estimates Simulation study Real data Conclusion

Kernel smoothing

The value of unknown function at a point is estimated as alocal weighted average of known observations in theneighbourhood of this pointKernel K – real function with support on [−1, 1] satisfying

∫ 1

−1x jK (x)dx =

1 j = 0,0 0 < j < k,bk 6= 0 j = k.

Epanechnikov kernel K (x) = −34(x2 − 1)I ([−1, 1])

Introduction Kernel estimates Simulation study Real data Conclusion

Kernel smoothing

The bandwidth sequence {h(n)}

limn→∞

h(n) = 0, limn→∞

nh(n) =∞

Statistical properties of an estimator ϑ(x) of a function ϑ(x)Local error of an estimator

MSE(ϑ(x)

)= E

(ϑ(x)− ϑ(x)

)2= Var

(ϑ(x)

)+Bias2

(ϑ(x)

)Global error of an estimator

MISE(ϑ)

=∫

MSE(ϑ(x)

)ω(x)dx

Introduction Kernel estimates Simulation study Real data Conclusion

Bandwidths

0 2 4 6 8 10 12 14 16 18 20

0

2

4

6

8

10

0

0.05

0.1

0.15

0.2

0.25

0.3

t

h

λ(t

)

Introduction Kernel estimates Simulation study Real data Conclusion

Two types of kernel conditional hazard estimates

External estimatorL(t|x) = F (t|x)G(t|x)λ(t|x) = f (t|x)

F (t|x) = f (t|x)G(t|x)L(t|x) = r(t|x)

L(t|x)

λE (t|x) = r(t|x)L(t|x)

=1ht

∑ni=1 wi (x)K

(t−Yi

ht

)δi∑n

i=1 wi (x)W(

Yi−tht

)Nadaraya–Watson weights

wi (x) =K(

x−Xihx

)∑n

j=1 K(

x−Xjhx

) , i = 1, . . . , n

Introduction Kernel estimates Simulation study Real data Conclusion

Two types of kernel conditional hazard estimates

Internal estimatorBeran estimator of conditional cumulative hazard function

Λ(t|x) =

0 t ≤ Y(1)∑i :Y(i)<t

δ(i)w(i)(x)1−∑i−1

j=1 w(j)(x)t > Y(1)

λI(t|x) = 1ht

∫K( t − u

ht

)dΛ(u|x)

= 1ht

n∑i=1

K( t − Y(i)

ht

)δ(i)w(i)(x)

1−∑i−1

j=1 w(j)(x)

ht influences smoothing in direction of the timehx influences smoothing in direction of the covariate

Introduction Kernel estimates Simulation study Real data Conclusion

Cox proportional hazards model

Cox modelλ(t|x) = λ0(t)eβx

Properties and assumptions:Hazard ratios are constant over time, i.e.

λ(t|x1)λ(t|x2) = eβ(x1−x2)

The dependence on covariate is exponentialNo distribution of the survival time is assumed

Maximum partial likelihood estimator of the model parameterβ

Introduction Kernel estimates Simulation study Real data Conclusion

Cox proportional hazards model

Breslow estimator of baseline hazard function

Λ0(t) =∑

i :t(i)<tλ0(t(i)) =

∑i :t(i)<t

di∑j∈Ri eβXj

Kernel estimator of baseline hazard function

λ0(t) = 1h

∫K( t − u

h

)dΛ0(u)

= 1h

n∑i=1

K( t − Yi

h

)δi∑n

j=1 I(Yj ≥ Yi )eβXj

Introduction Kernel estimates Simulation study Real data Conclusion

Bandwidth selection

Asymptotically optimal bandwidthsMinimize MISETheoretical value of bandwidthsNeed to know distribution of survival time, observed time, orcovariate

Introduction Kernel estimates Simulation study Real data Conclusion

Bandwidth selection

Cross-validation method

CV (hx , ht) = 1n

n∑i=1

∫λ3(t|Xi)e

−∫ t

0λ(u|Xi )dudt

− 2n

n∑i=1

λ2−i(Yi |Xi)

L(Yi |Xi)e−∫ Yi

0λ−i (u|Xi )du

δi

(hx,CV , ht,CV ) = arg min(hx ,ht ) CV (hx , ht)

0 2 4 6 8 10 12 14 16 18 20

0

24

6

810

−0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

ht

hx

CV

(hx,h

t)

Introduction Kernel estimates Simulation study Real data Conclusion

Bandwidth selection

Maximum likelihood method

ML(hx , ht) =n∏

i=1

λδi−i(Yi |Xi)F−i(Yi |Xi)

(hx,ML, ht,ML) = arg max(hx ,ht ) ML(hx , ht)

0 2 4 6 8 10 12 14 16 18 20

0

2

4

6

8

100

1

2

3

4

5

6

7

8

x 10−207

ht

hx

ML(h

x,h

t)

Introduction Kernel estimates Simulation study Real data Conclusion

Simulation study

200 triples of observed data (Xi ,Yi , δi )100 samplesSpecified conditional hazard function λ(t|x)Covariate Xi ∼ U(0, 10)Survival time Ti obtained as Ti = F−1(Ui |Xi ), whereUi ∼ U(0, 1) and F (t|x) = 1− e−

∫ t0 λ(u|x)du

Censoring time Ci ∼ logN (µ; 0.22), where µ influencescensoring rate (30%, 60% or 90%)Observed time Yi = min(Ti ,Ci )Censoring indicator δi = 1 for Yi = Ti and δi = 0 for otherError between true and estimated conditional hazard functionmeasured by

ASE(λ) = 1n

n∑i=1

(λ(Yi |Xi)− λ(Yi |Xi)

)2

Introduction Kernel estimates Simulation study Real data Conclusion

Weibull model

λ(t|x) = νµtµ−1eβx

ν = 0.018, µ = 1.3,β = 0.2The interpretationusing PH or AFTmodel 0

5

10

15

20

0

2

4

6

8

10

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

t

x

Introduction Kernel estimates Simulation study Real data Conclusion

Lognormal model

λ(t|x) =1σt φ( log t−βx

σ )Φ(βx−log t

σ )σ = 0.5, β = 1

3The interpretationusing AFT modelln Ti = βXi + εi ,εi ∼ N (0, σ2) 0

5

10

15

20

0

2

4

6

8

10

0

0.5

1

1.5

2

t

x

Introduction Kernel estimates Simulation study Real data Conclusion

Cox model

λ(t|x) =1

15e(1.5− cos

( t1.7 + 1

))eβx

β = 0.2λ0(t) =

115e(1.5− cos

( t1.7 + 1

))0

5

10

15

20

0

2

4

6

8

10

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

tx

Introduction Kernel estimates Simulation study Real data Conclusion

General hazard function

λ(t|x) =1

40000

(t (t − 25)2 + 200

× (sin(0.7x − 5) + 2)

0

5

10

15

20

02

46

8100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

tx

Introduction Kernel estimates Simulation study Real data Conclusion

Results of simulation study

If assumptions of the (semi)parametric models are satisfied,these models have slightly better resultsThis difference between models are diminished with increasingcensoring rateThe differences between kernel estimates and parametricmodels are more pronounced than between kernel estimatesand Cox model. This is due to stronger assumptions of theparametric models.The simulation of general conditional hazard function(violated assumptions of the (semi)parametric models) showskernel estimates as preferableIn practice, time distribution and dependence on covariatesare not known ⇒ general conditional hazard is most frequentsituation

Introduction Kernel estimates Simulation study Real data Conclusion

Triple negative breast cancer

408 patients diagnosed and/or treated at Masaryk MemorialCancer Institute in Brno in the period 2004–2011100 deaths (82 deaths in the first 4 years from diagnosis ∼80% censoring rate)The age at diagnosis varies between 25 and 88 years (theaverage 55 years)Is survival time affected by patient age? (Testing ofhypothesis that β = 0 versus β 6= 0 )

0 20 40 60 80 100 120 1400

50

100

150

200

250

300

350

400

Time (months)

Patient

death

censored

Patients’ times Death rate

Introduction Kernel estimates Simulation study Real data Conclusion

Triple negative breast cancer

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

Weibull model p-value=0.129β = 0.0128, ν = 0.0005, µ = 1.41PH interpretation: increase of age about one yearincreases risk of death 1.0128 timesAFT interpretation: increase of age about one yearshortens survival time 0.9910 times

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

Lognormal model p-value=0.045β = −0.0126, σ = 1.290, µ = 5.616AFT interpretation: increase of age about one yearshortens survival time 0.9875 times

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

Cox model p-value=0.127β = 0.0126PH interpretation: increase of age about one yearincreases risk of death by 1.0127 times

Introduction Kernel estimates Simulation study Real data Conclusion

Triple negative breast cancer

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

External kernel estimate

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

Internal kernel estimate

High risk of death for patients over 70 years and increased riskof death for patients up to 40 yearsSlightly different shape of hazard function for various agegroupsIt offers possibility to divide patients into groups with similarrisk of death (up to 40, 41-70, over 70 years)

Introduction Kernel estimates Simulation study Real data Conclusion

Conclusion

In practice, the theoretic shape of the conditional hazardfunction or distribution of the survival time is not known,assumptions of PH model are often violated – using kernelestimates is therefore very helpful alternativeThe kernel estimates of the conditional hazard function can beuseful for verifying assumptions of the (semi)parametricmodels and for finding thresholds of continuous variablesThe kernel estimates are able to capture any changes in thehazard function in direction of covariate and timeThe kernel methods are able to estimate different shapes ofthe hazard function without any constrains and smooth themoutThe kernel estimation produces functions more useful forpresentation

Introduction Kernel estimates Simulation study Real data Conclusion

Conclusion

Advantage DisadvantageKernel Flexibility Complexity of calculating

estimates Visualization Estimate influencedNo assumptions by choice of bandwidthFinding groupswith similar risk

Parametric Easy interpretationInadequate results can

be obtained whereassumptions of models

are violated

model Quality of estimate forknown time distribution

Semiparametric Easy interpretationmodel No distribution

assumption

Introduction Kernel estimates Simulation study Real data Conclusion

References

I. Selingerová, H. Doleželová, I. Horová, S. Katina, J. Zelinka, Survival ofPatients with Primary Brain Tumors: Comparison of Two StatisticalApproaches, PloS one 11(2), 2016.T. R. Fleming and D. P. Harrington, Counting processes and survivalanalysis, John Wiley & Sons, 2011.I. Horová, J. Koláček, J. Zelinka, Kernel Smoothing in MATLAB: Theoryand Practice of Kernel Smoothing, World Scientific Publishing, 2012.L. Spierdijk, Nonparametric Conditional Hazard Rate Estimation: A LocalLinear Approach, Computational Statistics & Data Analysis 52, 2008,2419–2434.M. Svoboda et al., Triple-Negative Breast Cancer: Analysis of PatientsDiagnosed and/or Treated at the Masaryk Memorial Cancer Institutebetween 2004–2009 (in Czech), Clinical Oncology 25(3), 2012, 188–198.

Introduction Kernel estimates Simulation study Real data Conclusion

Thank you for yourattention!

top related