STT520-420: BIOSTATISTICS ANALYSIS
Dr. Cuixian Chen
Chapter 8: Fitting Parametric Regression Models
STT520-420 1
Leukemia Remission Time with PHM and SAS
STT520-420
2
SAS has a procedure that easily estimates ’s in the proportional hazards model.
With PHM, use SAS codes to estimates ’s in the remission times data.
Q: Did the group effect significant?
Hypothesis Testing
We can test three types of hypothesis as following: H0: β =0;H0: the hazard function of the two groups are the same;H0: the survival function of the two groups are the same.
(for more than two groups, we can use CLASS options in procedures)
STT520-420
3
Then hypothesis tests about can be based on any of the 3 statistics below.
To test use one of :
Wald test:
LR test:
Score test:
H0 : 02
2 200 1
ˆˆ ˆ( ) ( )
ˆ( )aI
SE
201
( )2 log
ˆ( )e a
L
L
2201
0
( )
( ) a
U
I
Review: Parametric Survival Models (chap 7)
~
~
~
There are three statistics we can compute to do a significance test of H0: β =0 conveniently for each model: the Wald statistic is the quotient of the
estimator (beta-hat) divided by the standard error of the estimator.
the second statistic is the so-called likelihood ratio (LR) statistic and is used to compare the models
The Score test.
Chap 9.4: Cox Model Fitting
Notice that in each of the three printouts, there is a section giving values of a three test statistics testing the so-called “Global Null Hypothesis: β =0” . In this case, β =0 refers to the vector of all the betas:
The likelihood ratio chi-square statistic is obtained from the two
-2LOG(L) statistics subtracted (the one w/out covariates {no x’s} minus the one with covariates). If the null hypothesis is true, then this chi- square will have d.f. = # of covariates in the model.
This same difference in log(likelihoods) can be used to compare any two models - the statistic is chi-square with the number of d.f. is the difference in # of covariates, assuming the null hypothesis of the “extra” betas = 0 is true.
0...:H0 21 p
Chap 9.4: Cox Model Fitting: likelihood ratio test
Let’s use the LR test to compare models (see p. 179-180) - use the notation there:
The likelihood ratio test of the current model is
The likelihood ratio test of the full model is
Their difference (subtract the full minus the current likelihoods) is asymptotically chi-square with q d.f. and may be used to test whether the additional q parameters in the full model are zero.
This difference is called the deviance
2loge (ˆ L 0ˆ L 1
) p2 for testing H0 : all parameters0
2loge (ˆ L 0ˆ L 2
) pq2 for testing H0 : all parameters0
D 2 loge ( ˆ L 1) loge ( ˆ L 2)
Chap 9.4: Cox Model Comparison: likelihood ratio
Leukemia Remission Time with PHM and SAS
STT520-420
8
SAS has a procedure that easily estimates ’s in the proportional hazards model.
With PHM, use SAS codes to estimates ’s in the remission times data.
Reconsider the remission data example in more detail…get the SAS output for the 3 models: grp only (model 1) grp and logWBC (model 2) grp, logWBC, and interaction term grp*logWBC
(model 3) For each model, we’ll do three things:
do a statistical test of the null hypothesis beta=0 get an estimate of the hazard ratio for each beta get a 95% confidence interval for for each beta
Cox Model Fitting for remission data example
Remission data example in SAS, EX4.5, page 68http://people.uncw.edu/chenc/STT520_420/SAS_Codes/remission-phreg.sas ptions ls=80; dm log 'clear'; dm lis 'clear';
data remission;
input group $ remtime censor logWBC;
if group="6mp" then grp=0; else grp=1;
datalines;
6mp 6 1 … (input data);
/*note the use of the numeric variable grp defined as grp=1 if group=“pl” and 0 otherwise… */
/*Model 1: Covariate = group*/
proc phreg data=remission;
model remtime*censor(0)=grp;
baseline out=out1 survival=S1 LOGSURV=ls1 LOGLOGS=lls1 upper=UCL1 lower=LCL1;
title "Model 1";
run; quit;
proc print data=out1; run; quit;
proc gplot data=out1;
plot S1*remtime=grp; /*plot for survival function*/
/* it gives baseline survival curves for treatment and placebo groups*/
/*SYMBOL1 VALUE=none interpol=join;*/
plot ls1*remtime=grp; /*plot for log-log survivor*/
plot lls1*remtime=grp; /*plot for cumulative hazard function (negative log survival function)*/
run; quit; STT520-420
10
/*Model 2: Covariate = group and logWBC*/
proc phreg data=remission;
model remtime*censor(0)=grp logWBC;
baseline out=out2 survival=S2 upper=UCL2 lower=LCL2;
title "Model 2";
run; quit;
proc print data=out2; run; quit;
proc gplot data=out2;
plot S2*remtime=grp; /*baseline survival curves for treatment and placebo groups*/
run; quit;
/*----------------------------------------*/
/* Model 3: Covariate = group, logWBC and interaction term grp_logWBC=group*logWBC */
proc phreg data=remission;
model remtime*censor(0)=grp logWBC grp_logWBC;
grp_logWBC=grp*logWBC; /*Creation of interaction term*/
baseline out=out3 survival=S3 upper=UCL3 lower=LCL3;
title "Model 3";
run; quit;
proc print data=out3; run; quit;
proc gplot data=out3;
plot S3*remtime=grp; /*baseline survival curves for treatment and placebo groups*/
run; quit;
SAS output for Model 1: EX4.5, page68
STT520-420
11
Default method to handle ties;Other methods: Exact; Discrete; Efron.
Only look at Y, not Yx
Consider the Regression Model:Cox PHM on Yx
SAS output for Model 1: EX4.5, page 68
STT520-420
12
H0: βgrp=0 v.s. Ha: βgrp≠0
Review STT215: 5-steps to do Hypothesis testing:1.H0: β=0 v.s. Ha: β≠02.Choose a significance level: α=5% or 1%;3.Calculate the test statistics, assuming H0 is true;4.Finding the P-value in direction of Ha.5.Drawing conclusions (statistical and non-statistical):
If P-value ≤ α, then we reject H0 (Enough evidence). If P-value > α, then we do not reject H0 (No Enough evidence).
SAS output for Model 1: EX4.5, page68
STT520-420
13
H0: βgrp=0 v.s. Ha: βgrp≠0
Deviance for Model 1 (w/ covariate grp) comparing to Model 0 (w/o any covariate)D=187.970-172.759=15.2109
P-value=1-pchisq(15.2109, 1)=9.614686e-05
SAS output for Model 2: EX4.5, page68
STT520-420
14
H0: βgrp=βlogWBC=0 v.s. Ha: at least one of βgrp, and βlogWBC is not 0
Deviance for Model 2 (w/ covariate grp & logWBC) comparing to Model 0 (w/o any covariate)D=187.970-144.559= 43.411
P-value=1-pchisq(43.411, 2)=3.744736e-10
SAS output for Model 3: EX4.5, page68
STT520-420
15
H0: βgrp=0 v.s. Ha: βgrp≠0H0: βgrp=βlogWBC =βgrp*logWBC=0 v.s. Ha: at least one of βgrp, βlogWBC, and βgrp*logWBC is not 0
Deviance for Model 3 (w/ covariate grp, logWBC, grp*logWBC) comparing to Model 0 (w/o any covariate)D=187.970-144.131= 43.839
P-value=1-pchisq(43.839, 3)=1.632828e-09
Compare Model 1 and Model 2
STT520-420
16
Deviance for Model 2 (w/ covariate grp, and logWBC) comparing to Model 1 (w/ covariate grp)D=172.759-144.559=28.2
P-value=1-pchisq(28.2, 1)=1.094046e-07So Model 2 is better than Model 1.
H0: βlogWBC =0 v.s. Ha: βlogWBC ≠0
Compare Model 2 and Model 3
STT520-420
17
Deviance for Model 3 (w/ covariate grp, logWBC, and grp*logWBC) comparing to Model 2 (w/ covariate grp, and logWBC)D=144.559-144.131=0.428
P-value=1-pchisq(0.428, 1)=0.512972So coefficient of interaction part is 0. Model 2 is better than Model 3.
H0: βgrp*logWBC =0 v.s. Ha: βgrp*logWBC ≠0
PHM with a group membership covariate: there is only one covariate, namely “group” (usually control group: x=0 and treatment group: x=1)
The proportional hazard (or the hazard ratio) is
So, if we could get an estimate of call it -hat), we could then have an estimate of the hazard ratio between two individuals in the two groups ; i.e., exp(-hat) so we could say that
)exp()0exp()0*exp()(
)1*exp()(
)(
)(
0
0
0
1
yh
yh
yh
yh
X
X
)ˆexp(*)()( 01 yhyh XX
Recall: Example 8.1, page 145.
SAS output for Model 1: EX4.5, page 68
STT520-420
19
exp(beta)=4.523; then beta=log(4.523)=1.50919.
Hazard Ratio=4.523 means the hazard of remission for those with placebo is about 4.523 times (or 452.3%) of the hazard for those with 6-MP.
)ˆexp(*)()( 01 yhyh XX
)exp()0exp()0*exp()(
)1*exp()(
)(
)(
0
0
0
1
yh
yh
yh
yh
X
X
SAS output for Model 1: EX4.5, page 68
STT520-420
20
Nonparametric estimates of survival function based on a fitted PHM is given by BASELINE statement (for the subjects whose covariates are all equal to the meanOf each variable, eg: x=0 and x=1, mean(x)=0.5 for grp variable).
Cox model fitting: Use LR test to compare model 3 with model 2; i.e., is the interaction term significant? LR statistic is computed as difference between LRs
of 2 models, LR(model 2) - LR(model 3) = 144.559 - 144.131 = .428.
This test statistic follows chi-square with df=1. (one parameter difference between the two models)
under the null hypothesis that the interaction term has coefficient zero.
From R: 1-pchisq(0.428, 1) = 0.512972 That is: P(chisq(1) > .428) = .513. Therefore, we
do not reject the null hypothesis. That is, Model 2 is already an appropriate model.
More details: Compare Model 2 and Model 3
Now let’s look at the Hazard Ratio (HR) in each of the three models…
In model 1, the HR is estimated to be 4.523 (from SAS). Let’s see how this is done… we’ve seen that
so if X=1 is the placebo group, then the maximum likelihood estimate of beta = 1.50919 (from SAS), so exp(1.50919) = 4.523066 is the estimated hazard ratio. This means that the hazard for an individual in the placebo group is more than 4.5 times greater than an individual in the treatment group (at all times) ignoring logWBC.
hX 1(y)
hX 0(y)
h0(y)exp((1))
h0(y)exp((0))exp( 0) exp()
How to estimate beta, assuming g1(x)=exp(beta*x)
Consider Model 2’s hazard ratios(Placebo: x=1)
and
Model 3: with a significant interaction term, estimated HR could be
hX 1& logWBC (y)
hX 0& logWBC (y)
h0(y)exp(1.29405(1) 1.60432logWBC)
h0(y)exp(1.29405(0) 1.60432logWBC)3.647529
hX 1& logWBC1(y)
hX 1& logWBC (y)
h0(y)exp(1.29405(1) 1.60432(logWBC 1))
h0(y)exp(1.29405(1) 1.60432(logWBC))4.974476
1&log &int 0
0&log &int 0
( ) ( ) exp(2.35494(1) 1.80279(log ) .34220*1*log )
( ) ( ) exp(2.35494(0) 1.80279(log ) .34220*0*log )X WBC
X WBC
h y h y WBC WBC
h y h y WBC WBC
Cox Model Fitting: Control covariates
we use the baseline option in proc phreg will give the estimation of the baseline survival function with a 95% confidence interval for the baseline survival function. The UPPER and LOWER options store the upper and lower 95% confidence limits in variables UCL and LCL, respectively.
proc phreg;
model remtime*censor(0)=grp logWBC;
title “Model 2”;
baseline out=a survival=s upper=ucl lower=lcl ;
proc print data=a;
run; quit;
PROC PHREG: Baseline option
To predict the adjusted survival curves for specific values of the covariates, first create a dataset with the values you want to consider and then use the covariate option as follows:…
data b; grp=1; logWBC=2.93; run;
…
proc phreg data=remission;
model remtime*censor(0)=grp logWBC;
baseline out=a survival=s upper=ucl lower=lcl covariates=b/nomean;
proc print data=a; run; quit;
Chap 9.4: Cox Model Prediction
Testing for whether quantitative covariates are associated with survival time? Both give likelihood-ratio, Score, Wald’s test statistics.
PROC LIFEREG; (Testing automatically) proc lifereg data=recid; model week*arrest(0) = fin age race wexp mar paro prio
/dist=exponetial; Run;
PROC PHREG; (Testing automatically, works better) PROC PHREG DATA=recid; MODEL week*arrest(0)=fin age race wexp mar paro prio; RUN;
STT520-420
26
To check Exponential/Weibull assumption Case 1: With NO covariate: R programs in chapter 4; PROC LIFETEST produces two useful plots: the log-
survival (LS) plot and the log-log survival (LLS) plot, by using PLOTS=(S, LS, LLS) to check Exponential/Weibull distribution.
If Exponential Model is appropriate, the log-survival (LS) plot: (t, -log(S(t))) should be a straight line through origin.
If Weibull Model is appropriate, the log-log survival (LLS) plot: (logt, log(-log(S(t)))) should be a straight line.
However, these graphs do not adjust for the effects of covariates. With covariates, we can use PROC LIFEREG with RPOBPLOT option.
STT520-420
27
Graphical methods for evaluate model fithttp://people.uncw.edu/chenc/STT520_420/dataset/Chap8-steel-model-check.sas
Case 2: With covariate: To check model fit with covariates, consider PROC
LIFEREG with probplot option. The PROBPLOT statement produces non-parametric
estimates of the survivor function using a modified Kaplan-Meier method that adjusts for covariates.
proc lifereg data=recid; model week*arrest(0)=fin age race wexp mar paro prio /
dist=weibull; probplot; title "Lifereg Weibull"; run; quit;
STT520-420
28
Graphical methods for evaluate model fit
STT520-420
29
The upward sloping straight line represents the survival function predicted by themodel. The shaded bands around that line are the 95% confidence bands.The circles are the non-parametric survival function estimates. Ideally, allthe non-parametric estimates should lie within the confidence bands.
PROC LIFEREG in SAS
STT520-420
30
The only differences between AFT and the usual linear regression models are that there is a σ before εi the and that the dependent variable is logged.
With exact data, take Y = log T, and use the linear regression model with Y as the dependent variable. With censoring data, use MLE with different distribution assumption on ε. For each of the distribution of ε, there is a corresponding distribution for T.
Incidentally, all AFT models are named for the distribution of T rather than for the distribution of e or log T.
PROC LIFEREG in SAS
Yx can assume the follow distributions: Weibull, Exponential , gamma, log-logistic, and log-normal, by using “/dis=Weibull”.
Note: all AFT models are named for the distribution of Yx, not log(Yx) or epsilon.
However, the choice of model can make substantial difference.
Graphical method for evaluation model fit: If Yx ~Exp, then (Yx, -logS(Yx)) should be a straight
line with an origine at 0. If Yx ~Weibull, then (log(Yx), log[-logS(Yx)]) should be
a straight line. In PROC LIFETEST, plots=(ls, lls) gives both plots.STT520-420
31
PROC LIFEREG/PHREG in SAS PROC LIFEREG allows all types of censoring: RC, LC and IC,
while PROC PHREG only allows RC. PROC LIFEREG can test certain hypothesis about the shape of
hazard function. PROC PHREG only gives nonparametric estimation of survivor function, which can be difficult to interpret.
If shape of survival distribution is known, PROC LIFEREG produces more efficient estimation with smaller SD than PROC PHREG.
PROC LIFEREG creates set of dummy (indicator) variable to represent categorical variables with multiple values. But PROC PHREG require you to create such variables in DATA step.
But PROC LIFEREG does NOT handle time-dependent variables, while PROC PHREG does.
STT520-420
32
PROC LIFEREG EX 7.7, pg138 ( with residual plot)
options ls=80; dm log 'clear'; dm lis 'clear';
data sinker; input dur censor;
datalines;
10 1 12 1 15 0 17 1 18 1 18 1 20 0 20 1 21 1 21 0 23 0 25 1 27 1 29 1 29 1 30 0 35 1 ;
proc print data=sinker; run;
proc lifereg data=sinker;
model dur*censor(0)= /nolog dist=weibull; /*Considering Extreme-Value Distribution*/
probplot;
title 'Modeling u=log(Y) w/ NOLOG option';
run; quit;
proc lifereg data=sinker;
model dur*censor(0)= /dist=weibull ITPRINT; title 'Modeling non-transformed Y'; /*Considering Extreme-Value Distribution*/
probplot;
run; quit; /*ITPRINT is to see how the iterative process works*/
proc lifereg data=sinker;
model dur*censor(0)= /dist=exponential;
probplot;
title 'Modeling Exponential Y'; /*Considering Exponential Distribution*/
run; quit;
STT520-420
33
Testing for difference in survivor functions, with covariates
Q: Did the treatment make a difference in the survival experience of the two groups?
Test: S1(t)=S2(t) for all t.
PROC LIFETEST calculates the following hypothesis testing’s:
(1) Log-rank Test (Mantel-Haenszel test); (2) Wilcoxon Test; (3) Likelihood-ratio Test with additional assumption
that Yx follows Exponential assumption.
STT520-420
34
PROC LIFETEST: Testing for differences in survivor functions between 2 groups
STT520-420
35 Did treatment make a difference in survival function of two groups? That is, we test whether survivor functions are same in two groups, S1(t) = S2(t) for all t.ODS GRAPHICS ON;
PROC LIFETEST DATA=myel PLOTS=S(TEST);
TIME dur*status(0);
STRATA treat;
RUN;
ODS GRAPHICS OFF;
The STRATA statement has three consequences: 1. First, instead of a single table with KM estimates, separate
tables are produced for each of the two treatment groups. 2. Second, corresponding to the two tables are two graphs of the
survivor function, superimposed on the same axes for easy comparison.
3. Third, PROC LIFETEST reports several statistics related to testing for differences between the two groups. Also, the TEST option (after PLOTS=S) includes the log-rank test in the survivor plot.
PROC LIFETEST: Testing for whether quantitative covariates are associated with survival time? To test whether quantitative covariates are
associated with survival time. PROC LIFETEST with test option;
Proc lifetest data=recid; time week*arrest(0); Test fin age race wexp mar paro prio; Run;
It gives the log-rank and Wilcoxon test statistics (better in PROC LIFETEST).
Or Likelihood-ratio test.
STT520-420
36
PROC LIFETEST: myelomatosishttp://people.uncw.edu/chenc/STT520_420/dataset/Example-myel.sas
options ls=80; dm log 'clear'; dm lis 'clear';
data myel;
input dur censor treat renal;
datalines;
8 1 1 1 … (input data)
proc print; run; quit;
proc lifetest data=myel
plots=(s,h,p) graphics /*Only survivor function will be plotted in KM method even you mention s, h, p*/
outsurv=OUT /*write the intervals to an output data set OUT*/;
time dur*censor(0); run ; quit;
proc print data=OUT; /*print the output for intervals*/
run; quit;
proc lifetest data=myel plots=(s) graphics;
time dur*censor(0);
strata treat;
symbol1 v=none color=black line=1;
symbol2 v=none color=red line=2;
run ; quit;
STT520-420
37 proc lifetest data=myel plots=(s) graphics;
time dur*censor(0);
strata renal;
symbol1 v=none color=black line=1;
symbol2 v=none color=red line=2;
run ;
proc lifetest data=myel plots=(s) graphics;
time dur*censor(0);
strata treat renal;
symbol1 v=none color=black line=1;
symbol2 v=none color=red line=2;
run ;
PROC LIFETEST: myelomatosis
STT520-420
38
PROC LIFETEST: myelomatosis, with all data
STT520-420
39
PROC LIFETEST: myelomatosis, with STRATA
STT520-420
40
PROC LIFETEST: myelomatosis, with log-rank test
STT520-420
41
H0: The survival function for groups are the same;
Ha: The survival function for groups are NOT the same;