robust clinical prediction

25
1 Luigi Salmaso Associate Professor of Statistics University of Padova Research Group for the Bladder Cancer multicentric study : PF. Bassi, C. Brombin, L. Corain, M. Racioppi, L. Salmaso ROBUST CLINICAL PREDICTION INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008

Upload: brock

Post on 12-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008. ROBUST CLINICAL PREDICTION. Topics. Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY Case study: INVASIVE BLADDER CANCER - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ROBUST CLINICAL PREDICTION

1

Luigi SalmasoAssociate Professor of StatisticsUniversity of PadovaResearch Group for the Bladder Cancer multicentric study: PF. Bassi, C. Brombin, L. Corain, M. Racioppi, L. Salmaso

ROBUST CLINICAL PREDICTION

INTERNATIONAL SYMPOSIUM OF UROLOGYFUT-UROLOGY 2008

Page 2: ROBUST CLINICAL PREDICTION

2

Topics

• Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY

• Case study: INVASIVE BLADDER CANCER

• Application and results of several statistical methods to the case study

• Robust clinical prediction using the NonParametric Combination of Dependent Permutation Tests (NPC Test)

• Conclusions and practical suggestions

Page 3: ROBUST CLINICAL PREDICTION

3

Necessary steps for ‘optimal’ statistical predictions

• Study design• Collecting data using

a Web-based Database

Study protocol…………………… ……………………….……………………. ………………………. ……………………. ……………………….

Robust Statistical Analysis by suitable statistical methods (e.g. Nonparametric permutation methods)

Individual predictions based, e. g., on nomograms or other techniques

Page 4: ROBUST CLINICAL PREDICTION

4

Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY

•The availability of an electronic database can improve the quality and completeness of collected data, reducing, in particular, the number of missing data and the risk of imputation errors.

•Accuracy in defining the nature (observational/ randomized/…) and the endpoints of the study can lead to a better choice of the sample size and of the subsequent statistical analysis to perform.

Page 5: ROBUST CLINICAL PREDICTION

5

ELECTRONIC DATABASE : An example

WEB-based Database

Variables’ coding

WEB-based Database

Page 6: ROBUST CLINICAL PREDICTION

6

NonParametric Combination of Dependent Permutation Tests (NPC Test)

STATISTICAL ANALYSIS: standard methods and recent advances

Survival Analysis

Months

120100806040200

Cum

Sur

viva

l

1.0

.8

.6

.4

.2

0.0

Survival Function

Censored

Univariate Test (Student t test, Wilcoxon)

0%5%

10%15%20%25%30%35%40%45%50%

0-1 2-3 4-5 6-7 8-9 >=10Tumour (Phase III)

% o

f pat

ient

s

NEDDOD+AWD

Student's t: p =0.000Wilkoxon: p =0.000

Classification complex methods (Neural Networks,

Artificial Intelligence, …)

Multivariate Methods (Logistic regression, …)

Page 7: ROBUST CLINICAL PREDICTION

7

Case study: INVASIVE BLADDER CANCER

Total sample size: 1,003 subjects

469 subjects including DOD (Dead of Disease) and AWD (Alive with Disease, i.e. “statistically” died) patients

534 subjects including NED (Non Evidence of Disease) patients

Lost patients and DOC (Dead for Other Causes) patients were excluded

Aim of the study: Detecting variables (factors) that best predict the outcome (DEAD or ALIVE) after a BLADDER CANCER DIAGNOSIS

Italian multicentric observational study (from Jan 2001 to Dec 2006)

Reference: prof. PF. Bassi (Univ. Cattolica, Rome)

Page 8: ROBUST CLINICAL PREDICTION

8

• TNM-Classification of Bladder Cancer has been used, according to Wittekind & Sobin (2002), thus the original variables were transformed into ordinal variables. 30 endpoints were considered as relevant for the statistical analysis.

Case study: INVASIVE BLADDER CANCER

First sympton Diagnosispatient state of health at the first medical visit

I Phase

Diagnosispatient condition after bladder cancer diagnosis

II Phase

Surgerypatient state after surgery (histopathological variables were examined)

DiagnosisIII Phase

• In particular, the interest is in evaluating the importance of endpoints, collected at three phases of the study, in predicting the outcome.

Page 9: ROBUST CLINICAL PREDICTION

9Months

120100806040200

Cum

Sur

viva

l

1.0

.8

.6

.4

.2

0.0

Survival Function

Censored

Results of Kaplan-Meier (survival analysis)

(artificial example)

Page 10: ROBUST CLINICAL PREDICTION

10

0%10%20%30%40%50%60%70%80%90%

100%

0 1 2 3Grading (Phase III)

% o

f pat

ient

s

NEDDOD+AWD

Student's t: p =0.000Wilkoxon: p =0.000

Results of univariate tests

0%10%20%30%40%50%60%70%80%90%

100%

0 1Desease restarting (Phase III)

% o

f pat

ient

s

NEDDOD+AWD

Student's t: p =0.000Wilkoxon: p =0.000

0%

10%

20%

30%

40%

50%

60%

0-1 2-3 4-5 6-7 8-9 >=10Tumour (Phase II)

% o

f pat

ient

s

NEDDOD+AWD

Student's t: p =0.000Wilkoxon: p =0.000

0%

10%

20%

30%

40%

50%

60%

0-1 2-3 4-5 6-7 8-9 >=10Tumour (Phase II)

% o

f pat

ient

s

NEDDOD+AWD

Student's t: p =0.000Wilkoxon: p =0.000

Page 11: ROBUST CLINICAL PREDICTION

11

• The logistic regression model has been applied to the same dataset but very poor results were obtained (only two significant predictors: Stage TNM at I and II Phase)

• The main problems for application:

– the inability of logistic regression to handle missing values (missing data are present in 522 subjects out of 1,003 individuals);

– the high number of coefficients to be estimated so that the recursive algorithm do not converge (after 1000 iterations). Note that when convergence is not achieved for parameter estimates, results may be unreliable.

Results of Logistic Regression

Page 12: ROBUST CLINICAL PREDICTION

12

Phase Predictor estimated coefficient p-value

Constant -2,743 0.006 Previous superficial TCC (Transitional Cell Carcinoma) 1,186 0.288 Focality 0,911 0.058 Stage TNM -0,126 0.521 Grading -0,345 0.186 I P

hase

Carcinoma In Situ (CIS) -0,565 0.447 Focality -0,098 0.805 Stage TNM 0,129 0.026 Carcinoma In Situ (CIS) 0,381 0.473 Grading -0,132 0.576 Regional lymph nodes -0,754 0.314 Metastases 0,000 1.000 II

Phas

e

Highway urinary obstruction 0,445 0.050 Stage TNM 0,109 0.035 Carcinoma In Situ (CIS) 0,280 0.376 Grading 0,257 0.352 Regional lymph nodes 1,009 0.083 Metastases 21,000 0.999 Histoloy 0,209 0.133 Trigone infiltration -0,361 0.158 Corpus invasion -0,459 0.136 Urethral involvement -0,972 0.099 Vascular invasion 0,583 0.158 Lymphonodal invasion 0,466 0.075 Prostatic Invasion 0,510 0.181 Adenocarcinoma of the Prostate 0,115 0.694 Highway TCC (Transitional Cell Carcinoma) 0,414 0.441 Desease restarting 41,000 0.993 Chemotherapy before surgery -1,587 0.161 Chemoterapy after surgery -0,952 0.180

III P

hase

Theraphy restarting -20,000 0.996

Results of Logistic Regression

Page 13: ROBUST CLINICAL PREDICTION

13

Results of Logistic Regression: Number and % of missing values by variablePhase Variable No. of missing % of missing

Previous superficial TCC (Transitional Cell Carcinoma) 41 4% Focality 18 2% Stage TNM 44 4% Grading 37 4% Carcinoma In Situ (CIS) 12 1% I P

hase

Focality 147 15% Stage TNM 124 12% Carcinoma In Situ (CIS) 96 10% Grading 128 13% Regional lymph nodes 82 8% Metastases 137 14% II

Phas

e

Highway urinary obstruction 41 4% Stage TNM 70 7%

Carcinoma In Situ (CIS) 44 4% Grading 140 14% Regional lymph nodes 7 1% Metastases 65 6% Histoloy 82 8% Trigone infiltration 100 10% Corpus invasion 145 14% Urethral involvement 110 11% Vascular invasion 144 14% Lymphonodal invasion 117 12% Prostatic Invasion 187 19% Adenocarcinoma of the Prostate 131 13% Highway TCC (Transitional Cell Carcinoma) 87 9% Desease restarting 102 10% Chemotherapy before surgery 50 5% Chemoterapy after surgery 1 0%

III P

hase

Theraphy restarting 87 9%

Mean (missing values): 85,9

% mean (missing values): 9%

Subjects with at least one missing values: 522 (52%)

Phase Variable No. of missing % of missing Previous superficial TCC (Transitional Cell Carcinoma) 41 4% Focality 18 2% Stage TNM 44 4% Grading 37 4% Carcinoma In Situ (CIS) 12 1% I P

hase

Focality 147 15% Stage TNM 124 12% Carcinoma In Situ (CIS) 96 10% Grading 128 13% Regional lymph nodes 82 8% Metastases 137 14% II

Phas

e

Highway urinary obstruction 41 4% Stage TNM 70 7%

Carcinoma In Situ (CIS) 44 4% Grading 140 14% Regional lymph nodes 7 1% Metastases 65 6% Histoloy 82 8% Trigone infiltration 100 10% Corpus invasion 145 14% Urethral involvement 110 11% Vascular invasion 144 14% Lymphonodal invasion 117 12% Prostatic Invasion 187 19% Adenocarcinoma of the Prostate 131 13% Highway TCC (Transitional Cell Carcinoma) 87 9% Desease restarting 102 10% Chemotherapy before surgery 50 5% Chemoterapy after surgery 1 0%

III P

hase

Theraphy restarting 87 9%

Page 14: ROBUST CLINICAL PREDICTION

14

The multivariate permutation approach for hypothesis testing by NonParametric Combination (NPC) offers the following advantages:

PERMUTATION APPROACH FOR HYPOTHESIS TESTING

No need to specify the dependence structure among variables

Exact solutions

Powerful testsTreatment of missing values (missing completely at random, MCAR, or not completely at random, not-MAR)

It also deals with:- Stratification- Multivariate

categorical variables

It handles:- Mixed variables- Multivariate restricted alternatives

• NPC Test implements methods and algorithms presented in several international papers by prof. L. Salmaso and prof. F. Pesarin. L. Salmaso leads an internationally recognised research group in theoretical and applied nonparametric statistics.

• NPC TEST is a unique and innovative statistical method (and software) that provides researchers with authentic and powerful innovative solutions in the field of hypotheses testing.

Robust statistical prediction using NPC Test

Page 15: ROBUST CLINICAL PREDICTION

15

Robust statistical prediction using NPC Test

FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0

• NPC TEST allows us to perform hypothesis testing in the case of:Two and C samples with dependent or independent variables

Two and C samples with repeated measures

Stratified analysis

• NPC TEST also provides: Powerful test statistics for the treatment

of missing values One or two tailed test

• Data (including mixed variables): categorical

ordered categorical

numeric or continuous

binary

Page 16: ROBUST CLINICAL PREDICTION

16

t StatisticANOVA

differ. of means

test statistics - missing values

Anderson Darling

Cramer-Von-Mises

Chi-square

ModifiedChi-square

Likelihood Ratio

Robust statistical prediction using NPC TestFEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0

Combining functions for intermediate tests include:

An innovation of NPC TEST w.r.t. existing methods consists in the performance of any combination of tests, starting with an appropriate set of elementary tests, leading to a multivariate or multistrata overall global test through the NPC methodology.

Elementary partial test statistics include:

Fisher Liptak Tippet Direct

NPC TEST supports all statistical software standard functions: data import, data manipulating and produces an effective report that can be easily integrated and customized by means of an efficient text editor.

Page 17: ROBUST CLINICAL PREDICTION

17

Robust statistical prediction using NPC Test

Page 18: ROBUST CLINICAL PREDICTION

18

• After processing variables thus obtaining p-values using NPC methods, we also performed a control of the familywise error rate (FWE)

• The need for multiplicity control arises when any problem is structured into two or more experimental hypotheses (Finos and Salmaso, 2006)

• In order to have an inference on all the hypotheses defining the multivariate problem, it is necessary to control the probability of erroneously rejecting at least one univariate (elementary) hypothesis; this is called multivariate type I error or familywise error rate (FWE) (Marcus et al., 1976)

Robust statistical prediction using NPC Test

Page 19: ROBUST CLINICAL PREDICTION

19

Robust statistical prediction using NPC Test

CLOSED TESTING GRAPHICAL REPRESENTATION

Page 20: ROBUST CLINICAL PREDICTION

20

p-value Phase Variables (explanation) univariate

(partial test) 1st

combination Previous superficial TCC (Transitional Cell Carcinoma) n.s Focality n.s Stage TNM n.s Grading n.s I P

hase

Carcinoma In Situ (CIS) n.s

n.s.

p-value Phase Variables (explanation) univariate

(partial test) 1st

combination Focality n.s Stage TNM 0,0045 Carcinoma In Situ (CIS) n.s Grading n.s Regional lymph nodes 0,0014 Metastases n.s

II Ph

ase

Highway urinary obstruction 0,0007

0,0007

Results of NPC Test

Page 21: ROBUST CLINICAL PREDICTION

21

p-value Phase Variables (explanation) univariate

(partial test) 1st

combination Stage TNM 0,0011 Carcinoma In Situ (CIS) 0,0088 Grading 0,0011 Regional lymph nodes 0,0006 Metastases n.s. Histoloy n.s

0,0006

Vesical trigone infiltration n.s Corpus invasion 0,0027 Urethral involvement n.s. Vascular invasion 0,0005 Lymphonodal invasion 0,0005 Prostatic Invasion 0,0085

0,0005

Adenocarcinoma of the Prostate n.s Highway TCC (Transitional Cell Carcinoma) n.s.

n.s

Desease restarting 0,0004 Chemotherapy before surgery n.s Chemoterapy after surgery 0,0004

III P

hase

Theraphy restarting 0,0004

0,0002

Results of NPC Test

Page 22: ROBUST CLINICAL PREDICTION

22

p-value Phase 1st

combination 2nd combination (global test)

I Phase n.s.

II Phase 0,0007

0,0006

0,0005

n.s III Phase

0,0002

0,0013

Results of NPC Test

Page 23: ROBUST CLINICAL PREDICTION

23

• NPC method can offer a significant contribution to successful research in biomedical studies with several endpoints

• The advantages of NPC Test are connected with its flexibility of handling any type of variables

• We recommended the use of this methodology whenever the normality assumption is hard to justify, in presence of missing values and when the number of variables is higher than the number of subjects

Conclusions and practical suggestions

Page 24: ROBUST CLINICAL PREDICTION

24

Bassi P.F., Pagano F. (2007). Invasive Bladder Cancer. Springer. Corain L., Salmaso L. (2007). A critical review and a comparative study on conditional permutation

tests for two-way ANOVA. Communications in Statistics – Simulations and Computation, 36, 791-805.

Finos L., Salmaso L. (2006). Weighted methods controlling the multiplicity when the number of variables is much higher than the number of observations. Journal of Nonparametric Statistics, 18, 245-261.

Finos L., Salmaso L. (2006). FDR- and FWE-controlling methods using data-driven weights. Journal of Statistical Inference and Planning, 137, 3859-3870.

Finos L., Salmaso L., Solari A. (2007). Conditional Inference under simultaneous stochastic ordering constraints. Journal of Statistical Inference and Planning, 137, 2633-2641.

Marcus R., Peritz E., Gabriel K.R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63, 655-660.

Marozzi M., Salmaso L. (2006). Multivariate Bi-Aspect Testing for Two-Sample Location Problem. Communications in Statistics – Theory and Methods, 35, 477-488.

Salmaso L., Solari A. (2005). Multiple aspect testing for case-control designs. Metrika, 62, 331-340. Wittekind C., Sobin L. H. (2002). TNM Classification of malignant tumours UICC, International Union

Against cancer (6. ed.). Wiley-Liss, New York. http://www.gest.unipd.it/~salmaso/NPC_TEST.htm

REFERENCES

Page 25: ROBUST CLINICAL PREDICTION

25

• We applied a neural network model (Multilayer Perceptron) to the same dataset• By applying a k-fold cross-validation, we obtained a rate of right

classification of 75.3% for DOD+AWD and of 60.5% for NED. By using the subset of variables identified by univariate analysis we got a very similar performance (74.5% and 62.4%)

• Main problems of neural networks are:– Neural network work as black boxes, hence it is not possible to convert the

neuronal structure into a known model structure– All input fields ‘must’ be numeric (in the study we do not have numerical but

ordinal categorical variables)– Neuronal networks can suffer from a problem called interference (i.e. to

forget some of what it learned on older data)

Results of Neural Networks