roc analysis emily kistner-griffin, phd amy wahlquist, ms cancer prevention and control statistics...

44
ROC Analysis Emily Kistner-Griffin, PhD Amy Wahlquist, MS Cancer Prevention and Control Statistics Tutorial August 13, 2009

Upload: ronaldo-trim

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

ROC Analysis

Emily Kistner-Griffin, PhDAmy Wahlquist, MS

Cancer Prevention and Control Statistics TutorialAugust 13, 2009

Outline

I. Motivating Example: Chest CTII. ClassificationIII. Sensitivity and SpecificityIV. ROC curve and AUC estimation

a. Nonparametric Curveb. Parametric Curve

V. ROC and Logistic RegressionVI. Comparing ROC curves

I. Motivating Example: Chest CT

Evaluating the probability of malignancy in pulmonary nodules seen on chest CT in 213 MUSC patients from two cohorts

Sample of 194 subjects seen in pulmonary clinic and 19 subjects with CT previous to an unrelated surgical intervention

Develop a prediction model from clinical data and radiological characteristics of lung nodules

Chest CT

A model of P (malignancy) of pulmonary nodules has been described in the literature (Swensen SJ et al., 1997)

Model included three demographic characteristics: patient age, smoking status (ever vs. never), any history of cancer

Model included three radiological characteristics: diameter, upper lobe location, and spiculation

Chest CT

Swensen et al. reported an area under the reciever operating curve of 0.8014 ± 0.0360 in a validation sample, using a logistic regression approach.

Interested in how well Swensen’s model performs in the MUSC cohort.

Interested in evaluating whether we can improve the prediction model by including other patient characteristics

II. Classification

• Consider medical tests that are measured on a continuous or ordinal scale

• Goal: to describe the performance of the medical test in classifying subjects into individuals with and without disease

• Examples: PSA and CA-125 as biomarkers of prostate and ovarian cancer; BI-RADS for breast imaging (radiologist determined probability of malignancy)

Classification from CT

• Consider the diameter of the nodule as measured on the CT scan (range: 3.3mm-15mm)

• Larger nodules are more likely to be malignant (OR: 1.34, 95% CI: 1.20-1.49)

• How well can we predict malignancy from nodule diameter?

Contingency Table

  d<6 6≤d<8 8≤d<10 10≤d<12 12≤d≤15

Benign 41 30 29 17 24

Malignant 2 10 13 14 33

Classification Tables

• Choose a cut-point on continuous or ordinal scale in order to assign disease status

TruthD=1

TruthD=0

ClassifiedD=1 TP FP

ClassifiedD=0 FN TN

III. Sensitivity & Specificity

• For selected cut-point determine sensitivity and specificity of medical test (or prediction model)

• Sensitivity = Pr ( TP | + ) = TP / (TP+FN) = TPF

• Specificity = Pr ( TN | — ) = TN / (TN+FP) = TNF

• In order to summarize test characteristics – must compute sensitivity and specificity at multiple cut-points

Sensitivity & Specificity Example

Cut-point Sensitivity Specificity

6 0.972 0.291

8 0.833 0.504

10 0.653 0.709

12 0.458 0.830

From Metz CE (1978) Basic Principles of ROC Analysis. Seminars in Nuclear Medicine; 8 (4): 283 – 297.

05

1015

05

1015

3 6 9 12 15

0

1

Per

cent

diameterGraphs by malignant

0.00

0.20

0.40

0.60

0.80

1.00

3 6 9 12 15diameter

Sensitivity Specificity

Decision Threshold• Lowering the threshold increases TPF (sensitivity) and

the FPF (1-specificity)

• Raising the threshold decreases the TPF and the FPF

• Points representing all possible TPF and FPF lie on a curve – passing through the lower (0,0) corner when all tests are called negative and the upper (1,1) corner when all the tests are called positive

• If the test is informative then all other points on the curve must be above the diagonal (TP more likely than FP)

• The curve describing the compromises between TPF and FPF is called the ROC curve

0.00

0.25

0.50

0.75

1.00

Sen

sitiv

ity

0.00 0.25 0.50 0.75 1.001 - Specificity

Area under ROC curve = 0.7411

Detailed report of Sensitivity and Specificity------------------------------------------------------------------------------ CorrectlyCutpoint Sensitivity Specificity Classified LR+ LR-------------------------------------------------------------------------------( >= 3.3 ) 100.00% 0.00% 33.80% 1.0000 ( >= 4 ) 100.00% 1.42% 34.74% 1.0144 0.0000( >= 5 ) 97.22% 13.48% 41.78% 1.1236 0.2061( >= 6 ) 97.22% 29.08% 52.11% 1.3708 0.0955( >= 7 ) 93.06% 39.72% 57.75% 1.5436 0.1749( >= 8 ) 83.33% 50.35% 61.50% 1.6786 0.3310( >= 9.1 ) 70.83% 61.70% 64.79% 1.8495 0.4727( >= 10 ) 65.28% 70.92% 69.01% 2.2449 0.4896( >= 11 ) 56.94% 78.72% 71.36% 2.6764 0.5469( >= 12 ) 45.83% 82.98% 70.42% 2.6927 0.6528( >= 13 ) 25.00% 90.78% 68.54% 2.7115 0.8262( >= 14 ) 13.89% 95.04% 67.61% 2.7976 0.9061( >= 15 ) 0.00% 98.58% 65.26% 0.0000 1.0144( > 15 ) 0.00% 100.00% 66.20% 1.0000--------------------------------------------------------------------

roctab malignant diameter, detail graph

ROC -Asymptotic Normal--Obs Area Std. Err. [95% Conf. Interval]--------------------------------------------------------213 0.7411 0.0347 0.67317 0.80900

Likelihood Ratios

LR+ = sensitivity / (1-specificity) =TPFFPF

LR- = (1-sensitivity) / specificity = 1-TPF1-FPF

LR+ is the slope between the origin and the point onthe ROC curve and LR- is the slope between the point on thecurve and the (1,1) point (Choi 1998)

IV. ROC curve and AUC estimation

• ROC: Receiver Operating Characteristic

• Developed in signal detection theory to illustrate how the receiver deciphers between signal and noise (1960s)

• Illustration of two test characteristics: sensitivity and specificity at selected cut-points (decision thresholds)

• Popularized in medical testing in the field of Radiology (1980s)

ROC curve and Thresholds• ROC curve describes disease detection independent of

disease prevalence (sensitivity and specificity are also)

• Prevalence may help determine the operating threshold:

– Low prevalence suggests reducing FPF (higher specificity, higher threshold, lower part of the curve)

– High prevalence suggests increasing TPF (higher sensitivity, lower threshold, higher part of the curve)

• In practice, must consider costs and consequences of FP and FN before selecting the desirable cut-off:

– Consequence of FN: death?– Consequence of FP: stressful, costly work-up or treatment

Area Under the ROC Curve

• Summarizes the performance of the test

• Probability that the result of the test for a randomly selected abnormal subject will be greater than the result of the test for a randomly selected normal subject

• Average TPF: averaged across whole range of FPF in (0,1)

• Perfect test gives AUC = 1.0 and an uninformative test gives AUC=0.50

• Parametric and non-parametric approaches to constructing the ROC curve and calculating the area under the curve (AUC)

a. Nonparametric ROC Curve

• Constructed by plotting sensitivity and (1 – specificity) at each possible cut-point

• Area under the curve (AUC) constructed using the trapezoidal rule

• Variance estimators have been derived Delong et al. (1988), Hanley and McNeil (1982); Bamber (1975)

Variance of AUC

• Specifically for Delong et al. (1988) variance estimate:

0110

1

20101

1

21010

101

110

11

11)ˆvar(

}ˆ)({1

1 and}ˆ)({

1

1

),(1

)( and ),(1

)(

if 0, if 5.0, if 1),( where

),(1ˆ

Sn

Sm

YVn

SXVm

S

YXm

YVYXn

XV

XYXYXYYX

YXmn

n

jj

m

ii

m

ijij

n

jjii

m

iji

n

j

Confidence Intervals for AUC• Must consider distribution of AUC estimate:

asymptotically normal or binomial assumption

• Must select standard error estimate (Delong et al. approach is the default):

ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 213 0.7411 0.0347 0.67317 0.80900

. roctab malignant diameter, binomial

ROC -- Binomial Exact -- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 213 0.7411 0.0347 0.67754 0.79916

b. Parametric ROC Curve

• Assumes a binormal model

• A monotone transformation of the test results exists to give results that are normally distributed in the diseased and non-diseased populations

• Method involves fitting a straight line to the empirical ROC points by plotting using normal probability scales on each axis (plot inverse of the standard normal cumulative distribution function for sensitivity and specificity)

• Intercept of the line is the standardized difference in the continuous variable between the two populations; slope is a ratio of the standard deviations

Parametric AUC Estimation

AUC is a function of the slope and intercept of theestimated line – using the standard normalcumulative distribution function

21

/ and /)(Let

b

a

ba DDDDD

Nonparametric vs. Parametric

• Parametric approaches assume a binormal distribution to makes inferences (obtain MLE): only when the assumption is true are the estimators unbiased

• With continuous data a nonparametric approach is recommended

• With discrete ratings a parametric approach is recommended as nonparametric approaches tend to underestimate the true AUC

• Note standard error of the AUC is smaller using a continuous scale

. rocfit malignant diameter, cont(10)

Fitting binormal model:

Binormal model of malignant on diameter Number of obs = 213Goodness-of-fit chi2(7) = 8.52Prob > chi2 = 0.2894Log likelihood = -456.16837

------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- intercept | 0.997803 0.181857 5.49 0.000 0.641370 1.354236 slope (*) | 1.170680 0.139487 1.22 0.221 0.897290 1.444070-------------+---------------------------------------------------------------- /cut1 | -1.296367 0.141750 -9.15 0.000 -1.574192 -1.018542 /cut2 | -0.668255 0.110960 -6.02 0.000 -0.885733 -0.450777 /cut3 | -0.222392 0.102919 -2.16 0.031 -0.424110 -0.020674 /cut4 | 0.202507 0.101135 2.00 0.045 0.004286 0.400729 /cut5 | 0.499186 0.103559 4.82 0.000 0.296214 0.702159 /cut6 | 0.756664 0.109249 6.93 0.000 0.542539 0.970788 /cut7 | 1.040925 0.119741 8.69 0.000 0.806237 1.275614 /cut8 | 1.541544 0.150124 10.27 0.000 1.247307 1.835781 /cut9 | 2.369036 0.244933 9.67 0.000 1.888975 2.849096------------------------------------------------------------------------------------------------------------------------------------------------------------ | Indices from binormal fit Index | Estimate Std. Err. [95% Conf. Interval]-------------+---------------------------------------------------------------- ROC area | 0.741532 0.034471 0.673970 0.809094 delta(m) | 0.852328 0.144007 0.570080 1.134576 d(e) | 0.919346 0.151542 0.622329 1.216364 d(a) | 0.916517 0.150751 0.621050 1.211985------------------------------------------------------------------------------

(*) z test for slope==1

. rocplot, confband

0.2

5.5

.75

1S

ens

itivi

ty

0 .25 .5 .75 11 - Specificity

Area under curve = 0.7415 se(area) = 0.0345

V. ROC and Logistic Regression

• Prediction Model from Chest CT

• Use logistic regression to create probabilities of malignancy (represent diagnostic results from multiple predictors)

• Compare two logistic models of malignancy – one from previous literature and model with selected variables from the MUSC data

• Variables suggested in Swensen SJ et al. + surgical cohort (variable describing collection of samples)

• Variables selected using backwards regression in MUSC data

. logistic malignant surgical_cohort patient_age any_non_lung_cancer_history lung_cancer_history smoker_ever diameter upper_lobe spiculated

Logistic regression Number of obs = 207 LR chi2(8) = 73.20 Prob > chi2 = 0.0000Log likelihood = -94.454613 Pseudo R2 = 0.2793

------------------------------------------------------------------------------ malignant | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------surgical_c~t | 7.045799 4.91585 2.80 0.005 1.794929 27.65751 patient_age | .9933921 .0184868 -0.36 0.722 .9578115 1.030294any_non_lu~y | 4.017493 1.537066 3.63 0.000 1.897978 8.50392lung_cance~y | 10.43958 8.011157 3.06 0.002 2.319987 46.9765 smoker_ever | 1.026627 .5138138 0.05 0.958 .3849437 2.737967 diameter | 1.233463 .0787204 3.29 0.001 1.088433 1.397817 upper_lobe | 1.483983 .5613942 1.04 0.297 .7069965 3.114874 spiculated | 2.094564 .8488535 1.82 0.068 .9465232 4.635065------------------------------------------------------------------------------

. predict swensen

. lsens, gensens(sensitivity) genspec(specificity) genpr(cutoffs)

0.0

00

.25

0.5

00

.75

1.0

0S

ens

itivi

ty/S

pec

ifici

ty

0.00 0.25 0.50 0.75 1.00Probability cutoff

Sensitivity Specificity

. lroc

0.0

00

.25

0.5

00

.75

1.0

0S

ens

itivi

ty

0.00 0.25 0.50 0.75 1.001 - Specificity

Area under ROC curve = 0.8344

Use saved predicted probabilities from logistic model:

. roctab malignant swensen

ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 207 0.8344 0.0294 0.77682 0.89203

. roctab malignant swensen, graph

Postestimation: 95% CI

Again use saved predicted probabilities fromlogistic model:

. roccomp malignant diameter swensen

ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval]-------------------------------------------------------------------------diameter 207 0.7351 0.0357 0.66518 0.80499swensen 207 0.8344 0.0294 0.77682 0.89203-------------------------------------------------------------------------Ho: area(diameter) = area(swensen) chi2(1) = 9.52 Prob>chi2 = 0.0020

VI. Comparing ROC curves

Using quantities defined by Delong et al. for variance estimation

to define chi-squared test statistic:

Testing AUC Equality

21

0110

10101

,01

11010

,10

~)ˆ()()ˆ(

11

}ˆ)(}{ˆ)({1

1

}ˆ)(}{ˆ)({1

1

LLLSL

Sn

Sm

S

YVYVn

S

XVXVm

S

n

j

sj

srj

rsr

m

i

si

sri

rsr

Models with Multiple Predictors. logistic malignant diameter any_non_lung_cancer_history

surgical_cohort lung_cancer_history pet_positive pack

Logistic regression Number of obs = 206 LR chi2(6) = 112.09 Prob > chi2 = 0.0000Log likelihood = -75.983489 Pseudo R2 = 0.4245

------------------------------------------------------------------------------ malignant | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- diameter | 1.218243 .08847 2.72 0.007 1.05662 1.404588any_non_lu~y | 3.830492 1.693158 3.04 0.002 1.610666 9.109691surgical_c~t | 6.996053 5.682876 2.39 0.017 1.423719 34.37811lung_cance~y | 10.16367 8.299078 2.84 0.005 2.051197 50.36092pet_positive | 11.38458 5.025505 5.51 0.000 4.79259 27.04355 pack | 1.007755 .0046908 1.66 0.097 .9986032 1.016991------------------------------------------------------------------------------

. predict musc

. roccomp malignant diameter swensen musc, graph summary

0.00

0.25

0.50

0.75

1.00

Sen

sitiv

ity

0.00 0.25 0.50 0.75 1.001-Specificity

diameter ROC area: 0.741 swensen ROC area: 0.8344musc ROC area: 0.8987 Reference

. roccomp malignant diameter swensen musc

ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval]-------------------------------------------------------------------------diameter 202 0.7410 0.0359 0.67062 0.81131swensen 202 0.8344 0.0298 0.77605 0.89272musc 202 0.8987 0.0230 0.85374 0.94372-------------------------------------------------------------------------Ho: area(diameter) = area(swensen) = area(musc) chi2(2) = 22.81 Prob>chi2 = 0.0000

. rocgold malignant swensen diameter musc

------------------------------------------------------------------------------- ROC Bonferroni Area Std. Err. chi2 df Pr>chi2 Pr>chi2-------------------------------------------------------------------------------swensen (standard) 0.8344 0.0298diameter 0.7410 0.0359 8.2690 1 0.0040 0.0081musc 0.8987 0.0230 8.6304 1 0.0033 0.0066-------------------------------------------------------------------------------

Questions?

Next: ROC in SPSS

b. Lorenz Curves

• ROC curve represents a monotone increasing function of the FPF (1-specificity)

• If the risk of disease does not vary monotonically with the diagnostic test then the ROC may not be convex

• Lee (1999) suggested a Lorenz curve (used commonly in economics) for such data

• The methodology involves reordering the test results to ensure that the ratio of disease subjects / no disease subjects in each category is increasing

• Must consider whether reordering makes practical sense (usually sensible on an ordinal scale but not necessarily on a continuous scale)

0.1

.2.3

.4.5

.6.7

.8.9

1cu

mul

ativ

e %

of m

alig

nant

=1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1cumulative % of malignant=0

Lorenz curve

Defining Lorenz Curves

• Plot cumulative percent of individuals with disease against the cumulative percent of individuals without the disease at each cut-point

• Examples when a Lorenz might be appropriate:– Test has similar means but different variances across

populations with and without disease– Bimodal distribution of test in either population– Skewed distribution in population with disease and symmetric

distribution in population without the disease

• A flatter Lorenz curve suggests a worse diagnostic test

• Two summary indices describe the curvature – Gini index: twice the area between the Lorenz curve and the

diagonal line – Pietra index: twice the area of the largest triangle inscribed

between the diagonal line and the curve

Lorenz Curves and ROC

. roctab malignant diameter, lorenz graph

. roctab malignant diameter, lorenz

Lorenz curve --------------------------- Pietra index = 0.2322 Gini index = 0.3301

• If the at-risk probabilities increase (or decrease) with increasing values of the test results then Gini = 2(AUC)-1

• Larger Pietra and Gini indices describe better diagnostic tests

• Gini index is related to average difference in post-test probabilities for two randomly selected subjects and Pietra index is related to average absolute change between pre and post test probabilities of disease