roc curves studentslev3

7/28/2019 ROC Curves Studentslev3

1/43

2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Receiver Operating Characteristic

(ROC) Curves

Assessing the predictive properties of a

test statistic Decision Theory


2/43


Binary Prediction ProblemConceptual Framework

Suppose we have a test statistic for predicting the

presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos

Neg


3/43





True Disease Status

Pos Neg

Test

Criterion

Pos

Neg


4/43





True Disease Status

Pos Neg

Test

Criterion

Pos TP

Neg

TP = True Positive


5/43





True Disease Status

Pos Neg

Test

Criterion

Pos

Neg


6/43





True Disease Status

Pos Neg

Test

Criterion

Pos FP

Neg

FP = False Positive


7/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH




True Disease Status

Pos Neg

Test

Criterion

Pos

Neg






True Disease Status

Pos Neg

Test

Criterion

Pos

Neg FN

FN = False Negative






True Disease Status

Pos Neg

Test

Criterion

Pos

Neg






True Disease Status

Pos Neg

Test

Criterion

Pos

Neg TN

TN = True Negative




True Disease Status

Pos Neg

Test

Criterion

Pos TP FP

Neg FN TN

P N P+ N





Binary Prediction ProblemTest Properties

True Disease Status

Pos Neg

Test

Criterion

Pos TP FP

Neg FN TN

P N P+ N

Accuracy = Probability that the test yields acorrect result.

= (TP+TN) / (P+N)


14/43



True Disease Status

Pos Neg

Test

Criterion

Pos TP FP

Neg FN TN

P N P+ N

Sensitivity = Probability that a true case will test positive= TP / P

Also referred to as True Positive Rate (TPR)

orTrue Positive Fraction (TPF).


15/43



True Disease Status

Pos Neg

Test

Criterion

Pos TP FP

Neg FN TN

P N P+ N

Specificity = Probability that a true negative will test negative= TN / N

Also referred to as True Negative Rate (TNR)

orTrue Negative Fraction (TNF).


16/43



True Disease Status

Pos Neg

Test

Criterion

Pos TP FP

Neg FN TN

P N P+ N

1-Specificity = Prob that a true negative will test positive= FP / N

Also referred to as False Positive Rate (FPR)

orFalse Positive Fraction (FPF).


17/43



True Disease Status

Pos Neg

Test

Criterion

Pos TP FP

Neg FN TN

P N P+ N

Positive PredictiveValue (PPV) = Probability that a positive testwill truly have disease

= TP / (TP+FP)


18/43



True Disease Status

Pos Neg

Test

Criterion

Pos TP FP

Neg FN TN

P N P+ N

Negative PredictiveValue (NPV) = Probability that a negative testwill truly be disease free

= TN / (TN+FN)


19/43


Binary Prediction ProblemExample

True Disease Status

Pos Neg

Test

Criterion

Pos 27 173 200

Neg 73 727 800

100 900 1000

27/100 = .27Se =

Sp = 727/900 = .81

FPF = 1- Sp = .19

Acc = (27+727)/1000 = .75

PPV = 27/200 = .14

NPV = 727/800 = .91


20/43



Of these properties, only Se and Sp (and hence FPR)

are considered invariant test characteristics.

Accuracy, PPV, and NPV will vary according to theunderlying prevalence of disease.

Se and Sp are thus fundamental test properties and

hence are the most useful measures for comparing

different test criteria, even though PPV and NPV areprobably the most clinically relevant properties.


21/43


ROC Curves

Now assume that our test statistic is no longer binary,but takes on a series of values (for instance howmany of five distinct risk factors a person exhibits).

Clinically we make a rule that says the test is positiveif the number of risk factors meets or exceeds somethreshold (#RF >x)

Suppose our previous table resulted from

usingx

= 4. Lets see what happens as we vary x.


22/43


ROC CurvesImpact of using a threshold of 3 or more RFs

True Disease Status

Pos Neg

Test

Criterion

Pos 45 200 245

Neg 55 700 755

100 900 1000

27/100 = .45Se =

Sp = 727/900 = .78

FPF = 1- Sp = .22

Acc = (27+727)/1000 = .75

PPV = 27/200 = .18

NPV = 727/800 = .93

Se, Sp, and interestingly both PPV and NPV

.27

.81 .14

.91

.75

200

800


23/43


ROC CurvesSummary of all possible options

Threshold TPR FPR

6 0.00 0.00

5 0.10 0.114 0.27 0.19

3 0.45 0.22

2 0.73 0.27

1 0.98 0.80

0 1.00 1.00

As we relax our thresholdfor defining disease, our

true positive rate

(sensitivity) increases, butso does the false positiverate (FPR).

The ROC curve is a way tovisually display this

information.


24/43


ROC CurvesSummary of all possible options

Threshold TPR FPR

6 0.00 0.00

5 0.10 0.114 0.27 0.19

3 0.45 0.22

2 0.73 0.27

1 0.98 0.80

0 1.00 1.00

x=5

x=4

x=2

The diagonal line shows what we would expect

from simple guessing (i.e., pure chance).

What might an even better ROC curve look like?


25/43


ROC CurvesSummary of a more optimal curve

Threshold TPR FPR

6 0.00 0.00

5 0.10 0.014 0.77 0.02

3 0.90 0.03

2 0.95 0.04

1 0.99 0.40

0 1.00 1.00Note the immediate sharp rise in

sensitivity. Perfect accuracy is

represented by upper left corner.


26/43


ROC CurvesUse and interpretation

The ROC curve allows us to see, in a simple

visual display, how sensitivity and specificity

vary as our threshold varies. The shape of the curve also gives us some

visual clues about the overall strength of

association between the underlying test

statistic (in this case #RFs that are present)

and disease status.


27/43



The ROC methodologyeasily generalizes to teststatistics that are

continuous (such as lungfunction or a blood gas).We simply fit a smoothedROC curve through all

observed data points.


28/43



See demo from

www.anaesthetist.com/mnm/stats/roc/index.htm
http://www.anaesthetist.com/mnm/stats/roc/index.htmhttp://www.anaesthetist.com/mnm/stats/roc/index.htm


29/43


ROC CurvesArea under the curve (AUC)

The total area of the grid

represented by an ROC

curve is 1, since both TPR

and FPR range from 0 to 1.

The portion of this total

area that falls below the

ROC curve is known asthearea under the curve,

orAUC.


30/43


Area Under the Curve (AUC)Interpretation

The AUC serves as a quantitative summary ofthe strength of association between theunderlying test statistic and disease status.

An AUC of 1.0 would mean that the teststatistic could be used to perfectly discriminatebetween cases and controls.

An AUC of 0.5 (reflected by the diagonal 45line) is equivalent to simply guessing.


31/43



The AUC can be shown to equal the Mann-

Whitney U statistic, or equivalently the Wilcoxon

rank statistic, for testing whether the testmeasure differs for individuals with and

without disease.

It also equals the probability that the value of our

test measure would be higher for a randomly

chosen case than for a randomly chosen control.


32/43



FPR

TPR

1

0 1

ROC Curve

AUC

~ 0.540

casescontrols


33/43


AUC

~ .95

TPR

1

0 1FPR

ROC Curve


casescontrols


34/43



What defines a good AUC?

Opinions vary

Probably context specific What may be a good AUC for predicting COPD

may be very different than what is a good AUC

for predicting prostate cancer


35/43



http://gim.unmc.edu/dxtests/roc3.htm

.90-1.0 = excellent

.80-.90 = good .70-.80 = fair

.60-.70 = poor

.50-.60 = fail

Remember that


36/43



www.childrens-mercy.org/stats/ask/roc.asp

.97-1.0 = excellent

.92-.97 = very good .75-.92 = good

.50-.75 = fair
http://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asp


37/43


ROC CurvesComparing multiple ROC curves

Suppose we have two candidate test

statistics to use to create a binary decision

rule. Can we use ROC curves to choosean optimal one?


38/43



Adapted from curves at: http://gim.unmc.edu/dxtests/roc3.htm


39/43



http://en.wikipedia.org/w

iki/Receiver_operating_

characteristic


40/43



We can formally compare AUCs for twocompeting test statistics, but does thisanswer our question?

AUC speaks to which measure, as acontinuous variable, best discriminatesbetween cases and controls?

It does not tell us which specific cutpoint touse, or even which test statistic will ultimatelyprovide the best cutpoint.


41/43


ROC CurvesChoosing an optimal cutpoint

The choice of a particular Se and Sp should reflect therelative costs of FP and FN results.

What if a positive test triggers an invasive procedure?

What if the disease is life threatening and I have aninexpensive and effective treatment?

How do you balance these and other competing factors?

See excellent discussion of these issues atwww.anaesthetist.com/mnm/stats/roc/index.htm
http://www.anaesthetist.com/mnm/stats/roc/index.htmhttp://www.anaesthetist.com/mnm/stats/roc/index.htm


42/43


ROC CurvesGeneralizations

These techniques can be applied to any binary

outcome. It doesnt have to be disease status. In fact, the use of ROC curves was first introduced during

WWII in response to the challenge of how to accurately

identify enemy planes on radar screens.


43/43

ROC CurvesFinal cautionary notes

We assume throughout the existence of a goldstandard for measuring disease, when in practice nosuch gold standard exists. COPD, asthma, even cancer (can we truly rule out the absence of

cancer in a given patient?)

As a result, even Se and Sp may not be inherentlystable test characteristics, but may vary depending onhow we define disease and the clinical context in which

it is measured. Are we evaluating the test in the general population or only amongpatients referred to a specialty clinic?

Incorrect specification of P and N will vary in these two settings.

roc curves studentslev3

Documents