roc curves studentslev3
TRANSCRIPT
-
7/28/2019 ROC Curves Studentslev3
1/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Receiver Operating Characteristic
(ROC) Curves
Assessing the predictive properties of a
test statistic Decision Theory
-
7/28/2019 ROC Curves Studentslev3
2/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos
Neg
-
7/28/2019 ROC Curves Studentslev3
3/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos
Neg
-
7/28/2019 ROC Curves Studentslev3
4/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos TP
Neg
TP = True Positive
-
7/28/2019 ROC Curves Studentslev3
5/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos
Neg
-
7/28/2019 ROC Curves Studentslev3
6/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos FP
Neg
FP = False Positive
-
7/28/2019 ROC Curves Studentslev3
7/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos
Neg
-
7/28/2019 ROC Curves Studentslev3
8/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos
Neg FN
FN = False Negative
-
7/28/2019 ROC Curves Studentslev3
9/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos
Neg
-
7/28/2019 ROC Curves Studentslev3
10/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos Neg
Test
Criterion
Pos
Neg TN
TN = True Negative
-
7/28/2019 ROC Curves Studentslev3
11/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
True Disease Status
Pos Neg
Test
Criterion
Pos TP FP
Neg FN TN
P N P+ N
Suppose we have a test statistic for predicting the
presence or absence of disease.
-
7/28/2019 ROC Curves Studentslev3
12/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemConceptual Framework
-
7/28/2019 ROC Curves Studentslev3
13/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemTest Properties
True Disease Status
Pos Neg
Test
Criterion
Pos TP FP
Neg FN TN
P N P+ N
Accuracy = Probability that the test yields acorrect result.
= (TP+TN) / (P+N)
-
7/28/2019 ROC Curves Studentslev3
14/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemTest Properties
True Disease Status
Pos Neg
Test
Criterion
Pos TP FP
Neg FN TN
P N P+ N
Sensitivity = Probability that a true case will test positive= TP / P
Also referred to as True Positive Rate (TPR)
orTrue Positive Fraction (TPF).
-
7/28/2019 ROC Curves Studentslev3
15/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemTest Properties
True Disease Status
Pos Neg
Test
Criterion
Pos TP FP
Neg FN TN
P N P+ N
Specificity = Probability that a true negative will test negative= TN / N
Also referred to as True Negative Rate (TNR)
orTrue Negative Fraction (TNF).
-
7/28/2019 ROC Curves Studentslev3
16/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemTest Properties
True Disease Status
Pos Neg
Test
Criterion
Pos TP FP
Neg FN TN
P N P+ N
1-Specificity = Prob that a true negative will test positive= FP / N
Also referred to as False Positive Rate (FPR)
orFalse Positive Fraction (FPF).
-
7/28/2019 ROC Curves Studentslev3
17/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemTest Properties
True Disease Status
Pos Neg
Test
Criterion
Pos TP FP
Neg FN TN
P N P+ N
Positive PredictiveValue (PPV) = Probability that a positive testwill truly have disease
= TP / (TP+FP)
-
7/28/2019 ROC Curves Studentslev3
18/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemTest Properties
True Disease Status
Pos Neg
Test
Criterion
Pos TP FP
Neg FN TN
P N P+ N
Negative PredictiveValue (NPV) = Probability that a negative testwill truly be disease free
= TN / (TN+FN)
-
7/28/2019 ROC Curves Studentslev3
19/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemExample
True Disease Status
Pos Neg
Test
Criterion
Pos 27 173 200
Neg 73 727 800
100 900 1000
27/100 = .27Se =
Sp = 727/900 = .81
FPF = 1- Sp = .19
Acc = (27+727)/1000 = .75
PPV = 27/200 = .14
NPV = 727/800 = .91
-
7/28/2019 ROC Curves Studentslev3
20/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Binary Prediction ProblemTest Properties
Of these properties, only Se and Sp (and hence FPR)
are considered invariant test characteristics.
Accuracy, PPV, and NPV will vary according to theunderlying prevalence of disease.
Se and Sp are thus fundamental test properties and
hence are the most useful measures for comparing
different test criteria, even though PPV and NPV areprobably the most clinically relevant properties.
-
7/28/2019 ROC Curves Studentslev3
21/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC Curves
Now assume that our test statistic is no longer binary,but takes on a series of values (for instance howmany of five distinct risk factors a person exhibits).
Clinically we make a rule that says the test is positiveif the number of risk factors meets or exceeds somethreshold (#RF >x)
Suppose our previous table resulted from
usingx
= 4. Lets see what happens as we vary x.
-
7/28/2019 ROC Curves Studentslev3
22/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesImpact of using a threshold of 3 or more RFs
True Disease Status
Pos Neg
Test
Criterion
Pos 45 200 245
Neg 55 700 755
100 900 1000
27/100 = .45Se =
Sp = 727/900 = .78
FPF = 1- Sp = .22
Acc = (27+727)/1000 = .75
PPV = 27/200 = .18
NPV = 727/800 = .93
Se, Sp, and interestingly both PPV and NPV
.27
.81 .14
.91
.75
200
800
-
7/28/2019 ROC Curves Studentslev3
23/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesSummary of all possible options
Threshold TPR FPR
6 0.00 0.00
5 0.10 0.114 0.27 0.19
3 0.45 0.22
2 0.73 0.27
1 0.98 0.80
0 1.00 1.00
As we relax our thresholdfor defining disease, our
true positive rate
(sensitivity) increases, butso does the false positiverate (FPR).
The ROC curve is a way tovisually display this
information.
-
7/28/2019 ROC Curves Studentslev3
24/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesSummary of all possible options
Threshold TPR FPR
6 0.00 0.00
5 0.10 0.114 0.27 0.19
3 0.45 0.22
2 0.73 0.27
1 0.98 0.80
0 1.00 1.00
x=5
x=4
x=2
The diagonal line shows what we would expect
from simple guessing (i.e., pure chance).
What might an even better ROC curve look like?
-
7/28/2019 ROC Curves Studentslev3
25/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesSummary of a more optimal curve
Threshold TPR FPR
6 0.00 0.00
5 0.10 0.014 0.77 0.02
3 0.90 0.03
2 0.95 0.04
1 0.99 0.40
0 1.00 1.00Note the immediate sharp rise in
sensitivity. Perfect accuracy is
represented by upper left corner.
-
7/28/2019 ROC Curves Studentslev3
26/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesUse and interpretation
The ROC curve allows us to see, in a simple
visual display, how sensitivity and specificity
vary as our threshold varies. The shape of the curve also gives us some
visual clues about the overall strength of
association between the underlying test
statistic (in this case #RFs that are present)
and disease status.
-
7/28/2019 ROC Curves Studentslev3
27/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesUse and interpretation
The ROC methodologyeasily generalizes to teststatistics that are
continuous (such as lungfunction or a blood gas).We simply fit a smoothedROC curve through all
observed data points.
-
7/28/2019 ROC Curves Studentslev3
28/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesUse and interpretation
See demo from
www.anaesthetist.com/mnm/stats/roc/index.htm
http://www.anaesthetist.com/mnm/stats/roc/index.htmhttp://www.anaesthetist.com/mnm/stats/roc/index.htm -
7/28/2019 ROC Curves Studentslev3
29/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesArea under the curve (AUC)
The total area of the grid
represented by an ROC
curve is 1, since both TPR
and FPR range from 0 to 1.
The portion of this total
area that falls below the
ROC curve is known asthearea under the curve,
orAUC.
-
7/28/2019 ROC Curves Studentslev3
30/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Area Under the Curve (AUC)Interpretation
The AUC serves as a quantitative summary ofthe strength of association between theunderlying test statistic and disease status.
An AUC of 1.0 would mean that the teststatistic could be used to perfectly discriminatebetween cases and controls.
An AUC of 0.5 (reflected by the diagonal 45line) is equivalent to simply guessing.
-
7/28/2019 ROC Curves Studentslev3
31/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Area Under the Curve (AUC)Interpretation
The AUC can be shown to equal the Mann-
Whitney U statistic, or equivalently the Wilcoxon
rank statistic, for testing whether the testmeasure differs for individuals with and
without disease.
It also equals the probability that the value of our
test measure would be higher for a randomly
chosen case than for a randomly chosen control.
-
7/28/2019 ROC Curves Studentslev3
32/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Area Under the Curve (AUC)Interpretation
FPR
TPR
1
0 1
ROC Curve
AUC
~ 0.540
casescontrols
-
7/28/2019 ROC Curves Studentslev3
33/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
AUC
~ .95
TPR
1
0 1FPR
ROC Curve
Area Under the Curve (AUC)Interpretation
casescontrols
-
7/28/2019 ROC Curves Studentslev3
34/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Area Under the Curve (AUC)Interpretation
What defines a good AUC?
Opinions vary
Probably context specific What may be a good AUC for predicting COPD
may be very different than what is a good AUC
for predicting prostate cancer
-
7/28/2019 ROC Curves Studentslev3
35/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Area Under the Curve (AUC)Interpretation
http://gim.unmc.edu/dxtests/roc3.htm
.90-1.0 = excellent
.80-.90 = good .70-.80 = fair
.60-.70 = poor
.50-.60 = fail
Remember that
-
7/28/2019 ROC Curves Studentslev3
36/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
Area Under the Curve (AUC)Interpretation
www.childrens-mercy.org/stats/ask/roc.asp
.97-1.0 = excellent
.92-.97 = very good .75-.92 = good
.50-.75 = fair
http://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asp -
7/28/2019 ROC Curves Studentslev3
37/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesComparing multiple ROC curves
Suppose we have two candidate test
statistics to use to create a binary decision
rule. Can we use ROC curves to choosean optimal one?
-
7/28/2019 ROC Curves Studentslev3
38/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesComparing multiple ROC curves
Adapted from curves at: http://gim.unmc.edu/dxtests/roc3.htm
-
7/28/2019 ROC Curves Studentslev3
39/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesComparing multiple ROC curves
http://en.wikipedia.org/w
iki/Receiver_operating_
characteristic
-
7/28/2019 ROC Curves Studentslev3
40/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesComparing multiple ROC curves
We can formally compare AUCs for twocompeting test statistics, but does thisanswer our question?
AUC speaks to which measure, as acontinuous variable, best discriminatesbetween cases and controls?
It does not tell us which specific cutpoint touse, or even which test statistic will ultimatelyprovide the best cutpoint.
-
7/28/2019 ROC Curves Studentslev3
41/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesChoosing an optimal cutpoint
The choice of a particular Se and Sp should reflect therelative costs of FP and FN results.
What if a positive test triggers an invasive procedure?
What if the disease is life threatening and I have aninexpensive and effective treatment?
How do you balance these and other competing factors?
See excellent discussion of these issues atwww.anaesthetist.com/mnm/stats/roc/index.htm
http://www.anaesthetist.com/mnm/stats/roc/index.htmhttp://www.anaesthetist.com/mnm/stats/roc/index.htm -
7/28/2019 ROC Curves Studentslev3
42/43
2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH
ROC CurvesGeneralizations
These techniques can be applied to any binary
outcome. It doesnt have to be disease status. In fact, the use of ROC curves was first introduced during
WWII in response to the challenge of how to accurately
identify enemy planes on radar screens.
-
7/28/2019 ROC Curves Studentslev3
43/43
ROC CurvesFinal cautionary notes
We assume throughout the existence of a goldstandard for measuring disease, when in practice nosuch gold standard exists. COPD, asthma, even cancer (can we truly rule out the absence of
cancer in a given patient?)
As a result, even Se and Sp may not be inherentlystable test characteristics, but may vary depending onhow we define disease and the clinical context in which
it is measured. Are we evaluating the test in the general population or only amongpatients referred to a specialty clinic?
Incorrect specification of P and N will vary in these two settings.