the receiver operating characteristic (roc) curve

9
1 The Receiver Operating Characteristic (ROC) Curve EPP 245/298 Statistical Analysis of Laboratory Data

Upload: vaughan-mckenzie

Post on 30-Dec-2015

28 views

Category:

Documents


1 download

DESCRIPTION

The Receiver Operating Characteristic (ROC) Curve. EPP 245/298 Statistical Analysis of Laboratory Data. Binary Classification. Suppose we have two groups for which each case is a member of one or the other, and that we know the correct classification (“truth”). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Receiver Operating Characteristic (ROC) Curve

1

The Receiver Operating Characteristic (ROC) Curve

EPP 245/298

Statistical Analysis of

Laboratory Data

Page 2: The Receiver Operating Characteristic (ROC) Curve

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

2

Binary Classification

• Suppose we have two groups for which each case is a member of one or the other, and that we know the correct classification (“truth”).

• Suppose we have a prediction method that produces a single numerical value, and that small values of that number suggest membership in group 1 and large values suggest membership in group 2

Page 3: The Receiver Operating Characteristic (ROC) Curve

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

3

• If we pick a cutpoint t, we can assign any case with a predicted value ≤ t to group 1 and the others to group 2.

• For that value of t, we can compute the number correctly assigned to group 2 and the number incorrectly assigned to group 2 (true positives and false positives).

• For t small enough, all will be assigned to group 2 and for t large enough all will be assigned to group 1.

• The ROC curve is a plot of true positives vs. false positives

Page 4: The Receiver Operating Characteristic (ROC) Curve

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

4

datagen <- function(){ truth <- rep(0:1,each=50) pred <- c(rnorm(50,10,1),rnorm(50,12,1)) return(data.frame(truth=truth,pred=pred))}plot1 <- function(){ nz <- sum(truth==0) n <- length(truth) plot(density(pred[1:nz]),lwd=2,xlim=c(6,18), main="Generating an ROC Curve") lines(density(pred[(nz+1):n]),col=2,lwd=2) abline(v=10,col=4,lwd=2) abline(v=11,col=4,lwd=2) abline(v=12,col=4,lwd=2)}-----------------------------------------> source(“rocsim.r”)> roc.data <- datagen()> attach(roc.data)> plot1()

Page 5: The Receiver Operating Characteristic (ROC) Curve

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

5

Page 6: The Receiver Operating Characteristic (ROC) Curve

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

6

roc.curve <- function(truth,pred,maxx){ ntp <- sum(truth==1) ntn <- sum(truth==0) n <- length(truth) preds <- sort(unique(pred)) npred <- length(preds) tp <- vector("numeric",npred+1) fp <- tp fp[1] <- ntn tp[1] <- ntp for (i in 1:npred) { cutpt <- preds[i] tp[i+1] <- sum((pred >= cutpt)&(truth==1)) fp[i+1] <- sum((pred >= cutpt)&(truth==0)) } plot(fp,tp, type="l",lwd=2,xlim=c(0,maxx)) title("ROC Curve")}----------------------------------------> roc.curve(truth,pred,50)

Page 7: The Receiver Operating Characteristic (ROC) Curve

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

7

Page 8: The Receiver Operating Characteristic (ROC) Curve

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

8

datagen2 <- function(){ truth <- rep(0:1,c(990,10)) pred <- c(rnorm(990,10,1),rnorm(10,12,1)) return(data.frame(truth=truth,pred=pred))}--------------------------------------> detach(roc.data)> roc.data2 <- datagen2()> attach(roc.data2)> roc.curve(truth,pred,40)

Page 9: The Receiver Operating Characteristic (ROC) Curve

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

9

ROC Curve for Rare Outcome