measures of diagnostic accuracy

Statistical Methods for Analysis of Diagnostic

Accuracy Studies

Jon DeeksUniversity of Birmingham

with acknowledgement to Hans Reitsma

Measures of diagnostic accuracy

• Positive and negative predictive values

• Sensitivity and specificity • Likelihood ratios• Area under the ROC curve • Diagnostic odds ratio

Diagnostic accuracy studies

• Results from the index test are compared with the results obtained with the reference standard on the same subjects

• Accuracy refers to the degree of agreement between the results of the index test and those from the reference standard

Basic Design

Series of patientsSeries of patients

Index testIndex test

Reference standardReference standard

Cross-classificationCross-classification

Clinical problem

• Diagnostic value of B type natriuretic (BNP) measurement

• Does BNP measurement distinguish between those with and without left ventricular dysfunction in the elderly?

• Smith et al. BMJ 2000; 320: 906.

Anatomy of diagnostic study

• Target population: unscreened elderly• Index test: BNP• Target condition: LVSD • Final diagnosis (reference standard):

echocardiography – global and regional assessment of ventricular function including measurement of LV ejection fraction

Our example

Elderly patientsElderly patients

BNP measurementBNP measurement

Echocardiography for LVSDEchocardiography for LVSD

Cross-classificationCross-classification

Results of BNP study

61TP FP

FN TN

94

155

BNP

LVSD

>=18.7

<18.7

Present Absent

5011

1 93

14312

Measures of test performance

155

61

94

BNP

LVSD

>=18.7

<18.7

Present Absent

14312

5011

1 93

• sensitivity 11 / 12 = 92% < Pr(T+|D+) >

• specificity 93 / 143 = 65% < Pr(T-|D-) >

Measures of test performance

155

61

94

BNP

LVSD

>=18.7

<18.7

Present Absent

14312

5011

1 93

• positive predictive value11 / 61 = 18% < Pr(D+|T+) >

• negative predictive value93 / 94 = 99% < Pr(D-|T-) >

Sensivity and Specificity not directly affected by prevalence

286143143

50131

12 93

181

105

BNP

LVSD

>=18.7

<18.7

Present Absent

• sensitivity 131 / 143 = 92%

• specificity 93 / 143 = 65%

Predictive values directly affected by prevalence

286143143

50131

12 93

181

105

BNP

LVSD

>=18.7

<18.7

Present Absent

• positive predictive value131 / 181 = 72%

• negative predictive value93 / 105 = 89%

Do sensitivity and specificity vary with prevalence?

• Test performance is sometimes observed to be different in different settings, patient groups, etc.

• Occasionally attributed to differences in disease prevalence, but:– diseased and non-diseased spectrums differ as well.

• e.g. using a test in primary care and secondary care referrals– the diseased group are different (cases more difficult)– the non-diseased group are different (conditions more similar)– sensitivity may decrease, specificity certainly decreases

Likelihood ratios

• Why likelihood ratios?• Applicable in situations with more

than 2 test outcomes• Direct link from pre-test probabilities

to post-test probabilities

Likelihood ratios

• Information value of a test result expressed as likelihood ratio

155

61

94

BNP

LVSD

>=18.7

<18.7

Present Absent

14312

5011

1 93

6.2143/50

12/11

)|Pr(

)|Pr(

DT

DTLR

Likelihood Ratio of positive test

• How more often a positive test result occurs in persons with compared to those without the target condition

)|Pr(

)|Pr(

DT

DTLR

Likelihood ratios

• Likelihood ratio of a negative test result

• How less likely a negative test result is in persons with the target condition compared to those without the target condition

)|Pr(

)|Pr(

DT

DTLR

Likelihood ratios

13.0143/93

12/1

)|Pr(

)|Pr(

DT

DTLR

155

94

BNP

LVSD

>=18.7

<18.7

Present Absent

61

14312

5011

1 93

Calculate likelihood ratios from column percentages

LR

100%100%

34.97%91.67%

8.33% 65.03% 0.13

BNP

LVSD

>=18.7

<18.7

Present Absent

2.62

Interpreting likelihood ratios

• A LR=1 indicates no diagnostic value

• LR+ >10 are usually regarded as a ‘strong’ positive test result

• LR- <0.1 are usually regarded as a strong negative test result

• But it depends on what change in probability is needed to make a diagnosis

50%

92%LR+ = 10

10%55%

Advantages of likelihood ratios

• Still useful when there are more than 2 test outcomes

BNP is a continuous measurement

• Dichotomisation of BNP (high vs. low) means loss of information

• Higher values of BNP are more indicative of LVSD

Results BNP study

BNP Present Absent Total 26.7 9 28 37

18.7 -26.7 2 22 24

<18.7 1 93 94

Total 12 143 155

LVSD

Likelihood ratios

• Stratum specific likelihood ratios in case of more than 2 test results

)|Pr(

)|Pr()(

DxT

DxTxTLR

Compute LR from column percentages

BNP Present Absent LR 26.7 75% 20% 3.83

18.7 -26.7 17% 15% 1.08

<18.7 8% 65% 0.13

Total 100% 100%

LVSD

Bayes’ rule

Post-test odds for disease

=

Pre-test odds for disease x Likelihood ratio

Bayes’ rule

• Pre-test odds – chance of disease expressed in

odds

– example: if 2 out of 5 persons have the disease: probability = 2/5 in odds = 2/3

Bayes’ rule

• odds = probability / (1 – probability)

• probability = odds / (1 + odds)

)Pr(1

)Pr()(Odds

D

DD

)(Odds1

)(Odds)(Pr

D

DD

Bayes’ rulepatient with BNP >26.7

• Pre-test probability = 0.5• Pre-test odds = 0.5 / (1-0.5) = 1• LR(BNP >26.7) = 3.83• Post-test odds = 1x3.83 = 3.83• Post-test probability = 3.83 /

(1+3.83) = 0.79

Bayes’ rulepatient with BNP lower than 18.7

• Pre-test probability = 0.5• Pre-test odds = 0.5 / (1-0.5) = 1• LR(CK < 40) = 0.13• Post-test odds = 1 x 0.13 = 0.13• Post-test probability = 0.13 /

(1+0.13) = 0.12

Probability for LVSD after BNP

BNP LR

26.7 3.83

18.7-26.7 1.08

<18.7 0.13

79%

52%

12%

Pre-test prob.

Post test prob.

50%

50%

79%

52%

12%

5%

17%

5%

1%

Probability for LVSD after BNP

BNP LR 5% 50%

26.7 3.83 17% 79%

18.7-26.7 1.08 5% 52%

<18.7 0.13 1% 12%

Pre-test prob.

Post test prob.

Confidence intervals

• Sample uncertainty should be described for all statistics, using confidence intervals

ˆˆ%95 2/ sezCI

estimate of effect

Normal deviate (1.96 for 95% CI)

+ gives upper limit - gives lower limit

Standard error of estimate

Confidence Intervals for Proportions

• Sensitivity, specificity, positive and negative predictive values, and overall accuracy are all proportions

n

pppse

)ˆ1(ˆˆ

n

rp ˆ

Exact or Asymptotic CI?

• Asymptotic CI are approximations• Inappropriate when

– proportion is near 0% or near 100%– sample sizes are small(confidence intervals are not symmetric in

these cases)

• Preferable to use Binomial exact methods– can be computed in many statistics packages– or refer to tables

Comparison of Asymptotic and Exact Methods

95% Confidence intervals r/n p Asymptotic Exact

0/20 0% not calculable (0% to 14%) 1/20 5% (-5% to 15%) (0% to 25%) 2/20 10% (-3% to 23%) (1% to 32%) 3/20 15% (-1% to 31%) (3% to 38%) 4/20 20% (2% to 38%) (6% to 44%) 5/20 25% (6% to 44%) (9% to 49%) 6/20 30% (10% to 50%) (12% to 54%) 7/20 35% (14% to 56%) (15% to 59%) 8/20 40% (19% to 61%) (19% to 64%) 9/20 45% (23% to 67%) (23% to 68%)

10/20 50% (28% to 72%) (27% to 73%)

Confidence Intervals for Ratios of Probabilities and Odds

• Likelihood ratios are ratios of probabilities

2121

1111ln

nnrrRRse

1

1

2

2

nrn

r

RR

221121

1111ln

rnrnrrORse

11

1

22

2

rnr

rnr

OR

• Odds ratios are ratios of odds

CIs for study

• Sensitivity = 92% (62%, 100%)• Specificity = 65% (57%, 73%)

• PPV = 82% (70%, 91%)• NPV = 99% (94%, 100%)

• LR(>= 26.7) = 3.8 (2.4, 6.1)• LR(18.7 < 26.7) = 1.1 (0.3, 4.1)• LR(<18.7) = 0.13 (0.02, 0.84)

ROC-curve

• ROC stands for Receiver Operating Characteristic

• ROC-curve shows the pairs of sensitivity and specificity that correspond to various cut-off points for the continuous test result

Continuous diagnostic test results

Non-diseased Diseased

Diagnostic Threshold

TN FN FP TP

specificity=94% sensitivity=94%

Heterogeneity in Threshold



TN FN FP TP





TN FN FP TP


Threshold effects

Increasing threshold increases specificity but decreases sensitivity

Decreasing threshold increases sensitivity but decreases specificity

0.2

.4.6

.81

sens

itivi

ty

0.2.4.6.81specificity

for predicting spontaneous birth

Fetal fibronectin

Change in cut-off valueand effect on sens & spec

Cut-off Sensitivity Specificity9999 0% 100%26.7 75% 80%19.8 83% 70%18.7 92% 65%0 100% 0%

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

1-specificity

Sen

siti

vity

ROC-curve BNP

Cut-off: 26.7

Cut-off: 18.7

Cut-off: 19.8

ROC curve

• Shows the effect of different cut-off values on sensitivity and specificity

• Better tests have curves that lie closer to the upper left corner

• Area under the ROC is a single measure of test performance (higher is better)

• Shape– RAW continuous data gives steps– GROUPED data gives straight sloping lines– FITTED ROC curves are smoothed.

Variation in diagnostic thresholdAt what level, is a test result categorised as +ve, and how

should the threshold be selected?

Threshold affects the performance of the test, as described by ROC curves, and likelihood ratios

Depends ondisease prevalence (affects +ve and -ve predictive values)relative costs of false positive and false negative misdiagnosesrelative benefits of true positive and true negative diagnoses

Workshop exercise – erratum• Q16 page 8

Compute post-test probabilities for a high risk patient, pre-test prob=50%

Q19 page 10

LVSD

+ve -ve

MI or BNP +ve

-ve

40 86

36 63

4 23

measures of diagnostic accuracy

Documents

positive test result

negative test resulthow

unscreened elderlyindex

test outcomesdirect

prevalence sensitivity

prt d specificity

diagnostic valuelr

different cases