diagnostic prediction models & incremental value · 2014-09-16 · diagnostic prediction models...

125
Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University

Upload: others

Post on 22-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Diagnostic Prediction Models & Incremental value

Samuel G. Schumacher McGill University

Page 2: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Outline

1. What is incremental value and why is it important

2. “Binary test case”: 3 binary tests

3. “Multivariable model case”: Dx prediction models

4. Imperfect reference standard

2

Page 3: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What is it and why is it important

3

Page 4: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Accuracy: 2 tests

4

Page 5: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Accuracy: 2 tests

4

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

Page 6: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Accuracy: 2 tests

4

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

Page 7: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Accuracy: 2 tests

4

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

BUT: what matters is the value beyond what is already available (1st step towards impact)

Page 8: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental value: 3 tests

5

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

Page 9: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental value: 3 tests

5

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

Incremental Value

Page 10: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Two basic paradigms1. “binary test case”!‣ a single test drives clinical decision making

2. “multivariable model case”!‣ patient demographics ‣ clinical characteristics ‣ signs and symptoms ‣ perhaps multiple tests / biomarkers ‣ perhaps tests that give quantitative results

6

Page 11: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

“Binary test case”: 3 binary tests

7

Page 12: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Method

8

Page 13: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Method

8

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

Incremental Value

Page 14: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Method

8

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

100 100%

! n! ! ! %! !

Incremental Value

Page 15: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Method

8

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives70 70%

100 100%

! n! ! ! %! !

Incremental Value

Page 16: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Method

8

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives70 70%

60 60%

100 100%

! n! ! ! %! !

Incremental Value

Page 17: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Method

8

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives70 70%

60 60%

100 100%

! n! ! ! %! !

Incremental Value 10 10%

Page 18: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Method

8

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives70 70%

60 60%

100 100%

! n! ! ! %! !

Incremental Value 10 10%

➡ Difference in Sensitivity or (PositivesIndex - PositivesBaseline) / PositivesReference

Page 19: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Serial sputum examinations

9

INT J TUBERC LUNG DIS 11(5):485–495© 2007 The Union

Yield of serial sputum specimen examinations in the diagnosis of pulmonary tuberculosis: a systematic review

S. R. Mase,*† A. Ramsay,‡ V. Ng,§ M. Henry,¶ P. C. Hopewell,*† J. Cunningham,‡ R. Urbanczik,#M. D. Perkins,** M. A. Aziz,†† M. Pai‡‡

* Division of Pulmonary and Critical Care Medicine, San Francisco General Hospital, University of California, San Francisco, † Francis J Curry National Tuberculosis Center, University of California, San Francisco, California, USA; ‡ United Nations Children’s Fund/United Nations Development Programme/World Bank/World Health Organization (WHO) Special Programme for Research and Training in Tropical Diseases (TDR), WHO, Geneva, Switzerland; § Albany Medical College, Albany, New York, ¶ County of Sacramento Department of Health and Human Services, Sacramento, California, USA; # WHO Tuberculosis Laboratory Consultants Group, Schoemberg, Germany; ** Foundation for Innovative New Diagnostics (FIND), Geneva, †† Stop TB Department, WHO, Geneva, Switzerland; ‡‡ Department

S U M M A R Y

of Epidemiology & Biostatistics, McGill University, Montreal, Quebec, Canada

Current international tuberculosis (TB) guidelines rec-ommend the microscopic examination of three sputumspecimens for acid-fast bacilli in the evaluation of personssuspected of having pulmonary TB. We conducted a sys-tematic review of studies that quantified the diagnosticyield of each of three sputum specimens. By searchingmultiple databases and sources, we identified a total of37 eligible studies. The incremental yield in smear-positiveresults (in studies using all smear-positive cases as the de-nominator) and the increase in sensitivity (in studies thatused all culture-positive cases as the denominator) of thethird specimen were the main outcomes of interest. Al-though heterogeneity in study methods and results pre-sented challenges for data synthesis, subgroup analysessuggest that the average incremental yield and/or the in-

crease in sensitivity of examining a third specimen rangedbetween 2% and 5%. Reducing the recommended num-ber of specimens examined from three to two (particu-larly to two specimens collected on the same day) couldbenefit TB control programs, and potentially increasecase detection for several reasons. A number of opera-tional research issues need to be addressed. Studies ex-amining the most effective and efficient means to utilizecurrent technologies for microscopic examination of spu-tum would be most useful if they followed an internation-ally coordinated and standardized approach, both tostrengthen the country-specific evidence base and to per-mit comparison among studies.KEY WORDS: tuberculosis; smear microscopy; incre-mental yield; acid-fast bacillus; serial sputum specimens

TUBERCULOSIS (TB) care, including effective detec-tion and treatment of patients, is the core element ofTB control; thus, effective care is required for effec-tive control.1 Based on mathematical modeling, it hasbeen projected that if 70% of the estimated numberof incident sputum smear-positive cases are identified(case detection rate), and 85% of those are treated suc-cessfully, the incidence of TB will decrease; conse-quently, these percentages were set as global targetsfor 2005.2 Although treatment success rates now ap-proximate those targets (82% in 2003), case detectionrates still fall far short. In 2004, the case detection ratewas only 53%.3 Thus, the failure to promptly identifyand treat infectious cases stands out as a critical ob-stacle to global TB control.4 In addition to focusingon developing new technologies to support case find-ing, existing technologies, approaches and recommen-

dations need to be examined carefully to determinewhether they are being utilized optimally and to iden-tify areas in which improvements in efficacy and effi-ciency can be made.

The current guidelines of the World Health Orga-nization (WHO) and the International Union AgainstTuberculosis and Lung Disease (The Union) specifythat the essential step in the investigation of patientswho are suspected of having pulmonary tuberculosis(PTB) should be the microscopic examination of atleast three serial sputum specimens for acid-fast ba-cilli (AFB).5,6 The first and third sputum specimensare generally collected in the health facility (spot spec-imens) and the second is generally an early morningspecimen, usually collected at home.7,8

In addition to specifying the requirement for col-lecting and examining three specimens, the WHO

Correspondence to: Sundari R Mase, MD, MPH, Deputy Health Officer/TB Controller, Santa Clara County Public HealthDepartment, Tuberculosis Prevention and Control Programme, 645 South Bascom Avenue, San Jose, CA 95128, USA.Tel: (!1) 408 885 8564. Fax: (!1) 408 885 2331. e-mail: [email protected], [email protected] submitted 19 July 2006. Final version accepted 28 December 2006.

REVIEW ARTICLE

Methods!• ref. stand.: culture • baseline: ZN 1 + ZN 2 • Index test: ZN 3 • Stats: (PositivesZN-3 -

PositivesZN-1,2) / PositivesCulture !Results!• IV of third smear: 2-5%

Page 20: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

FM versus ZN

10

570 http://infection.thelancet.com Vol 6 September 2006

Review

Fluorescence versus conventional sputum smear microscopy for tuberculosis: a systematic reviewKaren R Steingart, Megan Henry, Vivienne Ng, Philip C Hopewell, Andrew Ramsay, Jane Cunningham, Richard Urbanczik, Mark Perkins, Mohamed Abdel Aziz, Madhukar Pai

Most of the world’s tuberculosis cases occur in low-income and middle-income countries, where sputum microscopy with a conventional light microscope is the primary method for diagnosing pulmonary tuberculosis. A major shortcoming of conventional microscopy is its relatively low sensitivity compared with culture, especially in patients co-infected with HIV. In high-income countries, fl uorescence microscopy rather than conventional microscopy is the standard diagnostic method. Fluorescence microscopy is credited with increased sensitivity and lower work eff ort, but there is concern that specifi city may be lower. We did a systematic review to summarise the accuracy of fl uorescence microscopy compared with conventional microscopy. By searching many databases and contacting experts, we identifi ed 45 relevant studies. Sensitivity, specifi city, and incremental yield were the outcomes of interest. The results suggest that, overall, fl uorescence microscopy is more sensitive than conventional microscopy, and has similar specifi city. There is insuffi cient evidence to determine the value of fl uorescence microscopy in HIV-infected individuals. The results of this review provide a point of reference, quantifying the potential benefi t of fl uorescence microscopy, with which the increased cost and technical complexity of the method can be compared to determine the possible value of the method under programme conditions.

IntroductionSputum smear microscopy (henceforth referred to as microscopy) is currently recommended for the diagnosis of pulmonary tuberculosis in low-income and middle-income countries, where more than 90% of tuberculosis cases occur.1–5 Microscopy is rapid, relatively simple, inexpensive, and highly specifi c in areas where there is a high prevalence of tuberculosis. In addition, microscopy identifi es the most infectious patients and is widely applicable.6,7 In some studies, microscopy has been reported to have greater than 80% sensitivity for identifying cases of pulmonary tuberculosis;8,9 however, in other reports, the sensitivity of the test has been relatively low and variable (range 20–60%).10,11 Microscopy has limited value for the diagnosis of tuberculosis in children and does not, by defi nition, identify smear-negative tuberculosis, which is more likely in HIV-positive than HIV-negative individuals.1,3,12,13 Smear-negative tuberculosis has been associated with poor treatment outcomes, including death, especially in areas hit hard by the HIV epidemic.14,15 For these reasons, methods to improve the sensitivity of microscopy are urgently needed, particularly in countries with a high HIV burden. One method, used most commonly in high-income countries and credited with improved sensitivity, is fl uorescence microscopy.16–18

Introduced in the 1930s,19 fl uorescence microscopy uses an acid-fast fl uorochrome dye (eg, auramine O or auramine-rhodamine) with an intense light source such as a halogen or high-pressure mercury vapour lamp. By comparison, conventional microscopy uses the carbolfuchsin Ziehl-Neelsen or Kinyoun acid-fast stains, and can be used with a conventional artifi cial light source or refl ected sunlight.2 The most important advantage of fl uorescence microscopy is that it uses a lower power

objective lens (typically 25×) than conventional microscopy (typically 100×), enabling the microscopist to assess the same area of a slide in less time.20,21 Substantial savings in work eff ort have been reported with fl uorescence microscopy, suggesting it may be cost-eff ective in some low-income countries.20–22 Fluorescence microscopy has been credited with increased sensi-tivity,22–26 and, for this reason, has been proposed by some experts for use in countries with a high prevalence of HIV infection.1,22 Other advantages include the simplicity of the fl uorochrome staining method compared with Ziehl-Neelsen methods.27,28 A potential shortcoming of fl uorescence microscopy is the possibility of false-positive results because inorganic objects may incorporate fl uorochrome dyes.29,30 In the 1970s, Kubica25 assessed the relative value of fl uorescence and conventional microscopy in a multicentre study of 61 163 Ziehl-Neelsen and 27 808 fl uorescence microscopy specimens from nine countries. Kubica25 found that fl uorescence microscopy improved sensitivity, but left a lingering doubt about specifi city.

We did a systematic review to summarise the evidence on the accuracy of fl uorescence microscopy, according to the guidelines and methods proposed for diagnostic systematic reviews and meta-analyses.31 We specifi cally addressed the following questions: (1) what is the sensitivity of fl uorescence microscopy compared with conventional microscopy? (2) What is the specifi city of fl uorescence microscopy compared with conventional microscopy? (3) What is the impact of sputum processing on the sensitivity and specifi city of fl uorescence microscopy? (4) Is there a diff erence in sensitivity and specifi city between auramine O and auramine-rhodamine stains? (5) Does the examination of smears with fl uorescence microscopy take less time than with conventional microscopy?

Lancet Infect Dis 2006; 6: 570–81

Division of Pulmonary and Critical Care Medicine,

San Francisco General Hospital, University of California, San

Francisco, CA, USA (K R Steingart MD,

Prof P C Hopewell MD); Francis J Curry National

Tuberculosis Center, San Francisco (K R Steingart MD,

Prof P C Hopewell MD); Epidemiologic Investigative

Service California Department of Health Services, Sacramento, (M Henry MPH); Albany Medical

College, Albany, NY, USA (V Ng MPH); UNICEF/UNDP/

World Bank/WHO Special Programme for Research and Training in Tropical Diseases,

World Health Organization, Geneva, (A Ramsay MSc,

J Cunningham MD); World Health Organization

Tuberculosis Laboratory Consultants Group,

Schoemberg, Germany (R Urbanczik MD); Foundation

for Innovative New Diagnostics, Cointrin, Geneva,

Switzerland (M Perkins MD); and Stop TB Department,

World Health Organization, Geneva, (M A Aziz MD); and

McGill University, Montreal, Quebec, Canada (M Pai, MD)

Correspondence to: Dr Madhukar Pai, Department of

Epidemiology, Biostatistics & Occupational Health

McGill UniversityMontreal, Canada, H3A 1A2.

Tel +1 514 398 5422;fax +1 514 398 4503;

[email protected]

Methods!• ref. stand.: culture • baseline: ZN • Index test: FM • Stats: ∆ Sensitivity !Results!• IV of FM over ZN: 10%

Page 21: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

CRP vs WHO symptom screen to determine IPT eligibility

11

Methods!• ref. stand.: expert panel • baseline: WHO symp. screen • Index test: CRP • Stats: ∆ Sensitivity, ∆

Specificity !Results!• ∆ Sensitivity = -20% • ∆ Specificity = +66% • NRI = 46%

populations with advanced HIV/AIDS.5–8 Indeed, even in theGetahun et al meta-analysis, the specificity of the symptomscreen decreased from 50% to 23% when the population wasrestricted to patients with CD4 cell counts ,200 cells permicroliter (Haileyesus Getahun, personal communication,2012). Thus, the WHO symptom screen unnecessarily deniesIPT to the vast majority of patients who are at greatest risk ofdeveloping active TB.

Our findings suggest that a TB screening algorithminclusive of POC-CRP testing could have enormous publichealth impact. Even in settings with high (15%) TBprevalence, POC-CRP testing retained high NPV and iden-tified considerably more patients as being eligible for IPTthan the WHO symptom screen. Furthermore, POC-CRPsubstantially reduced the number of patients who screenpositive and who would therefore require additional TBdiagnostic testing. By limiting such testing to those at highestrisk of active TB, POC-CRP could improve the efficiency andlower the cost of ICF, another WHO-endorsed strategy toreduce the burden of TB among PLHIV.3

The number of TB cases in our study was too small toadequately assess the sensitivity of POC-CRP for active TB.However, studies evaluating CRP for pulmonary TB diagno-sis among symptomatic patients have consistently demon-strated high sensitivity, although different cut-points wereused to classify test results. In addition, Lawn et al8 recentlyevaluated CRP-based TB screening among HIV-infectedpatients initiating ART, regardless of symptoms, in CapeTown, South Africa. Although the authors concludedCRP was not useful in their setting, CRP (at a cut-point of10 mg/L) had similar sensitivity [95% (IQR, 75–92) vs. 95%

(IQR, 73–90)] and greater specificity [95% (IQR, 53–62) vs.95% (IQR, 29–38)] than the WHO symptom screen.8 More-over, the sensitivity of CRP for active TB was higher than thatpreviously reported in the same setting for POC microbio-logic tests such as Xpert MTB/RIF and urine lipoarabino-mannan5,26; these data suggest that if used as an initialscreening test, POC-CRP would miss fewer cases of activeTB than Xpert MTB/RIF and urine lipoarabinomannan. Ulti-mately, because all current tests for TB including culture willmiss some TB cases, the potential costs of TB screening(generation of isoniazid-resistant TB cases) should beweighed against the benefits (number of TB cases avertedthrough IPT and the number of TB cases diagnosed throughmore focused ICF activities). Our data and those of Lawn et alsuggest that the cost–benefit ratio is far better with POC-CRPthan with the WHO symptom screen, the currently recom-mended TB screening strategy.

This study has several potential limitations. First,microbiologic testing for TB at the time of enrollment wasnot part of the protocol for the UARTO study; in the absenceof microbiologic data, TB status was adjudicated by an expertcommittee of infectious disease specialists. However, it isunlikely that TB cases were missed and sensitivity estimatesinflated because patients with advanced HIV/AIDS coinfectedwith TB are unlikely to survive beyond 6 months without TBtreatment.24 Second, we excluded patients for whom 6-monthTB status was unknown. Although exclusion of patients whodied or were lost to follow-up may have resulted in a “health-ier” cohort for our analysis, causes of death were ascertainedfor most of those patients who died—a significant strength ofthis study—and further support the utility of POC-CRP

FIGURE 2. Reclassification of patients eligible forIPT. If IPT eligibility were determined by the WHOsymptom screen, no patients with active TB butonly 42 of 196 patients without active TB wouldhave been considered eligible for IPT. If POC-CRPwere used instead, 1 patient with active TB (re-classification of cases = 220%, P = 0.32) and 129patients without active TB (reclassification ofnoncases = +66%, P , 0.001) would have beenreclassified as being eligible for IPT. Thus, TBscreening with POC-CRP would have resulted ina NRI of 46% (95% CI: 2 to 87, P = 0.03).

TABLE 3. Impact of TB Screening Strategies at Varying TB Prevalence (n = 1000)

Screening Strategy

5% TB Prevalence 10% TB Prevalence 15% TB Prevalence

TP* FN† FP‡ TN§ TP FN FP TN TP FN FP TN

WHO symptom screenk 50 0 750 200 100 0 711 189 150 0 671 179POC-CRP testing¶ 40 10 123 827 80 20 117 783 120 30 110 740

*TP, true positives—number of patients with TB classified as ineligible for IPT.†FN, false negatives—number of patients with TB classified as eligible for IPT.‡FP, false positives—number of patients without TB classified as ineligible for IPT.§TN, true negatives—number of patients without TB classified as eligible for IPT.kAssume WHO symptom screen sensitivity 100% and specificity 21% (study estimates).¶Assume POC-CRP sensitivity 80% and specificity 87% (study estimates).

J Acquir Immune Defic Syndr ! Volume 65, Number 5, April 15, 2014 CRP Improves Patient Selection for IPT

! 2013 Lippincott Williams & Wilkins www.jaids.com | 555

Page 22: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Limitations of this Methodology

1. Works for simple situations where ‣ all tests are binary ‣ baseline test is one single test

2. definition of incremental value “technical” ‣ “practical” incremental value may differ ‣ other methods needed to assess “practical”

incremental value

12

Page 23: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

13

Page 24: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases

13

Page 25: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases‣ multiple tests & characteristics (multivariable)

13

Page 26: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases‣ multiple tests & characteristics (multivariable)‣ prognosis rather than diagnosis (probabilistic)

13

Page 27: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases‣ multiple tests & characteristics (multivariable)‣ prognosis rather than diagnosis (probabilistic)‣ prediction models or decision rules more common

13

Page 28: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases‣ multiple tests & characteristics (multivariable)‣ prognosis rather than diagnosis (probabilistic)‣ prediction models or decision rules more common

• in TB

13

Page 29: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases‣ multiple tests & characteristics (multivariable)‣ prognosis rather than diagnosis (probabilistic)‣ prediction models or decision rules more common

• in TB‣ extra-pulmonary TB

13

Page 30: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases‣ multiple tests & characteristics (multivariable)‣ prognosis rather than diagnosis (probabilistic)‣ prediction models or decision rules more common

• in TB‣ extra-pulmonary TB‣ smear-negative TB

13

Page 31: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases‣ multiple tests & characteristics (multivariable)‣ prognosis rather than diagnosis (probabilistic)‣ prediction models or decision rules more common

• in TB‣ extra-pulmonary TB‣ smear-negative TB ‣ paediatric TB

13

Page 32: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

What if… …“the baseline test” is not a single binary test

• routine for chronic diseases‣ multiple tests & characteristics (multivariable)‣ prognosis rather than diagnosis (probabilistic)‣ prediction models or decision rules more common

• in TB‣ extra-pulmonary TB‣ smear-negative TB ‣ paediatric TB‣ latent TB infection

13

Page 33: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

“Multivariable model case”: Dx prediction models

14

Page 34: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental value “Multivariable model case”

15

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

Incremental Value

Page 35: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental value “Multivariable model case”

15

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

Incremental Value

Page 36: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Differences to “binary test case”

16

Page 37: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Differences to “binary test case”

16

1. information of baseline tests needs to be combined ‣ diagnostic prediction model

Page 38: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Differences to “binary test case”

16

2. incremental value is the change you observe when adding the index test to the baseline model ‣ assessing this change needs different metrics

1. information of baseline tests needs to be combined ‣ diagnostic prediction model

Page 39: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Differences to “binary test case”

3. output of models is not binary but a probability ‣ need to think about cut-offs & decision-making

16

2. incremental value is the change you observe when adding the index test to the baseline model ‣ assessing this change needs different metrics

1. information of baseline tests needs to be combined ‣ diagnostic prediction model

Page 40: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Differences to “binary test case”

17

2. incremental value is the change you observe when adding the index test to ‣ assessing this

1. information of baseline tests needs to be combined ‣ diagnostic prediction model

3. output of models is not binary but a probability ‣ need to think about cut-offs & decision-making

Page 41: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction Models Further readings

18

Page 42: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction Models1. Model development!

• logistic regression model / machine learning • modelling continuous predictors • predictor selection

2. Model assessment!• discrimination (e.g. AUC) • calibration (e.g. calibration plot)

3. Model validation!• internal validation • external validation

19

Page 43: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Differences to “binary test case”

20

1. information of baseline tests needs to be combined ‣ diagnostic prediction model

3. output of models is not binary but a probability ‣ need to think about cut-offs & decision-making

2. incremental value is the change you observe when adding the index test to the baseline model ‣ assessing this change needs different metrics

Page 44: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

21

2006 2007 2008 2009 2010 2011 2012 2013 2014

deci

sion

cur

ve a

naly

sis

NRI, IDI, c

ase-s

tratifi

ed !

reclas

sifica

tion ta

ble

risk s

tratifi

catio

n (recla

ssifica

tion) ta

ble

categ

ory-fre

e and w

eighted

NRIGree

nland cr

iticism

Pepe c

riticis

m of NRI a

nd certa

in analy

ses !

and te

sting ar

ound recla

ssifica

tion ta

ble

more an

alysis

around ris

k stra

tifica

tion ta

ble

very good reviews

Pencina attempt and strong !criticism by Pepe and Cook

pred

ictiv

enes

s cu

rve

very critical review of NRI

very critical editorial of NRI

Page 45: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Further readings

22

Page 46: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Limitations of Selected Metrics

23

Page 47: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Limitations of Selected Metrics

23

Area Under The ROC Curve!& Integr. Discrimin. Improv.

Page 48: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Limitations of Selected Metrics

23

Area Under The ROC Curve!& Integr. Discrimin. Improv.

• clinical interpretation • incorporate irrelevant info. • does not incorporate info. on consequences

Page 49: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Limitations of Selected Metrics

23

Area Under The ROC Curve!& Integr. Discrimin. Improv.

Reclassification table!& Net Reclass. Index

• clinical interpretation • incorporate irrelevant info. • does not incorporate info. on consequences

Page 50: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Limitations of Selected Metrics

23

Area Under The ROC Curve!& Integr. Discrimin. Improv.

Reclassification table!& Net Reclass. Index

• clinical interpretation • incorporate irrelevant info. • does not incorporate info. on consequences

• choice of cut-offs • weighting / “nature” of reclassification

• does not incorporate info. on consequences

Page 51: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Limitations of Selected Metrics

23

Area Under The ROC Curve!& Integr. Discrimin. Improv.

Reclassification table!& Net Reclass. Index

Decision Curve Analysis !& Net Benefit

• clinical interpretation • incorporate irrelevant info. • does not incorporate info. on consequences

• choice of cut-offs • weighting / “nature” of reclassification

• does not incorporate info. on consequences

Page 52: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Limitations of Selected Metrics

23

Area Under The ROC Curve!& Integr. Discrimin. Improv.

Reclassification table!& Net Reclass. Index

Decision Curve Analysis !& Net Benefit

• clinical interpretation • incorporate irrelevant info. • does not incorporate info. on consequences

• choice of cut-offs • weighting / “nature” of reclassification

• does not incorporate info. on consequences

• requires thinking about cut-offs (decision-thresholds)

• perhaps less easily interpreted

Page 53: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Differences to “binary test case”

24

1. information of baseline tests needs to be combined ‣ diagnostic prediction model

2. incremental value assessed by adding index test to ‣ assessing incremental value needs different

metrics

3. output of models is not binary but a probability ‣ need to think about cut-offs & decision-making

Page 54: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

LAM for TB Meningitis

25

Comparison of a Clinical Prediction Rule and a LAMAntigen-Detection Assay for the Rapid Diagnosis of TBMin a High HIV Prevalence SettingVinod B. Patel1, Ravesh Singh2, Cathy Connolly3, Victoria Kasprowicz2, Allimudin Zumla4, Thumbi

Ndungu2, Keertan Dheda4,5,6*

1 Department of Neurology, University of KwaZulu Natal, Berea, South Africa, 2 HIV Pathogenesis Programme, Doris Duke Medical Research Institute, Nelson R. Mandela

School of Medicine, University of KwaZulu Natal, Berea, South Africa, 3 Biostatistics Unit, Medical Research Council, Durban, South Africa, 4 Department of Infection,

Centre for Infectious Diseases and International Health, University College London, London, United Kingdom, 5 Lung Infection and Immunity Unit, Division of

Pulmonology and Department of Medicine, University of Cape Town Lung Institute, University of Cape Town, Cape Town, South Africa, 6 Institute of Infectious Disease

and Molecular Medicine, University of Cape Town, Cape Town, South Africa

Abstract

Background/Objective: The diagnosis of tuberculous meningitis (TBM) in resource poor TB endemic environments ischallenging. The accuracy of current tools for the rapid diagnosis of TBM is suboptimal. We sought to develop a clinical-prediction rule for the diagnosis of TBM in a high HIV prevalence setting, and to compare performance outcomes toconventional diagnostic modalities and a novel lipoarabinomannan (LAM) antigen detection test (Clearview-TBH) usingcerebrospinal fluid (CSF).

Methods: Patients with suspected TBM were classified as definite-TBM (CSF culture or PCR positive), probable-TBM and non-TBM.

Results: Of the 150 patients, 84% were HIV-infected (median [IQR] CD4 count = 132 [54; 241] cells/ml). There were 39, 55 and54 patients in the definite, probable and non-TBM groups, respectively. The LAM sensitivity and specificity (95%CI) was 31%(17;48) and 94% (85;99), respectively (cut-point $0.18). By contrast, smear-microscopy was 100% specific but detected noneof the definite-TBM cases. LAM positivity was associated with HIV co-infection and low CD4 T cell count (CD4,200 vs. .200cells/ml; p = 0.03). The sensitivity and specificity in those with a CD4,100 cells/ml was 50% (27;73) and 95% (74;99),respectively. A clinical-prediction rule $6 derived from multivariate analysis had a sensitivity and specificity (95%CI) of 47%(31;64) and 98% (90;100), respectively. When LAM was combined with the clinical-prediction-rule, the sensitivity increasedsignificantly (p,0.001) to 63% (47;68) and specificity remained high at 93% (82;98).

Conclusions: Despite its modest sensitivity the LAM ELISA is an accurate rapid rule-in test for TBM that has incrementalvalue over smear-microscopy. The rule-in value of LAM can be further increased by combination with a clinical-predictionrule, thus enhancing the rapid diagnosis of TBM in HIV-infected persons with advanced immunosuppression.

Citation: Patel VB, Singh R, Connolly C, Kasprowicz V, Zumla A, et al. (2010) Comparison of a Clinical Prediction Rule and a LAM Antigen-Detection Assay for theRapid Diagnosis of TBM in a High HIV Prevalence Setting. PLoS ONE 5(12): e15664. doi:10.1371/journal.pone.0015664

Editor: Ben J. Marais, University of Stellenbosch, South Africa

Received September 30, 2010; Accepted November 22, 2010; Published December 22, 2010

Copyright: ! 2010 Patel et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the Columbia University-Southern African Fogarty AIDS International Training and Research Program funded by theFogarty International Center, National Institutes of Health (grant # D43TW00231; VP), a South African Medical Research Council (SA MRC) grant (VP and KD), theEuropean Union Seventh Framework Programme (TBsusgent; VP and KD), the South African National Research Foundation Research Chairs Initiative (SARChI; TNand KD), a SA MRC Career Development Award (KD), and the European and Developing Countries Clinical Trials Partnership (TESA and TB-NEAT). The funders hadno role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

Although the tuberculosis (TB) epidemic has plateaued inseveral regions of the world, in Africa, fuelled by poverty and HIVco-infection, TB is out of control. South Africa has the fifth highestburden of TB and the largest number of HIV-infected residents inany one country worldwide [1]. Given the high rate of HIV-TBco-infection, extra-pulmonary TB (EPTB) and hence centralnervous system TB, which comprises 1 to 18% of EPTB [2,3,4,5],is a common clinical problem.

HIV-infected patients with TB meningitis (TBM) are particu-larly challenging to manage because there are no accurate tools to

rapidly establish a diagnosis and delay in establishing treatment isassociated with mortality [6,7,8]. Smear-microscopy in an idealresearch setting, and using high volumes of processed CSF, mayhave a modest sensitivity [8,9]. However, in a programmaticsetting in Africa the yield is dismal and recent studies in HIV-infected populations reveal a sensitivity of less than 5% [10].Polymerase chain reaction (PCR), which may be used as aconfirmatory test for TBM [11], is a good rapid rule-in test forTBM with a sensitivity of ,40 to 50% but this technology isunavailable in most hospitals in Africa [12]. We recently showedthat CSF antigen-specific quantitative T cell assays may be a rapidand accurate rule-in test for TBM but the available ELISPOT

PLoS ONE | www.plosone.org 1 December 2010 | Volume 5 | Issue 12 | e15664

Method!• ref. stand.: culture or PCR • baseline: logistic

regression model (clinical predictors and lab tests)

• Index test: LAM ELISA !Result!• ∆ Sensitivity = +16% • ∆ Specificity = -5%

Page 55: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

LAM for TB Meningitis Methods

26

Page 56: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

LAM for TB Meningitis Methods

26

1 logistic regression model

Page 57: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

LAM for TB Meningitis Methods

26

1 logistic regression model 2 chose cut-off

Page 58: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

LAM for TB Meningitis Methods

26

1 logistic regression model 2 chose cut-off

3 compare sens/spec

Page 59: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA & TST for pre-IPT assessment in HIV+

27

Interferon release does not adddiscriminatory value to smear-negativeHIV–tuberculosis algorithmsM.X. Rangaka*,#,", H.P. Gideon#, K.A. Wilkinson#,+,1, M. Paie,J. Mwansa-Kambafwile#, G. Maartens1,**, J.R. Glynn", A. Boulle*, K. Fielding",R. Goliath#, R. Titus#, S. Mathee## and R.J. Wilkinson#,+,1,""

ABSTRACT: Clinical algorithms for evaluating HIV-infected individuals for tuberculosis (TB) priorto isoniazid preventive therapy (IPT) perform poorly, and interferon-c release assays (IGRAs)have moderate accuracy for active TB. It is unclear whether, when used as adjunct tests, IGRAsadd any clinical discriminatory value for active TB diagnosis in the pre-IPT assessment.

779 sputum smear-negative HIV-infected persons, established on or about to commencecombined antiretroviral therapy (ART), were screened for TB prior to IPT. Stepwise multivariablelogistic regression was used to develop clinical prediction models. The discriminatory ability wasassessed by receiver operator characteristic area under the curve (AUC). QuantiFERON1-TB Goldin-tube (QFT-GIT) was evaluated.

The prevalence of smear-negative TB by culture was 6.4% (95% CI 4.9–8.4%). Used alone, QFT-GIT and the tuberculin skin test (TST) had comparable performance; the post-test probability ofdisease based on single negative tests was 3–4%. In a multivariable model, the QFT-GIT test didnot improve the ability of a clinical algorithm, which included not taking ART, weight ,60 kg, noprior history of TB, any one positive TB symptom/sign (cough o2 weeks) and CD4+ count,250 cells per mm3, to discriminate smear-negative culture-positive and -negative TB (72% to74%; AUC comparison p50.33). The TST marginally improved the discriminatory ability of theclinical model (to 77%, AUC comparison p50.04).

QFT-GIT does not improve the discriminatory ability of current TB screening clinical algorithmsused to evaluate HIV-infected individuals for TB ahead of preventive therapy. Evaluation of newTB diagnostics for clinical relevance should follow a multivariable process that goes beyond testaccuracy.

KEYWORDS: Diagnostic research, HIV, incremental value, interferon-c release assay,Mycobacterium tuberculosis, QuantiFERON1-TB Gold in-tube

The HIV-1 epidemic has derailed tuber-culosis (TB) control in sub-Saharan Africa.It is recognised that additional interven-

tions, apart from the otherwise successful directlyobserved treatment short course (DOTS) strategy,are required to control TB [1]. The World HealthOrganization (WHO) therefore promotes isonia-zid preventive therapy (IPT), intensified TB casefinding and TB infection control to curb TB inHIV-infected individuals as part of the ‘‘ThreeIs’’ initiative [2]. Active TB should be excludedprior to starting IPT to avoid drug resistancefrom isoniazid monotherapy [3]. Where sputumculture for Mycobacterium tuberculosis has been

employed as a gold standard method in varioussettings, the median prevalence of positivity hasranged from 0.7% in population-based surveys to8.5% in HIV-1 voluntary counselling and testingservices [4]. Asymptomatic and smear-negativeforms of disease are also common and so theoperational need to screen for prevalent TB hasbecome very important [5, 6].

The gold standard for intensified TB case findingremains sputum culture. It is more sensitive thansmear, but takes several weeks and is expensive.Thus, screening algorithms based on TB symptomshave been suggested but sensitivity and specificity

AFFILIATIONS

*Centre for Infectious Disease and

Epidemiology Research (CIDER),#Clinical Infectious Diseases

Research Initiative, Institute of

Infectious Diseases and Molecular

Medicine,1Dept of Medicine,

**Division of Clinical Pharmacology,

University of Cape Town, and##Provincial Administration of the

Western Cape, Cape Town, South

Africa."London School of Hygiene and

Tropical Medicine,+MRC National Institute for Medical

Research, and""Division of Medicine, Imperial

College London, London, UK.eMcGill University and Montreal

Chest Institute, Montreal, QC,

Canada.

CORRESPONDENCE

M.X. Rangaka

Centre for Infectious Disease and

Epidemiology Research (CIDER)

School of Health Sciences

University of Cape Town

Observatory

7925

Cape Town

South Africa

E-mail: [email protected]

Received:

April 05 2011

Accepted after revision:

June 04 2011

First published online:

June 30 2011

European Respiratory Journal

Print ISSN 0903-1936

Online ISSN 1399-3003This article has supplementary material available from www.erj.ersjournals.com

EUROPEAN RESPIRATORY JOURNAL VOLUME 39 NUMBER 1 163

Eur Respir J 2012; 39: 163–171

DOI: 10.1183/09031936.00058911

Copyright!ERS 2012

c

Method!• ref. stand.: culture • baseline: logistic

regression model (demographics, clinical, CD4)

• Index tests: IGRA, TST !Results!• ∆ AUC(IGRA) =+2% • ∆ AUC(TST) = +5%

Page 60: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA & TST for pre-IPT assessment in HIV+

Methods

28

Page 61: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA & TST for pre-IPT assessment in HIV+

Methods

28

1 logistic regression model

Page 62: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA & TST for pre-IPT assessment in HIV+

Methods

28

1 logistic regression model

2 compare AUC

Page 63: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA & TST for pre-IPT assessment in HIV+

Methods

28

1 logistic regression model

2 compare AUCDiscriminatory ability of TB tests (Multivariable Analyses)

Clinical (blue, AUC=72%) AND TST at 5mm (red, AUC=77%)

Comparison p-value=0.03

Clinical (blue, AUC=72%) AND QFT (red, AUC=74%) Comparison p-value=0.41

Clinical (blue, AUC=72%) and both TST & QFT (red, AUC=78%)

Comparison p-value=0.01

QFT adds little to TB screening

algorithms for evaluating HIV-

infected adults for IPT

Discriminatory ability of TB tests (Multivariable Analyses)

Clinical (blue, AUC=72%) AND TST at 5mm (red, AUC=77%)

Comparison p-value=0.03

Clinical (blue, AUC=72%) AND QFT (red, AUC=74%) Comparison p-value=0.41

Clinical (blue, AUC=72%) and both TST & QFT (red, AUC=78%)

Comparison p-value=0.01

QFT adds little to TB screening

algorithms for evaluating HIV-

infected adults for IPT

Page 64: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA & TST for pre-IPT assessment in HIV+

Methods

28

1 logistic regression model

2 compare AUC

Discriminatory ability of TB tests (Multivariable Analyses)

Clinical (blue, AUC=72%) AND TST at 5mm (red, AUC=77%)

Comparison p-value=0.03

Clinical (blue, AUC=72%) AND QFT (red, AUC=74%) Comparison p-value=0.41

Clinical (blue, AUC=72%) and both TST & QFT (red, AUC=78%)

Comparison p-value=0.01

QFT adds little to TB screening

algorithms for evaluating HIV-

infected adults for IPT

Discriminatory ability of TB tests (Multivariable Analyses)

Clinical (blue, AUC=72%) AND TST at 5mm (red, AUC=77%)

Comparison p-value=0.03

Clinical (blue, AUC=72%) AND QFT (red, AUC=74%) Comparison p-value=0.41

Clinical (blue, AUC=72%) and both TST & QFT (red, AUC=78%)

Comparison p-value=0.01

QFT adds little to TB screening

algorithms for evaluating HIV-

infected adults for IPT

Discriminatory ability of TB tests (Multivariable Analyses)

Clinical (blue, AUC=72%) AND TST at 5mm (red, AUC=77%)

Comparison p-value=0.03

Clinical (blue, AUC=72%) AND QFT (red, AUC=74%) Comparison p-value=0.41

Clinical (blue, AUC=72%) and both TST & QFT (red, AUC=78%)

Comparison p-value=0.01

QFT adds little to TB screening

algorithms for evaluating HIV-

infected adults for IPT

Page 65: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA for Risk Stratification of Active Tuberculosis Suspects

29

Evaluation of Quantitative IFN-g Response for RiskStratification of Active Tuberculosis SuspectsJohn Z. Metcalfe1, Adithya Cattamanchi1, Eric Vittinghoff2, Christine Ho3,4, Jennifer Grinsdale3,Philip C. Hopewell1,3, L. Masae Kawamura3, and Payam Nahid1,3

1Division of Pulmonary and Critical Care Medicine, San Francisco General Hospital, and 2Department of Epidemiology and Biostatistics, Universityof California, San Francisco; 3Tuberculosis Control Section, Department of Public Health; and 4Centers for Disease Control and Prevention,San Francisco, California

Rationale: The contribution of interferon-g release assays (IGRAs) toappropriate risk stratification of active tuberculosis suspects has notbeen studied.Objectives: To determine whether the addition of quantitative IGRAresults to a prediction model incorporating clinical criteria improvesrisk stratification of smear-negative–tuberculosis suspects.Methods: Clinical data from tuberculosis suspects evaluated by theSan Francisco Department of Public Health Tuberculosis ControlClinic from March 2005 to February 2008 were reviewed. Weexcluded tuberculosis suspects who were acid fast–bacilli smear–positive, HIV-infected, or under 10 years of age. We developeda clinical prediction model for culture-positive disease and examinedthe benefit of adding quantitative interferon (IFN)-g results mea-sured by QuantiFERON-TB Gold (Cellestis, Carnegie, Australia).Measurements and Main Results: Of 660 patients meeting eligibilitycriteria, 65 (10%) had culture-proven tuberculosis. The odds ofactive tuberculosis increased by 7% (95% confidence interval [CI],3–11%) for each doublingof IFN-g level. The addition of quantitativeIFN-g results to objective clinical data significantly improved modelperformance (c-statistic 0.71 vs. 0.78; P , 0.001) and correctlyreclassified 32% of tuberculosis suspects (95% CI,11–52%; P ,0.001) into higher-risk or lower-risk categories. However, quantita-tive IFN-g results did not significantly improve appropriate riskreclassification beyond that provided by clinician assessment of risk(4%; 95% CI, 27 to 122%; P 5 0.14).Conclusions: Higher quantitative IFN-g results were associated withactive tuberculosis, and added clinical value to a prediction modelincorporating conventional risk factors. Although this benefit maybe attenuated within highly experienced centers, the predictiveaccuracy of quantitative IFN-g levels should be evaluated in othersettings.

Keywords: interferon-gamma release assay; Quantiferon; risk predic-tion; risk reclassification

Interferon-g release assays (IGRAs) are in vitro immuno-diagnostic tests that measure effector T cell mediated interferon(IFN)-g response to Mycobacterium tuberculosis–specific anti-gens. IGRAs are as sensitive and more specific than the tubercu-lin skin test for detecting latent tuberculosis infection (LTBI) (1,

2) and have better correlation with gradient of M. tuberculosisexposure (3–8). In 2005, the Centers for Disease Control andPrevention recommended that QuantiFERON TB-Gold (QFT-G; Cellestis, Carnegie, Australia), the first FDA-approved,commercially available IGRA to experience widespread use,could be used for targeted screening of LTBI in all circumstancesin which the tuberculin skin test (TST) is used (9).

Although the advantages of IGRAs in diagnosing LTBI arewell established, their role in evaluating active tuberculosissuspects remains unclear. IGRAs have variable, although oftensuboptimal, sensitivity and specificity for diagnosing activetuberculosis (1, 2, 10–16). To date, with the exception of studiesexamining these assays in parallel with the TST (11, 17), IGRAshave not been considered in light of conventional risk factorsfor active disease. In addition, whether IGRAs improve pre-diction of individual patients’ risk for active tuberculosis has notbeen examined.

Acid fast bacilli (AFB) smear-positive tuberculosis suspectscan often be triaged with relative ease. However, in suspectswhose sputa or other tissue are smear-negative for AFB,clinicians use demographic and clinical risk factors, symptoms,and chest radiograph findings to classify patients into low-,intermediate-, or high-risk categories for active tuberculosis.Patients classified as high risk are typically initiated on antitu-berculosis therapy, whereas treatment is withheld for low-riskpatients. In this study, we use novel risk reclassification methods(18) to assess whether addition of quantitative IFN-g responsemeasured by QFT-G (Cellestis) to routine clinical evaluationimproves risk stratification of individuals suspected of havingsmear-negative pulmonary and extrapulmonary tuberculosis.

Some of the results of these studies have been previouslyreported in abstract form (19).

AT A GLANCE COMMENTARY

Scientific Knowledge on the Subject

The role of interferon-g release assays (IGRAs) in theevaluation of active tuberculosis suspects is controversial.To date, whether IGRAs improve classification of smearnegative tuberculosis suspects into clinically relevant riskcategories has not been examined.

What This Study Adds to the Field

Quantitative interferon-g levels measured by Quanti-FERON-TB Gold improves risk stratification of smear-negative active tuberculosis suspects when added to objectiveclinical and demographic risk factors. However, this benefit isattenuated when the judgment of experienced clinicians isalso considered.

(Received in original form June 30, 2009; accepted in final form October 1, 2009)

Supported by grants T32 HL007185 (J.Z.M.), K23 HL094141 (A.C.), R01AI034238 (P.C.H.), and K23 HL092629 (P.N.) from the National Institutes ofHealth.

Correspondence and requests for reprints should be addressed to John Z.Metcalfe, M.D., M.P.H., University of California, San Francisco Division ofPulmonary and Critical Care Medicine, 1001 Potrero Avenue, Rm 5K1, SanFrancisco, CA 94110-0111. E-mail: [email protected]

This article has an online supplement, which is accessible in this issue’s table ofcontents at www.atsjournals.org

Am J Respir Crit Care Med Vol 181. pp 87–93, 2010Originally Published in Press as DOI: 10.1164/rccm.200906-0981OC on October 1, 2009Internet address: www.atsjournals.org

Method!• ref. stand.: culture • baseline: logistic

regression model (demographics, clinical, CXR)

• Index test: IGRA (quant.) !Results!• ∆ AUC = +7% • NRI = +32% • no improvement beyond

clinician assessment

Page 66: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA for Risk Stratification of Active Tuberculosis Suspects

Methods

30

Page 67: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA for Risk Stratification of Active Tuberculosis Suspects

Methods

30

1 logistic regression model

Page 68: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

IGRA for Risk Stratification of Active Tuberculosis Suspects

Methods

30

1 logistic regression model2 assess reclassification

Page 69: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

risk categories (Table 3). In comparison to the clinical modelalone, both case reclassification (11 more cases classified as highrisk and 1 fewer as low risk) and noncase reclassification (88more noncases designated as low risk and no more noncasesclassified as high risk) were improved. Results were similar whenalternate thresholds were used to define risk categories (seeTable E1 in the online supplement). Findings on chest radio-graph remained the strongest predictor of active tuberculosis.

Secondary Analysis

We performed a secondary analysis to determine whetherquantitative IFN-g levels improved risk reclassification beyonda prediction model that includes clinician suspicion. First, weevaluated whether a similar benefit in risk reclassification oc-curred when clinician suspicion, rather than quantitative IFN-glevel, was added to the baseline prediction model includingobjective demographic and clinical data. When clinician suspi-cion was added to the baseline model, accuracy increased (AUC,0.71; 95% CI, 0.64–0.77 vs. 0.82; 95% CI, 0.77–0.88; P , 0.001)

and 45% of tuberculosis suspects (95% CI, 23–80%; P , 0.001)were appropriately reclassified into higher- or lower-risk cate-gories (data not shown). Next, the addition of quantitative IFN-gresults to this expanded model, including clinician suspicion,significantly increased accuracy (AUC, 0.82; 95% CI, 0.77–0.88vs. AUC, 0.86; 95% CI, 0.81–0.91; P 5 0.02), but not netreclassification index (NRI, 4%; 95% CI, 20.07 to 0.22; P 50.14). Improved prediction among tuberculosis cases was out-weighed by worse performance among noncases (Table 4). Theaddition of QFT-G results at the manufacturer-recommended cutpoint of 0.35 IU/ml in place of quantitative IFN-g levels did notmaterially affect results obtained in either the primary orsecondary analysis. To further explore performance in casesand noncases, we examined individual patients’ risk before andafter quantitative IFN-g level was added to the model. Themajority of culture-proven cases showed an appropriate increasein predicted risk with addition of quantitative IFN-g results(Figure 2A). However, both decreased (appropriate) and in-creased (inappropriate) risk prediction was common amongnoncases (Figure 2B).

TABLE 2. COEFFICIENTS AND SUMMARY STATISTICS FOR PREDICTION MODELS

Baseline ClinicalPrediction Model

BaselinePrediction

Model withIFN-g Results

BaselinePrediction

Model withClinician Suspicion

Baseline PredictionModel with Clinician

Suspicion andIFN-g Results

CXR, active disease* 2.92 3.66 0.92 1.18Night sweats or weight loss 1.60 2.22 1.12 1.45Previous active disease 0.29 0.27 0.24 0.0 23US birth† 1.80 2.85 2.01 2.95Foreign birth, <2 yr in US† 1.41 1.58 2.33 2.45Foreign born, 3–12 yr in US† 2.71 3.37 2.09 2.65Contact to active case 2.43 2.11 3.69 3.09High clinical suspicion‡ 19.43 19.31Intermediate clinical suspicion‡ 5.53 4.83Quantitative IFN-g result (effect size

per each doubling, IU/ml)1.07 1.07

AIC 400 374 346 323AUC 0.71 (0.64–0.77) 0.78x (0.73–0.84) 0.82 (0.77–0.88) 0.86k (0.81–0.91)

Definition of abbreviations: AIC 5 Akaike information criterion, a measure of the goodness of fit of a statistical model with lowervalues indicating better fit; AUC 5 Area under the receiver operating curve, the probability that a randomly selected case willhave a higher test value than a randomly selected noncase; a perfect test has an area under the curve of 1.0, while a worthless testhas an area of 0.5; CXR 5 chest radiograph.

* Reference category: inactive disease or normal CXR.† Reference category: foreign born, .12 years in US.‡ Reference category: low clinical suspicion.x Significant difference (P , 0.001) between this model and previous model without quantitative IFN-g results.k Significant difference (P 5 0.02) between this model and previous model without quantitative IFN-g results.

TABLE 3. RISK RECLASSIFICATION FOLLOWING INCORPORATION OF IFN-g RESULTS. COMPARISONTO BASELINE CLINICAL PREDICTION MODEL

Model with ClinicalPredictors Alone

Model with Clinical Predictors and Quantitative IFN-g ResultsPercent Appropriately

Reclassified<5% risk 5–20% risk .20% risk Total No.

In 65 patients who developedculture-positive disease

<5% risk 7 3 0 10 305–20% risk 1 27 12 40 28.20% risk 1 0 14 15 7Total No. 9 30 26 65

In 595 patients who ruled outfor active tuberculosis

<5% risk 158 18 0 176 2105–20% risk 89 241 27 357 17.20% risk 17 10 35 62 44Total No. 264 269 62 595

Net reclassification improvement 5 31.9% (P , 0.001). Reclassification among patients who developed culture-positivedisease 5 20% (P , 0.01); reclassification among patients who ruled out for active tuberculosis 5 11.9% (P , 0.001).

90 AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE VOL 181 2010

IGRA for Risk Stratification of Active Tuberculosis Suspects

Methods

30

1 logistic regression model2 assess reclassification

Page 70: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

risk categories (Table 3). In comparison to the clinical modelalone, both case reclassification (11 more cases classified as highrisk and 1 fewer as low risk) and noncase reclassification (88more noncases designated as low risk and no more noncasesclassified as high risk) were improved. Results were similar whenalternate thresholds were used to define risk categories (seeTable E1 in the online supplement). Findings on chest radio-graph remained the strongest predictor of active tuberculosis.

Secondary Analysis

We performed a secondary analysis to determine whetherquantitative IFN-g levels improved risk reclassification beyonda prediction model that includes clinician suspicion. First, weevaluated whether a similar benefit in risk reclassification oc-curred when clinician suspicion, rather than quantitative IFN-glevel, was added to the baseline prediction model includingobjective demographic and clinical data. When clinician suspi-cion was added to the baseline model, accuracy increased (AUC,0.71; 95% CI, 0.64–0.77 vs. 0.82; 95% CI, 0.77–0.88; P , 0.001)

and 45% of tuberculosis suspects (95% CI, 23–80%; P , 0.001)were appropriately reclassified into higher- or lower-risk cate-gories (data not shown). Next, the addition of quantitative IFN-gresults to this expanded model, including clinician suspicion,significantly increased accuracy (AUC, 0.82; 95% CI, 0.77–0.88vs. AUC, 0.86; 95% CI, 0.81–0.91; P 5 0.02), but not netreclassification index (NRI, 4%; 95% CI, 20.07 to 0.22; P 50.14). Improved prediction among tuberculosis cases was out-weighed by worse performance among noncases (Table 4). Theaddition of QFT-G results at the manufacturer-recommended cutpoint of 0.35 IU/ml in place of quantitative IFN-g levels did notmaterially affect results obtained in either the primary orsecondary analysis. To further explore performance in casesand noncases, we examined individual patients’ risk before andafter quantitative IFN-g level was added to the model. Themajority of culture-proven cases showed an appropriate increasein predicted risk with addition of quantitative IFN-g results(Figure 2A). However, both decreased (appropriate) and in-creased (inappropriate) risk prediction was common amongnoncases (Figure 2B).

TABLE 2. COEFFICIENTS AND SUMMARY STATISTICS FOR PREDICTION MODELS

Baseline ClinicalPrediction Model

BaselinePrediction

Model withIFN-g Results

BaselinePrediction

Model withClinician Suspicion

Baseline PredictionModel with Clinician

Suspicion andIFN-g Results

CXR, active disease* 2.92 3.66 0.92 1.18Night sweats or weight loss 1.60 2.22 1.12 1.45Previous active disease 0.29 0.27 0.24 0.0 23US birth† 1.80 2.85 2.01 2.95Foreign birth, <2 yr in US† 1.41 1.58 2.33 2.45Foreign born, 3–12 yr in US† 2.71 3.37 2.09 2.65Contact to active case 2.43 2.11 3.69 3.09High clinical suspicion‡ 19.43 19.31Intermediate clinical suspicion‡ 5.53 4.83Quantitative IFN-g result (effect size

per each doubling, IU/ml)1.07 1.07

AIC 400 374 346 323AUC 0.71 (0.64–0.77) 0.78x (0.73–0.84) 0.82 (0.77–0.88) 0.86k (0.81–0.91)

Definition of abbreviations: AIC 5 Akaike information criterion, a measure of the goodness of fit of a statistical model with lowervalues indicating better fit; AUC 5 Area under the receiver operating curve, the probability that a randomly selected case willhave a higher test value than a randomly selected noncase; a perfect test has an area under the curve of 1.0, while a worthless testhas an area of 0.5; CXR 5 chest radiograph.

* Reference category: inactive disease or normal CXR.† Reference category: foreign born, .12 years in US.‡ Reference category: low clinical suspicion.x Significant difference (P , 0.001) between this model and previous model without quantitative IFN-g results.k Significant difference (P 5 0.02) between this model and previous model without quantitative IFN-g results.

TABLE 3. RISK RECLASSIFICATION FOLLOWING INCORPORATION OF IFN-g RESULTS. COMPARISONTO BASELINE CLINICAL PREDICTION MODEL

Model with ClinicalPredictors Alone

Model with Clinical Predictors and Quantitative IFN-g ResultsPercent Appropriately

Reclassified<5% risk 5–20% risk .20% risk Total No.

In 65 patients who developedculture-positive disease

<5% risk 7 3 0 10 305–20% risk 1 27 12 40 28.20% risk 1 0 14 15 7Total No. 9 30 26 65

In 595 patients who ruled outfor active tuberculosis

<5% risk 158 18 0 176 2105–20% risk 89 241 27 357 17.20% risk 17 10 35 62 44Total No. 264 269 62 595

Net reclassification improvement 5 31.9% (P , 0.001). Reclassification among patients who developed culture-positivedisease 5 20% (P , 0.01); reclassification among patients who ruled out for active tuberculosis 5 11.9% (P , 0.001).

90 AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE VOL 181 2010

IGRA for Risk Stratification of Active Tuberculosis Suspects

Methods

30

1 logistic regression model2 assess reclassification

Page 71: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

risk categories (Table 3). In comparison to the clinical modelalone, both case reclassification (11 more cases classified as highrisk and 1 fewer as low risk) and noncase reclassification (88more noncases designated as low risk and no more noncasesclassified as high risk) were improved. Results were similar whenalternate thresholds were used to define risk categories (seeTable E1 in the online supplement). Findings on chest radio-graph remained the strongest predictor of active tuberculosis.

Secondary Analysis

We performed a secondary analysis to determine whetherquantitative IFN-g levels improved risk reclassification beyonda prediction model that includes clinician suspicion. First, weevaluated whether a similar benefit in risk reclassification oc-curred when clinician suspicion, rather than quantitative IFN-glevel, was added to the baseline prediction model includingobjective demographic and clinical data. When clinician suspi-cion was added to the baseline model, accuracy increased (AUC,0.71; 95% CI, 0.64–0.77 vs. 0.82; 95% CI, 0.77–0.88; P , 0.001)

and 45% of tuberculosis suspects (95% CI, 23–80%; P , 0.001)were appropriately reclassified into higher- or lower-risk cate-gories (data not shown). Next, the addition of quantitative IFN-gresults to this expanded model, including clinician suspicion,significantly increased accuracy (AUC, 0.82; 95% CI, 0.77–0.88vs. AUC, 0.86; 95% CI, 0.81–0.91; P 5 0.02), but not netreclassification index (NRI, 4%; 95% CI, 20.07 to 0.22; P 50.14). Improved prediction among tuberculosis cases was out-weighed by worse performance among noncases (Table 4). Theaddition of QFT-G results at the manufacturer-recommended cutpoint of 0.35 IU/ml in place of quantitative IFN-g levels did notmaterially affect results obtained in either the primary orsecondary analysis. To further explore performance in casesand noncases, we examined individual patients’ risk before andafter quantitative IFN-g level was added to the model. Themajority of culture-proven cases showed an appropriate increasein predicted risk with addition of quantitative IFN-g results(Figure 2A). However, both decreased (appropriate) and in-creased (inappropriate) risk prediction was common amongnoncases (Figure 2B).

TABLE 2. COEFFICIENTS AND SUMMARY STATISTICS FOR PREDICTION MODELS

Baseline ClinicalPrediction Model

BaselinePrediction

Model withIFN-g Results

BaselinePrediction

Model withClinician Suspicion

Baseline PredictionModel with Clinician

Suspicion andIFN-g Results

CXR, active disease* 2.92 3.66 0.92 1.18Night sweats or weight loss 1.60 2.22 1.12 1.45Previous active disease 0.29 0.27 0.24 0.0 23US birth† 1.80 2.85 2.01 2.95Foreign birth, <2 yr in US† 1.41 1.58 2.33 2.45Foreign born, 3–12 yr in US† 2.71 3.37 2.09 2.65Contact to active case 2.43 2.11 3.69 3.09High clinical suspicion‡ 19.43 19.31Intermediate clinical suspicion‡ 5.53 4.83Quantitative IFN-g result (effect size

per each doubling, IU/ml)1.07 1.07

AIC 400 374 346 323AUC 0.71 (0.64–0.77) 0.78x (0.73–0.84) 0.82 (0.77–0.88) 0.86k (0.81–0.91)

Definition of abbreviations: AIC 5 Akaike information criterion, a measure of the goodness of fit of a statistical model with lowervalues indicating better fit; AUC 5 Area under the receiver operating curve, the probability that a randomly selected case willhave a higher test value than a randomly selected noncase; a perfect test has an area under the curve of 1.0, while a worthless testhas an area of 0.5; CXR 5 chest radiograph.

* Reference category: inactive disease or normal CXR.† Reference category: foreign born, .12 years in US.‡ Reference category: low clinical suspicion.x Significant difference (P , 0.001) between this model and previous model without quantitative IFN-g results.k Significant difference (P 5 0.02) between this model and previous model without quantitative IFN-g results.

TABLE 3. RISK RECLASSIFICATION FOLLOWING INCORPORATION OF IFN-g RESULTS. COMPARISONTO BASELINE CLINICAL PREDICTION MODEL

Model with ClinicalPredictors Alone

Model with Clinical Predictors and Quantitative IFN-g ResultsPercent Appropriately

Reclassified<5% risk 5–20% risk .20% risk Total No.

In 65 patients who developedculture-positive disease

<5% risk 7 3 0 10 305–20% risk 1 27 12 40 28.20% risk 1 0 14 15 7Total No. 9 30 26 65

In 595 patients who ruled outfor active tuberculosis

<5% risk 158 18 0 176 2105–20% risk 89 241 27 357 17.20% risk 17 10 35 62 44Total No. 264 269 62 595

Net reclassification improvement 5 31.9% (P , 0.001). Reclassification among patients who developed culture-positivedisease 5 20% (P , 0.01); reclassification among patients who ruled out for active tuberculosis 5 11.9% (P , 0.001).

90 AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE VOL 181 2010

IGRA for Risk Stratification of Active Tuberculosis Suspects

Methods

30

1 logistic regression model2 assess reclassification

Page 72: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

risk categories (Table 3). In comparison to the clinical modelalone, both case reclassification (11 more cases classified as highrisk and 1 fewer as low risk) and noncase reclassification (88more noncases designated as low risk and no more noncasesclassified as high risk) were improved. Results were similar whenalternate thresholds were used to define risk categories (seeTable E1 in the online supplement). Findings on chest radio-graph remained the strongest predictor of active tuberculosis.

Secondary Analysis

We performed a secondary analysis to determine whetherquantitative IFN-g levels improved risk reclassification beyonda prediction model that includes clinician suspicion. First, weevaluated whether a similar benefit in risk reclassification oc-curred when clinician suspicion, rather than quantitative IFN-glevel, was added to the baseline prediction model includingobjective demographic and clinical data. When clinician suspi-cion was added to the baseline model, accuracy increased (AUC,0.71; 95% CI, 0.64–0.77 vs. 0.82; 95% CI, 0.77–0.88; P , 0.001)

and 45% of tuberculosis suspects (95% CI, 23–80%; P , 0.001)were appropriately reclassified into higher- or lower-risk cate-gories (data not shown). Next, the addition of quantitative IFN-gresults to this expanded model, including clinician suspicion,significantly increased accuracy (AUC, 0.82; 95% CI, 0.77–0.88vs. AUC, 0.86; 95% CI, 0.81–0.91; P 5 0.02), but not netreclassification index (NRI, 4%; 95% CI, 20.07 to 0.22; P 50.14). Improved prediction among tuberculosis cases was out-weighed by worse performance among noncases (Table 4). Theaddition of QFT-G results at the manufacturer-recommended cutpoint of 0.35 IU/ml in place of quantitative IFN-g levels did notmaterially affect results obtained in either the primary orsecondary analysis. To further explore performance in casesand noncases, we examined individual patients’ risk before andafter quantitative IFN-g level was added to the model. Themajority of culture-proven cases showed an appropriate increasein predicted risk with addition of quantitative IFN-g results(Figure 2A). However, both decreased (appropriate) and in-creased (inappropriate) risk prediction was common amongnoncases (Figure 2B).

TABLE 2. COEFFICIENTS AND SUMMARY STATISTICS FOR PREDICTION MODELS

Baseline ClinicalPrediction Model

BaselinePrediction

Model withIFN-g Results

BaselinePrediction

Model withClinician Suspicion

Baseline PredictionModel with Clinician

Suspicion andIFN-g Results

CXR, active disease* 2.92 3.66 0.92 1.18Night sweats or weight loss 1.60 2.22 1.12 1.45Previous active disease 0.29 0.27 0.24 0.0 23US birth† 1.80 2.85 2.01 2.95Foreign birth, <2 yr in US† 1.41 1.58 2.33 2.45Foreign born, 3–12 yr in US† 2.71 3.37 2.09 2.65Contact to active case 2.43 2.11 3.69 3.09High clinical suspicion‡ 19.43 19.31Intermediate clinical suspicion‡ 5.53 4.83Quantitative IFN-g result (effect size

per each doubling, IU/ml)1.07 1.07

AIC 400 374 346 323AUC 0.71 (0.64–0.77) 0.78x (0.73–0.84) 0.82 (0.77–0.88) 0.86k (0.81–0.91)

Definition of abbreviations: AIC 5 Akaike information criterion, a measure of the goodness of fit of a statistical model with lowervalues indicating better fit; AUC 5 Area under the receiver operating curve, the probability that a randomly selected case willhave a higher test value than a randomly selected noncase; a perfect test has an area under the curve of 1.0, while a worthless testhas an area of 0.5; CXR 5 chest radiograph.

* Reference category: inactive disease or normal CXR.† Reference category: foreign born, .12 years in US.‡ Reference category: low clinical suspicion.x Significant difference (P , 0.001) between this model and previous model without quantitative IFN-g results.k Significant difference (P 5 0.02) between this model and previous model without quantitative IFN-g results.

TABLE 3. RISK RECLASSIFICATION FOLLOWING INCORPORATION OF IFN-g RESULTS. COMPARISONTO BASELINE CLINICAL PREDICTION MODEL

Model with ClinicalPredictors Alone

Model with Clinical Predictors and Quantitative IFN-g ResultsPercent Appropriately

Reclassified<5% risk 5–20% risk .20% risk Total No.

In 65 patients who developedculture-positive disease

<5% risk 7 3 0 10 305–20% risk 1 27 12 40 28.20% risk 1 0 14 15 7Total No. 9 30 26 65

In 595 patients who ruled outfor active tuberculosis

<5% risk 158 18 0 176 2105–20% risk 89 241 27 357 17.20% risk 17 10 35 62 44Total No. 264 269 62 595

Net reclassification improvement 5 31.9% (P , 0.001). Reclassification among patients who developed culture-positivedisease 5 20% (P , 0.01); reclassification among patients who ruled out for active tuberculosis 5 11.9% (P , 0.001).

90 AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE VOL 181 2010

IGRA for Risk Stratification of Active Tuberculosis Suspects

Methods

30

1 logistic regression model2 assess reclassification

Page 73: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

Decision Thresholds

31

Page 74: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

Decision Thresholds

• Taking clinical consequences of testing into account explicitly

31

Page 75: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

Decision Thresholds

• Taking clinical consequences of testing into account explicitly

• Decision threshold (pt): probability where expected benefit of treatment is equal to the expected benefit of avoiding treatment

31

Page 76: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

Decision Thresholds

• Taking clinical consequences of testing into account explicitly

• Decision threshold (pt): probability where expected benefit of treatment is equal to the expected benefit of avoiding treatment• e.g. pt = 20% = willing to over-treat 80 to

correctly treat 20; or under-treat 20 to correctly withhold treatment in 80

31

Page 77: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

The Basic Idea

32

Page 78: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

The Basic Idea

32

Page 79: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

The Basic Idea

• key idea: by committing to a particular pt, you implicitly state your view of what the trade-off should be between avoiding FPs and FNs

32

Page 80: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

The Basic Idea

• key idea: by committing to a particular pt, you implicitly state your view of what the trade-off should be between avoiding FPs and FNs‣ this can then be used to calculate the NB of a test/model at that

threshold

32

Page 81: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

The Basic Idea

• key idea: by committing to a particular pt, you implicitly state your view of what the trade-off should be between avoiding FPs and FNs‣ this can then be used to calculate the NB of a test/model at that

threshold

• NB indicates how many more TP classifications can be made with a model for the same number of FP classifications, compared to not using a model

32

Page 82: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

The Basic Idea

• key idea: by committing to a particular pt, you implicitly state your view of what the trade-off should be between avoiding FPs and FNs‣ this can then be used to calculate the NB of a test/model at that

threshold

• NB indicates how many more TP classifications can be made with a model for the same number of FP classifications, compared to not using a model• e.g. NB of 0.05: use of test/model leads to 5 extra TPs per 100 tested

without an increase in FPs, compared to assuming all are test-negative

32

Page 83: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Net Benefit & Decision Curve Analysis

The Basic Idea

• key idea: by committing to a particular pt, you implicitly state your view of what the trade-off should be between avoiding FPs and FNs‣ this can then be used to calculate the NB of a test/model at that

threshold

• NB indicates how many more TP classifications can be made with a model for the same number of FP classifications, compared to not using a model• e.g. NB of 0.05: use of test/model leads to 5 extra TPs per 100 tested

without an increase in FPs, compared to assuming all are test-negative

• DCA: NB over a range of decision thresholds

32

Page 84: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

33

Page 85: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?

33

Page 86: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?a) less radical: avoid impotence/incontinence, risk recurrence

33

Page 87: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?a) less radical: avoid impotence/incontinence, risk recurrenceb) more radical: risk impotence/incontinence, avoid recurrence

33

Page 88: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?a) less radical: avoid impotence/incontinence, risk recurrenceb) more radical: risk impotence/incontinence, avoid recurrence

• possible decisions

33

Page 89: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?a) less radical: avoid impotence/incontinence, risk recurrenceb) more radical: risk impotence/incontinence, avoid recurrence

• possible decisions1. assume all will recur / all radical (perfect sens.)

33

Page 90: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?a) less radical: avoid impotence/incontinence, risk recurrenceb) more radical: risk impotence/incontinence, avoid recurrence

• possible decisions1. assume all will recur / all radical (perfect sens.)2. assume none will recur / none radical (perfect spec.)

33

Page 91: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?a) less radical: avoid impotence/incontinence, risk recurrenceb) more radical: risk impotence/incontinence, avoid recurrence

• possible decisions1. assume all will recur / all radical (perfect sens.)2. assume none will recur / none radical (perfect spec.)3. decision based on cancer grade (AUC 0.56)

33

Page 92: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?a) less radical: avoid impotence/incontinence, risk recurrenceb) more radical: risk impotence/incontinence, avoid recurrence

• possible decisions1. assume all will recur / all radical (perfect sens.)2. assume none will recur / none radical (perfect spec.)3. decision based on cancer grade (AUC 0.56)4. decision based on cancer grade & stage (AUC 0.58)

33

Page 93: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

• clinical decision: more radical vs. less radical surgery?a) less radical: avoid impotence/incontinence, risk recurrenceb) more radical: risk impotence/incontinence, avoid recurrence

• possible decisions1. assume all will recur / all radical (perfect sens.)2. assume none will recur / none radical (perfect spec.)3. decision based on cancer grade (AUC 0.56)4. decision based on cancer grade & stage (AUC 0.58)5. decision based on multivariable model (AUC 0.73)

33

Page 94: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

34

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

none!radicalall !

radical grade

multivariable model

grade + stage

treat even if P(D) is low

only treat if P(D) very high

Page 95: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Imperfect Reference Standard

35

Page 96: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental value: 3 tests

36

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

True Positives(100%)

True Negatives(100%)Reference Standard

Index TestPositives

(% Sensitivity)Negatives

(% Specificity)

Reference Standard

Baseline Test Positives Negatives

Index Test

True Positives(100%)

True Negatives(100%)

Positives Negatives

Incremental Value

Page 97: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Xpert for Paediatric TB

37

www.thelancet.com/infection Published online July 18, 2011 DOI:10.1016/S1473-3099(11)70167-0 1

Articles

Published OnlineJuly 18, 2011DOI:10.1016/S1473-3099(11)70167-0

See Online/CommentDOI:10.1016/S1473-3099(11)70187-6

Division of Medical Microbiology and Institute for Infectious Diseases and Molecular Medicine (Prof M P Nicol PhD, W Zemanay PhD), Division of Clinical Pharmacology, Department of Medicine (L Workman MSc), and Department of Paediatrics and Child Health (W Isaacs BSc, J Munro, F Black MBBCh, B Eley FCPaed, Prof H J Zar PhD), University of Cape Town, Cape Town, South Africa; National Health Laboratory Service, Groote Schuur Hospital, Cape Town, South Africa (Prof M P Nicol, W Zemanay); Red Cross War Memorial Children’s Hospital, Cape Town, South Africa (W Isaacs, J Munro, F Black, B Eley, Prof H J Zar); and Foundation for Innovative New Diagnostics (FIND), Geneva, Switzerland (C C Boehme MD)

Correspondence to:Prof Mark P Nicol, 5.28 Falmouth, Faculty of Health Sciences, Anzio Road, Observatory, Cape Town 7925, South [email protected]

Accuracy of the Xpert MTB/RIF test for the diagnosis of pulmonary tuberculosis in children admitted to hospital in Cape Town, South Africa: a descriptive studyMark P Nicol, Lesley Workman, Washiefa Isaacs, Jacinta Munro, Faye Black, Brian Eley, Catharina C Boehme, Widaad Zemanay, Heather J Zar

SummaryBackground WHO recommends that Xpert MTB/RIF replaces smear microscopy for initial diagnosis of suspected HIV-associated tuberculosis or multidrug-resistant pulmonary tuberculosis, but no data exist for its use in children. We aimed to assess the accuracy of the test for the diagnosis of pulmonary tuberculosis in children in an area with high tuberculosis and HIV prevalences.

Methods In this prospective, descriptive study, we enrolled children aged 15 years or younger who had been admitted to one of two hospitals in Cape Town, South Africa, with suspected pulmonary tuberculosis between Feb 19, 2009, and Nov 30, 2010. We compared the diagnostic accuracy of MTB/RIF and concentrated, fl uorescent acid-fast smear with a reference standard of liquid culture from two sequential induced sputum specimens (primary analysis).

Results 452 children (median age 19·4 months, IQR 11·1–46·2) had at least one induced sputum specimen; 108 children (24%) had HIV infection. 27 children (6%) had a positive smear result, 70 (16%) had a positive culture result, and 58 (13%) had a positive MTB/RIF test result. With mycobacterial culture as the reference standard, MTB/RIF tests when done on two induced sputum samples detected twice as many cases (75·9%, 95% CI 64·5–87·2) as did smear microscopy (37·9%, 25·1–50·8), detecting all of 22 smear-positive cases and 22 of 36 (61·1%, 44·4–77·8) smear-negative cases. For smear-negative cases, the incremental increase in sensitivity from testing a second specimen was 27·8% for MTB/RIF, compared with 13·8% for culture. The specifi city of MTB/RIF was 98·8% (97·6–99·9). MTB/RIF results were available in median 1 day (IQR 0–4) compared with median 12 days (9–17) for culture (p<0·0001).

Interpretation MTB/RIF testing of two induced sputum specimens is warranted as the fi rst-line diagnostic test for children with suspected pulmonary tuberculosis.

Funding National Institutes of Health, the National Health Laboratory Service Research Trust, the Medical Research Council of South Africa, and Wellcome Trust.

IntroductionDiagnosis of pulmonary tuberculosis in children has relied predominantly on clinical, radiological, and tuberculin skin-test fi ndings.1 However, clinical diagnosis has low specifi city, radiological interpretation is subject to interobserver variability, and the tuberculin skin test is a marker of exposure, not disease.1–3 Microbiological confi rmation with identifi cation of drug resistance is increasingly important in the context of an emerging drug-resistant tuberculosis epidemic. Furthermore, confi rmation is useful in children with HIV, in whom pill burden, drug interactions, and adherence issues make treatment of HIV and tuberculosis diffi cult. Use of repeated induced sputum specimens in children is simple, well tolerated, and eff ective for microbiological confi rmation of pulmonary tuberculosis, even in infants.4,5 One induced sputum specimen provides a similar microbiological yield to three gastric lavage specimens in children admitted to hospital with pulmonary tuberculosis.4

Smear microscopy is typically negative in children with culture-confi rmed tuberculosis, even when optimised fl uorescence microscopy is used on concentrated specimens. 1,6 Mycobacterial culture of induced sputum

specimens is therefore needed, but culture can take weeks and is consequently unavailable to inform clinical decisions on initial treatment. A rapid diagnosis of tuberculosis in children is desirable because delayed diagnosis is associated with poor outcome.7

An urgent need therefore exists for a rapid, sensitive, and specifi c test for tuberculosis and for identifi cation of drug-resistant disease in children. The performance of Xpert MTB/RIF (Cepheid, Sunnyvale, CA, USA), an integrated sample processing and nucleic acid amplifi cation test for detection of Mycobacterium tuberculosis and resistance to rifampicin, was assessed in a large multicentre study in adults with suspected tuberculosis.8 One MTB/RIF test accurately detected the presence of tuberculosis in 98·2% of smear-positive and 72·5% of smear-negative tuberculosis cases. Rifampicin resistance was detected with a sensitivity of 99·1% and specifi city of 100%. The test results were available within 100 min of testing, much less time than for results from culture.

WHO have endorsed the use of MTB/RIF as the initial diagnostic test in people suspected of having drug-resistant or HIV-associated tuberculosis,9 but no data are available on its accuracy in children. We prospectively

Articles

www.thelancet.com/lancetgh Vol 1 August 2013 e97

Rapid diagnosis of pulmonary tuberculosis in African children in a primary care setting by use of Xpert MTB/RIF on respiratory specimens: a prospective studyHeather J Zar, Lesley Workman, Washiefa Isaacs, Keertan Dheda, Widaad Zemanay, Mark P Nicol

SummaryBackground In children admitted to hospital, rapid, accurate diagnosis of pulmonary tuberculosis with the Xpert MTB/RIF assay is possible, but no paediatric studies have been done in the primary care setting, where most children are given care, and where microbiological diagnosis is rarely available. We assessed the diagnostic accuracy of Xpert MTB/RIF in children in primary care.

Methods For this prospective study, we obtained repeat induced sputum and nasopharyngeal aspirate specimens from children (<15 years) with suspected pulmonary tuberculosis at a clinic in Khayeliwtsha, Cape Town, South Africa. We compared the diagnostic accuracy of Xpert MTB/RIF with a reference standard of culture and smear microscopy on induced sputum specimens. For the main analysis, specifi city of Xpert MTB/RIF versus liquid culture, we included only children with two interpretable Xpert MTB/RIF and induced sputum culture results.

Findings Between Aug 1, 2010, and July 30, 2012, we enrolled 384 children (median age 38·3 months, IQR 21·2–56·5) who had one paired induced sputum and nasopharyngeal specimen, 309 (81%) of whom had two paired specimens. Five children (1%) tested positive for tuberculosis by smear microscopy, 26 (7%) tested positive by Xpert MTB/RIF, and 30 (8%) tested positive by culture. Xpert MTB/RIF on two induced sputum specimens detected 16 of 28 culture-confi rmed cases (sensitivity of 57·1%, 95% CI 39·1–73·5) and on two nasopharyngeal aspirates detected 11 of 28 culture-confi rmed cases (sensitivity of 39·3, 23·6–57·6; p=0·18). The specifi city of Xpert MTB/RIF on induced sputum was 98·9% (95% CI 96·9–99·6) and on nasopharyngeal aspirates was 99·3% (97·4–99·8).

Interpretation Our fi ndings suggest that Xpert MTB/RIF on respiratory secretions is a useful test for rapid diagnosis of paediatric pulmonary tuberculosis in primary care.

Funding National Institutes of Health, National Health Laboratory Services Research Trust, the Medical Research Council of South Africa, the National Research Foundation South Africa, the European and Developing Countries Clinical Trials Partnership.

IntroductionDiagnosis of pulmonary tuberculosis in children can be challenging because of non-specifi c symptoms, signs, and radiological features.1 Diagnostic uncertainty is compounded by HIV co-infection because tuberculosis can be diffi cult to distinguish from other HIV-associated diseases. Missed diagnosis of paediatric pulmonary tuberculosis is an important problem, with up to 40% of children admitted to hospital with culture-confi rmed tuberculosis being discharged without appropriate treatment because culture confi rmation can take several weeks.2–4 In the community setting, the proportion of missed cases can be even higher because many patients have less severe disease presentation. Rapid microbiological confi rmation of tuberculosis and identifi cation of drug resistance is therefore desirable because it enables defi nitive diagnosis and initiation of appropriate treatment.5

The Xpert MTB/RIF assay (Xpert; Cepheid, CA, USA) has enabled rapid diagnosis of pulmonary tuberculosis and detection of resistance to rifampicin in children

admitted to hospital. We previously reported that Xpert MTB/RIF on two induced sputum specimens correctly identifi ed 44 (76%) of 58 children in hospital with culture-confi rmed disease.6 Similar results have been reported in hospital-based studies of children with suspected tuberculosis in Uganda and Tanzania using sputum or induced sputum specimens.7,8 Findings from a Zambian study showed that Xpert MTB/RIF done on gastric lavage specimens in children admitted to hospital had a sensitivity of 65%, whereas in older children the sensitivity on spontaneously produced sputum was 90%.9 We also reported that Xpert MTB/RIF on two nasopharyngeal aspirates was useful for microbiological confi rmation in children in hospital.10 Although nasopharyngeal aspirates provided a lower culture yield than induced sputum, the sensitivity of two Xpert MTB/RIF tests on nasopharyngeal aspirates was similar to Xpert MTB/RIF on two induced sputum specimens, 71% on induced sputum and 65% on nasopharyngeal aspirate specimens.10

Most children with suspected pulmonary tuberculosis present to primary clinics, but microbiological diagnosis

Lancet Glob Health 2013; 1: e97–104

Copyright © Zar et al. Open Access article distributed under the terms of CC BY

Department of Paediatrics and Child Health (Prof H J Zar PhD, L Workman MPH, W Isaacs MSc), Department of Medicine (Prof K Dheda PhD), and Division of Medical Microbiology, Institute for Infectious Diseases, and Molecular Medicine (W Zemanay PhD, Prof M P Nicol PhD), University of Cape Town, Cape Town, South Africa; Red Cross War Memorial Children’s Hospital, Cape Town, South Africa Prof H J Zar, L Workman, W Isaacs); and the National Health Laboratory Service, Groote Schuur Hospital, Cape Town, South Africa (W Zemanay, Prof M P Nicol)

Correspondence to:Prof Heather J Zar, Department of Paediatrics and Child Health, 5th Floor, ICH Building, Red Cross War Memorial Children’s Hospital Cape Town, South [email protected]

Page 98: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Xpert for Paediatric TB

37

www.thelancet.com/infection Published online July 18, 2011 DOI:10.1016/S1473-3099(11)70167-0 1

Articles

Published OnlineJuly 18, 2011DOI:10.1016/S1473-3099(11)70167-0

See Online/CommentDOI:10.1016/S1473-3099(11)70187-6

Division of Medical Microbiology and Institute for Infectious Diseases and Molecular Medicine (Prof M P Nicol PhD, W Zemanay PhD), Division of Clinical Pharmacology, Department of Medicine (L Workman MSc), and Department of Paediatrics and Child Health (W Isaacs BSc, J Munro, F Black MBBCh, B Eley FCPaed, Prof H J Zar PhD), University of Cape Town, Cape Town, South Africa; National Health Laboratory Service, Groote Schuur Hospital, Cape Town, South Africa (Prof M P Nicol, W Zemanay); Red Cross War Memorial Children’s Hospital, Cape Town, South Africa (W Isaacs, J Munro, F Black, B Eley, Prof H J Zar); and Foundation for Innovative New Diagnostics (FIND), Geneva, Switzerland (C C Boehme MD)

Correspondence to:Prof Mark P Nicol, 5.28 Falmouth, Faculty of Health Sciences, Anzio Road, Observatory, Cape Town 7925, South [email protected]

Accuracy of the Xpert MTB/RIF test for the diagnosis of pulmonary tuberculosis in children admitted to hospital in Cape Town, South Africa: a descriptive studyMark P Nicol, Lesley Workman, Washiefa Isaacs, Jacinta Munro, Faye Black, Brian Eley, Catharina C Boehme, Widaad Zemanay, Heather J Zar

SummaryBackground WHO recommends that Xpert MTB/RIF replaces smear microscopy for initial diagnosis of suspected HIV-associated tuberculosis or multidrug-resistant pulmonary tuberculosis, but no data exist for its use in children. We aimed to assess the accuracy of the test for the diagnosis of pulmonary tuberculosis in children in an area with high tuberculosis and HIV prevalences.

Methods In this prospective, descriptive study, we enrolled children aged 15 years or younger who had been admitted to one of two hospitals in Cape Town, South Africa, with suspected pulmonary tuberculosis between Feb 19, 2009, and Nov 30, 2010. We compared the diagnostic accuracy of MTB/RIF and concentrated, fl uorescent acid-fast smear with a reference standard of liquid culture from two sequential induced sputum specimens (primary analysis).

Results 452 children (median age 19·4 months, IQR 11·1–46·2) had at least one induced sputum specimen; 108 children (24%) had HIV infection. 27 children (6%) had a positive smear result, 70 (16%) had a positive culture result, and 58 (13%) had a positive MTB/RIF test result. With mycobacterial culture as the reference standard, MTB/RIF tests when done on two induced sputum samples detected twice as many cases (75·9%, 95% CI 64·5–87·2) as did smear microscopy (37·9%, 25·1–50·8), detecting all of 22 smear-positive cases and 22 of 36 (61·1%, 44·4–77·8) smear-negative cases. For smear-negative cases, the incremental increase in sensitivity from testing a second specimen was 27·8% for MTB/RIF, compared with 13·8% for culture. The specifi city of MTB/RIF was 98·8% (97·6–99·9). MTB/RIF results were available in median 1 day (IQR 0–4) compared with median 12 days (9–17) for culture (p<0·0001).

Interpretation MTB/RIF testing of two induced sputum specimens is warranted as the fi rst-line diagnostic test for children with suspected pulmonary tuberculosis.

Funding National Institutes of Health, the National Health Laboratory Service Research Trust, the Medical Research Council of South Africa, and Wellcome Trust.

IntroductionDiagnosis of pulmonary tuberculosis in children has relied predominantly on clinical, radiological, and tuberculin skin-test fi ndings.1 However, clinical diagnosis has low specifi city, radiological interpretation is subject to interobserver variability, and the tuberculin skin test is a marker of exposure, not disease.1–3 Microbiological confi rmation with identifi cation of drug resistance is increasingly important in the context of an emerging drug-resistant tuberculosis epidemic. Furthermore, confi rmation is useful in children with HIV, in whom pill burden, drug interactions, and adherence issues make treatment of HIV and tuberculosis diffi cult. Use of repeated induced sputum specimens in children is simple, well tolerated, and eff ective for microbiological confi rmation of pulmonary tuberculosis, even in infants.4,5 One induced sputum specimen provides a similar microbiological yield to three gastric lavage specimens in children admitted to hospital with pulmonary tuberculosis.4

Smear microscopy is typically negative in children with culture-confi rmed tuberculosis, even when optimised fl uorescence microscopy is used on concentrated specimens. 1,6 Mycobacterial culture of induced sputum

specimens is therefore needed, but culture can take weeks and is consequently unavailable to inform clinical decisions on initial treatment. A rapid diagnosis of tuberculosis in children is desirable because delayed diagnosis is associated with poor outcome.7

An urgent need therefore exists for a rapid, sensitive, and specifi c test for tuberculosis and for identifi cation of drug-resistant disease in children. The performance of Xpert MTB/RIF (Cepheid, Sunnyvale, CA, USA), an integrated sample processing and nucleic acid amplifi cation test for detection of Mycobacterium tuberculosis and resistance to rifampicin, was assessed in a large multicentre study in adults with suspected tuberculosis.8 One MTB/RIF test accurately detected the presence of tuberculosis in 98·2% of smear-positive and 72·5% of smear-negative tuberculosis cases. Rifampicin resistance was detected with a sensitivity of 99·1% and specifi city of 100%. The test results were available within 100 min of testing, much less time than for results from culture.

WHO have endorsed the use of MTB/RIF as the initial diagnostic test in people suspected of having drug-resistant or HIV-associated tuberculosis,9 but no data are available on its accuracy in children. We prospectively

Articles

www.thelancet.com/lancetgh Vol 1 August 2013 e97

Rapid diagnosis of pulmonary tuberculosis in African children in a primary care setting by use of Xpert MTB/RIF on respiratory specimens: a prospective studyHeather J Zar, Lesley Workman, Washiefa Isaacs, Keertan Dheda, Widaad Zemanay, Mark P Nicol

SummaryBackground In children admitted to hospital, rapid, accurate diagnosis of pulmonary tuberculosis with the Xpert MTB/RIF assay is possible, but no paediatric studies have been done in the primary care setting, where most children are given care, and where microbiological diagnosis is rarely available. We assessed the diagnostic accuracy of Xpert MTB/RIF in children in primary care.

Methods For this prospective study, we obtained repeat induced sputum and nasopharyngeal aspirate specimens from children (<15 years) with suspected pulmonary tuberculosis at a clinic in Khayeliwtsha, Cape Town, South Africa. We compared the diagnostic accuracy of Xpert MTB/RIF with a reference standard of culture and smear microscopy on induced sputum specimens. For the main analysis, specifi city of Xpert MTB/RIF versus liquid culture, we included only children with two interpretable Xpert MTB/RIF and induced sputum culture results.

Findings Between Aug 1, 2010, and July 30, 2012, we enrolled 384 children (median age 38·3 months, IQR 21·2–56·5) who had one paired induced sputum and nasopharyngeal specimen, 309 (81%) of whom had two paired specimens. Five children (1%) tested positive for tuberculosis by smear microscopy, 26 (7%) tested positive by Xpert MTB/RIF, and 30 (8%) tested positive by culture. Xpert MTB/RIF on two induced sputum specimens detected 16 of 28 culture-confi rmed cases (sensitivity of 57·1%, 95% CI 39·1–73·5) and on two nasopharyngeal aspirates detected 11 of 28 culture-confi rmed cases (sensitivity of 39·3, 23·6–57·6; p=0·18). The specifi city of Xpert MTB/RIF on induced sputum was 98·9% (95% CI 96·9–99·6) and on nasopharyngeal aspirates was 99·3% (97·4–99·8).

Interpretation Our fi ndings suggest that Xpert MTB/RIF on respiratory secretions is a useful test for rapid diagnosis of paediatric pulmonary tuberculosis in primary care.

Funding National Institutes of Health, National Health Laboratory Services Research Trust, the Medical Research Council of South Africa, the National Research Foundation South Africa, the European and Developing Countries Clinical Trials Partnership.

IntroductionDiagnosis of pulmonary tuberculosis in children can be challenging because of non-specifi c symptoms, signs, and radiological features.1 Diagnostic uncertainty is compounded by HIV co-infection because tuberculosis can be diffi cult to distinguish from other HIV-associated diseases. Missed diagnosis of paediatric pulmonary tuberculosis is an important problem, with up to 40% of children admitted to hospital with culture-confi rmed tuberculosis being discharged without appropriate treatment because culture confi rmation can take several weeks.2–4 In the community setting, the proportion of missed cases can be even higher because many patients have less severe disease presentation. Rapid microbiological confi rmation of tuberculosis and identifi cation of drug resistance is therefore desirable because it enables defi nitive diagnosis and initiation of appropriate treatment.5

The Xpert MTB/RIF assay (Xpert; Cepheid, CA, USA) has enabled rapid diagnosis of pulmonary tuberculosis and detection of resistance to rifampicin in children

admitted to hospital. We previously reported that Xpert MTB/RIF on two induced sputum specimens correctly identifi ed 44 (76%) of 58 children in hospital with culture-confi rmed disease.6 Similar results have been reported in hospital-based studies of children with suspected tuberculosis in Uganda and Tanzania using sputum or induced sputum specimens.7,8 Findings from a Zambian study showed that Xpert MTB/RIF done on gastric lavage specimens in children admitted to hospital had a sensitivity of 65%, whereas in older children the sensitivity on spontaneously produced sputum was 90%.9 We also reported that Xpert MTB/RIF on two nasopharyngeal aspirates was useful for microbiological confi rmation in children in hospital.10 Although nasopharyngeal aspirates provided a lower culture yield than induced sputum, the sensitivity of two Xpert MTB/RIF tests on nasopharyngeal aspirates was similar to Xpert MTB/RIF on two induced sputum specimens, 71% on induced sputum and 65% on nasopharyngeal aspirate specimens.10

Most children with suspected pulmonary tuberculosis present to primary clinics, but microbiological diagnosis

Lancet Glob Health 2013; 1: e97–104

Copyright © Zar et al. Open Access article distributed under the terms of CC BY

Department of Paediatrics and Child Health (Prof H J Zar PhD, L Workman MPH, W Isaacs MSc), Department of Medicine (Prof K Dheda PhD), and Division of Medical Microbiology, Institute for Infectious Diseases, and Molecular Medicine (W Zemanay PhD, Prof M P Nicol PhD), University of Cape Town, Cape Town, South Africa; Red Cross War Memorial Children’s Hospital, Cape Town, South Africa Prof H J Zar, L Workman, W Isaacs); and the National Health Laboratory Service, Groote Schuur Hospital, Cape Town, South Africa (W Zemanay, Prof M P Nicol)

Correspondence to:Prof Heather J Zar, Department of Paediatrics and Child Health, 5th Floor, ICH Building, Red Cross War Memorial Children’s Hospital Cape Town, South [email protected]

~10-20% culture+ but ~50% treated for TB

Page 99: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Latent Class Analysis

38

Measured characteristics, test results & treatment responseMeasured characteristics Latent variables

CXR

Smear

Culture

Clinical Diagnosis (based on ‘Point Scoring’,Diagnostic Classificationor algorithm)

Signs & Symptoms*

Householdexposure (reported)

TB disease

Xpert

TST/IGRA

Notes: (1) Several relationships that are also modified by the HIV-status, age, nutritional status or other variables and this will need more detailed explanation, exploratory or sensitivity analyses.(2) The presence/absence of certain relationships may also be explored in the data by model comparison or as part of sensitivity analyses.

other (non-TB) respiratory disease

(1) HIV(2) Malnutrition(3) Age(4) Parental smoking(5) SES(6) Sex

r(i)

exposure to TB

Response to ATT

Response to antibiotics

Page 100: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Other options• improve upon simple reference standard ‣ composite reference standard ‣ panel diagnosis

• look downstream of “diagnosis” ‣ look at treatment decisions ‣ look at final outcomes

• decision-analytic modelling ‣ based on evidence/assumptions about accuracy of

index test and baseline test

39

Page 101: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Conclusions

40

Page 102: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Conclusions

• incremental value: first step towards impact

• “binary test case” is simple: should always be done

• “multivariable test case” based on diagnostic prediction models: consider DCA/NB

• imperfect reference standard

41

Page 103: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

ADDITIONAL SLIDES

42

Page 104: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

Page 105: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

Page 106: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

Page 107: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

Page 108: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

none!radical

Page 109: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

none!radical

Page 110: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

none!radicalall !

radical

Page 111: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

none!radicalall !

radical

treat even if P(D) is low

Page 112: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

grade

multivariable model

grade + stage

Prediction of Seminal Vesicle Invasion at Radical Prostatectomy

Example of Decision Curve Analysis

43

none!radicalall !

radical

treat even if P(D) is low

only treat if P(D) very high

Page 113: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Two paradigms for IV

44

ID NCD

Goal of testing diagnosis (diagnosis) & prognosis

Outcome high and acute risk of mortality if untreated

risk of poor outcome usually not acute and sometimes not high

Treatment very good risk/benefit profile moderate risk/benefit profile

Reference standard

usually some kind of culture (or serology / molecular)

usually long-term follow-up or invasive testing

Management decision based on single test based on many parameters

Page 114: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Where the classic ID paradigm does not fit

45

Infectious diseaes Latent TB

Goal of testing diagnosis prognosis

Outcome acute and high risk of mortality if untreated

risk of poor outcome usually not acute and sometimes not high

Treatment very good risk/benefit profile moderate risk/benefit profile

Reference standard

usually some kind of culture (or serology / molecular) long-term follow-up

Management decision based on single test based on many parameters

PREDICTION !RESEARCH

Page 115: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Where the classic ID paradigm does not fit

46

Infectious diseaes Childhood TB

Goal of testing diagnosis diagnosis/prognosis

Outcome acute and high risk of mortality if untreated

acute and high risk of mortality if untreated

Treatment very good risk/benefit profile very good risk/benefit profile

Reference standard

usually some kind of culture (or serology / molecular) ?

Management decision based on single test based on many parameters

PREDICTION !RESEARCH

Page 116: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Where the classic ID paradigm does not fit

47

Infectious diseaes Extrapulmonary TB

Goal of testing diagnosis diagnosis/prognosis

Outcome acute and high risk of mortality if untreated

acute and high risk of mortality if untreated

Treatment very good risk/benefit profile very good risk/benefit profile

Reference standard

usually some kind of culture (or serology / molecular) ?

Management decision based on single test based on many parameters

PREDICTION !RESEARCH

Page 117: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Where the classic ID paradigm does not fit

48

Infectious diseaes HIV-associated TB

Goal of testing diagnosis diagnosis/prognosis

Outcome acute and high risk of mortality if untreated

acute and high risk of mortality if untreated

Treatment very good risk/benefit profile very good risk/benefit profile

Reference standard

usually some kind of culture (or serology / molecular) ?

Management decision based on single test based on many parameters

PREDICTION !RESEARCH

Page 118: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Prediction Models & Incremental Value Metrics

1. Model development

2. Model assessment • discrimination (e.g.

AUC) • calibration (e.g.

calibration plot)

3. Internal validation

49

4. Assess incremental value • ∆ AUC & IDI • reclassification table & NRI • DCA & NB

5. Next steps • external validation • model updating • impact assessment

Page 119: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

How to integrate multiple parameters into treatment decision• ideally: personalized management by highly-trained specialist based on any

available data on patient, perhaps supported by a validated prediction model

• reality: based on algorithms (essentially a decision-rule) • based on expert opinion rather than empiric evidence? • either simple algorithms or simple scoring rule

• Opportunities • prediction models may offer improvement in correct classification • may enable lower-level HC workers to diagnose and treat

• Challenges • for diseases where multiple measures are used there is usually no good

reference standard but long-term f/u is not practical in low-resource settings • potentially challenging to implement

50

Page 120: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Where the classic ID paradigm does not fit

• Latent TB infection • no detection of agent: prognosis more relevant than

“diagnosis” • no gold standard test available

• Childhood TB • no detection of agent for majority: predicting

prognosis rather than making diagnosis is relevant • no gold standard test available

• HIV-associated TB

51

Page 121: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

52

than 1. This implies a threshold for treatment below 50%. An important issue is that the sensitivity and specificity of a decision rule in clinical practice is not only influ-enced by the quality of the prediction model, but also by the adherence of clinicians to the rule. Validation of a prediction model may indicate the efficacy of a rule (the maximum that can be attained with 100% adherence), but impact analysis will indicate the effectiveness in practice.

Clinicians may choose to overrule the decision rule, which may improve sensitiv-ity and/or specificity. But overruling may also dilute the effects of the rule. 56 There may be various barriers to the clinical use of decision rules. Barriers include issues in attitude such as skepticism of guidelines (in general and with respect to the specific rule), questions on the clinical sensibility of the rule, too high confidence in clinical judgment, fear of medicolegal risks, concern that important factors are not addressed by the decision rule, concern on patient safety, and practical issues such as availability of the rule at the time of decision making, and easy of use.

Table 16.6 Developing and evaluating clinical prediction models and decision rules (based on Reilly and Evans 344 )

Level of evidence Definitions and standards of evaluation Clinical implications

Level 1 Derivation of prediction

model Identification of predictors

for multivariable model; blinded assessment of outcomes

Needs validation and further evaluation before using in actual patient care

Level 2 Narrow validation of predic-

tion model Assessment of predictive

ability when tested pro-spectively in one setting; blinded assessment of outcomes

Needs validation in varied set-tings; may use predictions cautiously in patients simi-lar to sample studied

Level 3 Broad validation of prediction

model Assessment of predictive abil-

ity in varied settings with wide spectrum of patients and physicians

Needs impact analysis; may use predictions with confi-dence in their accuracy

Level 4 Narrow impact analysis of

prediction model used as decision rule

Prospective demonstration in one setting that use of decision rule improves physicians’ decisions (quality or cost-effectiveness of patient care)

May use cautiously to inform decisions in settings simi-lar to that studied

Level 5 Broad impact analysis of pre-

diction model used as decision rule

Prospective demonstration in varied settings that use of decision rule improves physicians’ decisions for wide spectrum of patients

May use in varied settings with confidence that its use will benefit patient care quality or effectiveness

16.3 From Prediction Models to Decision Rules 293

Page 122: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Incremental Value Metrics

53

Page 123: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

AUC & IDI

• AUC: probability of correct classification for a pair of patients with and without the outcome

• IDI: sum of the average increase in predicted probability among D+ and average decrease in predicted probability among D- (related to ∆AUC); equivalent to the increase in mean sensitivity given no changes in specificity

54

Page 124: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

NRI

Net fraction of reclassifications in the right direction by making decisions based on predictions with the marker compared to decisions without the marker; default weights are by prevalence of disease.

55

Page 125: Diagnostic Prediction Models & Incremental value · 2014-09-16 · Diagnostic Prediction Models & Incremental value Samuel G. Schumacher McGill University. ... Andrew Ramsay, Jane

Decision Curve Analysis

56

cancer to examine 3 models that predict recurrencewithin 5 years: 1 based on a multivariable model(“model”) and 2 clinical rules: “assume that any manwith Gleason grade 8 or above will recur” (“Gleasonrule”) and “assume that any man with Gleason grade8 or above, or stage 2 or above, will recur” (“stagerule”). We assess these models to determine how well they aid the decision to undergo or not undergoadjuvant hormonal therapy.

Figure 4 shows the decision curves. Althoughcancer recurrence is a serious event, the benefit ofadjuvant hormonal therapy is somewhat unclear,and the drugs have important side effects such ashot flashes, decreased libido, and fatigue. Due to dif-ferences in opinion about the value of hormonaltherapy, as well as differences between patients inthe importance attached to side effects, the probabil-ity used as a threshold to determine hormonal ther-apy varies from case to case. A typical range of pts is30% to 60%—that is, if you took a group of clini-cians and patients and documented the probabilityof recurrence at which the clinician would advisehormonal therapy, this would vary between 30%and 60%. For much of this range, the simpleGleason rule is comparable to the multivariablemodel, even though it has far less discriminativeaccuracy (e.g., AUC 0.56 compared to 0.73).Moreover, the Gleason rule is much better than the

stage rule in the key 30% to 60% range even thoughit has a lower AUC (0.56 v. 0.58). This is no doubtbecause the stage rule is highly sensitive and theGleason rule more specific. But although sensitivityand specificity give a general indication as to whichtest is superior in which situation, decision curveanalysis can delineate precisely the conditionsunder which each test should be preferred.

As a further illustration, Figures 5 and 6 show thedecision curves for some theoretical distributions.Note that the results of the decision curve analysisaccord well with our expectations. For example, asensitive predictor is superior to a specific predictorwhere pt is low—that is, where the harm of a falsenegative is greater than the harm of a false positive.The situation is reversed at high pt: the curves for thesensitive and specific predictor cross near the inci-dence; a near-perfect predictor is of value exceptwhere pt is close to 1, that is, where the patient or theclinician has to be nearly certain before taking action.A predictor that is 2 standard deviations higher inpatients who have the event is superior across nearlythe full range of pt.

DISCUSSION

Given the exponential increase in molecular mark-ers in medicine and the integration of information

DECISION CURVE ANALYSIS TO EVALUATE PREDICTION MODELS

METHODOLOGY 571

Figure 4 Decision curve for prediction of recurrence after surgeryfor prostate cancer. Thin line: assume no patient will recur. Dottedline: assume all patients will recur. Long dashes: binary decisionrule based on cancer grade (“Gleason rule”). Grey line: binary deci-sion rule based on both grade and stage of cancer (“stage rule”).Solid line: multivariable prediction model. The graph gives theexpected net benefit per patient relative to no hormonal therapy for any patient (“treat none”). The unit is the benefit associatedwith 1 patient who would recur without treatment and who receiveshormonal therapy.

Figure 5 Decision curve for a theoretical distribution. In thisexample, disease incidence is 20%. Thin line: assume no patienthas disease. Dotted line: assume all patients have disease. Thickline: a perfect prediction model. Grey line: a near-perfect binarypredictor (99% sensitivity and 99% specificity). Solid line: a sen-sitive binary predictor (99% sensitivity and 50% specificity).Dashed line: a specific binary predictor (50% sensitivity and 99%specificity).

at MCGILL UNIVERSITY LIBRARIES on November 10, 2012mdm.sagepub.comDownloaded from

perfect model

treat nonetreat all

99% sens 99% spec

50% sens 99% spec

99% sens 50% spec

treat even if P(D) is low

only treat if P(D) very high