mrcpsych08 - how to analyse diagnostic test studies (june08)

Critical Appraisal of Diagnostic Test Studies

Alex J MitchellConsultant in Liaison PsychiatryUniversity of Leicester

MRCPsych Teaching 2008

Contents

• Importance of understanding diagnostic tests

• Statistics of diagnostic validity

• Examples

Importance of understanding diagnostic tests

What Is a Diagnostic Test in Psychiatry?

• CT/MRI• CSF• Blood tests eg TFTs• SCAN/SCID/PSE/MINI• Neuropsychological Testing• MMSE• HADS/BDI/CESD?• Clinical Judgement• Self-report

Why Is a HADS score not a diagnosis?

Why Is a HADS score not a diagnosis?

1. No core features2. No symptom ranking3. No functional assessment4. Duration unclear5. What if Missing items?6. Imprecise

Defining Diagnostic Testing• INTENTION• Screening

– The systematic application of a test or inquiry, to identify individuals at sufficient risk of a specific disorder to warrant further actions among those who have not sought medical help for that disorder

• Case-Finding– The selected application of a test or inquiry, to identify individuals with a suspected disorder

and exclude those without a disorder, usually in those who have sought medical help for that disorder

• APPLICATION• Targeted (High Risk)

– The highly selected application of a test or inquiry, to identify individuals at high risk of a specific disorder by virtue of known risk factors

• Routine Screening– The systematic application of a test or inquiry, to individuals without a known disorder (or who

have not sought medical help for that disorder)

Adapted from Department of Health. Annual report of the national screening committee. London: DoH, 1997.

Aims of Detection

• Screening:– Short; Easy; some false +ve (low SpS PPV), few false

–ve (High Sens, NPV)

• Diagnosis (case-finding)– Accurate, Few false +ve or –ve

• Rating– Simple, patient rated, correl. With QoL and other

outcomes

UK National Screening Committee Guidelines

• The condition should:• • Be an important health issue• • Have a well-understood history, with a

detectable risk factor or disease marker• • Have cost-effective primary preventions

implemented.

• The screening tool should:• • Be a valid tool with known cut-off• • Be acceptable to the public• • Have agreed diagnostic procedures.

• The treatment should:• • Be effective, with evidence of benefits of

early intervention• • Have adequate resources• • Have appropriate policies as to who should

be treated.

• The screening program should:• • Show evidence that benefits of screening

outweighing risks• • Be acceptable to public and professionals• • Be cost effective (and have ongoing

evaluation)• • Have quality-assurance strategies in place.• Adapted from: UK National Screening

Committee Criteria for appraising the viability, effectiveness and appropriateness of a screening programme

• http://www.nsc.nhs.uk/pdfs/criteria.pdf

In this last step the screening tool /method is introduced clinically but monitored to discover the effect on important patient outcomes such as new identifications, new cases treated and new cases entering remission.

Screening implementation studies using real-world outcomes

ImplementationPhase IV_screen

This is an important step in which the tool is evaluated clinically in one group with access to the new method compared to a second group (ideally selected in a randomized fashion) who make assessments without the tool.

Screening RCT; clinicians using vs not using a screening tool

ImplementationPhase III_screen

The aim is to assess the refined tool against a criterion (gold standard) in a real world sample where the comparator subjects may comprise several competing condition which may otherwise cause difficulty regarding differential diagnosis.

Diagnostic validity in a representative sample

Diagnostic validityPhase II_screen

The aim is to evaluate the early design of the screening method against a known (ideally accurate) standard known as the criterion reference. In early testing the tool may be refined, selecting most useful aspects and deleting redundant aspects in order to make the tool as efficient (brief) as possible whilst retaining its value.

Early diagnostic validity testing in a selected sample and refinement of tool

Diagnostic validityPhase I_screen

Here the aim is to develop a screening method that is likely to help in the detection of the underlying disorder, either in a specific setting or in all setting. Issues of acceptability of the tool to both patients and staff must be considered in order for implementation to be successful.

Development of the proposed tool or test

DevelopmentPre-clinical

DescriptionPurposeTypeStage

Theory of Diagnostic Tests

Non-Depressed

Depressed# ofIndividuals

TestResult

Cut-off value

False +veFalse -ve

True -ve

True +ve

Low Prevalence (Se Sp = same)

Non-Depressed

Mj Depression# ofIndividuals

TestResult

Cut-off value

False +veLARGE

False –veSMALL

High Prevalence (Se Sp = same)

Non-Depressed Mj+Mn Depression

# ofIndividuals

TestResult

Cut-off value

False +veSMALL

False –veLARGE

Accuracy 2x2 Table

PrevalenceSpecificitySensitivity

NPVTrue -VeFalse -VeTest -ve

PPVFalse +veTrue +veTest +ve

DepressionABSENT

DepressionPRESENT D / B + D

SpA / A + C

SnTotal

D/C + DNPV DC

Test-ve

A/A + BPPV BA

Test+ve

Reference StandardNo Disorder

Reference StandardDisorder Present

Can This Help establish a syndrome?

Example: A Clear Disease [#1]

Disorder

Number ofIndividuals

False +veFalse +ve

True -veTrue -ve

Point of Partial Rarity

Test Result

No Disorder

False -veFalse -ve

True +veTrue +ve

Example: A Probable Syndrome [#2]

Disorder


False +veFalse +ve False -veFalse -ve

True -veTrue -ve

True +veTrue +ve

MMSE Cognitive Score

No Disorder

Example: A Normally Distributed Trait [#3]

Disorder


False +veFalse +ve False -veFalse -ve

True -veTrue -ve

True +veTrue +ve

MMSE Cognitive Score

No Disorder

Example: Dementia

Disease?Syndrome?Trait?

Hubbert et al (2005) BMC Geriatrics

MMSE scores for dementia (n=72)and non-dementia (n=2735)

Huppert et al BMC Geriatrc 2005

Example: Depression

DiseaseSyndromeTrait

Mitchell, Coyne et al (2008)

0

10

20

30

40

50

60

70

80

90

100

110

Early Pregnancy3months Post-Partum12months Post-Partum

Scores on the CES-D during Pregnancy, 3 and 12 months Post-partum in 947 Women

Depressive Symptoms Moderate to Severe DepressionHealthy Mild Depression

PHQ9 Linear distribution

0

5

10

15

20

25

30

35

Zero One Two

Three

Four

Five Six

Seven

Eight

Nine

TenElev

enTwelveThir

teen

Fourte

enFifte

enSixt

een

Sevente

enEigh

teen

PHQ9 (Major Depression)PHQ9 (Minor Depression)PHQ9 (Non-Depressed)

Baker-Glen, Mitchell et al (2008)

Thompson et al (2001) n=18,414

0

500

1000

1500

2000

2500

3000

Zero One

TwoThree Four

Five SixSev

en

eight

Nine

TenEleve

nTwelv

eThirt

een

Fourtee

nFifte

enSixtee

nSev

entee

nEightee

n

Statistics of diagnostic tests

Basic Measures of Accuracy• Sensitivity (Se) a/(a + c)• A measure of accuracy defined the proportion of patients with disease in whom

the test result is positive: a/(a + c)

• Specificity (Sp) d/(b + d)• A measure of accuracy defined as the proportion of patients without disease in

whom the test result is negative

• Positive Predictive Value a/(a+b)• A measure of rule-in accuracy defined as the proportion of true positives in

those that screen positive screening result, as follows

• Negative Predictive Value c/(c+d)• A measure of rule-out accuracy defined as the proportion of true negatives in

those that screen negative screening result, as follows

Summary Measures

• Youden's J– Sensitivity + Specificity – 1

• Predictive Summary Index– PPV + NPV – 1

• Overall accuracy– TP+TN / TP+FP+TN+FN

Reciprocal Measures• Number Needed to Diagnose (NND)

– 1 / (Youden's J)

• Number Needed to Predict (NNP)– 1 / (PSI)

• Number Needed to Screen (NNS)– 1/(FC-FiC)

Murphy JM, Berwick DM, Weinstein MC, Borus JF, Budman SH, Klerman GL 1987 : Performance of screening and diagnostic tests: Application of Receiver Operating Characteristic ROC analysis. Arch Gen Psychiatry 44:550-555

Accuracy 2x2 Table

PrevalenceSpecificitySensitivity

NPVTrue -VeFalse -VeTest -ve

PPVFalse +veTrue +veTest +ve

DepressionABSENT

DepressionPRESENT

Test vs Major Depression

700060001000

50004500500Test -ve

20001500500Test +ve

DepressionABSENT

DepressionPRESENT

Sensitivity50%

PPV 33%

Specificity75%

NPV 90%

Prevalence 14%

Test vs Major + Min Depression

300020001000

1000500500Test -ve

20001500500Test +ve

DepressionABSENT

DepressionPRESENT

Sensitivity50%

PPV 33%

Specificity33%

NPV 50%

Prevalence 33%

Added Value

• Definition 1:– The additional ability of a test to rule-in or rule-

out compared with the baseline rate– PPV minus Prevalence– NPV minus prevalence

• Definition 2:– The additional of a test to rule-in or rule-out

compared with the unassisted rate

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Loss

of

ener

gy

Dim

inis

hed

driv

e

Slee

p di

stur

banc

e

Con

cent

rati

on/i

ndec

isio

n

Dep

ress

ed m

ood

Anx

iety

Dim

inis

hed

conc

entr

atio

n

Inso

mni

a

Dim

inis

hed

inte

rest

/ple

asur

e

Psyc

hic

anxi

ety

Hel

ples

snes

s

Wor

thle

ssne

ss

Hop

eles

snes

s

Som

atic

anx

iety

Tho

ught

s of

dea

th

Ang

er

Exce

ssiv

e gu

ilt

Psyc

hom

otor

cha

nge

Inde

cisi

vene

ss

Dec

reas

ed a

ppet

ite

Psyc

hom

otor

agi

tati

on

Psyc

hom

otor

ret

arda

tion

Dec

reas

ed w

eigh

t

Lack

of

reac

tive

moo

d

Incr

ease

d ap

peti

te

Hyp

erso

mni

a

Incr

ease

d w

eigh

t

All Case ProportionDepressed ProportionNon-Depressed Proportion

Mitchell, Zimmerman et al MIDAS Database. Psychol Med 2007 Submitted

-0.10

0.00

0.10

0.20

0.30

0.40

0.50A

nger

Anx

iety

Dec

reas

ed a

ppet

ite

Dec

reas

ed w

eigh

t

Dep

ress

ed m

ood

Dim

inis

hed

conc

entr

atio

n

Dim

inis

hed

driv

eD

imin

ishe

d in

tere

st/p

leas

ure

Exce

ssiv

e gu

ilt

Hel

ple

ssne

ss

Hop

eles

snes

s

Hyp

erso

mni

a

Incr

ease

d ap

peti

te

Incr

ease

d w

eigh

t

Inde

cisi

vene

ss

Inso

mni

aLa

ck o

f re

acti

ve m

ood

Loss

of

ener

gy

Psyc

hic

anxi

ety

Psyc

hom

otor

agi

tati

on

Psyc

hom

otor

cha

nge

Psyc

hom

otor

ret

arda

tion

Slee

p di

stur

banc

e

Som

atic

anx

iety

Thou

ghts

of

deat

h

Wor

thle

ssne

ss

Rule-In Added Value (PPV-Prev)Rule-Out Added Value (NPV-Prev)

Accuracy of Tests: Visual

0% 100%25% 75%

Very unlikely Very likelylikelyunlikely

2 Questions

Overall

PHQ-2

WHO5 (1+3)

1 Question3% - (37) - 63% = 60%

3% - (16) - 32% = 29%

3% - (16) - 32% = 29%

10% - (22) -50% = 54%

32% - (37) - 96% = 64%

Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci

CIDI (computer) Any Depression

Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci

CIDI (computer) Any Depression

Arroll B et al (2003) BMJ

CIDI (computer) Mj Depression

CIDI (computer) Mj Depression

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre-test Probability

Pos

t-tes

t Pro

babi

lity

Clinician Positive (Fallowfield et al, 2001)

Clinician Negative (Fallowfield et al, 2001)

Baseline Probability

HADS-D Positive (Mata-analysis)

HADS-D Negative (Meta-analysis)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre-test Probability

Post

-test

Pro

babi

lity

Depression Present (Routine)

Depression Absent (Routine)

Depression Scales +ve (Median)

Depression Scales -ve (Median)

Prior Probability

PPV=0.41

NPV=0. 97

Prevalence of 0.15

Worked Examples of diagnostic tests

PostStroke Mj Depression vs NonMj

• Clinicians diagnosis using DSMIV vs SCAN/PSE

• Using the SCAN:• 50 people with major depression • 150 healthy people• 50 with minor depression

Clinicians using DSMIV• Clinicians diagnosed 52 cases with Mj depression• The specificity of DSMIV was 95%

• Q. What was the sensitivity?• Q. What was the prevalence?• Q. What was the PPV?• Q. What was the % correctly identified per every

100 screened?

Test vs Major Depression

20050

??Test -ve

52??Test +ve(Clinician)

DepressionABSENT

DepressionOn SCAN

Sensitivity50%

PPV ??%

Specificity95%

NPV ??%

Prevalence ??%

1.301.271.1785.600.910.680.960.810.951902000.844250DSMIV algorithm

4.6151.9551.200.720.130.840.380.861722000.341750Anger

46.92502.5539.200.660.040.800.220.821642000.201050Poor orientation

11.937.697.3513.600.480.140.840.250.571142000.562850Poor concentration

7.32501.7158.400.790.010.800.330.981962000.04250Suicidal thoughts

2.452.561.6062.400.780.270.880.530.891782000.502550Poor appetite

3.932.632.7236.800.610.250.900.350.681362000.703550Insomnia

6.0112.50-2.23-44.800.100.210.950.210.10202000.984950Low energy

3.902.503.57280.550.270.920.330.601202000.804050Loss of drive

1.961.351.5863.200.770.500.990.520.781562000.964850Loss of interest

1.411.221.2083.200.900.660.970.740.921842000.904550Persistent low mood

NNPNNDNNSIdentification Index

Negative Utility Index

Positive Utility Index

NPVPPVSpecificity

Non Depressed Stroke Patient withoutsymptom

No Post-Stroke Depression by reference standard

Sensitivity

Post-Stroke Depression withsymptom

Post-Stroke Depression by reference standard

Symptoms

Advanced Techniques

sROCReal World NumbersNND; NNSEconomics

NNS= 1/Idemtification Index

Number needed to ScreenRequires application of criterion (gold) standard)

Measures real number of correct identifications vs misidentificationsCan be easily converted into a percentage

TP+TN / TP+FP+TN+FNOverall Accuracy (Fraction Correct)

NNP = 1/PSINumber Needed to Predict

Dependent of prevalencePlaces equal weight on rule-in and rule-out accuracy

Measures gainClinically applicable

PPV + NPV – 1Predictive Summary Index

NND = 1/YoudenNumber Needed to Diagnose

Requires application of criterion (gold) standard)Does not assess ratio of false positives to negatives

Relatively independent of prevalenceNot clinically interpretable

sensitivity + specificity – 1Youden Index

Reciprocal Absolute Benefit Formula

Reciprocal Absolute Benefit

WeaknessStrengthBasic FormulaMeasure

PPV DT Distress = 55%; PPV Other Methods 65%

ROC Plot

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

1 - Specifity

Sens

itivi

ty Low Mood

DSMIV

Low mood & loss interest

mrcpsych08 - how to analyse diagnostic test studies (june08)

Economy & Finance

hads score

ctmri csf blood tests

contents importance

mrcpsych teaching

clinical judgement selfreport

core features

functional assessment

duration unclear