mrcpsych08 - how to analyse diagnostic test studies (june08)
DESCRIPTION
This is an educational talk/presentation on the science of diagnostic tests using examples from psychiatry. It was first presented for MRCPsych (Royal College of Psychiatrists UK) June 2008. Now updated in 2009...see newer versionTRANSCRIPT
Critical Appraisal of Diagnostic Test Studies
Alex J MitchellConsultant in Liaison PsychiatryUniversity of Leicester
MRCPsych Teaching 2008
Contents
• Importance of understanding diagnostic tests
• Statistics of diagnostic validity
• Examples
Importance of understanding diagnostic tests
What Is a Diagnostic Test in Psychiatry?
• CT/MRI• CSF• Blood tests eg TFTs• SCAN/SCID/PSE/MINI• Neuropsychological Testing• MMSE• HADS/BDI/CESD?• Clinical Judgement• Self-report
Why Is a HADS score not a diagnosis?
Why Is a HADS score not a diagnosis?
1. No core features2. No symptom ranking3. No functional assessment4. Duration unclear5. What if Missing items?6. Imprecise
Defining Diagnostic Testing• INTENTION• Screening
– The systematic application of a test or inquiry, to identify individuals at sufficient risk of a specific disorder to warrant further actions among those who have not sought medical help for that disorder
• Case-Finding– The selected application of a test or inquiry, to identify individuals with a suspected disorder
and exclude those without a disorder, usually in those who have sought medical help for that disorder
• APPLICATION• Targeted (High Risk)
– The highly selected application of a test or inquiry, to identify individuals at high risk of a specific disorder by virtue of known risk factors
• Routine Screening– The systematic application of a test or inquiry, to individuals without a known disorder (or who
have not sought medical help for that disorder)
Adapted from Department of Health. Annual report of the national screening committee. London: DoH, 1997.
Aims of Detection
• Screening:– Short; Easy; some false +ve (low SpS PPV), few false
–ve (High Sens, NPV)
• Diagnosis (case-finding)– Accurate, Few false +ve or –ve
• Rating– Simple, patient rated, correl. With QoL and other
outcomes
UK National Screening Committee Guidelines
• The condition should:• • Be an important health issue• • Have a well-understood history, with a
detectable risk factor or disease marker• • Have cost-effective primary preventions
implemented.
• The screening tool should:• • Be a valid tool with known cut-off• • Be acceptable to the public• • Have agreed diagnostic procedures.
• The treatment should:• • Be effective, with evidence of benefits of
early intervention• • Have adequate resources• • Have appropriate policies as to who should
be treated.
• The screening program should:• • Show evidence that benefits of screening
outweighing risks• • Be acceptable to public and professionals• • Be cost effective (and have ongoing
evaluation)• • Have quality-assurance strategies in place.• Adapted from: UK National Screening
Committee Criteria for appraising the viability, effectiveness and appropriateness of a screening programme
• http://www.nsc.nhs.uk/pdfs/criteria.pdf
In this last step the screening tool /method is introduced clinically but monitored to discover the effect on important patient outcomes such as new identifications, new cases treated and new cases entering remission.
Screening implementation studies using real-world outcomes
ImplementationPhase IV_screen
This is an important step in which the tool is evaluated clinically in one group with access to the new method compared to a second group (ideally selected in a randomized fashion) who make assessments without the tool.
Screening RCT; clinicians using vs not using a screening tool
ImplementationPhase III_screen
The aim is to assess the refined tool against a criterion (gold standard) in a real world sample where the comparator subjects may comprise several competing condition which may otherwise cause difficulty regarding differential diagnosis.
Diagnostic validity in a representative sample
Diagnostic validityPhase II_screen
The aim is to evaluate the early design of the screening method against a known (ideally accurate) standard known as the criterion reference. In early testing the tool may be refined, selecting most useful aspects and deleting redundant aspects in order to make the tool as efficient (brief) as possible whilst retaining its value.
Early diagnostic validity testing in a selected sample and refinement of tool
Diagnostic validityPhase I_screen
Here the aim is to develop a screening method that is likely to help in the detection of the underlying disorder, either in a specific setting or in all setting. Issues of acceptability of the tool to both patients and staff must be considered in order for implementation to be successful.
Development of the proposed tool or test
DevelopmentPre-clinical
DescriptionPurposeTypeStage
Theory of Diagnostic Tests
Non-Depressed
Depressed# ofIndividuals
TestResult
Cut-off value
False +veFalse -ve
True -ve
True +ve
Low Prevalence (Se Sp = same)
Non-Depressed
Mj Depression# ofIndividuals
TestResult
Cut-off value
False +veLARGE
False –veSMALL
High Prevalence (Se Sp = same)
Non-Depressed Mj+Mn Depression
# ofIndividuals
TestResult
Cut-off value
False +veSMALL
False –veLARGE
Accuracy 2x2 Table
PrevalenceSpecificitySensitivity
NPVTrue -VeFalse -VeTest -ve
PPVFalse +veTrue +veTest +ve
DepressionABSENT
DepressionPRESENT D / B + D
SpA / A + C
SnTotal
D/C + DNPV DC
Test-ve
A/A + BPPV BA
Test+ve
Reference StandardNo Disorder
Reference StandardDisorder Present
Can This Help establish a syndrome?
Example: A Clear Disease [#1]
Disorder
Number ofIndividuals
False +veFalse +ve
True -veTrue -ve
Point of Partial Rarity
Test Result
No Disorder
False -veFalse -ve
True +veTrue +ve
Example: A Probable Syndrome [#2]
Disorder
Number ofIndividuals
False +veFalse +ve False -veFalse -ve
True -veTrue -ve
True +veTrue +ve
MMSE Cognitive Score
No Disorder
Example: A Normally Distributed Trait [#3]
Disorder
Number ofIndividuals
False +veFalse +ve False -veFalse -ve
True -veTrue -ve
True +veTrue +ve
MMSE Cognitive Score
No Disorder
Example: Dementia
Disease?Syndrome?Trait?
Hubbert et al (2005) BMC Geriatrics
MMSE scores for dementia (n=72)and non-dementia (n=2735)
Huppert et al BMC Geriatrc 2005
Example: Depression
DiseaseSyndromeTrait
Mitchell, Coyne et al (2008)
0
10
20
30
40
50
60
70
80
90
100
110
Early Pregnancy3months Post-Partum12months Post-Partum
Scores on the CES-D during Pregnancy, 3 and 12 months Post-partum in 947 Women
Depressive Symptoms Moderate to Severe DepressionHealthy Mild Depression
PHQ9 Linear distribution
0
5
10
15
20
25
30
35
Zero One Two
Three
Four
Five Six
Seven
Eight
Nine
TenElev
enTwelveThir
teen
Fourte
enFifte
enSixt
een
Sevente
enEigh
teen
PHQ9 (Major Depression)PHQ9 (Minor Depression)PHQ9 (Non-Depressed)
Baker-Glen, Mitchell et al (2008)
Thompson et al (2001) n=18,414
0
500
1000
1500
2000
2500
3000
Zero One
TwoThree Four
Five SixSev
en
eight
Nine
TenEleve
nTwelv
eThirt
een
Fourtee
nFifte
enSixtee
nSev
entee
nEightee
n
Statistics of diagnostic tests
Basic Measures of Accuracy• Sensitivity (Se) a/(a + c)• A measure of accuracy defined the proportion of patients with disease in whom
the test result is positive: a/(a + c)
• Specificity (Sp) d/(b + d)• A measure of accuracy defined as the proportion of patients without disease in
whom the test result is negative
• Positive Predictive Value a/(a+b)• A measure of rule-in accuracy defined as the proportion of true positives in
those that screen positive screening result, as follows
• Negative Predictive Value c/(c+d)• A measure of rule-out accuracy defined as the proportion of true negatives in
those that screen negative screening result, as follows
Summary Measures
• Youden's J– Sensitivity + Specificity – 1
• Predictive Summary Index– PPV + NPV – 1
• Overall accuracy– TP+TN / TP+FP+TN+FN
Reciprocal Measures• Number Needed to Diagnose (NND)
– 1 / (Youden's J)
• Number Needed to Predict (NNP)– 1 / (PSI)
• Number Needed to Screen (NNS)– 1/(FC-FiC)
Murphy JM, Berwick DM, Weinstein MC, Borus JF, Budman SH, Klerman GL 1987 : Performance of screening and diagnostic tests: Application of Receiver Operating Characteristic ROC analysis. Arch Gen Psychiatry 44:550-555
Accuracy 2x2 Table
PrevalenceSpecificitySensitivity
NPVTrue -VeFalse -VeTest -ve
PPVFalse +veTrue +veTest +ve
DepressionABSENT
DepressionPRESENT
Test vs Major Depression
700060001000
50004500500Test -ve
20001500500Test +ve
DepressionABSENT
DepressionPRESENT
Sensitivity50%
PPV 33%
Specificity75%
NPV 90%
Prevalence 14%
Test vs Major + Min Depression
300020001000
1000500500Test -ve
20001500500Test +ve
DepressionABSENT
DepressionPRESENT
Sensitivity50%
PPV 33%
Specificity33%
NPV 50%
Prevalence 33%
Added Value
• Definition 1:– The additional ability of a test to rule-in or rule-
out compared with the baseline rate– PPV minus Prevalence– NPV minus prevalence
• Definition 2:– The additional of a test to rule-in or rule-out
compared with the unassisted rate
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Loss
of
ener
gy
Dim
inis
hed
driv
e
Slee
p di
stur
banc
e
Con
cent
rati
on/i
ndec
isio
n
Dep
ress
ed m
ood
Anx
iety
Dim
inis
hed
conc
entr
atio
n
Inso
mni
a
Dim
inis
hed
inte
rest
/ple
asur
e
Psyc
hic
anxi
ety
Hel
ples
snes
s
Wor
thle
ssne
ss
Hop
eles
snes
s
Som
atic
anx
iety
Tho
ught
s of
dea
th
Ang
er
Exce
ssiv
e gu
ilt
Psyc
hom
otor
cha
nge
Inde
cisi
vene
ss
Dec
reas
ed a
ppet
ite
Psyc
hom
otor
agi
tati
on
Psyc
hom
otor
ret
arda
tion
Dec
reas
ed w
eigh
t
Lack
of
reac
tive
moo
d
Incr
ease
d ap
peti
te
Hyp
erso
mni
a
Incr
ease
d w
eigh
t
All Case ProportionDepressed ProportionNon-Depressed Proportion
Mitchell, Zimmerman et al MIDAS Database. Psychol Med 2007 Submitted
-0.10
0.00
0.10
0.20
0.30
0.40
0.50A
nger
Anx
iety
Dec
reas
ed a
ppet
ite
Dec
reas
ed w
eigh
t
Dep
ress
ed m
ood
Dim
inis
hed
conc
entr
atio
n
Dim
inis
hed
driv
eD
imin
ishe
d in
tere
st/p
leas
ure
Exce
ssiv
e gu
ilt
Hel
ple
ssne
ss
Hop
eles
snes
s
Hyp
erso
mni
a
Incr
ease
d ap
peti
te
Incr
ease
d w
eigh
t
Inde
cisi
vene
ss
Inso
mni
aLa
ck o
f re
acti
ve m
ood
Loss
of
ener
gy
Psyc
hic
anxi
ety
Psyc
hom
otor
agi
tati
on
Psyc
hom
otor
cha
nge
Psyc
hom
otor
ret
arda
tion
Slee
p di
stur
banc
e
Som
atic
anx
iety
Thou
ghts
of
deat
h
Wor
thle
ssne
ss
Rule-In Added Value (PPV-Prev)Rule-Out Added Value (NPV-Prev)
Accuracy of Tests: Visual
0% 100%25% 75%
Very unlikely Very likelylikelyunlikely
2 Questions
Overall
PHQ-2
WHO5 (1+3)
1 Question3% - (37) - 63% = 60%
3% - (16) - 32% = 29%
3% - (16) - 32% = 29%
10% - (22) -50% = 54%
32% - (37) - 96% = 64%
Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Any Depression
Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Any Depression
Arroll B et al (2003) BMJ
CIDI (computer) Mj Depression
CIDI (computer) Mj Depression
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pre-test Probability
Pos
t-tes
t Pro
babi
lity
Clinician Positive (Fallowfield et al, 2001)
Clinician Negative (Fallowfield et al, 2001)
Baseline Probability
HADS-D Positive (Mata-analysis)
HADS-D Negative (Meta-analysis)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pre-test Probability
Post
-test
Pro
babi
lity
Depression Present (Routine)
Depression Absent (Routine)
Depression Scales +ve (Median)
Depression Scales -ve (Median)
Prior Probability
PPV=0.41
NPV=0. 97
Prevalence of 0.15
Worked Examples of diagnostic tests
PostStroke Mj Depression vs NonMj
• Clinicians diagnosis using DSMIV vs SCAN/PSE
• Using the SCAN:• 50 people with major depression • 150 healthy people• 50 with minor depression
Clinicians using DSMIV• Clinicians diagnosed 52 cases with Mj depression• The specificity of DSMIV was 95%
• Q. What was the sensitivity?• Q. What was the prevalence?• Q. What was the PPV?• Q. What was the % correctly identified per every
100 screened?
Test vs Major Depression
20050
??Test -ve
52??Test +ve(Clinician)
DepressionABSENT
DepressionOn SCAN
Sensitivity50%
PPV ??%
Specificity95%
NPV ??%
Prevalence ??%
1.301.271.1785.600.910.680.960.810.951902000.844250DSMIV algorithm
4.6151.9551.200.720.130.840.380.861722000.341750Anger
46.92502.5539.200.660.040.800.220.821642000.201050Poor orientation
11.937.697.3513.600.480.140.840.250.571142000.562850Poor concentration
7.32501.7158.400.790.010.800.330.981962000.04250Suicidal thoughts
2.452.561.6062.400.780.270.880.530.891782000.502550Poor appetite
3.932.632.7236.800.610.250.900.350.681362000.703550Insomnia
6.0112.50-2.23-44.800.100.210.950.210.10202000.984950Low energy
3.902.503.57280.550.270.920.330.601202000.804050Loss of drive
1.961.351.5863.200.770.500.990.520.781562000.964850Loss of interest
1.411.221.2083.200.900.660.970.740.921842000.904550Persistent low mood
NNPNNDNNSIdentification Index
Negative Utility Index
Positive Utility Index
NPVPPVSpecificity
Non Depressed Stroke Patient withoutsymptom
No Post-Stroke Depression by reference standard
Sensitivity
Post-Stroke Depression withsymptom
Post-Stroke Depression by reference standard
Symptoms
Advanced Techniques
sROCReal World NumbersNND; NNSEconomics
NNS= 1/Idemtification Index
Number needed to ScreenRequires application of criterion (gold) standard)
Measures real number of correct identifications vs misidentificationsCan be easily converted into a percentage
TP+TN / TP+FP+TN+FNOverall Accuracy (Fraction Correct)
NNP = 1/PSINumber Needed to Predict
Dependent of prevalencePlaces equal weight on rule-in and rule-out accuracy
Measures gainClinically applicable
PPV + NPV – 1Predictive Summary Index
NND = 1/YoudenNumber Needed to Diagnose
Requires application of criterion (gold) standard)Does not assess ratio of false positives to negatives
Relatively independent of prevalenceNot clinically interpretable
sensitivity + specificity – 1Youden Index
Reciprocal Absolute Benefit Formula
Reciprocal Absolute Benefit
WeaknessStrengthBasic FormulaMeasure
PPV DT Distress = 55%; PPV Other Methods 65%
ROC Plot
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
1 - Specifity
Sens
itivi
ty Low Mood
DSMIV
Low mood & loss interest