understanding and using the stard and prisma guidelines · introduction – herbert y. kressel,...

Post on 19-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction – Herbert Y. Kressel, Professor of Radiology, Harvard Medical School, Editor-in-Chief, Radiology STARD – Patrick M. Bossuyt, Professor of Clinical Epidemiology, University of Amsterdam, and one of the original authors of STARD PRISMA - Matthew McInnes, Associate Professor of Radiology, University of Ottawa, 2014 RSNA Eyler fellow Putting it all together – Deborah Levine, Professor of Radiology, Harvard Medical School, Senior Deputy Editor, Radiology

Understanding and Using the STARD and PRISMA Guidelines

Understanding and Using the STARD and PRISMA Guidelines: Introduction

Herbert Y. Kressel, MD Editor, Radiology

Miriam H. Stoneman Professor of Radiology Harvard Medical School, Boston MA

Editors, Reviewers, and Readers: Key Questions

• Is it true? Scientific Validity • Can the results be generalized? Reproducibility • Is it Novel? • Is it Important • Is it Interesting

ANNALS OF SCIENCE

THE TRUTH WEARS OFF Is there something wrong with the scientific method?

BY JONAH LEHRER DECEMBER 13, 2010

Unreliable research

Trouble at the lab Scientists like to think of science as self-correcting. To an alarming degree, it is not

“I SEE a train wreck looming,” warned Daniel Kahneman, an eminent psychologist, in an open letter last year. The premonition concerned research on a phenomenon known as “priming”. Priming studies suggest that decisions can be influenced by apparently irrelevant actions or events that took place just before the cusp of choice. …. Dr Kahneman and a growing number of his colleagues fear that a lot of this priming research is poorly founded. Over the past few years various researchers have made systematic attempts to replicate some of the more widely cited priming experiments. Many of these replications have failed. In April, for instance, a paper in PLoS ONE, a journal, reported that nine separate experiments had not managed to reproduce the results of a famous study from 1998 purporting to show that thinking about a professor before taking an intelligence test leads to a higher score than imagining a football hooligan.

Oct 19th 2013 | http://www.economist.com/node/21588057

Unable to reproduce reported results

• Amgen working with original authors to reproduce results could only replicate 6/53 key studies assessed

• Bayer Healthcare able to reproduce results in ¼ of 63 studies

Why Most Published Research Findings Are False

… “Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.”

… “Commercially available “data mining” packages

actually are proud of their ability to yield statistically significant results through data dredging.”

PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124

Probability that study is true depends on pretest probability, statistical power, and bias

PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124

NIH plans to enhance reproducibility

Francis S. Collins and Lawrence A. Tabak discuss initiatives that the U.S. NIH is exploring to restore the self-correcting nature of preclinical research

“A growing chorus of concern, from scientists and laypeople, contends that the complex system for ensuring the reproducibility of biomedical research is failing and is in need of restructuring. As leaders of the NIH, we share this concern…. Science has long been regarded as ‘self-correcting’, given that it is founded on the replication of earlier work. Over the long term, that principle remains true. In the shorter term, however, the checks and balances that once ensured scientific fidelity have been hobbled. This has compromised the ability of today’s researchers to reproduce others’ findings.”

A history of the evolution of guidelines for reporting medical research: the long road to the EQUATOR Network.

Concerns about the quality of research reporting are not new

“It is a commonplace of medical literature to find the most positive and sweeping conclusions drawn from a sample so meager as to make scientifically sound conclusions of any sort utterly impossible.” -Pearl, 1919

“Frequently, indeed, the way in which the observations were planned must have made it impossible for the observer to form a valid estimate of the error … an idea of what results might be expected if the experiment were repeated under the same conditions.” -Mainland 1938

Schor and Karten (1966)

• 34% - Conclusions drawn about population but no statistical tests applied to determine whether such conclusions were justified

• 31% - No use of statistical tests when needed

• 25% - Study design not appropriate for solving the stated problem

• 19% - Too much confidence placed on negative results with small-size samples

Problems in Research Reporting Continue

Among 513 neuroscience articles published in 5 top-ranking journals (Science, Nature, Nature Neuroscience, Neuron, Journal of Neuroscience) in 2009-10, 50% of 157 articles which compared effect sizes used an inappropriate method of analysis

Nieuwenhuis, Nature Neuroscience, 2011

In 100 orthopedic research papers published in seven journals in 2005-10, the conclusions were not clearly justified by the results in 17% and a different analysis should have been undertaken in 39%

Parsons et al Biomedcentral, 2012

Reporting Standards are Desirable • “Standards governing the content and format of statistical aspects

should be developed to guide authors in the preparation of manuscripts.” -O’Fallon et al 1978, Biometrics 34:687-95

• “… editors could greatly improve the reporting of clinical trials by providing authors with a list of items that they expected to be strictly reported.” -DerSimonian et al 1982, NEJM 306:1332-7

• “An obvious proposal is to suggest that editors of oncology journals make up a check-list for authors….” -Zelen 1989, J Clin Oncol 7:827-8

Efforts to Create Standards for Reporting • STARD Standards for Reporting Diagnostic Accuracy (Diagnostic Performance ) • CONSORT Consolidated Standards of Reporting Trials

(Randomized Control Trials) • PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses

• STROBE Strengthening the reporting of observational studies in epidemiology (cohort, case-control, and cross sectional studies)

Guidelines Improve Reporting • Help identify the presence and nature of bias • Help identify methodological problems e.g sample

size, inappropriate analysis • Help ensure that the description of the methods is

adequate to reproduce study • They do not ensure that a study is novel or important

or interesting.

STARD 2015

Complete And Transparent Reporting Of

Diagnostic Accuracy Studies

Patrick M. Bossuyt

Diagnostic Accuracy

• How good is the test in correctly classifying patients as being diseased?

Diagnostic Accuracy

• How good is the test in correctly classifying patients as having the target condition?

Diagnostic Accuracy Study

Index Test

Gold Standard

Diagnostic Accuracy Study

Index Test

Reference Standard

Diagnostic Accuracy Study

Index Test

Reference Standard

Series of Patients

Cross-classification

Ebola No Ebola

Positive

Negative

RT-PCR

Ebola No Ebola

Positive 15 9

Negative 0 107

The results RT-PCR

• Sensitivity & Specificity 100% 92%

• 95% CI (78% to 100%) (86% to 96%)

Ebola No Ebola

Positive 15 9

Negative 0 107

Sources of variation Setting Patient characteristics Previous testing

Sources of bias:

– Incomplete verification – Unblinded reading – Multiple reference

standards

Accuracy is not fixed

Pulmonary Embolism

D-dimer testing

Diagnostic Accuracy Study

D-dimer

Multislice CT

ED Patients

Cross-classification

Patient Selection

D-dimer

CT

“Easy” Patients

Cross-classification

Two Series: Two sets of Eligibility Criteria

D-dimer

PE patients

Cross-classification

Healthy controls

Verification - partial

D-dimer

Multislice CT

ED Patients

Cross-classification

Target Condition

Other Condition

Positive

Negative

The results Reference Standard

Partial verification: random Reference Standard

Target Condition

Other Condition

Positive

Negative

Partial verification: typical Reference Standard

Target Condition

Other Condition

Positive

Negative

Multiple Reference Standards

D-dimer

Multislice CT

ED Patients

Cross-classification

Multiple Reference Standards

D-dimer

Multislice CT

ED Patients

Cross-classification

Follow-up

Multiple Reference Standards Reference Standard

Target Condition

Other Condition

Positive

Negative

Multiple Reference Standards Reference Standard

Target Condition

Other Condition

Positive

Negative

Multiple Reference Standards

D-dimer

Multislice CT

ED Patients

Cross-classification

Follow-up

Multislice CT Follow-up

Multiple Reference Standards

Multiple Reference Standards Reference Standard

Target Condition

Other Condition

Positive

Negative

Multiple Reference Standards Reference Standard

Target Condition

Other Condition

Positive

Negative

Multiple Reference Standards Reference Standard

Target Condition

Other Condition

Positive

Negative

D-dimer testing

JE Schrecengost; Clin Chem 2003

outpatients inpatients

JAMA 15 SEP 1999; 282; 1061-1066

Korevaar et al. Radiology. 2015 Mar;274(3):781-9

Item Reported

Inclusion and exclusion criteria 65% Participant sampling: consecutive vs. random vs. convenience 55% Blinding of index test readers 58% Baseline characteristics (age, sex, presenting symptoms) 61%

112 Diagnostic accuracy studies published in 2012

How well are studies reported?

STARD (2003)

Adherence to STARD

STARD 2015: An update

• Incorporate new evidence – Improved understanding of sources of bias and variation

• Facilitate use

– Rearranging and rephrasing items – Improving consistency with other major reporting guidelines

Project team (n=4)

Advisory board (n=13)

Contributors (n=68)

N=85

Penny Whiting Marie Westwood Nandini Dendukuri David Simel Augusto Azuara-Blanco Rita Horvath Ann van den Bruel Anne Rutjes Lucas Bachmann Jeffrey Blume Frank Buntinx Blanca Lumbreras Chris Hyde Carl Heneghan Ewout Steyerberg Eleanor Ochodo Gianni Virgili Holly Janes Joris de Groot Jac Dinnes Carl Moons

Hans van Maanen William Summerskill Herbert Kressel Nader Rifai Robert Golub Philippe Ravaud Isabelle Boutron Richelle Cooper John Ioannidis Iveta Simera Andreas Ziegler Doug Altman Jon Deeks Kenneth Fleming Gordon Guyatt Myriam Hunink Jos Kleijnen Andre Knottnerus Erik Magid Barbara McNeil Matthew McQueen

Andrew Onderdonk Christopher Price Sharon Straus Stephen Walter Wim Weber Constantine Gatsonis Les Irwig David Moher Riekie de Vet David Bruns Paul Glasziou Jeroen Lijmer Drummond Rennie Hans Reitsma Jorgen Hilden Harry Büller Frank Davidoff John Overbeke Daniël Korevaar Lotty Hooft Jérémie Cohen Patrick Bossuyt

Mariska Leeflang Matthew Thompson Margaret Pepe Nynke Smidt Nancy Obuchowski Petra Macaskill Katie Morris Reem Mustafa Rosanna Peeling Steffen Petersen Sally Lord Holger Schunemann Susan Mallett Todd Alonzo Andrew Vickers Nancy L. Wilczynski Yemisi Takwoingi Nitika Pai Sarah Byron Stephanie Chang Stefan Lange

Project plan • Stage 1: Literature search

– Aim: Identify potential new items

• Stage 2: Two-round survey

– Aim: Identify items that needed to be modified, removed, or added

• Stage 3: Two-day live meeting in Amsterdam

• Stage 4: Final input wider STARD group

• Stage 5: Piloting

• Stage 6: Finalization of checklist

Survey question: An example

• Existing item: – Report any adverse events from performing the index tests or the reference standard.

• Consideration:

– Diagnostic accuracy studies typically lack the power and design to estimate adverse event rates.

– Many tests do not have intrinsic adverse events.

• Question: Should we: – Keep this item as it is – Modify this item (please explain) – Remove this item (our suggestion) – No opinion

STARD 2003 items (n=25) Consensus reached? Keep item as it is 5

Modify item 13

Remove item 0

No consensus 7

Potential new items (n=8) Consensus reached? Include 4

Exclude 0

No consensus 4

Response rate: 86% (73/85)

Project plan

• Stage 1: Literature search

• Stage 2: Two-round survey

• Stage 3: Two-day live meeting in Amsterdam

– Aim: Discuss items for which no consensus was reached

– Aim: Reach consensus on draft checklist

• Stage 4: Final input wider STARD group

• Stage 5: Piloting

• Stage 6: Finalization of checklist

Project plan • Stage 1: Literature search

• Stage 2: Two-round survey

• Stage 3: Two-day live meeting in Amsterdam

– Aim: Discuss items for which no consensus was reached

– Aim: Reach consensus on draft checklist

• Stage 4: Final input wider STARD group

• Stage 5: Piloting

• Stage 6: Finalization of checklist

A checklist is not the end product

• A list of reporting items is only the beginning

• We have to develop real tools – Teaching material – Writing aids – Reviewing tools – …

STARD for Abstracts

STARD for Trial Registration

STARD 2015 Complete And Transparant Reporting

Of Diagnostic Accuracy Studies

PRISMA: Guide for Authors

Matthew McInnes MD FRCPC

What is a systematic review? • A systematic review attempts to collate all empirical

evidence that fits pre-specified eligibility criteria to answer a specific research question.

• It uses explicit, systematic methods that are selected to minimize bias, thus providing reliable findings from which conclusions can be drawn and decisions made.

Why are they important? • Because of their rigorous, scientific approach,

systematic reviews and meta-analyses have been touted as the “highest level of evidence”

What can they tell us? • More precise estimates of imaging test accuracy,

intervention effectiveness, or complication rate • Identify factors contributing to heterogeneity in

these measures • Compare diagnostic accuracy between two

modalities**

Pickhardt, Radiology 2011

McInnes, Radiology 2011

Kiewet, Radiology 2012

*2015 data is only up to date through May, yellow box is ‘projected for a full year’

Systematic Reviews Published in Imaging Journals (Diagnostic Test Accuracy Reviews only)

*2015 data is through October 2015

Systematic Reviews Published in Radiology

*

Systematic Review Benefits • No expensive infrastructure needed • No review board or other typical barriers to progress • Anyone (even you) can do it!

• Failure to Choose a Review Question That Represents an Advance in Knowledge – no or only a very small number of studies – several well-conducted, large sample size studies have

evaluated the same research question and arrived at similar conclusions

– systematic review with a similar purpose has been recently published

PICO • Patient

• Index Test

• Comparison (if relevant, may be N/A)

• Outcome (typically diagnostic accuracy as defined by reference standard)

Staunton, M. Radiology 2007

PICO P = Patients with a focal liver lesion possibly adenoma vs. FNH, but no history of cirrhosis were included I (index test) = gadoxetic acid–enhanced MR imaging C = No comparison O (Outcome) = an acceptable reference standard for diagnosis of FNH or HCA defined as surgical pathology; biopsy; and clinical follow-up, imaging follow-up, or both.

McInnes et al. Radiology 2015

PRISMA • Preferred Reporting Items for Systematic reviews and

Meta-Analysis • 27 item checklist to guide reporting of systematic

reviews

PRISMA • Key information is often poorly reported in systematic

reviews, thus diminishing their potential usefulness • As is true for all research, systematic reviews should be

reported fully and transparently to allow readers to assess the strengths and weaknesses of the investigation

• Aim is to ensure clear presentation of what was planned, done, and found in a systematic review

How to use PRISMA? • Guide protocol design • Guide manuscript writing

Benefits of PRISMA? • PRISMA adherence is associated with systematic

review quality

Tunis et al. Radiology 2011

Benefits of PRISMA? • PRISMA adherence is associated with citation rate

van der Pol et al. PLOSOne 2015

Benefits of PRISMA? • Journals endorse PRISMA

– Radiology – jMRI

Benefits of PRISMA?

PRISMA (lnterventions)

Diagnostic Test Accuracy

Reviews

DTA Review Challenges

Publication Bias

Heterogeneity

Meta-Analysis Method

Risk of bias in included studies • Diagnostic accuracy reviews should use QUADAS-2 • NOT STARD, Cochrane risk of bias tool etc.

http://www.bristol.ac.uk/quadas/quadas-2/

QUADAS-2

Lee et al. Radiology 2015

dta.cochrane.org

Practical tips • Use PRISMA and other resources when planning your

systematic review • Create internal peer review team for protocol design

and manuscript writing

Practical tips • Reach out to experts • Educate yourself

Putting it all together -- Deborah Levine, MD

Senior Deputy Editor, Radiology

• Be sure you have a novel/interesting/important research question

• Follow our Publication Information for Authors – Use a checklist! – Flow diagrams are helpful for summarizing not just

patient inclusion/exclusion but also for showing results of index test

• You can follow all the rules and have the “perfect” methodology but if the question you are asking is not new, clinically relevant, or hypothesis generating…

• …then lack of novelty will make acceptance in our journal unlikely

Novel: does it… • Add new information

• Provide new concepts

• Describe new technology

• Define new diagnostic or therapeutic approaches

• Resolve existing controversies

Important: Does it…. • Change practice:

“News you can use”

• Help us understand biology or technology?

• Generate a new hypothesis and stimulate further research?

Interesting/Informative: Does it…

• Add to considerably to our available information

• Have conclusions that provide clear direction

• Provide useful information

Use our Publication Information for Authors

Our Goal: Help

you build and optimize the structure and content of your manuscript

http://pubs.rsna.org/page/radiology/pia

http://pubs.rsna.org/page/radiology/pia

Define your purpose • Clearly stated purpose is the foundation of any

research study and research manuscript • Use that purpose in the abstract and at the end of

introduction • When you write your conclusions, be sure they directly

address the purpose

Hypothesis-driven research • STARD 2015 asks for

Study objectives AND hypotheses – What are you trying to prove (disprove) – A clearly stated hypothesis helps focus the

entire research manuscript

Methods – Details Radiology checklist • Ethical: IRB, HIPAA, informed consent • Remember to register your clinical trial! (ClinicalTrials.gov)

– prospective trials should be here, but consider this for retrospective trials as well!

• Funding source(s) • Conflicts of interest • Overlap in subjects from prior publications

Methods: Study design Prospective (looking forward) vs. Retrospective (looking back) Prospective studies - Data collection (and usually analysis) are planned BEFORE the obtaining the index test and reference standard Written informed consent Frequently we see studies with retrospective analysis of prospectively acquired data

Methods - participants • Describe the study population

– Inclusion/exclusion criteria and numbers – Dates of study enrollment/follow-up

Inclusion bias • “Inclusion criteria were 100 consecutive patients with

right lower quadrant pain who had ultrasound and CT within 24 hours of surgery….” – How many patients had RLQ pain during the time

interval? – How many had US but no CT? – How many had CT but no US? – How many did not go to surgery? – How many had surgery >24 hours later?

1000 patients with abdominal pain

CT (N=500)

US N=300

No imaging (N=200)

US group: Biased to Children women of reproductive age

CT: Older Biased to males

1000

500 CT

200 US and CT

100 US, CT, and surgery in <24 hours

The 100 patients in the final population are biased towards the sickest patients who went to surgery AND also the types of patients/findings that would lead to a patient having both a CT and an US

STARD 2015 asks for distribution of alternative diagnosis in those without the target condition

Participant accrual

• Was this based on: – Presenting symptoms – Results from previous tests

• Convenience sample – Had patients received the index test and/or reference

standard and you are basically mining the data

Sample size and POWER STARD 2015 asks for “intended sample size and how it was determined” Power analysis

Best is a priori If you don’t have that, we may ask for a post hoc analysis, particularly for non-significant results If you have significant results, the issue will be if your population and methodology are generalizable

Methods: Clarity is key

• Structure your methods such that someone else can EXACTLY reproduce your work – How do you perform your index test – How do you perform your reference standard? – Be specific. Use tables, as needed

“In sufficient detail to allow replication”

Dolly First clone mammal created from an adult cell Born July 5, 1996 http://www.nms.ac.uk/explore/collections-

stories/natural-sciences/dolly-the-sheep/

Blinded interpretations • Index test and reference standard

– Describe if reviewers were blinded – If reviewers were involved in patient care, is there

possible recall bias?

– Batched readings – what is read first? How might that bias results?

Time Time

Time

Reference standard

• What you use as “truth” – Operative findings – Histology – Imaging studies

• time interval for clinical follow-up • Realize the biases inherent in your reference standard

– Why did you choose your reference standard?

Time Time

Time

Reference test methods • Be very clear about the reference test

– Positive, Negative, Indeterminate

– How did you determine these thresholds? • If based on your own population, you can have biased

results – Was your study exploratory only? –need to be clear about

the strength of the conclusions that can/cannot be drawn

• If you don’t have an a priori threshold, did you have 2 patients groups, a development set and an independent test set?

• Be certain to report on all items listed in MATERIALS & METHODS

• Do not report results for items not listed in MATERIALS & METHODS

• Give specific information, not generalities • Tip: Have someone not involved in your project read your

manuscript. Does it make sense? Do methods and results parallel each other?

Results

Estimates • Report accuracy with measures of

uncertainty/confidence intervals • Report indeterminate and missing results • Report variability between subgroups

– Preferentially established before data results • Report reproducibility, inter- and intra-observer TIP: involve a statistician early in the planning process

Text Table Graph

Provides detail Can be cumbersome

Explicit information Might make manuscript long

Illustrative summary Might lack granularity

Discussion - Limitations

– sources of potential bias – statistical uncertainty – generalizability

Discussion – Conclusion • Be sure that your summary interpretation is consistent

with • Hypothesis • Purpose • Methods • Results

Putting it all together • Start with a great idea • Plan your study using guidelines and checklists • Involve a statistician early to help with planning • Write your manuscript using the appropriate

guidelines, publication information for authors, and checklists

We look forward to seeing your next manuscript in submitted to Radiology

top related