can text mining and machine learning help reduce ... · abstract text was missing for some...

DR MELITA GIUMMARRA

Can text mining and machine learning help reduce systematic review workload for injury researchers?

Senior Research Fellow, ARC DECRA

Pre-hospital, Emergency and Trauma Research Group

MONASH

PUBLIC HEALTH &

PREVENTIVE MEDICINE

@MelitaGiummarra [email protected]

mailto:[email protected]

ACKNOWLEDGEMENTS

STUDY TEAM

Dr Melita Giummarra

Ms Georgina Lau

Dr Genevieve Grant

Professor Belinda Gabbe

FUNDING DISCLOSURES

Dr Giummarra is supported by an ARC DECRA fellowship

Professor Gabbe is supported by an ARC Future Fellowship.

Paper reporting review

findings (in press):

Methods paper

SYSTEMATIC REVIEWS FOR EVIDENCE SYNTHESIS

• Systematic reviews employ rigorous methods to evaluate the state of the science

▪ Comprehensive search strategy (MeSH/EMTREE terms & keywords)

▪ Pool results from multiple databases (typically 4-7)

▪ Each stage is conducted by two researchers and cross-referenced

▪ Generate narrative and quantitative (e.g., meta-analysis) synthesis of the evidence

▪ Evaluate level of evidence relative to risk of bias/quality of the science

▪ As a result, systematic reviews:

▪ Are often highly cited

▪ Help reduce “research waste”

▪ Identify benefits/harms of “exposures” or interventions

▪ Identify gaps in knowledge

over-identify potential literature

(typically reject >95%)

EMERGING CHALLENGES

The rapidly increasing publication rates mean that reviews are practically out of date by the time they’re complete!

CAN MACHINE LEARNING & TEXT MINING HELP?

• ML/TM are statistical tools to detect patterns and extract knowledge from unstructured natural language text

• Explore and categorize unstructured data (e.g., term co-occurrences or frequency)

• Minimise human effort requirements – especially for large bodies of unstructured data.

• For systematic reviews, ML/TM tools have been developed for various stages of the review:

• Literature search

• Screening

• Data extraction

• Risk of bias evaluation

• Review updates

• To date, recommendations are that we do NOT rely on text mining solely, but it may be used as a “second reviewer”.

ABSTRACKR

Abstrackr a free web-based platform developed by Byron Wallace (Brown University, USA).

• It uses an active learning algorithm (using uni-grams and bi-grams) to generate predictions of relevance from the words in citation titles, abstracts and keywords based on judgements by a reviewer.

• Once predictions are generated, citations are sorted according to probability of relevance and researchers can then more quickly identify articles likely to be relevant and eliminate those with low probability

• Abstrackr has been shown to reduce the burden of conducting and updating systematic reviews in specific health topics (e.g., genetics) without compromising sensitivity and specificity to identify eligible citations for full text review.

http://abstrackr.cebm.brown.edu

AIMS OF MY STUDY

REVIEW AIM:

Determine relationships between fault attribution and socio-economic/health outcomes after transport injury.

METHODS-SPECIFIC AIMS

• Examine whether machine learning is appropriate for systematic review citation screening in injury research

• Examine whether text analysis of full text articles provides workload savings for injury recovery systematic reviews

SEARCH STRATEGY

STEP 1: SCREENING CITATIONS

(TITLE & ABSTRACT)

SCREENING STRATEGY

Endnote library

(n = 16,324)

Duplicates removed

(n = 5,764)

Citations screened

(n = 10,559)

Medline

(n = 4,291)

Embase

(n = 7,482)

PsycINFO

(n = 1,315)

CINAHL

(n = 2,667)

Cochrane

(n = 569)

Reviewer 1: Traditional manual screening against eligibility criteria:

Population: Transport injury, adults aged >15

Design: Cohort, observational and prospective studies

Outcomes: Work, Pain, Psychological or Health outcomes reported

Reviewer 2: Screening against eligibility criteria in Abstrackr, stopping when no more citations were predicted relevant.

SCREENING CITATIONS (Abstrackr)

RESULTS: SCREENING STAGE 1 (citations)

Reviewer 1:

• Screened 10,559 citations

• 61 hours of screening

• Identified 401 articles for full text screening

Reviewer 2:

• Screened 1,809 articles

• 16 hours of screening

• Identified 634 citations for full text screening

RESULTS: SCREENING STAGE 1 (citations)

16.30 hrs(n=1,809)

2.50 hrs(n=343)

5.47 hrs(n=649)

7.37 hrs(n=818)

11.52 hrs(n=1,244)

13.90 hrs(n=1,374)

KEY OBSERVATIONS: CITATION SCREENING

✓ Excellent workload savings (17.1% of citations; total screening saving = 63.4% vs traditional review)

✓ Excellent (low) false negative rate – esp. when considering full text inclusion

✓ Workload savings, specificity and false negative rate optimised by a more generous stopping rule

Variable and moderate precision and low sensitivity, probably due to:

1. Abstract text was missing for some citations

2. Abstracts often failed to report design and population features

3. Complex inclusion criteria with multiple outcomes

4. Highly imbalanced dataset (relevant : irrelevant)

• Only 689 (15.3%) articles were relevant for full text review (134 that were excluded as ineligible for full text review, e.g., conference abstracts, books etc.).

The findings are consistent with previous Abstrackr methods evaluations with unbalanced and broad review topics.(e.g., Rathbone, 2015, Systematic Reviews; Gates, 2018, Systematic Reviews)

STEP 2: SCREENING FULL TEXTS

SCREENING STRATEGY: FULL TEXT

Full text screened

(n = 555)

Reviewer 1: Traditional manual screening against all eligibility criteria

Reviewer 2: Text mining with a fault dictionary** to restrict full texts to those with a fault-related concept in methods/results

using Wordstat & QDA Miner. Manual screening those with a fault term against al eligibility criteria.

Endnote library

(n = 16,324)

Duplicates removed

(n = 5,764)

Citations screened

(n = 10,559)

Medline

(n = 4,291)

Embase

(n = 7,482)

PsycINFO

(n = 1,315)

CINAHL

(n = 2,667)

Cochrane

(n = 569)

* Full text ineligible for screening: Publication format (conference abstract, non-empirical, dissertation), missed duplicate, not available in English

** Dictionary was developed from a survey of 20 injury and trauma experts

Citations judged irrelevant

(n = 9,870)

Full text ineligible*

(n = 134)

SCREENING FULL TEXT ARTICLES (REVIEWER 2)

1. Add semantic anchors before methods & after results

2. Prepare and import PDFs for text mining

1. Optimize PDFs for text (e.g., via “edit PDF” in Adobe)

2. Import PDFs into Wordstat (v.14) via Stata with Document Converter Wizard (n=555)

3. Review failed PDFs and attempt to load via QDA Miner (n=89)

4. Reserve those that still failed for manual screening (n=25)

3. Load fault terms (categorization dictionary, 46 terms) in Wordstat

4. Analyse fault term frequency between semantic anchors

a) Test sensitivity of terms and drop irrelevant terms (32 terms dropped)

b) Save keyword frequency as stata file to identify full text s for manual screening (papers with >=1 fault term)

Fault Dictionary

Attribution of responsibility

Blame

common law

Compensable

Compensation

Fault

Insurance

Lawyer

Legal

Liability

Litigation

Passenger

Pedestrian

Tort

SCREENING FULL TEXT ARTICLES (Wordstat)

Reviewer 1

• Reviewed 555 full texts

• 39 hours screening time

Reviewer 2

• 25 PDFs did not import and were manually screened

• 342 (64.5% of 530 full texts that loaded) contained >1 fault term and were manually screened

• The most frequent fault terms identified were insurance (n=171), compensation (n=136), passenger (n=104) and pedestrian (n=100)

• 8.75 hours screening time (including PDF formatting)

✓ None of the full-text articles without a fault-related term were judged to eligible by Reviewer 1.

✓ Text mining reduced screening workload by 29.7% (screening time)

KEY OBSERVATIONS

TAKE HOME MESSAGE

• Abstrackr and text mining offer excellent work efficiencies, good accuracy and very low false negative rates

• Learnings to improve citation screening in Abstrackr:

• Identify citations missing abstractS (or be wary when removing duplicates to keep sourceS with the abstract)

• Eliminate ineligible citation types in Endnote before importing into Abstrackr (e.g., book chapters and conference abstracts)

• Considerations for using text mining of full text articles

• Significant time is required for document preparation (e.g., upload only methods/results), and dictionary testing, but still saves time relative to manual screening.

• Conclusion?

• These tools ARE beneficial to support systematic reviews in public health/injury research

• Recommend machine learning especially if the outcomes or exposures are clearly defined

• Text mining cannot and should not completely replace human screeners when examining complex literature, populations or health outcomes.

QUESTIONS?

Paper reporting review

findings (in press):

[email protected]

@MelitaGiummarra

Methods paper

mailto:[email protected]

can text mining and machine learning help reduce ... · abstract text was missing for some...

Documents