health care special interest-i2b2
TRANSCRIPT
Extracting information from clinical notes
H. Yang, I. Spasic, F. Sarafraz,John A. Keane, Goran Nenadic
School of Computer ScienceUniversity of Manchester
Electronic clinical notes electronic medical/health records hospital discharge summaries
Extract information on individual patients and their diseases clinical practice
treatments, drugs used, etc.
Aim: support data analytics e.g. monitoring quality
Huge interest locally and internationally
Motivation & aim
Clinical notes Highly condensed text
sometimes without proper sentences hospital discharge summaries are more structured list of medications, symptoms, etc.
Terminological variability orthographic, acronyms, local conventions
Various sections previous history, social/family background
NLP challenges in clinical data A series of international challenges in information
extraction from clinical narratives organisers: Informatics for Integrating Biology & the
Bedside (i2b2)
3 shared tasks so far− De-identification of medical records and identification of
smokers from their clinical records (2007)Identification of obesity & related diseases in patients from hospital discharge documents (2008)Extraction of medications and related information from patients’ discharge documents (2009)
2010 challenge concept, assertions, relations
i2b2 2008 Extract status of diseases in patients
obesity, diabetes mellitus, hypercholesterolemia, hypertriglyceridemia, hypertension, heart failure (16 in total)
status: yes, no, unmentioned, questionable on textual and “intuitive” level
28 teams worldwide UoM ranked 1st in textual and 7th in intuitive
Our methodology Term-based exact and approximate matching Context-based pattern- and rule-based matching Machine learning approach
Yang, H., Spasic, I., Keane, J., Nenadic, G.: A Text Mining Approach to the Prediction of a Disease Status from Clinical Discharge Summaries, JAMIA 16(4):596-600
Methodology
Linguisticpre-processing
Informationextraction
(rules, machinelearning)
Medicalresources
Constructingresults
section splitting, sentence splitting,chunking, POS tagging, parsing
textual evidence extraction, section filtering, morphological clues (e.g. drug/disease name affixes)
Template filling, filtering negative results, relations and heuristics:
Organ : Symptom,Symptom : Disease,Disease : Drug, Drug : Mode of application
•Disease names•Drug names•Body parts•Symptoms•Abbreviations•Synonyms
Rule-based IE Disease status patterns - context-based patterns
[N] negative for CHF[Q] question of asthma[U] no known diagnosis of CAD[U] we should consider further asthma studies as an outpatient
- semantics-based patterns[N] normal coronaries, a thin black man
Clinical resources used in sentence extraction clinical inference rules e.g., weight>90kg,
LDL>160mg/dl, HDL<35mg/dl medications e.g., ‘anti-depressant’
Textual Annotation Results
Performance on Disease Status (Ranked 1st)Micro-average: Accuracy (0.9723)Macro-average: P (0.8482), R (0.7737), F-score (0.8052)
#Eval #Corr #Gold Precision Recall F-score
Y 2267 2132 2192 0.9404 0.9726 0.9562
N 56 40 65 0.7142 0.6153 0.6611
Q 12 9 17 0.7500 0.5294 0.6206
U 5709 5640 5770 0.9879 0.9774 0.9826
Intuitive Annotation Results
Performance on Disease Status (Ranked 7th)Micro-average: Accuracy (0.9572)Macro-average: P (0.6383), R (0.6294), F-score (0.6336)
#Eval #Corr #Gold Precision Recall F-Score
Y 2160 2068 2285 0.9574 0.9050 0.9304
N 5236 5014 5100 0.9576 0.9831 0.9702
Q 3 0 14 0 0 0
i2b2 2009 Extract mentions of medication and related
information drugs the patient takes dose, mode of application, frequency, duration, etc.
(for each mention)
19 teams worldwide UoM ranked 3rd
Our approach was based on combining extensive dictionaries morphological and derivational patterns
Evaluation (F-measure)
Medication 83.59%
Dosage 82.67%
Frequency 83.49%
Mode 85.33%
Duration 51.00%
Reason 38.81%
All fields 78.47%
Spasić I, Sarafraz F, Keane JA, Nenadic G: “Medication Information Extraction with Linguistic Pattern Matching and Semantic Rules”, JAMIA (to appear)
Summary
NLP and text mining techniques are useful for extraction of clinical data
- disease status extraction: 95-97% accuracy - medication information extraction: 80% F-measure
Construction of reliable and sufficient resources - clinical terms and abbreviations (e.g., disease synonyms,
symptoms, drugs) - context patterns related to diseases, medication, etc. Domain knowledge required
construction of domain- and task-specific resources complex clinical facts and conditions for inference
more comprehensive knowledge representation needed