health care special interest-i2b2

Extracting information from clinical notes

H. Yang, I. Spasic, F. Sarafraz,John A. Keane, Goran Nenadic

School of Computer ScienceUniversity of Manchester

Electronic clinical notes electronic medical/health records hospital discharge summaries

Extract information on individual patients and their diseases clinical practice

treatments, drugs used, etc.

Aim: support data analytics e.g. monitoring quality

Huge interest locally and internationally

Motivation & aim

Clinical notes Highly condensed text

sometimes without proper sentences hospital discharge summaries are more structured list of medications, symptoms, etc.

Terminological variability orthographic, acronyms, local conventions

Various sections previous history, social/family background

NLP challenges in clinical data A series of international challenges in information

extraction from clinical narratives organisers: Informatics for Integrating Biology & the

Bedside (i2b2)

3 shared tasks so far− De-identification of medical records and identification of

smokers from their clinical records (2007)Identification of obesity & related diseases in patients from hospital discharge documents (2008)Extraction of medications and related information from patients’ discharge documents (2009)

2010 challenge concept, assertions, relations

i2b2 2008 Extract status of diseases in patients

obesity, diabetes mellitus, hypercholesterolemia, hypertriglyceridemia, hypertension, heart failure (16 in total)

status: yes, no, unmentioned, questionable on textual and “intuitive” level

28 teams worldwide UoM ranked 1st in textual and 7th in intuitive

Our methodology Term-based exact and approximate matching Context-based pattern- and rule-based matching Machine learning approach

Yang, H., Spasic, I., Keane, J., Nenadic, G.: A Text Mining Approach to the Prediction of a Disease Status from Clinical Discharge Summaries, JAMIA 16(4):596-600

Methodology

Linguisticpre-processing

Informationextraction

(rules, machinelearning)

Medicalresources

Constructingresults

section splitting, sentence splitting,chunking, POS tagging, parsing

textual evidence extraction, section filtering, morphological clues (e.g. drug/disease name affixes)

Template filling, filtering negative results, relations and heuristics:

Organ : Symptom,Symptom : Disease,Disease : Drug, Drug : Mode of application

•Disease names•Drug names•Body parts•Symptoms•Abbreviations•Synonyms

Rule-based IE Disease status patterns - context-based patterns

[N] negative for CHF[Q] question of asthma[U] no known diagnosis of CAD[U] we should consider further asthma studies as an outpatient

- semantics-based patterns[N] normal coronaries, a thin black man

Clinical resources used in sentence extraction clinical inference rules e.g., weight>90kg,

LDL>160mg/dl, HDL<35mg/dl medications e.g., ‘anti-depressant’

Textual Annotation Results

Performance on Disease Status (Ranked 1st)Micro-average: Accuracy (0.9723)Macro-average: P (0.8482), R (0.7737), F-score (0.8052)

#Eval #Corr #Gold Precision Recall F-score

Y 2267 2132 2192 0.9404 0.9726 0.9562

N 56 40 65 0.7142 0.6153 0.6611

Q 12 9 17 0.7500 0.5294 0.6206

U 5709 5640 5770 0.9879 0.9774 0.9826

Intuitive Annotation Results

Performance on Disease Status (Ranked 7th)Micro-average: Accuracy (0.9572)Macro-average: P (0.6383), R (0.6294), F-score (0.6336)

#Eval #Corr #Gold Precision Recall F-Score

Y 2160 2068 2285 0.9574 0.9050 0.9304

N 5236 5014 5100 0.9576 0.9831 0.9702

Q 3 0 14 0 0 0

i2b2 2009 Extract mentions of medication and related

information drugs the patient takes dose, mode of application, frequency, duration, etc.

(for each mention)

19 teams worldwide UoM ranked 3rd

Our approach was based on combining extensive dictionaries morphological and derivational patterns

Evaluation (F-measure)

Medication 83.59%

Dosage 82.67%

Frequency 83.49%

Mode 85.33%

Duration 51.00%

Reason 38.81%

All fields 78.47%

Spasić I, Sarafraz F, Keane JA, Nenadic G: “Medication Information Extraction with Linguistic Pattern Matching and Semantic Rules”, JAMIA (to appear)

Summary

NLP and text mining techniques are useful for extraction of clinical data

- disease status extraction: 95-97% accuracy - medication information extraction: 80% F-measure

Construction of reliable and sufficient resources - clinical terms and abbreviations (e.g., disease synonyms,

symptoms, drugs) - context patterns related to diseases, medication, etc. Domain knowledge required

construction of domain- and task-specific resources complex clinical facts and conditions for inference

more comprehensive knowledge representation needed

health care special interest-i2b2

Technology

clinical records

clinical discharge summaries

clinical data

clinical notesh

information extraction

clinical narratives

based patternsn negative

black man clinical resources