health care special interest-i2b2

13
Extracting information from clinical notes H. Yang, I. Spasic, F. Sarafraz, John A. Keane, Goran Nenadic School of Computer Science University of Manchester

Upload: farzanehs

Post on 14-Jun-2015

569 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Health care  special interest-i2b2

Extracting information from clinical notes

H. Yang, I. Spasic, F. Sarafraz,John A. Keane, Goran Nenadic

School of Computer ScienceUniversity of Manchester

Page 2: Health care  special interest-i2b2

Electronic clinical notes electronic medical/health records hospital discharge summaries

Extract information on individual patients and their diseases clinical practice

treatments, drugs used, etc.

Aim: support data analytics e.g. monitoring quality

Huge interest locally and internationally

Motivation & aim

Page 3: Health care  special interest-i2b2

Clinical notes Highly condensed text

sometimes without proper sentences hospital discharge summaries are more structured list of medications, symptoms, etc.

Terminological variability orthographic, acronyms, local conventions

Various sections previous history, social/family background

Page 4: Health care  special interest-i2b2
Page 5: Health care  special interest-i2b2

NLP challenges in clinical data A series of international challenges in information

extraction from clinical narratives organisers: Informatics for Integrating Biology & the

Bedside (i2b2)

3 shared tasks so far− De-identification of medical records and identification of

smokers from their clinical records (2007)Identification of obesity & related diseases in patients from hospital discharge documents (2008)Extraction of medications and related information from patients’ discharge documents (2009)

2010 challenge concept, assertions, relations

Page 6: Health care  special interest-i2b2

i2b2 2008 Extract status of diseases in patients

obesity, diabetes mellitus, hypercholesterolemia, hypertriglyceridemia, hypertension, heart failure (16 in total)

status: yes, no, unmentioned, questionable on textual and “intuitive” level

28 teams worldwide UoM ranked 1st in textual and 7th in intuitive

Our methodology Term-based exact and approximate matching Context-based pattern- and rule-based matching Machine learning approach

Yang, H., Spasic, I., Keane, J., Nenadic, G.: A Text Mining Approach to the Prediction of a Disease Status from Clinical Discharge Summaries, JAMIA 16(4):596-600

Page 7: Health care  special interest-i2b2

Methodology

Linguisticpre-processing

Informationextraction

(rules, machinelearning)

Medicalresources

Constructingresults

section splitting, sentence splitting,chunking, POS tagging, parsing

textual evidence extraction, section filtering, morphological clues (e.g. drug/disease name affixes)

Template filling, filtering negative results, relations and heuristics:

Organ : Symptom,Symptom : Disease,Disease : Drug, Drug : Mode of application

•Disease names•Drug names•Body parts•Symptoms•Abbreviations•Synonyms

Page 8: Health care  special interest-i2b2

Rule-based IE Disease status patterns - context-based patterns

[N] negative for CHF[Q] question of asthma[U] no known diagnosis of CAD[U] we should consider further asthma studies as an outpatient

- semantics-based patterns[N] normal coronaries, a thin black man

Clinical resources used in sentence extraction clinical inference rules e.g., weight>90kg,

LDL>160mg/dl, HDL<35mg/dl medications e.g., ‘anti-depressant’

Page 9: Health care  special interest-i2b2

Textual Annotation Results

Performance on Disease Status (Ranked 1st)Micro-average: Accuracy (0.9723)Macro-average: P (0.8482), R (0.7737), F-score (0.8052)

#Eval #Corr #Gold Precision Recall F-score

Y 2267 2132 2192 0.9404 0.9726 0.9562

N 56 40 65 0.7142 0.6153 0.6611

Q 12 9 17 0.7500 0.5294 0.6206

U 5709 5640 5770 0.9879 0.9774 0.9826

Page 10: Health care  special interest-i2b2

Intuitive Annotation Results

Performance on Disease Status (Ranked 7th)Micro-average: Accuracy (0.9572)Macro-average: P (0.6383), R (0.6294), F-score (0.6336)

#Eval #Corr #Gold Precision Recall F-Score

Y 2160 2068 2285 0.9574 0.9050 0.9304

N 5236 5014 5100 0.9576 0.9831 0.9702

Q 3 0 14 0 0 0

Page 11: Health care  special interest-i2b2

i2b2 2009 Extract mentions of medication and related

information drugs the patient takes dose, mode of application, frequency, duration, etc.

(for each mention)

19 teams worldwide UoM ranked 3rd

Our approach was based on combining extensive dictionaries morphological and derivational patterns

Page 12: Health care  special interest-i2b2

Evaluation (F-measure)

Medication 83.59%

Dosage 82.67%

Frequency 83.49%

Mode 85.33%

Duration 51.00%

Reason 38.81%

All fields 78.47%

Spasić I, Sarafraz F, Keane JA, Nenadic G: “Medication Information Extraction with Linguistic Pattern Matching and Semantic Rules”, JAMIA (to appear)

Page 13: Health care  special interest-i2b2

Summary

NLP and text mining techniques are useful for extraction of clinical data

- disease status extraction: 95-97% accuracy - medication information extraction: 80% F-measure

Construction of reliable and sufficient resources - clinical terms and abbreviations (e.g., disease synonyms,

symptoms, drugs) - context patterns related to diseases, medication, etc. Domain knowledge required

construction of domain- and task-specific resources complex clinical facts and conditions for inference

more comprehensive knowledge representation needed