iebi workshop-10/23/07 challenges in evaluating natural language processing systems for military...

38
IEBI Workshop- 10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE Applications Technologies Lawrence Fagan, MD, PhD Stanford University/MedLEE Applications Technologies

Post on 21-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Challenges in Evaluating NaturalLanguage Processing Systems

for Military Health Records Carol Friedman, PhD

Columbia University/MedLEE Applications Technologies

Lawrence Fagan, MD, PhDStanford University/MedLEE Applications

Technologies

Page 2: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Outline

• NLP evaluation issues

• Ideal evaluation of NLP output requires consideration of the context of the applications

• Catalog of common NLP applications in biomedicine and the implication for evaluation

Page 3: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Outline

• NLP evaluation issues

• Ideal evaluation of NLP output requires consideration of the context of the applications

• Catalog of common NLP applications in biomedicine and the implication for evaluation

Page 4: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Different Evaluation Objectives

• Different NLP communities have different objectives and traditions

Improvement of:– Science of NLP– Science of biomedical NLP – Biological research– Clinical research – Clinical care

Page 5: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Evaluation Objectives Determine

• Evaluation design

• NLP requirements– Type of information needed

• Medical terms with/without modifiers • Clinical & other external knowledge

– End product • Codes, facts, yes/no categories

Page 6: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Evaluation to ImproveClinical Research and Care

Issues to Consider

Page 7: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

• Need to start with a concrete clinical goal– Detect potential case of tuberculosis in

chest x-ray report for isolation– Detect positive mammography reports for

follow up– Find new adverse events to find ways to

avoid them

Page 8: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Type of Task:Broad vs. Narrow

• Very specific application– Identify reports of patients who smoke– Identify x-ray reports positive for pneumonia

• General application – Data mining & knowledge discovery– Generate patient problem list

Page 9: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

• Structural knowledge– Extract diagnoses from Diagnosis Section of

Discharge Summaries

• Coding knowledge– ICD-9 coding of x-ray reports for billing

• Clinical knowledge– Identifying x-ray reports indicating pneumonia

• ~ 38 different combinations of findings & modifiers

Application Requires NLP + External Knowledge

486 (pneumonia)for infiltrate in cxr

Page 10: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

• Different steps of process impact results

NLP Components

Preprocess Extraction Engine

Post-process

Clean-up, recognize text portions and boundaries, …

Recognize entities, relations, generate codes, …

Clinical logic for application

NLP Components

CXR Findingsopacity mod: patchy loc: left lung ........

Pneumonia: possible

Page 11: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Preprocess Extraction Engine

Post-process

Clean-up, recognize text portions and boundaries, …

Recognize entities, relations, generate codes, …

Clinical logic for application

NLP Components

CXR Findingsopacity mod: 5x5cm loc: left lung ........

Pneumonia: unlikely

Page 12: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Use of Experts • Need guidelines and examples

• How much to train

• Inter-annotator agreement & resolution

• Borderline cases confound results

• Granularity issues– Comparability

Fever (mod: persistent)Persistent feverSNOMED codes: persistent fever chronic persistent fever prolonged fever fever (mod: persistent)

Page 13: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

• Chief complaints (‘well baby 3 mo’, ‘c/f/h’)

• Discharge summaries, radiology reports

• Reports with structured & unstructured information

• Telegraphic notes

• Special templates

Document Heterogeneity & Complexity of Text

Page 14: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

“Well-Structured” Reports:Chest Radiology Report

CLINICAL INFORMATION:F/U. IMPRESSION:MODERATE PULMONARY VASCULAR CONGESTION AND

INTERSTITIAL EDEMA SHOWS NO SIGNIFICANT CHANGE FROM 3/25 THROUGH 3/27/95. SIDE HOLE OF THE NG TUBE IS NEAR THE EG JUNCTION. DEVELOPMENT OF RIGHT BASILAR ATELECTASIS ON 3/27/95.

DESCRIPTION:A series of portable chest x-rays demonstrate worsening but stable

vascular congestion and interstitial edema from 3/25 through 3/27/95. The NG tube side hole is seen near the EG junction. A duo- tube is seen extending into the stomach, but its distal tip is not seen. A tracheostomy is seen in good position.

……………………………………………………..

Page 15: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Mixed Structure: Catheterization Report

Page 16: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Admit 10/2371 yo woman h/o DM, HTN, Dilated CM/CHF, Afib s/p embolic event, chronic diarrhea, admitted with SOB. CXR pulm edema. Rx’d Lasix.All: noneMeds Lasix 40mg IVP bid, ASA, Coumadin 5, Prinivil 10, glucophage 850 bid, glipizide 10 bid, immodium prnHospitalist=Smith PMD=Jones Full Code, Cx>101

Poorly Structured Report: Telegraphic Note

Page 17: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Reducing Potential Bias

NLP developers should avoid

– Designing study – Being involved in choice or determination

of reference standard– Correcting bugs– Changing system– Performing actual evaluation

Page 18: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Analyzing Results & Errors• Determine effect of components on performance

– NLP vs. domain knowledge– Document characteristics/quirks– Frequency of adding/updating clinical terms– Type of NLP task: classification/information

extraction/specialized– Borderline situations

• Report degree of complexity needed to correct errors

• Determine if performance is adequate for task• Report on confidence intervals

Page 19: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Other Issues:Clinical Environment

• Heterogeneity– Systems– Document formats– Document types– Clinical Domain

• Working with physicians• Clinical evaluation tradition• Workflow issues

Page 20: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Patient Documents

• Lack of access to patient records– Significant bottleneck for NLP progress

• Difficult to get permission to share from health care institutions

• Large scale effort needed to establish scrubbed document sets for development and evaluation

• Individual efforts beneficial but limited and scattered

Page 21: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Outline

• NLP Evaluation Issues

• Ideal evaluation of NLP output requires consideration of the context of the applications

• Catalog of common NLP applications in biomedicine and the implication for evaluation

Page 22: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Context-based Evaluation: Example Record

• Chief Complaint: Asthma re-evaluation.• Subjective: 8 year-old girl with past history of moderate

persistent asthma while living in Alaska until 2 years ago

• The primary triggers for her asthma have been viral colds and irritant exposure, and she had particular difficulty with the forest fire smoke in central Alaska.

• She also has a history of a low serum IgA. Her last IgA determination was August 2004, which showed an IgA level of 29 mg/dl, with the lower limit of normal for a child her age being 33.

Page 23: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Context-based Evaluation

• Chief Complaint: Asthma re-evaluation.• Subjective: 8 year-old girl with past history of

moderate persistent asthma while living in Alaska until 2 years ago

• Tasks: Disease Maintenance Summarization• vs. Infectious Disease Reporting

Page 24: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Context-based Evaluation

• Chief Complaint: Asthma re-evaluation.• Subjective: 8 year-old girl with past history of

moderate persistent asthma while living in Alaska until 2 years ago

• Tasks: Disease Maintenance Summarization• vs. Infectious Disease Reporting

Page 25: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Context-based Evaluation

• Chief Complaint: Asthma re-evaluation.• …• The primary triggers for her asthma have been viral

colds and irritant exposure, and she had particular difficulty with the forest fire smoke in central Alaska.

• …• Tasks: Disease Maintenance Summarization• vs. Infectious Disease Reporting

Page 26: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Context-based Evaluation

• Chief Complaint: Asthma re-evaluation.• …• The primary triggers for her asthma have been viral

colds and irritant exposure, and she had particular difficulty with the forest fire smoke in central Alaska.

• …• Tasks: Disease Maintenance Summarization• vs. Infectious Disease Reporting

Page 27: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Context-based Evaluation

• Chief Complaint: Asthma re-evaluation.• …• She also has a history of a low serum IgA. Her last IgA

determination was August 2004, which showed an IgA level of 29 mg/dl, with the lower limit of normal for a child her age being 33.

• Task: Disease Maintenance Summarization• vs. Infectious Disease Reporting

Page 28: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Context-based Evaluation

• Chief Complaint: Asthma re-evaluation.• …• She also has a history of a low serum IgA. Her last IgA

determination was August 2004, which showed an IgA level of 29 mg/dl, with the lower limit of normal for a child her age being 33.

• Task: Disease Maintenance Summarization• vs. Infectious Disease Reporting

Page 29: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Outline

• NLP evaluation issues

• Ideal evaluation of NLP output requires consideration of the context of the applications

• Catalog of common NLP applications in biomedicine and the implication for evaluation

Page 30: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Potential NLP Applications

• Health reporting requirements• Known disease surveillance• Unknown disease surveillance• Recognizing adverse drug reaction• Quality assurance/avoiding clinical errors• Charge capture• Recognizing scientific relations in text databases

Page 31: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Health Reporting Requirements

• Example: Reporting new TB cases• Task description: Governmental

requirements that certain disease states must be identified within a period after the original information (typically diagnosis) is identified.

• Task requirements: Text may be confined to one or more sections of record. May require inference to identify disease state. May be easier to get the “right” answer than other apps.

Page 32: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Known Disease Surveillance

• Example: Locating Hospital Acquired (nosocomial) infections

• Task description: Looking at a set of fixed reports for specific findings or combination of findings that suggest disease state

• Task requirements: Need to combine free text with structured text such as lab reports, and existing codes (e.g., ICD-9 coding on discharge)

Page 33: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

“Unknown” Disease Surveillance

• Example: Looking for the next “gulf war syndrome.”

• Task description: By far, the most difficult task because it is not clear what is being searched for. Looking for a pattern of signs, symptoms, lab tests, time course, etc, not explained by known patterns

• Task requirements: Every concept is potentially relevant plus need significant inference to determine novelty of problem.

Page 34: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Recognizing Adverse Drug Reactions

• Example: Searching for known (and possibly unknown) side effects of treatments

• Task description: Side effect profiles are known for many drugs/regimens. Early recognition of onset of those side effects important to decreasing morbidity

• Task requirements: Temporal relationship between treatment and possible side effects important to glean from narrative.

Page 35: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Quality Assurance/Avoiding Clinical Errors

• Example: Flagging contra-indicated treatments due to a drug allergy

• Task description: Extract from narrative signs/symptoms/lab tests that suggest unanticipated response to prior treatment.

• Task requirements: combining concepts from narrative with structured parts of records and comparing to guidelines/protocols

Page 36: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Charge Capture

• Example: Locating clinic/hospital charges that have not been otherwise captured

• Task description: Scan narrative for suggestion of procedures performed or supplies used that have not been billed

• Task requirements: Inferring actions from narrative and comparing with billing codes. Concepts are well defined and can be enumerated.

Page 37: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Recognizing scientific relations in text databases

• Example: Finding protein-protein interactions in pubmed database

• Task description: Scan abstracts to identify protein names and description of relationships

• Task requirements: Requires understanding of naming schemes in biology and ability to handle naming issues. Inference to identify correctly the relationship described in the text

Page 38: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE

IEBI Workshop-10/23/07

Summary

• Overview of evaluation issues• Key point: evaluation requires

consideration of the context of the applications

• Catalog of common NLP applications in biomedicine and the implication for evaluation