![Page 1: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/1.jpg)
IEBI Workshop-10/23/07
Challenges in Evaluating NaturalLanguage Processing Systems
for Military Health Records Carol Friedman, PhD
Columbia University/MedLEE Applications Technologies
Lawrence Fagan, MD, PhDStanford University/MedLEE Applications
Technologies
![Page 2: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/2.jpg)
IEBI Workshop-10/23/07
Outline
• NLP evaluation issues
• Ideal evaluation of NLP output requires consideration of the context of the applications
• Catalog of common NLP applications in biomedicine and the implication for evaluation
![Page 3: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/3.jpg)
IEBI Workshop-10/23/07
Outline
• NLP evaluation issues
• Ideal evaluation of NLP output requires consideration of the context of the applications
• Catalog of common NLP applications in biomedicine and the implication for evaluation
![Page 4: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/4.jpg)
IEBI Workshop-10/23/07
Different Evaluation Objectives
• Different NLP communities have different objectives and traditions
Improvement of:– Science of NLP– Science of biomedical NLP – Biological research– Clinical research – Clinical care
![Page 5: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/5.jpg)
IEBI Workshop-10/23/07
Evaluation Objectives Determine
• Evaluation design
• NLP requirements– Type of information needed
• Medical terms with/without modifiers • Clinical & other external knowledge
– End product • Codes, facts, yes/no categories
![Page 6: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/6.jpg)
IEBI Workshop-10/23/07
Evaluation to ImproveClinical Research and Care
Issues to Consider
![Page 7: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/7.jpg)
IEBI Workshop-10/23/07
• Need to start with a concrete clinical goal– Detect potential case of tuberculosis in
chest x-ray report for isolation– Detect positive mammography reports for
follow up– Find new adverse events to find ways to
avoid them
![Page 8: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/8.jpg)
IEBI Workshop-10/23/07
Type of Task:Broad vs. Narrow
• Very specific application– Identify reports of patients who smoke– Identify x-ray reports positive for pneumonia
• General application – Data mining & knowledge discovery– Generate patient problem list
![Page 9: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/9.jpg)
IEBI Workshop-10/23/07
• Structural knowledge– Extract diagnoses from Diagnosis Section of
Discharge Summaries
• Coding knowledge– ICD-9 coding of x-ray reports for billing
• Clinical knowledge– Identifying x-ray reports indicating pneumonia
• ~ 38 different combinations of findings & modifiers
Application Requires NLP + External Knowledge
486 (pneumonia)for infiltrate in cxr
![Page 10: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/10.jpg)
IEBI Workshop-10/23/07
• Different steps of process impact results
NLP Components
Preprocess Extraction Engine
Post-process
Clean-up, recognize text portions and boundaries, …
Recognize entities, relations, generate codes, …
Clinical logic for application
NLP Components
CXR Findingsopacity mod: patchy loc: left lung ........
Pneumonia: possible
![Page 11: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/11.jpg)
IEBI Workshop-10/23/07
Preprocess Extraction Engine
Post-process
Clean-up, recognize text portions and boundaries, …
Recognize entities, relations, generate codes, …
Clinical logic for application
NLP Components
CXR Findingsopacity mod: 5x5cm loc: left lung ........
Pneumonia: unlikely
![Page 12: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/12.jpg)
IEBI Workshop-10/23/07
Use of Experts • Need guidelines and examples
• How much to train
• Inter-annotator agreement & resolution
• Borderline cases confound results
• Granularity issues– Comparability
Fever (mod: persistent)Persistent feverSNOMED codes: persistent fever chronic persistent fever prolonged fever fever (mod: persistent)
![Page 13: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/13.jpg)
IEBI Workshop-10/23/07
• Chief complaints (‘well baby 3 mo’, ‘c/f/h’)
• Discharge summaries, radiology reports
• Reports with structured & unstructured information
• Telegraphic notes
• Special templates
Document Heterogeneity & Complexity of Text
![Page 14: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/14.jpg)
IEBI Workshop-10/23/07
“Well-Structured” Reports:Chest Radiology Report
CLINICAL INFORMATION:F/U. IMPRESSION:MODERATE PULMONARY VASCULAR CONGESTION AND
INTERSTITIAL EDEMA SHOWS NO SIGNIFICANT CHANGE FROM 3/25 THROUGH 3/27/95. SIDE HOLE OF THE NG TUBE IS NEAR THE EG JUNCTION. DEVELOPMENT OF RIGHT BASILAR ATELECTASIS ON 3/27/95.
DESCRIPTION:A series of portable chest x-rays demonstrate worsening but stable
vascular congestion and interstitial edema from 3/25 through 3/27/95. The NG tube side hole is seen near the EG junction. A duo- tube is seen extending into the stomach, but its distal tip is not seen. A tracheostomy is seen in good position.
……………………………………………………..
![Page 15: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/15.jpg)
IEBI Workshop-10/23/07
Mixed Structure: Catheterization Report
![Page 16: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/16.jpg)
IEBI Workshop-10/23/07
Admit 10/2371 yo woman h/o DM, HTN, Dilated CM/CHF, Afib s/p embolic event, chronic diarrhea, admitted with SOB. CXR pulm edema. Rx’d Lasix.All: noneMeds Lasix 40mg IVP bid, ASA, Coumadin 5, Prinivil 10, glucophage 850 bid, glipizide 10 bid, immodium prnHospitalist=Smith PMD=Jones Full Code, Cx>101
Poorly Structured Report: Telegraphic Note
![Page 17: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/17.jpg)
IEBI Workshop-10/23/07
Reducing Potential Bias
NLP developers should avoid
– Designing study – Being involved in choice or determination
of reference standard– Correcting bugs– Changing system– Performing actual evaluation
![Page 18: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/18.jpg)
IEBI Workshop-10/23/07
Analyzing Results & Errors• Determine effect of components on performance
– NLP vs. domain knowledge– Document characteristics/quirks– Frequency of adding/updating clinical terms– Type of NLP task: classification/information
extraction/specialized– Borderline situations
• Report degree of complexity needed to correct errors
• Determine if performance is adequate for task• Report on confidence intervals
![Page 19: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/19.jpg)
IEBI Workshop-10/23/07
Other Issues:Clinical Environment
• Heterogeneity– Systems– Document formats– Document types– Clinical Domain
• Working with physicians• Clinical evaluation tradition• Workflow issues
![Page 20: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/20.jpg)
IEBI Workshop-10/23/07
Patient Documents
• Lack of access to patient records– Significant bottleneck for NLP progress
• Difficult to get permission to share from health care institutions
• Large scale effort needed to establish scrubbed document sets for development and evaluation
• Individual efforts beneficial but limited and scattered
![Page 21: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/21.jpg)
IEBI Workshop-10/23/07
Outline
• NLP Evaluation Issues
• Ideal evaluation of NLP output requires consideration of the context of the applications
• Catalog of common NLP applications in biomedicine and the implication for evaluation
![Page 22: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/22.jpg)
IEBI Workshop-10/23/07
Context-based Evaluation: Example Record
• Chief Complaint: Asthma re-evaluation.• Subjective: 8 year-old girl with past history of moderate
persistent asthma while living in Alaska until 2 years ago
• The primary triggers for her asthma have been viral colds and irritant exposure, and she had particular difficulty with the forest fire smoke in central Alaska.
• She also has a history of a low serum IgA. Her last IgA determination was August 2004, which showed an IgA level of 29 mg/dl, with the lower limit of normal for a child her age being 33.
![Page 23: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/23.jpg)
IEBI Workshop-10/23/07
Context-based Evaluation
• Chief Complaint: Asthma re-evaluation.• Subjective: 8 year-old girl with past history of
moderate persistent asthma while living in Alaska until 2 years ago
• Tasks: Disease Maintenance Summarization• vs. Infectious Disease Reporting
![Page 24: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/24.jpg)
IEBI Workshop-10/23/07
Context-based Evaluation
• Chief Complaint: Asthma re-evaluation.• Subjective: 8 year-old girl with past history of
moderate persistent asthma while living in Alaska until 2 years ago
• Tasks: Disease Maintenance Summarization• vs. Infectious Disease Reporting
![Page 25: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/25.jpg)
IEBI Workshop-10/23/07
Context-based Evaluation
• Chief Complaint: Asthma re-evaluation.• …• The primary triggers for her asthma have been viral
colds and irritant exposure, and she had particular difficulty with the forest fire smoke in central Alaska.
• …• Tasks: Disease Maintenance Summarization• vs. Infectious Disease Reporting
![Page 26: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/26.jpg)
IEBI Workshop-10/23/07
Context-based Evaluation
• Chief Complaint: Asthma re-evaluation.• …• The primary triggers for her asthma have been viral
colds and irritant exposure, and she had particular difficulty with the forest fire smoke in central Alaska.
• …• Tasks: Disease Maintenance Summarization• vs. Infectious Disease Reporting
![Page 27: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/27.jpg)
IEBI Workshop-10/23/07
Context-based Evaluation
• Chief Complaint: Asthma re-evaluation.• …• She also has a history of a low serum IgA. Her last IgA
determination was August 2004, which showed an IgA level of 29 mg/dl, with the lower limit of normal for a child her age being 33.
• Task: Disease Maintenance Summarization• vs. Infectious Disease Reporting
![Page 28: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/28.jpg)
IEBI Workshop-10/23/07
Context-based Evaluation
• Chief Complaint: Asthma re-evaluation.• …• She also has a history of a low serum IgA. Her last IgA
determination was August 2004, which showed an IgA level of 29 mg/dl, with the lower limit of normal for a child her age being 33.
• Task: Disease Maintenance Summarization• vs. Infectious Disease Reporting
![Page 29: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/29.jpg)
IEBI Workshop-10/23/07
Outline
• NLP evaluation issues
• Ideal evaluation of NLP output requires consideration of the context of the applications
• Catalog of common NLP applications in biomedicine and the implication for evaluation
![Page 30: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/30.jpg)
IEBI Workshop-10/23/07
Potential NLP Applications
• Health reporting requirements• Known disease surveillance• Unknown disease surveillance• Recognizing adverse drug reaction• Quality assurance/avoiding clinical errors• Charge capture• Recognizing scientific relations in text databases
![Page 31: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/31.jpg)
IEBI Workshop-10/23/07
Health Reporting Requirements
• Example: Reporting new TB cases• Task description: Governmental
requirements that certain disease states must be identified within a period after the original information (typically diagnosis) is identified.
• Task requirements: Text may be confined to one or more sections of record. May require inference to identify disease state. May be easier to get the “right” answer than other apps.
![Page 32: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/32.jpg)
IEBI Workshop-10/23/07
Known Disease Surveillance
• Example: Locating Hospital Acquired (nosocomial) infections
• Task description: Looking at a set of fixed reports for specific findings or combination of findings that suggest disease state
• Task requirements: Need to combine free text with structured text such as lab reports, and existing codes (e.g., ICD-9 coding on discharge)
![Page 33: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/33.jpg)
IEBI Workshop-10/23/07
“Unknown” Disease Surveillance
• Example: Looking for the next “gulf war syndrome.”
• Task description: By far, the most difficult task because it is not clear what is being searched for. Looking for a pattern of signs, symptoms, lab tests, time course, etc, not explained by known patterns
• Task requirements: Every concept is potentially relevant plus need significant inference to determine novelty of problem.
![Page 34: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/34.jpg)
IEBI Workshop-10/23/07
Recognizing Adverse Drug Reactions
• Example: Searching for known (and possibly unknown) side effects of treatments
• Task description: Side effect profiles are known for many drugs/regimens. Early recognition of onset of those side effects important to decreasing morbidity
• Task requirements: Temporal relationship between treatment and possible side effects important to glean from narrative.
![Page 35: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/35.jpg)
IEBI Workshop-10/23/07
Quality Assurance/Avoiding Clinical Errors
• Example: Flagging contra-indicated treatments due to a drug allergy
• Task description: Extract from narrative signs/symptoms/lab tests that suggest unanticipated response to prior treatment.
• Task requirements: combining concepts from narrative with structured parts of records and comparing to guidelines/protocols
![Page 36: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/36.jpg)
IEBI Workshop-10/23/07
Charge Capture
• Example: Locating clinic/hospital charges that have not been otherwise captured
• Task description: Scan narrative for suggestion of procedures performed or supplies used that have not been billed
• Task requirements: Inferring actions from narrative and comparing with billing codes. Concepts are well defined and can be enumerated.
![Page 37: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/37.jpg)
IEBI Workshop-10/23/07
Recognizing scientific relations in text databases
• Example: Finding protein-protein interactions in pubmed database
• Task description: Scan abstracts to identify protein names and description of relationships
• Task requirements: Requires understanding of naming schemes in biology and ability to handle naming issues. Inference to identify correctly the relationship described in the text
![Page 38: IEBI Workshop-10/23/07 Challenges in Evaluating Natural Language Processing Systems for Military Health Records Carol Friedman, PhD Columbia University/MedLEE](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d5a5503460f94a3ac78/html5/thumbnails/38.jpg)
IEBI Workshop-10/23/07
Summary
• Overview of evaluation issues• Key point: evaluation requires
consideration of the context of the applications
• Catalog of common NLP applications in biomedicine and the implication for evaluation