open health natural language processing consortium (ohnlp)

37
1 Open Health Natural Language Processing Consortium (OHNLP) Mayo Clinic : Guergana Savova, Ph.D. James Masanz [email protected] IBM Watson Research : Anni Coden, Ph.D. Michael Tanenblatt [email protected]

Upload: harmon

Post on 28-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Open Health Natural Language Processing Consortium (OHNLP). Mayo Clinic : Guergana Savova, Ph.D. James Masanz [email protected] IBM Watson Research : Anni Coden, Ph.D. Michael Tanenblatt [email protected]. Overview. OHNLP? Oh, NLP? Demo of a clinical OHNLP system (cTAKES) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Open Health Natural Language Processing Consortium (OHNLP)

1

Open Health Natural Language Processing Consortium

(OHNLP)

Mayo Clinic:Guergana Savova, Ph.D.

James [email protected]

IBM Watson Research:Anni Coden, Ph.D.Michael Tanenblatt

[email protected]

Page 2: Open Health Natural Language Processing Consortium (OHNLP)

2

Overview

• OHNLP? Oh, NLP?

• Demo of a clinical OHNLP system (cTAKES)

• Demo of a medical OHNLP system (MedKAT) with extensions to pathology (/P)

• How can I adapt the system to my data?

• Lively discussion: how can I get involved, OHNLP future steps…

Page 3: Open Health Natural Language Processing Consortium (OHNLP)

3

Open Health Natural Language Processing Consortium

• www.ohnlp.org (part of caBIG Vocabulary Knowledge Center web presence)

• Goal• Foster an open-source collaborative community around

clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP.

• Two open source releases as part of OHNLP• Mayo’s pipeline for processing clinical notes (cTAKES)

• IBM’s pipeline for processing medical notes (MedKAT) and pathology reports (MedKAT/P)

Page 4: Open Health Natural Language Processing Consortium (OHNLP)

4

Other non-OHNLP clinical NLP Systems

• Proprietary• medLEE (Columbia University)• Topaz (University of Pittsburgh)• Vanderbilt University• caTIES (University of Pittsburgh)• MPLUS/Onyx (University of Utah)• VA Hospital system

• Open Source• i2b2 HITEx (Health Information Text Extraction)

Page 5: Open Health Natural Language Processing Consortium (OHNLP)

5

Clinical example:clinical Text Analysis and

Knowledge Extraction System (cTAKES)

Presenters:Guergana Savova

James Masanz

Page 6: Open Health Natural Language Processing Consortium (OHNLP)

6

Overview• cTAKES

• Developed at Mayo Clinic

• Goals:

• Phenotype extraction

• Generic – to be used for a variety of retrievals and use cases

• Expandable – at the information model level and methods

• Modular

• Cutting edge technologies – best methods combining existing practices and novel research with rapid technology transfer

• Best software practices (80M+ notes)

• Commitment to both R and D in R&D

Page 7: Open Health Natural Language Processing Consortium (OHNLP)

7

cTAKES: Components

• Clinical narrative as a sublanguage

• Core components• Sentence boundary detection (OpenNLP technology)

• Tokenization (rule-based)

• Morphologic normalization (NLM’s LVG)

• POS tagging (OpenNLP technology)

• Shallow parsing (OpenNLP technology)

• Named Entity Recognition• Dictionary mapping (lookup algorithm)

• Machine learning (MAWUI)

• Negation and context identification (NegEx)

Page 8: Open Health Natural Language Processing Consortium (OHNLP)

8

Output Example: Disorder Object

• “No evidence of unstable angina.”

• Disorder• Text: unstable angina

• Associated code: SNOMED 4557003

• Named entity type: disease/disorder

• Status: current

• Negation: true

Page 9: Open Health Natural Language Processing Consortium (OHNLP)

9

Methods

• Preliminary results:

• Savova, Guergana; Kipper-Schuler, Karin; Buntrock, James and Chute, Christopher. 2008. UIMA-based clinical information extraction system. LREC 2008: Towards enhanced interoperability for large HLT systems: UIMA for NLP.

• Manuscript with detailed system description and evaluation under review (JAMIA)

Page 10: Open Health Natural Language Processing Consortium (OHNLP)

10

cTAKES demo

Page 11: Open Health Natural Language Processing Consortium (OHNLP)

11

Medical example:Medical Knowledge Analysis System

MedKAT and MedKAT/P

Presenters:Anni Coden

Michael Tanenblatt

Page 12: Open Health Natural Language Processing Consortium (OHNLP)

12

Overview

• MedKAT and MedKAT/P• Developed at IBM

• Goal:

• Identification of concepts and their attributes based on a standard or proprietary terminology/ontology

• /P adaptation to pathology reports – relation extraction

• Modular, Generic, Expandable

• Terminology, Conceptual Model

• Easy adaptation to specific corpus and conventions

• Integration into institutional system

• Ongoing commitment to Research and Development

Page 13: Open Health Natural Language Processing Consortium (OHNLP)

13

Core Components

• Document structure

• Syntactic tools (tokenization ... Shallow parsing)

• Concept identification

• Negation

• Relationship extraction

Extracted data F-scoreAnatomic site 0.95Histology 0.98Size 1.00Date 1.00Grade 0.98Gross Desc 0.80Lymph Nodes 0.81Primary Tumor 0.82Metastatic Tumor 0.65

Page 14: Open Health Natural Language Processing Consortium (OHNLP)

14

Document Structure

16

Page 15: Open Health Natural Language Processing Consortium (OHNLP)

15

Document Structure

17

Page 16: Open Health Natural Language Processing Consortium (OHNLP)

16

Document Structure

18

Page 17: Open Health Natural Language Processing Consortium (OHNLP)

17

Output

Page 18: Open Health Natural Language Processing Consortium (OHNLP)

18

Cancer Disease Knowledge

Representation Model

Page 19: Open Health Natural Language Processing Consortium (OHNLP)

19

Demos

• Query by Model / Cancer

• Detailed view of annotations in Document Analyzer

• http://domino.research.ibm.com/comm/research_projects.nsf/pages/medicalinformatics.index.html

Page 20: Open Health Natural Language Processing Consortium (OHNLP)

20

Adaptation

Presenters:Anni Coden

Michael Tanenblatt

Page 21: Open Health Natural Language Processing Consortium (OHNLP)

21

Adaptation

• Sentence breaks

• Text case

• Part of speech tags

• Shallow parser

• Dictionary lookup

• Document structure

Page 22: Open Health Natural Language Processing Consortium (OHNLP)

22

Sentence Breaks

Page 23: Open Health Natural Language Processing Consortium (OHNLP)

23

Sentence Breaks

• Some solutions:• Use annotator to re-break sentences• Retrain tagger

Page 24: Open Health Natural Language Processing Consortium (OHNLP)

24

Case/Part of Speech Tags

Page 25: Open Health Natural Language Processing Consortium (OHNLP)

25

Case/Part of Speech Tags

• Some solutions:• Retrain tagger• Use UIMA annotator to create a “true

case” view

Page 26: Open Health Natural Language Processing Consortium (OHNLP)

26

Part of Speech Tags

Page 27: Open Health Natural Language Processing Consortium (OHNLP)

27

Part of Speech Tags

• Some solutions:• Retrain tagger• Use dictionary lookup to modify

incorrect tags• Create rule-based annotator to

modify incorrect tags

Page 28: Open Health Natural Language Processing Consortium (OHNLP)

28

Shallow Parser

Page 29: Open Health Natural Language Processing Consortium (OHNLP)

29

Shallow Parser

31

Page 30: Open Health Natural Language Processing Consortium (OHNLP)

30

Shallow Parser

32

Page 31: Open Health Natural Language Processing Consortium (OHNLP)

31

Dictionary Lookup

• Dictionary entries can be added, changed, deleted

• Dictionary entry attributes can be added, changed, deleted

• Search parameters can be modified

• Post processing filters

• Tokenization of text and dictionary should be the same

Page 32: Open Health Natural Language Processing Consortium (OHNLP)

32

Document Structure

• Plain text or XML (e.g., CDA)

• Processes specific document section types (e.g., diagnosis)

• Detection of formatting (e.g. bullets)

• Detection of relations between sections

• Making implicit conventions explicit (e.g. meaning of title)

Page 33: Open Health Natural Language Processing Consortium (OHNLP)

Discussion: Future of OHNLP.ORG

• Provided seed annotators and tools

• Goal: growing community• Annotators, tools• Methodologies• Gold standards

• Common type system for plug-and-play

• What are the hurdles?

Page 34: Open Health Natural Language Processing Consortium (OHNLP)

34

Hands-on Customization

Page 35: Open Health Natural Language Processing Consortium (OHNLP)

35

MedKAT

• Dictionary adaptation

• Concept identification parameters

• Document structure detection

Page 36: Open Health Natural Language Processing Consortium (OHNLP)

36

cTAKES

• Negation window

• Lookup window

• Dictionary modifications

Page 37: Open Health Natural Language Processing Consortium (OHNLP)

37

Questions?