festival of genomics 2016 london: mining and processing of unstructured medical data

8
Mining and Processing of Unstructured Medical Data Cindy Perscheid Festival of Genomics London, Jan 19, 2016

Upload: matthieu-schapranow

Post on 09-Jan-2017

425 views

Category:

Technology


0 download

TRANSCRIPT

Mining and Processing of Unstructured Medical Data

Cindy Perscheid

Festival of Genomics

London, Jan 19, 2016

■  Doctor‘s and discharge letters

■  Clinical trial descriptions

■  Scientific publications

Unstructured Medical Data Information Hidden in Text

Perscheid, Schapranow

Processing of Unstructured Medical Data

Chart 2

■  Huge amount of data: Pubmed with references to +25 Million articles

■  Restricted querying: Keyword search

■  Multilingual

Unstructured Medical Data Challenges and Limitations

Perscheid, Schapranow

Processing of Unstructured Medical Data

Chart 3

[Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ... [Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ...

■  Named Entity Recognition: Identify keywords

■  Part-Of-Speech Tagging: Identify grammatical function of words

■  Parsing: Identify sentence structure and components

□  Chunking: Combine words and POS tags to chunks

□  Relation Extraction: Identify relations between sentence parts

■  Semantic Role Labeling: Identify specific roles in sentence

■  …

Natural Language Processing Selected Methods

Perscheid, Schapranow

Processing of Unstructured Medical Data

Noun Noun Noun

Disease

Preposition

Person

Adjective

Chart 4

Noun

■  IMDB provides text analysis features, e.g.

□  Fulltext indexing

□  Entity Recognition

□  Tokenization/Chunking

□  Fuzzy search

■  Mechanisms can be made domain-specific by specifying

□  Dictionaries

□  CGUL rules containing regular expressions with linguistic attributes

Outlook IMDB Textual Analysis Features

T Text Retrieval and Extraction

Multi-Core and Parallelization

Reduction of Layers

x x

Perscheid, Schapranow

Processing of Unstructured Medical Data

Chart 5

?

Natural Language Processing Applications

Perscheid, Schapranow

Processing of Unstructured Medical Data

Chart 6 Hello Bonjour

Text Summarization

Question Answering Systems

Machine Translation

Information Retrieval and Extraction

Doctor‘s Letter Explanation

major depression

What disease is mirtazapine

predominantly used for?

?

■  In short: Slow tools, wrong results

□  Too hard: Natural language is complex

□  Too much data: >25 Million papers in PubMed…

Application Example: Question Answering Still a lot to Improve…

Perscheid, Schapranow

Processing of Unstructured Medical Data

Credit: Dr. Mariana Neves, Hasso Plattner Institute

Chart 7

Thanks!

Hasso Plattner Institute Enterprise Platform & Integration Concepts

August-Bebel-Str. 88 14482 Potsdam, Germany

Dr. Matthieu-P. Schapranow [email protected]

http://we.analyzegenomes.com/

Cindy Perscheid, M. Sc. [email protected]

Perscheid, Schapranow

Processing of Unstructured Medical Data

Chart 8