mining electronic health records for insights

23
Mining Electronic Health Records Go Beyond Ontology Based Text Mining October 15 th 2015 Mining Electronic Health Records #1 07/05/202 2

Upload: ontotext

Post on 07-Apr-2017

543 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Mining Electronic Health Records for Insights

Mining Electronic Health RecordsGo Beyond Ontology Based Text Mining

October 15th 2015

Mining Electronic Health Records #105/02/2023

Page 2: Mining Electronic Health Records for Insights

• Information management company providing text analysis, data management and state-of-the-art semantic technology

• 70 software developers in Sofia, Bulgaria• Presence in London and New York• Clients include BBC, FT, AstraZeneca, DoD, Wiley & Sons• Over 400 person-years in R&D to create a one-stop shop for:

– Content enrichment– Data management – Graph database engine

Ontotext

Mining Electronic Health Records #205/02/2023

Page 3: Mining Electronic Health Records for Insights

Technology Portfolio

Mining Electronic Health Records #305/02/2023

Page 4: Mining Electronic Health Records for Insights

Mining Electronic Health Records #405/02/2023

Clients

Page 5: Mining Electronic Health Records for Insights

Healthcare Insights

Mining Electronic Health Records #505/02/2023

Page 6: Mining Electronic Health Records for Insights

Mining Electronic Health Records #605/02/2023

• An ontology models discrete knowledge domain

• All ontology concepts have a definition

• All ontology concepts have alternative labels

• Where appropriate, ontology concepts have additional labels

• Inference can be applied

Chronic Obstructive Pulmonary Disease

rdf:typeCOPDDisease

skos:prefLabel

skos:altLabel COLD

Shortness of Breath

rdf:type Symptom

hasSymptom

skos:altLabel Chronic Airflow Obstruction

rdf:type Disease

rdfs

:sub

Clas

sOff

Respiratory Disease

Ontology Based IE

Page 7: Mining Electronic Health Records for Insights

Ontology Based IE - problems

Mining Electronic Health Records #705/02/2023

• Does not model a domain completely (both on instance level and labels) Extend ontologies Ontology enrichment via instance mappings

• Labels contain additional qualifying information Definition of literals rewrite and ignore rules

• Labels does not reflect natural language Apply “flexible” gazetteers

• Ambiguity in terminology Pre-filtering Ranking Semantic instance mappings

Page 8: Mining Electronic Health Records for Insights

Vocabulary Enrichment – Semantic Mappings

Mining Electronic Health Records #805/02/2023

J44.9

Chronic obstructive airway disease NOS

Chronic obstructive lung disease NOS

Chronic obstructive pulmonary disease, unspecified

155565006

Chronic obstructive lung disease

Chronic obstructive airways disease NOS

Chronic obstructive lung disease (disorder)

CAFL - Chronic airflow limitation

Chronic irreversible airway obstruction

ICD 10 CM SNOMED CT US

skos:closeMatch

string matching

Page 9: Mining Electronic Health Records for Insights

Ontology Based IE - problems

Mining Electronic Health Records #905/02/2023

• Does not model a domain completely (both on instance level and labels) Extend ontologies Ontology enrichment via instance mappings

• Labels contain additional qualifying information Definition of literals rewrite and ignore rules

• Labels does not reflect natural language Apply “flexible” gazetteers

• Ambiguity in terminology Pre-filtering Ranking Semantic instance mappings

Page 10: Mining Electronic Health Records for Insights

Vocabulary Enrichment – Synonym Enrichment

Mining Electronic Health Records #1005/02/2023

Tumor

Tumour

Abdomen

Abd

Tumor of abdomen

Tumor of abd

Tumour of abdomen

Tumour of abd

Page 11: Mining Electronic Health Records for Insights

Ontology Based IE - problems

Mining Electronic Health Records #1105/02/2023

• Does not model a domain completely (both on instance level and labels) Extend ontologies Ontology enrichment via instance mappings

• Labels contain additional qualifying information Definition of literals rewrite and ignore rules

• Labels does not reflect natural language Apply “flexible” gazetteers

• Ambiguity in terminology Pre-filtering Ranking Semantic instance mappings

Page 12: Mining Electronic Health Records for Insights

Ontology Based IE – example

Mining Electronic Health Records #1205/02/2023

Page 13: Mining Electronic Health Records for Insights

Flexible Gazetteers

Mining Electronic Health Records #1305/02/2023

• Pre-coordinated terms cannot match all natural language terms, especially those used in narrative medical text! Inversions

concept “knee injury” vs. “injury of knee” in text

Gaps due to additional qualifiersconcept “periorbital swelling” vs. “periorbital soft tissue swelling” in text

Page 14: Mining Electronic Health Records for Insights

Detection of negations

Mining Electronic Health Records #1405/02/2023

• The ability to reliably identify negated medical statements in text may significantly affect the quality of the extracted information.

Adverbial Negation

Negations in noun phrase

Prepositional Negation

Adjective Negation

Verb Negation

Page 15: Mining Electronic Health Records for Insights

Temporality Identification

Mining Electronic Health Records #1505/02/2023

• Temporal resolution for events in clinical notes is crucial for an accurate definition of patient history, current medical condition and assigned treatment.

• Identified temporality classes are:HistoricalHypothetical (“Not particular”)Recent

• The temporality data is important to be normalized based on the medical documents meta data (date of report/visit)!

Page 16: Mining Electronic Health Records for Insights

Temporality Identification - Example

Mining Electronic Health Records #1605/02/2023

Page 17: Mining Electronic Health Records for Insights

Post-coordination Patterns

Mining Electronic Health Records #1705/02/2023

• It is impossible to fully describe medical knowledge in term of fully qualified concepts!

• Natural language does not follow the standardized descriptions defined by domain ontologies!

• Concepts must describe basic entities• Entity properties can be described by different

qualifier classes• Patterns can generate new concepts, combining

specific instance and qualifier classes

Page 18: Mining Electronic Health Records for Insights

Post-coordination Patterns - Examples

Mining Electronic Health Records #1805/02/2023

• Example pattern:<disease> or <morphologic abnormality> as right most concept in a noun phrase, preceded by <qualifier> and <body structure>

Page 19: Mining Electronic Health Records for Insights

Data Modeling

Mining Electronic Health Records #1905/02/2023

• Based on normalized data• … but allowing extension with free text• Allow data fusion with background knowledge• Capture all aspects of the extracted information• Tightly coupled with the context• Provide provenance and confidence score• Explorable! Not just searchable

Page 20: Mining Electronic Health Records for Insights

Data provenance: graph <http://linkedlifedata.com/resource/document/CD8672>

Data Modeling

Mining Electronic Health Records #2005/02/2023

rdf:typePatient XYZPatient

malehasGender

hasBirthDate 1956/09/20 xsd:date

hasDiagnosehttp://linkedlifedata.com/resource/icd9cm/157.9

currentDisease

hasStatus

skos:prefLabel Malignant neoplasm of pancreas

rdf:type

Data provenance: graph <http://linkedlifedata.com/resource/document/CN127753>hasTreatment

http://linkedlifedata.com/resource/treatment/DT127753

TreatmenthasDrug

hasDosage

rdf:type

http://linkedlifedata.com/resource/drug/irinotecan180 mg/ 1 m2 for 80 min

Page 21: Mining Electronic Health Records for Insights

Data provenance: graph <http://linkedlifedata.com/resource/drugBroshure/CAMPTOSAR>

Maximum Daily Dosage

Data Modeling – KB

Mining Electronic Health Records #2105/02/2023

http://linkedlifedata.com/resource/drugDosage/DD127753

Dosage

hasMedication

hasPopulationGroup

rdf:type

http://linkedlifedata.com/resource/drug/irinotecanAdult

hasAdministration Routehttp://linkedlifedata.com/resource/route/subcutaneus

hasAdministration Formhttp://linkedlifedata.com/resource/form/injection

http://linkedlifedata.com/resource/icd9cm/157.9hasIndication

hasDosageValue180

hasDosageUnitmg

hasDenominatorValue1

hasDenominatorUnitm2

Page 22: Mining Electronic Health Records for Insights

Semantic Data Exploration and Mining

Mining Electronic Health Records #2205/02/2023

• Build Linked Data out of extracted facts and background knowledge

• Semantic Faceted Search • Cross Entity Search & Exploration• Expert Text Mining Search in pre-annotated

documentsCombine semantic annotations with PoS elements Identify post-coordination patterns Identify relations patternsQuery expansion using background knowledge

Page 23: Mining Electronic Health Records for Insights

• Information Extraction from EHRs is still a challenge!• Making use of the extracted data is even more

challenging • Ontotext provides the technology stack to make it work!

[email protected]

Thank you!

Mining Electronic Health Records #2305/02/2023