© io informatics, inc. 2015 c hallenges, r isks and o pportunities f or s emantic i nteroperability...
TRANSCRIPT
© IO Informatics, Inc. 2015
CHALLENGES, RISKS AND OPPORTUNITIES FOR
SEMANTIC INTEROPERABILITY WITH TRANSMART
GIVE MEANING TO YOUR DATA
Authors: Robert Stanley, CEO (Presenter)Dr. Jason Eshleman, Director of Informatics
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
CHALLENGES, RISKS, OPPORTUNITIES WITH TRANSMART
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
• Time, effort required to manually curate and load data
• Understanding, complying with, extending tranSMART schema
• Avoiding lost security, provenance, context, curation decisions
• Need for automation, alerting, scale for enterprise applications
CHALLENGES FOR ETL AND INTEROPERABILITY:
“GETTING DATA IN”
PD Study
Alz StudytMLoader
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
• “Getting the data in” is not enough
• Lexical matching is not enough
• Need to navigate across datasets to identify / harmonize to common identifiers, terms, relationships
• Barriers to machine-supported use of standards and ontologies for data harmonization
RISKS FOR ETL AND INTEROPERABILITY:
MAKING DATA USEFULLY SEARCHABLE
Study ID Gender Treatment
PD Study1a Female Azilect
Study ID Sex Treatment
GSE26927 1 rasagiline
PD Study Alz Study
Study ID Sex Treatment
PD Study1a Female Azilect
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
• Evolving, diverse science will ~always~ require diverse ontologies / vocabularies / ways of describing and organizing data
• Traditional integration models (tM*mappings.txt) and applications are not designed for efficient alignment across new standards and datasets.
RISKS FOR ETL AND INTEROPERABILITY:
CONNECTING TO OTHER / NEW DATA
(new data)
Drugbank
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
SOLUTIONS
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
• Provide automation with provenance, dynamic rules, alerting
• Provide algorithmic / inference support for ETL yet make curation decisions transparent and easily reported
• Automate alignment with tranSMART schema
SOLUTIONS FOR ETL AND INTEROPERABILITY:
“GETTING DATA IN”
PD Study
Alz Study
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
• “Data in” is not enough – semantic alignment is required
• Supported discovery and visualization across datasets to harmonize common identifiers, terms, relationships
• Automate alignment with pre-existing standards and ontologies, with preferred labels and synonyms
SOLUTIONS FOR ETL AND INTEROPERABILITY:
MAKING DATA USEFULLY SEARCHABLE
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
• Apply resources ontologies/vocabularies, math, inference => rules that transparently extend to new sources
SOLUTIONS FOR ETL AND INTEROPERABILITY:
CONNECTING TO OTHER / NEW DATA
(new data)
DrugBank
* E.g., Re-align my data with… AERO, BAO, ChEBI, ChEMBL, DisGeNET, DrugBank, ICDn, NCIT, UMLS, VOID, (…)
• Build on a data model and environment designed for semantic alignment with new data or collaborators*
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
TOOLS AND METHODS
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
LEVERAGING SEMANTIC INNOVATION
FOR TRANSMARTImport data into
Modeler (KE)
Agile semantic (w3c) data modelling
Machine-aided identification of
semantic inconsistencies
Extend with additional resources
Align with desired ontologies and nomenclature
Re-align with collaborators’, new sources
Import into tranSMART using ETL
pipeline
Store and execute integration rules and entailments
Extend and apply tranSMART’s loading ETL
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
SENTIENT PLATFORM KNOWLEDGE EXPLORER, WEB
QUERY VISUALIZATION, QUERY AND
SEMANTIC BENEFITS FOR INTEGRATION MAPPING
IMPORT, EDIT AND APPLY ONTOLOGIES AND CONTROLLED VOCABULARIES
UNCOVER HIDDEN RELATIONSHIPS AND APPLY INFERENCE TO IMPROVE INTEGRATION EFFICIENCY
SEARCH AND / OR DELIVER INTEGRATED DATA TO TRANSMART AND OTHER APPLICATIONS
Resource Mapping
Linked Open Data
Ontologies SPARQL
RDFEnterprise platform built on open source using Angular, SPARK, Knime, NoSQL*, …
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
STARTING WITH (TRANSLATIONAL) DIVERSITY
Patient Name
Cond Trtmnt
[Patient x] Alz Azilect
Pt ID Disease Diag.
Rx
[Pt ID xx4x]
Parkinsons Rasagaline
Study 2 Data Set
Study 1 Data Set
The data is in separate applications, using different standards and databases.
We want to be able to ask questions that include all of our clinical data and molecular assessments, and our partner’s data, but currently can’t do this. What if we put the data into tranSMART!?!
Copyright © 2015 IO Informatics Inc.
treatment
AlzheimersDisease
diagnosis
Semantic Lab Studies Network
Rasagiline
First, “dump” the data into the system and “shake it”. Creation and application of staging RDF reduces manual review requirements by over 95%.*
Patient Name
Cond Trtmnt
[Patient x] Alz Azilect
Lab Studies Data Set
Patient [Preferred ID #]
(is transformed by IOI tranSMART staging ontology)
(*align with useful terminology and relationships)
(for initial discovery, harmonization)
STAGING RDF FOR CURATION DISCOVERYVIA MATH, DIRTY QUERIES AND INFERENCE
*Automap to tranSMART ontologystandard, apply math, concatenate IDs, visualize and iterate with subject experts…
Copyright - IO Informatics © 2015
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
Patient [Preferred ID #]
Semantic Medical Records Network
Bringing the next dataset or standard into the system makes cross-source lexical matching, ontological/nomenclature (relationships/labels) matching and inference available for curation and harmonization, with context and provenance.
Pt ID Disease Diag. Rx
[Pt ID xx4x]
Parkinsons Rasagaline
Medical Records Data Set(is enhanced into)
Rasagiline
treatment
Azilect
brand name
diagnosis
Parkinsons Disease
ALIGN AND RE-ALIGN WITH STANDARDSA SEMANTIC MODELING ENVIRONMENT
(*harmonize to useful “upper” synonyms and relationships)
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
linked
by common terms
USEFUL OUTCOME…
Find patients diagnosed with both Parkinsons and Alzheimers disease who were treated with Azilect.
All data is harmonized and deeply searchable
Query directs content to desired application / schema(e.g., “put it in tranSMART!”)
diagnosis
ParkinsonsDisease
diagnosis
Patient [Preferred ID #]
AlzheimersDisease
Azilect
brand name
treatment
Rasagaline
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
BENEFITS
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
Simplify initial analysis and navigation across prospective datasets to reduce manual review burden for curation by over 95%
Maintain provenance on data sources, curation decisions
Automate ETL on clean data to reduce loading time and effort (*alerts for unexpected events, pop-up decision-maker)
BENEFITS OF A SEMANTIC LAYER:GET USEFUL DATA IN MORE
QUICKLY
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
Visual, algorithmic, thesauri, ontological and inferential support for data harmonization
Create and reuse rules, inferences and classifications for semantic interoperability
BENEFITS OF A SEMANTIC LAYER:DATA IN TRANSMART IS
HARMONIZED
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
Provide a data model and platform designed for rapid extension and interoperability with new data sources
Use and re-use private and public resources - algorithms, vocabularies, ontologies/taxonomies - for aligning and re-aligning data
BENEFITS OF A SEMANTIC LAYER:DATA IS FOR LONG-TERM
ALIGNMENT WITH NEW DATA AND STANDARDS
2007
GROWING ADOPTION AND RESOURCES 2008200920112014
Copyright - IO Informatics © 2015
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
SIDE NOTE - DOCKER CONTAINER FOR EASY INSTALLATION OF TRANSMART IS NOW AVAILABLE
https://registry.hub.docker.com/u/ioinformatics/transmart/
© IO Informatics, Inc. 2015© IO Informatics, Inc. 2015
Discussion
For additional information contact IO Informatics:
Robert Stanley, CEOBo Purtic, Ph.D, Director SalesBill Hayden, Director Business Development
Phone: (510) 705-8470Email: [email protected]