cabig semantic infrastructure 2.0: supporting tbpt needs dave hau, m.d., m.s. acting director,...
TRANSCRIPT
caBIG Semantic Infrastructure 2.0:
Supporting TBPT Needs
Dave Hau, M.D., M.S.Acting Director, Semantic Infrastructure
NCI Center for Biomedical Informatics and Information Technology
Key Extensions from 1.0 Infrastructure
• Lower barrier of entry - “Make easy things easy”• Linear value proposition
• Vs. all or none• Immediate return upon initial investment• Tools guide user to increase semantics for more
“return”• Support all levels of participation based on user needs
• Support non-NCI semantic representation strategies• Legacy support for 1.0 users• Leverage existing open-source technologies and
standards from the community
2
Semantic Infrastructure 2.0
• Knowledge Management Service (SAIF / ECCF Registry)• Informational (Static) Semantics
• Layered (~ upper ontology) – promote reuse ; enhance interoperability ; avoid data element “explosion”
• Contextual – all elements have a model context ; vs. individual element curation
• RIM-based semantics and ISO 21090 healthcare datatypes• Ability to transform between different formats based on use case
• Behavioral (Dynamic) Semantics• Support semantic workflow composition on caGrid 2.0• Analytical and transactional services
• Artifact Management – SAIF metamodel (UML profile) ; DITA• Form templates – HL7 CDA, CDISC ODM, Xform
• Governance Service• Conformance checking – design time and run time
3
Leverage Open-Source Technologies
Informational (Static) Semantics
Eclipse Modeling Framework (EMF)
Transformation between model formats
4
HL7 MIF
• HL7 Static Model Designer
UML
• MDT (Model Development Tools)
Ontology
• Protégé?
Linear Value Proposition
“Just enough semantics” for different deployment contexts
(Lab-wide vs. Institution-wide vs. Global)
5
Innovation Path
Transition
Enterprise Path
“VALUE”
Linear Value Proposition – Static Semantics
Value – Data integration across caGrid 2.0
6
Any community ontology (e.g. Gene Ontology)
Mapping (e.g. ISA-TAB -> LSDAM mapping)
caBIG Domain Analysis Model (e.g. BRIDG, LSDAM)
Linear Value Proposition – Behavioral Semantics
Value – Semantic Workflow Composition7
WSDL / WADL
WSDL / WADL + pre,post-conditions
SAIF Behavioral Framework (Service Contract – roles, interactions)
SI 2.0 is federated
8
Standard
Institutional
Lab Lab
Institutional
Lab Lab
SI 2.0 Supports Multiple Platforms
9
Service Registry
• caGrid 2.0
Service Registry
• CVGrid
Service Registry
• Other Platform
Semantic Infrastructure 2.0
SI 2.0 -> Service Registry -> Service Instance
10
SI 2.0 – caTissue specification
caGrid 2.0 Service Registry
(stores reference)
caTissue instance at Wash U
caTissue instance at Fox Chase
caTissue instance at U Leicester,
UK
caTissue instance at Lowy, UNSW
Semantic Infrastructure
Service Registry
Service Instances
Dynamic Extensions
11
SI 2.0 – caTissue specification
caGrid 2.0 Service Registry
(stores reference)
caTissue instance at Wash U
caTissue instance at Fox Chase
caTissue instance at U Leicester,
UK
caTissue instance at Lowy, UNSW
Semantic Infrastructure (NCI instance or local instance)
Service Registry
Service Instances
DEs
CDA template / xform
Version bump - resync
LexEVS
Value sets, pick lists
Dynamic Extensions - Details
• SI 2.0 will provide portlet and service capabilities for caTissue to create Dynamic Extensions (DEs) directly on an SI 2.0 instance.
• SI 2.0 will provide capabilities for querying and reuse existing models and attributes.
• Newly created DEs are available for sharing from SI 2.0 instance at owner’s preferred timing.
• LexEVS will provide value set querying, creation and management capabilities for the DEs.
• Forms will be created on SI 2.0, and retrievable by caTissue as a CDA template, xform etc., with validation mechanisms (e.g. schematrons associated with CDA template).
12
Dynamic Extensions - Details
• For defining DEs, SI 2.0 will support use of NCIt concepts, and also non-NCI semantic representations such as community ontologies (NCBO Bioportal ontologies, OBO Foundry etc.)
• Upon creation of DEs, the version of a user's model will be incremented. SI 2.0 will prompt the runtime registry to reload the new version of the model, so discovery can be based on the new extended model.
• (Possibly) Entity-Attribute-Value(EAV) or RDF triple representation to avoid building and deploying new data service.
13
AIM v3 Sample Annotations
<Calculation description="Linear Measurement" cagridId="0" mathML="" codeMeaning="Length" codeValue="G-A22A" codingSchemeDesignator="SRT" codingSchemeVersion="" uid="1.24897.57654138621.1646" >
<referencedCalculationCollection/>
<calculationResultCollection>
…
<CalculationData cagridId="0" value="28.8822383880615">
14
AIM v3 Sample Annotations
<AnatomicEntity codeMeaning="LUNG" codeValue="REX0001" codingSchemeDesignator="RADLEX" cagridId="0" label="">
</AnatomicEntity>
15
AIM v3 Sample Annotations
<ImagingObservation codeMeaning="Conspicuity" codeValue="REX4001" codingSchemeDesignator="RADREX" comment="" cagridId="0" label="">
<imagingObservationCharacteristicCollection>
<ImagingObservationCharacteristic codeMeaning="Very Obvious" codeValue="REX4006" codingSchemeDesignator="RADLEX" comment="" cagridId="0" label="" />
</imagingObservationCharacteristicCollection>
</ImagingObservation>
16
AIM – Radiology / Pathology Imaging
• The metadata is in the data.
• We are annotating each data instance, instead of each class and attributes. “One annotation per object”
• Entity-Attribute-Value representation
• RDF triples / SPARQL endpoint
17
Semantic Infrastructure 2.0 Timeline (Tentative)
• Nov 2010 – Jan 2011: “Inception Activities” – Risk Mitigation:• Data Migration• Semantic Workflow Composition
• Summer 2011: SI 2.0 repository initial release ; data migration starts
• Fall 2011: Knowledge Management Service• Winter 2011: Governance Service• Spring 2012: Tools• Summer 2012: Decision Support / Analysis Service
18
Use Case – Reuse Metadata(Interim Strategy)
I'm a Prostate SPORE Group, and we have a list of new data elements that we want to add to caTissue (or other caBIG tools). I'd like to know what are the data elements and query from caTissue as I'm filling out a dynamic extension form in caTissue. - how can we a) use a vocabulary service - to send "prostate cancer " (or similar term) and get a drop down list of values that are in caDSR and thus registered in NCI Thesaurus. So, in this very simple example - they would have "prostate carcinoma".
“CDE Curation Tool” (Querying now open to public)
http://cdecurate.nci.nih.gov
19
Use Case – Reuse Metadata
20
Use Case – Reuse Metadata
21
Use Case – Reuse Metadata
22
Use Case – Reuse Metadata
23
Use Case – Reuse Metadata
24
Semantic Infrastructure 2.0 Roadmap
• SI 2.0 Roadmap public website:• https://wiki.nci.nih.gov/x/IRnDAQ
• SI 2.0 Roadmap, Sep 6, 2010 (next release Nov 19, 2010):• https://wiki.nci.nih.gov/x/vw-0AQ (online version)• https://wiki.nci.nih.gov/download/attachments/29563169/CCBIIT_S
emantic_Infrastructure_2.0_Roadmap_Sept_6_2010.pdf (pdf version)
• SI 2.0 Roadmap community input form:• https://wiki.nci.nih.gov/download/attachments/29563169/
Semantic_Infrastructure_Community_Input_Form.xlsx
25