semantic infrastructure and ability to add in data models/vocabulary (dynamic extensions, etc)...
TRANSCRIPT
Semantic Infrastructure and
Ability to add in data models/vocabulary
(Dynamic Extensions, etc)
Session #1
October 4, 2010
Agenda
• Discuss Domain needs for Dynamic Extensions or similar method
• Discuss how that fits with the new Semantic Infrastructure – how do we ensure it is taken into account?
• Querying over the Grid for these new data elements (discovery, ability to query).
• Vocabulary needs from the community: preferred lists for diseases, for race, for specimen description, etc - - how will they be determined, discovered, and used in semantic infrastructure?
• Note: Nov 3-4, 2010 TBPT F2F – we would like to communicate on the impact of Semantic Infrastructure 2.0 to community - need Government input
• Radiology SMEs are also waiting for AIM/PAIS resolution, and how semantic infrastructure will support their direction.
Appendix
Background Material at end:
• Dynamic Extensions Use in caTissue Suite Example, • Emory (Sept 2009) analysis of AIM and how may not support path, their initial
proposal for MicroAIM • Semantic Infrastructure Additional Relevant portions
Domain needs for DE (Dynamic Extensions)
• Discuss Domain needs for Dynamic Extensions or similar method• caTissue Suite – ability for user to create “Smoking History”, Prostate
SPORE specific data elements, etc. during localization of tool at an institute or consortium level
• Support research endeavors, such as creating new imaging (radiology/pathology) descriptions, terms/data elements that reflect new observation-types that are needed for multi-site studies to report on.
Adding Classes/Attributes/Vocabularies to Get Your Research Done for Your Project
Contributing classes/attributes/vocabularies, etc. in a controlled manner so these new items can be queryable by others to find specimens, etc.
Ideal: Need to make seamless, fast turnaround process to make this work!
… Rely on Dynamic Extensions
COLLABORATIVEISOLATED
Dynamic Extensions: High level goals
Ability to create new classes (entities) dynamically and associate them with static model
• Class, Attribute, associations metadata - Add/Edit/Delete• Data entry form generation: Form view and Spreadsheet view• Validation rules (UI as well as backend)• Permissible values support (custom, EVS, CDE)• Security for identified attributes• Entity reuse - Copy & Share attributes• XMI: Import and export• caCORE API to add, edit and read• caGrid compatibility
• requires Dynamic Extensions compatibility review process?
Metadata structure
MODEL
CLASS
ATTRIBUTES
1..*
1..*
1..*
PERMISSIBLE VALUES
1..*
caDSR
EVS
CDE reuse
Class reuse
Concept Code
Download value
domain
caTissue Dynamic Extensions
Add new data elements to caTissue through Dynamic Extensions (DEs)
DEs can be created through caTissue’s UI or programmatically through the API Ad Hoc UI creation of DEs is useful when you are creating
one or two extensions at a single institution Programmatic DE creation is convenient when creating
multiple extensions or creating extensions for multiple sites
Semantic Infrastructure 2.0
• Relevant portions from September 2010 version:• https://wiki.nci.nih.gov/download/attachments/29563169/CCBIIT_Semanti
c_Infrastructure_2.0_Roadmap_Sept_6_2010.pdf
Semantic Infrastructure 2.0
Semantic Infrastructure 2.0
Semantic Infrastructure 2.0
Semantic Infrastructure 2.0
Semantic Infrastructure
SemanticInfrastructureRoadmapConcept Map
Radiology/AIM/ Pathology Imaging Efforts
• Need to clarify approach to AIM (Annotation and Image Markup)• What does it mean, what is the unified approach for CBIIT?
• Note that AIM group (Stanford/Northwestern) are also considering that they may be able to support Annotation/develop a model that will support annotation for caTissue, Pathology, Imaging• (From Daniel Rubin (Radiologist, Stanford))
• “As I previously mentioned, we strongly believe that we should unify the disparate efforts related to image description/annotation among Radiology (AIM), Tissue (caTissue), and Digital Pathology. This will not only reduce fragmentation and improve interoperability, it will also enable substantive integration of radiology/pathology/molecular data needed for our Enterprise Use Cases and Big Health. Can we arrange a tcon with the key parties to discuss steps to move this forward?”
• Next Follow-up items to clarify direction?• AIM (Rubin, Mongolkwat) are looking for a teleconference with
Fore(TBPT), IMG
Annotation and Image Markup (Northwestern, Stanford)
• When AIM is used to describe annotations, each information component, anatomic entity, observation, measurement, etc, is explicitly captured in a semantically precise and computationally accessible manner. Thus, in the example above, an AIM-enabled PACS workstation would generate a pick list of RadLex® anatomic terms, from which the radiologist would select middle lobe of right lung. The specific RadLex® identifier for that location (RID1310) would automatically be embedded in the annotation. Similarly, the PACS workstation would generate pick lists for the AIM observation (mass, RID3874) and the AIM observation characteristic (enhancing, RID6065). The AIM can also contain the x and y coordinates of an outline drawn around a lesion or the coordinates of an arrow pointing to a lesion, as these are generated by the user of the PACS workstation. If calculations—for example, longest diameter or area—were performed by the workstation, then AIM could store these results. The latter are part of a list of standardized measurements. The details of the AIM information model are described elsewhere (3).
• • Once an annotation is defined in the AIM model, making sophisticated queries becomes
relatively simple. Our query “Find all studies that contain enhancing right middle lobe lung masses that measure between 5 and 6 cm2” becomes “Find all image references in AIM annotations where AIM: Anatomic Entity = RID1310, AIM: Imaging Observation = RID3874, AIM: Observation Characteristic = RID6065, AIM: Calculation = Area and AIM: Calculation Result >5 and <6 cm2.” The exact syntax and the mechanisms used to execute such a query are more complicated than those presented here but are well defined.
• Source: http://radiology.rsna.org/content/253/3/590.full
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
AIM Data Model (V2.0 rev5) Overview
AIM
Sept 2009 Pres
How will Semantic Infrastructure Address:
• Dynamic Extensions -- how will this be supported (Service for them “registered”• How can these terms be discoverable by other sites that want to
incorporate them?• How will they be queried over the grid?
• Vocabularies – how will users be able to select the “preferred name” but also view the synonyms?
• Pathology observations, Radiology observations on an image– how do they fit?• Observations might be empirical, but could be generated from output of
Algorithms acting on small area of image (i.e. cell counts, or automatic staining detection, outline, cardiac output from measurements on images)
• How about the RESULTING diagnosis/reporting that is the conclusions of of the data they observed? (this might be more static: i.e. breast ductal carcinoma in situ, hepatocellular chirrosis
• This is what is often passed on to other research systems (biobanks, clinical trials, etc)
TBPT F2F Discussion (Nov 3) – on Semantic Infrastructure
• Working Session(s): Semantic Infrastructure – Discussion on how it will move forward, how will biorepository management and pathology data be supported.
• Semantic Infrastructure Topics may include:• How it will apply to Pathology Data and Biorepository Data – how will querying
be developed and managed across the caBIG program – how will it help the end-user?
• Discuss with ARCH/VCDE team on Oct 4 meeting (mention) – Need additional meeting that week. Need answers at F2F meeting.
Appendix
Background Material at end:
• Dynamic Extensions Use in caTissue Suite Example, • Emory (Sept 2009) analysis of AIM and how may not support path, their initial
proposal for MicroAIM • Semantic Infrastructure Additional Relevant portions
DYNAMIC EXTENSIONS IN CATISSUE SUITE (EXAMPLES)
Background Slides
caTissue use cases
Administrator creates new custom
annotations
System stores metadata and creates RDBMS tables to store actual
data
CREATE
Technician wants to add custom annotation
System auto generates web page to add custom
annotation
System adds data to the database
DATA
Researcher query for Specimens
based on custom annotation
System auto generates query criteria pages and provides ability to query
across static and dynamic model
QUERY
Creating DEs using the caTissue Suite User Interface
• The four main features of DEs are:• Form Creation• Containment• Linking • Inheritance
• Administrators can create DEs in caTissue
1. By using EA to create model and XMI file and IMPORT into caTissue, send through Review Process
2. Create DE in caTissue, then export XMI and send through Review Process
• STEPS for caTissue Dynamic Extension creation:• Requirement Document• Design Model in Enterprise Architect• Create XMI File upload into CaTissue• Create Permissible Value file upload into CaTissue.• Create Form Definition File upload into caTissue
Using DEs
• Forms can be viewed and data can be entered/edited by navigating to the collection protocol based view under Biospecimen Data. • 1. Click the Biospecimen Data tab• 2. Select the collection protocol for which the form was created• 3. Select the hook entity under which the form was created• 4. Click the View Annotation tab• 5. Select the desired form from the Annotation Forms drop-down list• 6. View or enter/edit data in the form
DE at Indiana University (S. Ragg, G. Schadow, A. McMaho, Persisten) - 16 Dynamic Extensions Created
Pediatrics Oncology
Acute Lymphoblastic Leukemia Wilms Tumor Osteosarcoma Neuroblastoma Ewings Sarcoma
Hematology Sickle Cell Disease Neonatology
Inflammatory Bowl Disease Crohn’s Disease Ulcerative Colitis
Pediatrics Endocrinology
Diabetes Mellitus Type 1 Diabetes Mellitus Type 2
CardiologyObstetricsOphthalmology
Glaucoma Cataract Retinal Disease
Medicine Cardiology
DE at Indiana - Workflow
Dynamic Extension: Wilms Tumor
DE in Prostate SPORES project (Prostate SPORES consortium)
• Inter-Prostate SPORE Biomarker Study (IPBS)
Converted case report forms (CRFs) to spreadsheet Spreadsheets were reviewed and mapped to existing data
elements in UML models of caTissue Core Distributed spreadsheet to several sites
Identified obvious matches in Core/Suite model Assisted by caTissue development team in verifying and
finding less-obvious matches Identified missing classes/attributes
Some of these elements were incorporated into the next version of the caTissue model
From Andrew Helsley, 2008 caBIG Annual Meeting
IPBS Dynamic Extensions
IPBS data elements not mapped to caTissue Core:
Determined to be out of scope: Quality of life items Patient Questionnaire. Follow-up data
Chosen for DE on: Participant Specimen Specimen Collection Group
Elements from spreadsheets not in caTissue Core were modeled with Enterprise Architect XMI created from UML models Dynamic Extension SOP for IPBSFrom Andrew Helsley, 2008 caBIG Annual Meeting
IPBS Dynamic Extensions for the Specimen Hook Entity
AIM Background (from September 2009 slides by Emory)
• Presentation given to TBPT as discussions started on understanding AIM, where it was, how it could support pathology
• Emory presented case for their MicroAIM – which has currently evolved to their PAIS (did not have current information available)
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Gap Analysis and Limitations
• One annotation per object– Annotation in one AIM document must be of the same
observation type (no normal and cancer nuclei together)– One annotation per object leads to serious data
redundancy and query performance deterioration
• Limited information in Annotation Of Annotation– AoA provides group level information derived or calculated
from multiple annotations, but limited metadata of ReferencedAnnotation leads to dereference of all linked annotations for queries or analysis
• Unknown levels of nesting for Annotation of Annotation– Results in queries that have unknown number of joins for a
single query, or unknown number of queries
Gap
Sept 2009 Pres
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Gap Analysis and Limitations (cont’d)
• Insufficient grouping relationship of related annotations– Project information not modeled– Grouping of closely related annotations are not explicitly
modeled, such as annotations for time series, validation set, serial section, etc
• Limited markup types– Multipoint to represent polygon, semantically confusing– 3D geometric shapes missing– No image based markups such as masks or tensors– Markups can also be animations, such as moving cells
• Lack of provenance information for computations– Algorithms, parameters, and inputs
Gap
Sept 2009 Pres
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Gap Analysis and Limitations (con’td)
• Limited image reference information– No support of references to microscopy images (WSI, TMA
images, etc)– Problem on referencing images from multiple modalities
generated from different equipments
• Patient centric: no support for animal image annotations, and no specimen information
• Ontology based on RadLex– Need support for subcellular anatomic entities, pathology
and biology concepts and observations
• No versioning support– Versioning for schema, document instance and ontology
needed
Gap
Sept 2009 Pres
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Our Proposal: MicroAIM – Microscopy and Pathology Annotation and Image Markup• Redefine AIM to support multiple objects and associate
different observations to each object, within same document
• Observation on a single or multiple objects, and can be derived
• Calculation on a single or multiple objects, and can be derived
• Another general class of type to represent mask or field value
• Geometric shapes should be extended to encompass 3D shapes
• Represent provenance information for computed markup and annotation
• Support pathology image reference and specimen information
MicroAIM
Sept 2009 Pres
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Sketch of MicroAIM Concepts
• ImageReference: Metadata that describes an image or a group of images that are used as the base for making markup and annotation, and can be used to identify and retrieve them from an image archive or data service
• Annotation: Explanatory or descriptive information made by humans or machines directly related to the content of a referenced image or images
• Markup: graphical symbols associated with an image.• Provenance: information that helps determine the derivation history
of a markup or annotation, such as algorithm information, parameters, and other inputs
• Project: aggregation of related images, markup, or annotations, from which conclusions may be drawn
• Group: aggregation of closely related subset of annotation documents
• User: The person who creates the MicroAIM document
MicroAIM
Sept 2009 Pres
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Overview of MicroAIM (Work-In-Progress)class Domain Mo...
Annotation
GeometricShapeCalculationObservation
Specimen
ImageReference
Provenance
User
MicroAIM
Equipment
Group
AnatomicEntity
Subject
Field
Project
MicroscopyImageReference
DICOMImageReference
TMAImageReference
Markup
DiseaseInference
Region
WSIImageReference
1
0..1
1
0..*
1
0..1
+derived1
+base0..*
1
0..*
1
0..*
1
0..*
1 0..1
10..*
1
0..1
1
0..*
10..1
1 0..1
+derived1
+base0..*
1
0..*
1
0..1
1
0..*
1
0..1
1
0..1
1
0..1
1
0..*
1
0..1
1
0..*
1
0..1
MicroAIM
Sept 2009 Pres
Semantic Infrastructure 2.0
• Relevant portions from September 2010 version:• https://wiki.nci.nih.gov/download/attachments/29563169/CCBIIT_Semanti
c_Infrastructure_2.0_Roadmap_Sept_6_2010.pdf