semantic infrastructure and ability to add in data models/vocabulary (dynamic extensions, etc)...

37
Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Upload: sylvia-mason

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Semantic Infrastructure and

Ability to add in data models/vocabulary

(Dynamic Extensions, etc)

Session #1

October 4, 2010

Page 2: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Agenda

• Discuss Domain needs for Dynamic Extensions or similar method

• Discuss how that fits with the new Semantic Infrastructure – how do we ensure it is taken into account?

• Querying over the Grid for these new data elements (discovery, ability to query).

• Vocabulary needs from the community: preferred lists for diseases, for race, for specimen description, etc - - how will they be determined, discovered, and used in semantic infrastructure?

• Note: Nov 3-4, 2010 TBPT F2F – we would like to communicate on the impact of Semantic Infrastructure 2.0 to community - need Government input

• Radiology SMEs are also waiting for AIM/PAIS resolution, and how semantic infrastructure will support their direction.

Page 3: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Appendix

Background Material at end:

• Dynamic Extensions Use in caTissue Suite Example, • Emory (Sept 2009) analysis of AIM and how may not support path, their initial

proposal for MicroAIM • Semantic Infrastructure Additional Relevant portions

Page 4: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Domain needs for DE (Dynamic Extensions)

• Discuss Domain needs for Dynamic Extensions or similar method• caTissue Suite – ability for user to create “Smoking History”, Prostate

SPORE specific data elements, etc. during localization of tool at an institute or consortium level

• Support research endeavors, such as creating new imaging (radiology/pathology) descriptions, terms/data elements that reflect new observation-types that are needed for multi-site studies to report on.

Adding Classes/Attributes/Vocabularies to Get Your Research Done for Your Project

Contributing classes/attributes/vocabularies, etc. in a controlled manner so these new items can be queryable by others to find specimens, etc.

Ideal: Need to make seamless, fast turnaround process to make this work!

… Rely on Dynamic Extensions

COLLABORATIVEISOLATED

Page 5: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Dynamic Extensions: High level goals

Ability to create new classes (entities) dynamically and associate them with static model

• Class, Attribute, associations metadata - Add/Edit/Delete• Data entry form generation: Form view and Spreadsheet view• Validation rules (UI as well as backend)• Permissible values support (custom, EVS, CDE)• Security for identified attributes• Entity reuse - Copy & Share attributes• XMI: Import and export• caCORE API to add, edit and read• caGrid compatibility

• requires Dynamic Extensions compatibility review process?

Page 6: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Metadata structure

MODEL

CLASS

ATTRIBUTES

1..*

1..*

1..*

PERMISSIBLE VALUES

1..*

caDSR

EVS

CDE reuse

Class reuse

Concept Code

Download value

domain

Page 7: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

caTissue Dynamic Extensions

Add new data elements to caTissue through Dynamic Extensions (DEs)

DEs can be created through caTissue’s UI or programmatically through the API Ad Hoc UI creation of DEs is useful when you are creating

one or two extensions at a single institution Programmatic DE creation is convenient when creating

multiple extensions or creating extensions for multiple sites

Page 8: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Semantic Infrastructure 2.0

• Relevant portions from September 2010 version:• https://wiki.nci.nih.gov/download/attachments/29563169/CCBIIT_Semanti

c_Infrastructure_2.0_Roadmap_Sept_6_2010.pdf

Page 9: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Semantic Infrastructure 2.0

Page 10: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Semantic Infrastructure 2.0

Page 11: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Semantic Infrastructure 2.0

Page 12: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Semantic Infrastructure 2.0

Page 13: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Semantic Infrastructure

Page 14: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

SemanticInfrastructureRoadmapConcept Map

Page 15: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Radiology/AIM/ Pathology Imaging Efforts

• Need to clarify approach to AIM (Annotation and Image Markup)• What does it mean, what is the unified approach for CBIIT?

• Note that AIM group (Stanford/Northwestern) are also considering that they may be able to support Annotation/develop a model that will support annotation for caTissue, Pathology, Imaging• (From Daniel Rubin (Radiologist, Stanford))

• “As I previously mentioned, we strongly believe that we should unify the disparate efforts related to image description/annotation among Radiology (AIM), Tissue (caTissue), and Digital Pathology. This will not only reduce fragmentation and improve interoperability, it will also enable substantive integration of radiology/pathology/molecular data needed for our Enterprise Use Cases and Big Health. Can we arrange a tcon with the key parties to discuss steps to move this forward?”

• Next Follow-up items to clarify direction?• AIM (Rubin, Mongolkwat) are looking for a teleconference with

Fore(TBPT), IMG

Page 16: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Annotation and Image Markup (Northwestern, Stanford)

• When AIM is used to describe annotations, each information component, anatomic entity, observation, measurement, etc, is explicitly captured in a semantically precise and computationally accessible manner. Thus, in the example above, an AIM-enabled PACS workstation would generate a pick list of RadLex® anatomic terms, from which the radiologist would select middle lobe of right lung. The specific RadLex® identifier for that location (RID1310) would automatically be embedded in the annotation. Similarly, the PACS workstation would generate pick lists for the AIM observation (mass, RID3874) and the AIM observation characteristic (enhancing, RID6065). The AIM can also contain the x and y coordinates of an outline drawn around a lesion or the coordinates of an arrow pointing to a lesion, as these are generated by the user of the PACS workstation. If calculations—for example, longest diameter or area—were performed by the workstation, then AIM could store these results. The latter are part of a list of standardized measurements. The details of the AIM information model are described elsewhere (3).

• • Once an annotation is defined in the AIM model, making sophisticated queries becomes

relatively simple. Our query “Find all studies that contain enhancing right middle lobe lung masses that measure between 5 and 6 cm2” becomes “Find all image references in AIM annotations where AIM: Anatomic Entity = RID1310, AIM: Imaging Observation = RID3874, AIM: Observation Characteristic = RID6065, AIM: Calculation = Area and AIM: Calculation Result >5 and <6 cm2.” The exact syntax and the mechanisms used to execute such a query are more complicated than those presented here but are well defined.

• Source: http://radiology.rsna.org/content/253/3/590.full

Page 17: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

C

en

ter

for

Com

pre

hen

siv

e I

nfo

rmati

cs

AIM Data Model (V2.0 rev5) Overview

AIM

Sept 2009 Pres

Page 18: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

How will Semantic Infrastructure Address:

• Dynamic Extensions -- how will this be supported (Service for them “registered”• How can these terms be discoverable by other sites that want to

incorporate them?• How will they be queried over the grid?

• Vocabularies – how will users be able to select the “preferred name” but also view the synonyms?

• Pathology observations, Radiology observations on an image– how do they fit?• Observations might be empirical, but could be generated from output of

Algorithms acting on small area of image (i.e. cell counts, or automatic staining detection, outline, cardiac output from measurements on images)

• How about the RESULTING diagnosis/reporting that is the conclusions of of the data they observed? (this might be more static: i.e. breast ductal carcinoma in situ, hepatocellular chirrosis

• This is what is often passed on to other research systems (biobanks, clinical trials, etc)

Page 19: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

TBPT F2F Discussion (Nov 3) – on Semantic Infrastructure

• Working Session(s): Semantic Infrastructure – Discussion on how it will move forward, how will biorepository management and pathology data be supported.

• Semantic Infrastructure Topics may include:• How it will apply to Pathology Data and Biorepository Data – how will querying

be developed and managed across the caBIG program – how will it help the end-user?

• Discuss with ARCH/VCDE team on Oct 4 meeting (mention) – Need additional meeting that week. Need answers at F2F meeting.

Page 20: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Appendix

Background Material at end:

• Dynamic Extensions Use in caTissue Suite Example, • Emory (Sept 2009) analysis of AIM and how may not support path, their initial

proposal for MicroAIM • Semantic Infrastructure Additional Relevant portions

Page 21: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

DYNAMIC EXTENSIONS IN CATISSUE SUITE (EXAMPLES)

Background Slides

Page 22: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

caTissue use cases

Administrator creates new custom

annotations

System stores metadata and creates RDBMS tables to store actual

data

CREATE

Technician wants to add custom annotation

System auto generates web page to add custom

annotation

System adds data to the database

DATA

Researcher query for Specimens

based on custom annotation

System auto generates query criteria pages and provides ability to query

across static and dynamic model

QUERY

Page 23: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Creating DEs using the caTissue Suite User Interface

• The four main features of DEs are:• Form Creation• Containment• Linking • Inheritance

• Administrators can create DEs in caTissue

1. By using EA to create model and XMI file and IMPORT into caTissue, send through Review Process

2. Create DE in caTissue, then export XMI and send through Review Process

• STEPS for caTissue Dynamic Extension creation:• Requirement Document• Design Model in Enterprise Architect• Create XMI File upload into CaTissue• Create Permissible Value file upload into CaTissue.• Create Form Definition File upload into caTissue

Page 24: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Using DEs

• Forms can be viewed and data can be entered/edited by navigating to the collection protocol based view under Biospecimen Data. • 1. Click the Biospecimen Data tab• 2. Select the collection protocol for which the form was created• 3. Select the hook entity under which the form was created• 4. Click the View Annotation tab• 5. Select the desired form from the Annotation Forms drop-down list• 6. View or enter/edit data in the form

Page 25: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

DE at Indiana University (S. Ragg, G. Schadow, A. McMaho, Persisten) - 16 Dynamic Extensions Created

Pediatrics Oncology

Acute Lymphoblastic Leukemia Wilms Tumor Osteosarcoma Neuroblastoma Ewings Sarcoma

Hematology Sickle Cell Disease Neonatology

Inflammatory Bowl Disease Crohn’s Disease Ulcerative Colitis

Pediatrics Endocrinology

Diabetes Mellitus Type 1 Diabetes Mellitus Type 2

CardiologyObstetricsOphthalmology

Glaucoma Cataract Retinal Disease

Medicine Cardiology

Page 26: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

DE at Indiana - Workflow

Dynamic Extension: Wilms Tumor

Page 27: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

DE in Prostate SPORES project (Prostate SPORES consortium)

• Inter-Prostate SPORE Biomarker Study (IPBS)

Converted case report forms (CRFs) to spreadsheet Spreadsheets were reviewed and mapped to existing data

elements in UML models of caTissue Core Distributed spreadsheet to several sites

Identified obvious matches in Core/Suite model Assisted by caTissue development team in verifying and

finding less-obvious matches Identified missing classes/attributes

Some of these elements were incorporated into the next version of the caTissue model

From Andrew Helsley, 2008 caBIG Annual Meeting

Page 28: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

IPBS Dynamic Extensions

IPBS data elements not mapped to caTissue Core:

Determined to be out of scope: Quality of life items Patient Questionnaire. Follow-up data

Chosen for DE on: Participant Specimen Specimen Collection Group

Elements from spreadsheets not in caTissue Core were modeled with Enterprise Architect XMI created from UML models Dynamic Extension SOP for IPBSFrom Andrew Helsley, 2008 caBIG Annual Meeting

Page 29: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

IPBS Dynamic Extensions for the Specimen Hook Entity

Page 30: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

AIM Background (from September 2009 slides by Emory)

• Presentation given to TBPT as discussions started on understanding AIM, where it was, how it could support pathology

• Emory presented case for their MicroAIM – which has currently evolved to their PAIS (did not have current information available)

Page 31: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

C

en

ter

for

Com

pre

hen

siv

e I

nfo

rmati

cs

Gap Analysis and Limitations

• One annotation per object– Annotation in one AIM document must be of the same

observation type (no normal and cancer nuclei together)– One annotation per object leads to serious data

redundancy and query performance deterioration

• Limited information in Annotation Of Annotation– AoA provides group level information derived or calculated

from multiple annotations, but limited metadata of ReferencedAnnotation leads to dereference of all linked annotations for queries or analysis

• Unknown levels of nesting for Annotation of Annotation– Results in queries that have unknown number of joins for a

single query, or unknown number of queries

Gap

Sept 2009 Pres

Page 32: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

C

en

ter

for

Com

pre

hen

siv

e I

nfo

rmati

cs

Gap Analysis and Limitations (cont’d)

• Insufficient grouping relationship of related annotations– Project information not modeled– Grouping of closely related annotations are not explicitly

modeled, such as annotations for time series, validation set, serial section, etc

• Limited markup types– Multipoint to represent polygon, semantically confusing– 3D geometric shapes missing– No image based markups such as masks or tensors– Markups can also be animations, such as moving cells

• Lack of provenance information for computations– Algorithms, parameters, and inputs

Gap

Sept 2009 Pres

Page 33: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

C

en

ter

for

Com

pre

hen

siv

e I

nfo

rmati

cs

Gap Analysis and Limitations (con’td)

• Limited image reference information– No support of references to microscopy images (WSI, TMA

images, etc)– Problem on referencing images from multiple modalities

generated from different equipments

• Patient centric: no support for animal image annotations, and no specimen information

• Ontology based on RadLex– Need support for subcellular anatomic entities, pathology

and biology concepts and observations

• No versioning support– Versioning for schema, document instance and ontology

needed

Gap

Sept 2009 Pres

Page 34: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

C

en

ter

for

Com

pre

hen

siv

e I

nfo

rmati

cs

Our Proposal: MicroAIM – Microscopy and Pathology Annotation and Image Markup• Redefine AIM to support multiple objects and associate

different observations to each object, within same document

• Observation on a single or multiple objects, and can be derived

• Calculation on a single or multiple objects, and can be derived

• Another general class of type to represent mask or field value

• Geometric shapes should be extended to encompass 3D shapes

• Represent provenance information for computed markup and annotation

• Support pathology image reference and specimen information

MicroAIM

Sept 2009 Pres

Page 35: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

C

en

ter

for

Com

pre

hen

siv

e I

nfo

rmati

cs

Sketch of MicroAIM Concepts

• ImageReference: Metadata that describes an image or a group of images that are used as the base for making markup and annotation, and can be used to identify and retrieve them from an image archive or data service

• Annotation: Explanatory or descriptive information made by humans or machines directly related to the content of a referenced image or images

• Markup: graphical symbols associated with an image.• Provenance: information that helps determine the derivation history

of a markup or annotation, such as algorithm information, parameters, and other inputs

• Project: aggregation of related images, markup, or annotations, from which conclusions may be drawn

• Group: aggregation of closely related subset of annotation documents

• User: The person who creates the MicroAIM document

MicroAIM

Sept 2009 Pres

Page 36: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

C

en

ter

for

Com

pre

hen

siv

e I

nfo

rmati

cs

Overview of MicroAIM (Work-In-Progress)class Domain Mo...

Annotation

GeometricShapeCalculationObservation

Specimen

ImageReference

Provenance

User

MicroAIM

Equipment

Group

AnatomicEntity

Subject

Field

Project

MicroscopyImageReference

DICOMImageReference

TMAImageReference

Markup

DiseaseInference

Region

WSIImageReference

1

0..1

1

0..*

1

0..1

+derived1

+base0..*

1

0..*

1

0..*

1

0..*

1 0..1

10..*

1

0..1

1

0..*

10..1

1 0..1

+derived1

+base0..*

1

0..*

1

0..1

1

0..*

1

0..1

1

0..1

1

0..1

1

0..*

1

0..1

1

0..*

1

0..1

MicroAIM

Sept 2009 Pres

Page 37: Semantic Infrastructure and Ability to add in data models/vocabulary (Dynamic Extensions, etc) Session #1 October 4, 2010

Semantic Infrastructure 2.0

• Relevant portions from September 2010 version:• https://wiki.nci.nih.gov/download/attachments/29563169/CCBIIT_Semanti

c_Infrastructure_2.0_Roadmap_Sept_6_2010.pdf