cabig data structures

94
caBIG Data Structures CS584 Lecture on 4/6/2007 Patrick McConnell Duke Comprehensive Cancer Center [email protected]

Upload: lee-hicks

Post on 30-Dec-2015

41 views

Category:

Documents


5 download

DESCRIPTION

caBIG Data Structures. CS584 Lecture on 4/6/2007. Patrick McConnell Duke Comprehensive Cancer Center [email protected]. Agenda. caBIG background (5 min, 8 slides) Goals, program structure, organizations caTRIP background (5 min, 6 slides) Background, use cases, architecture - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: caBIG Data Structures

caBIG Data Structures

CS584 Lecture on 4/6/2007

Patrick McConnell

Duke Comprehensive Cancer [email protected]

Page 2: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Agenda

• caBIG background (5 min, 8 slides)

• Goals, program structure, organizations

• caTRIP background (5 min, 6 slides)

• Background, use cases, architecture

• caBIG compatibility (30 min, 21 slides + demonstration)

• Interoperability, compatibility, syntactics, and semantics

• Building caBIG compatible systems (10 min, 7 slides)

• Interoperability, compatibility, syntactics, and semantics

• caGrid (10 min, 8 slides)

• Background, service creation, metadata

• caTRIP demonstration (10 min, 2 slides + demo)

• Demonstration

• Discussion/questions (5 min + throughout)

Page 3: caBIG Data Structures

caBIG Background

Goals, program structure, organizations

Page 4: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG backgroundBiomedical information tsunami

• overwhelming volume of data

• multitude of sources

Page 5: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG backgroundInformatics tower of Babel

•Each cancer research community speaks its own scientific “dialect”

•Integration critical to achieve promise of molecular medicine

Page 6: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG backgroundGoals and principles

• 50 Cancer Centers are working towards a common goal of integrated data, tools and methodologies to accelerate cancer research goals at the National Cancer Institute for Bioinformatics (NCICB), the cancer Biomedical Informatics Grid (caBIG™)

• The goal of caBIG™ is to create a virtual web of interconnected data, individuals, and organizations which will:

• redefine how research is conducted

• care is provided

• patients / participants interact with the biomedical research enterprise

• The principles driving caBIG™ are:

• Open Source

• Open Access

• Open Development

• Federated Model

Page 7: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG backgroundcaBIG facilitates sharing

Page 8: caBIG Data Structures
Page 9: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG backgroundWorkspaces

DOMAIN WORKSPACE 3Tissue Banks & Pathology ToolsDOMAIN WORKSPACE 3Tissue Banks & Pathology Tools

provides for the integration, development, and implementation of tissue and pathology tools.

DOMAIN WORKSPACE 2Integrative Cancer ResearchDOMAIN WORKSPACE 2Integrative Cancer Research

provides tools and systems to enable integration and sharing of information.

DOMAIN WORKSPACE 1Clinical Trial Management SystemsDOMAIN WORKSPACE 1Clinical Trial Management Systems

addresses the need for consistent, open and comprehensive tools for clinical trials management.

CROSS CUTTING WORKSPACE 2Architecture

CROSS CUTTING WORKSPACE 2Architecture

developing architectural standards and architecture necessary for other workspaces.

CROSS CUTTING WORKSPACE 1Vocabularies & Common

Data Elements

CROSS CUTTING WORKSPACE 1Vocabularies & Common

Data Elements

responsible for evaluating, developing, and integrating systems for vocabulary and ontology content, standards, and software systems for content delivery

DOMAIN WORKSPACE 4ImagingDOMAIN WORKSPACE 4Imaging

provides for the sharing and analysis of in vivo imaging data.

Page 10: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG backgroundCommunities

9Star ResearchAlbert EinsteinArdais Argonne National LaboratoryBurnham Institute California Institute of Technology-JPLCity of Hope Clinical Trial Information Service (CTIS)Cold Spring HarborColumbia University-Herbert IrvingConsumer Advocates in Research and Related Activities (CARRA)Dartmouth-Norris CottonData Works DevelopmentDepartment of Veterans AffairsDrexel University Duke UniversityEMMES CorporationFirst Genetic TrustFood and Drug AdministrationFox Chase Fred HutchinsonGE Global Research CenterGeorgetown University-LombardiIBMIndiana UniversityInternet 2Jackson LaboratoryJohns Hopkins-Sidney Kimmel Lawrence Berkeley National Laboratory Massachusetts Institute of Technology Mayo Clinic Memorial Sloan KetteringMeyer L. Prentis-KarmanosNew York University

Ohio State University-Arthur G. James/Richard SoloveOregon Health and Science UniversityRoswell Park Cancer Institute St Jude Children's Research HospitalThomas Jefferson University-KimmelTranslational Genomics Research InstituteTulane University School of MedicineUniversity of Alabama at BirminghamUniversity of Arizona University of California Irvine-Chao FamilyUniversity of California, San FranciscoUniversity of California-DavisUniversity of ChicagoUniversity of ColoradoUniversity of Hawaii University of Iowa-HoldenUniversity of MichiganUniversity of MinnesotaUniversity of NebraskaUniversity of North Carolina-Lineberger University of Pennsylvania-AbramsonUniversity of PittsburghUniversity of South Florida-H. Lee Moffitt University of Southern California-NorrisUniversity of VermontUniversity of WisconsinVanderbilt University-IngramVelosVirginia Commonwealth University-MasseyVirginia TechWake Forest UniversityWashington University-SitemanWistarYale UniversityNorthwestern University-Robert H. Lurie

Page 11: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG backgroundDuke’s role in caBIG

•Pankaj Agarwal•Bob Annechiarico•Bill Banks•Vijaya Chadaram•Jamie Cuticchia•Raj Dash•Mohammad Farid•Seth Fehrs•Patrick McConnell•Salvatore Mungal•Mark Peedin

•CALGB•CCR•Coalition of Cooperative Groups•Dana Farber•Georgetown•Mayo•Oregon Health Sciences University•SemanticBits LLC•University of Pennsylvania•Wake Forest•Yale

•Integrative Cancer Research• Workspace participant• RProteomics developer • caTRIP developer

•Architecture• Workspace participant• caGrid developer• caGrid scientific liaison• Guide to Mentors

•Vocabularies and Common Data Elements• Workspace participant• Guide to Mentors

•Clinical Trials Management Systems• Workspace participant• C3PR developer• CTMS Interoperability architect• C3D developer

•Tissue Banking and Pathology Tools• Workspace participant• caTissue adopter

•Strategic Planning• Workspace participant

Page 12: caBIG Data Structures

The Cancer Translational ResearchInformatics Platform (caTRIP)

Background, use cases, architecture

Page 13: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIPWho is involved?

•Duke Bioinformatics

• Jamie Cuticchia (PI)

• Patrick McConnell (lead architect)

•Duke Information Systems

• Bob Annechiarico (PM)

• Wilma Stanley (developer)

• Mark Peedin (developer)

• Mohamad Farid (DBA)

• Jeff Allred (IT manager)

•Duke Pathology

• Raj Dash (domain expert)

• Chris Hubbard (developer)

•Duke Oncology

• Kelley Marcom (domain expert)

• Gretchen Kimmick (domain expert)

• Kimberly Blackwell (domain expert)

• Lee Wilke (domain expert)

•Duke CALGB

• Kimberly Johnson (DataMart liaison)

•SemanticBits

• Ram Chilukuri (lead developer)

• Srini Akkala (developer)

• Sanjeev Agarwal (developer)

•5 AM Solutions

• Bill Mason (developer)

•NCI

• Julie Klemm (ICR WS lead)

• Carl Shaefer (NCI rep)

• Subha Madhavan (caIntegrator PM)

•BAH

• Curtis Lockshin

• Mehul Shah (tech support)

Managers and Architects

Database Developers and IT

Domain Experts

Software Developers

NCI/BAH

Page 14: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP What is translational research?

• Bench-to-Bedside• Wikipedia (the source of all knowledge):

Translational medicine is a branch of medical research that attempts to more directly connect basic research to patient care.

• Basic research occurs in the lab• Patient care occurs in the clinic

• Translational research broadened…

Translational medicine can also have a much broader definition, referring to the development and application of new technologies in a patient driven environment - where the emphasis is on early patient testing and evaluation.

…facilitate the interaction between basic research clinical medicine, particularly in clinical trials.

Page 15: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP Initial focus

• Our initial focus will be on connecting existing data systems, including basic science data, to enhance patient care

• Initial problem scenario: outcomes analysis

• Use data from existing patients to inform the treatment of another patient

• Leverage clinical, pathology, tissue, and basic science data

• Scenario:

Patient A enters the clinic. What treatments were applied with success on other patients with similar characteristics (race, sex, symptoms, pathology results, adverse events, biomarkers).

Page 16: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP Broadened focus: scientific use cases

• Find available tumor tissue• What are all the tissue specimens from her2/neu positive patients that have a

primary tumor in the breast and are BRCA1 positive?• Find factors of survival

• What are all the ER positive patients that have survived breast cancer after radiation treatment?

• Find patients for trials• What are all the patients that are triple negative (ER, PR, and HER2/NEU

negative)?• Determine the distribution of disease factors over time

• Does a change in pathology biomarkers over time contribute to recurrence or death?

• Determine correlation of factors pre and post surgery• Does a change in ER or PR status before and after surgery correlate with other

factors?• Find pathology reports of interest

• Show me all of the pathology reports for Her2/Neu positive patients with a lobular carcinoma.

Page 17: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP Connecting disparate data systems

Tumor Registry

Diagnosis, Treatment,

Recurrence, Follow-up

CAE

Pathology Biomarkers

caTissue CORE

Tissue Bank

caTIES

Pathology Reports

caIntegrator

SNP Data

MRNMRN

Page 18: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP Architecture overview

MAW3 Tumor Registry Illumina

caTIES CAEcaTissue

CORETR

CGEMSSNP

Duke

Domain Grid Services

Distributed Query Engine

query

disc

over

GU

I

IndexService

IdPService

Core Grid Services

Domain Controller

auth

entic

ate

caTissue CORE

caTIES CAE TR caIntegrator

GridGrouper

authorize

Page 19: caBIG Data Structures

caBIG Compatibility

Interoperability, compatibility, syntactics, and semantics

Page 20: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

ability of a system to access and use the parts or equipment of another system

SemanticSemanticinteroperabilityinteroperability

SyntacticSyntacticinteroperabilityinteroperability

Courtesy: Charlie MeadcaBIG compatibility Interoperability defined

Page 21: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility How does this apply to caBIG?

• Connect scientists and practitioners through a shareable and interoperable infrastructure

• Develop standard rules and a common language to more easily share information (compatibility guidelines)

• Build or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care.

“The cancer community is united in its mission to eliminate suffering and death due to cancer. It is now connected by caBIG™. “

Page 22: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility What is compatibility in caBIG?

The four areas of the caBIG compatibility guidelines:

• Information Models - Individual types of data are rarely collected or presented in isolation. Rather, they are assembled into a contextual environment that includes closely and more distantly associated data and information. These associations and relationships can be presented in the form of an information model.

• CDEs - Data that is collected on a given study or trial must be defined and described such that remote users of that data can understand what it means. These metadata descriptions are referred to as data elements.

• Vocabularies and Ontologies - Biomedical information includes a substantial body of specialized concepts that are represented by terms. Agreement upon the basic concepts, terms and definitions that are inherent in all biomedical information is essential for achieving semantic interoperability.

• Programming and Messaging Interfaces - Computer programs and the people who write them are able to access resources from other programs through programming and messaging interfaces. Each of these interfaces responds to a particular syntax for its communications. Agreement upon standards for these interfaces is necessary to overcome barriers to syntactic interoperability.

Page 23: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Levels of compatibility

The four levels of the caBIGTM compatibility guidelines:

• Legacy - Implies no interoperability with an external system or resource. A system that was designed without awareness of or prior to the availability of these compatibility guidelines, and which does not meet any of the requirements for interoperability.

• Bronze - Classifies the minimum requirements that must be met to achieve a basic degree of interoperability.

• Silver - A rigorous set of requirements that, when met, significantly reduce the barrier to use of a resource by a remote party who was not involved in the development of that resource.

• Gold - Currently being defined by caBIG. Is expected to provide for a formalized grid architecture and data standards that will enable standardized advertising, discovery, and use of all federated caBIG resources.

Page 24: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Syntactic

Semantic

Semantic & Syntactic

caBIG compatibility caBIG compatibility guidelines

Page 25: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Syntactic interoperability

• The solution for syntactic interoperability in caBIG at the silver level of compatibility is for all systems to provide an Object Oriented Application Programmer Interface (API).

• Object Oriented Interfaces can be

implemented in many programming

languages.

• This interface can be connected to the

caGrid so that the local data repository

is globally accessible in a language

independent way.

• The interface is described by an

information model, which acts as the

junction between the syntactic

components and the semantic

components.

Gene

+ name: String

+ hugoGeneSymbol: String

+ sequence: String

Page 26: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Programming and messaging interfaces

• Types of APIs

• Client APIs in a programming language

• Messaging APIs via a messaging protocol

• Types of systems

• Data services provide access to an information model

• Query method• Associations are “traversable”

• Analytical services provide methods tomanipulate data

• Hybrid services provide methods to manipulate information models

• Analytical tools consumer of silver compatible data, but don’t produce it

Page 27: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Programming and messaging interfaces details

Legacy Bronze Silver Gold

No programmatic interfaces to the system are available. Only local data files in a custom format can be read

Data transfer mechanisms implemented only on an ad hoc basis

Programmatic access to data from an external resource is possible.

Well-described API’s provide access to data in the form of data objects.

Standards-based electronic data formats are supported for both input to and output from the system.

Standards-based messaging protocols are supported wherever messaging is relevant.

All features of Silver, plus:

Service-oriented components produce or consume resources in the form of grid services

Interoperable with data grid architecture to be defined by caBIG

Examples

Executables Proprietary API/data format

JavaDocs

XML, ASN.1

SOAP, CORBA

Globus

caGrid-based services

Page 28: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility caTRIP API

Hyperlinks to

caTRIP API

docs

Page 29: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility caTRIP grid service WSDL

Hyperlinks to

caTRIP API

WSDL

Page 30: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

cd Logical Model

engine::FederatedQueryProcessor

+ processDCQLQueryPlan(DCQLQueryDocument) : CQLQuery

engine::FederatedQueryExecutor

+ executeCQLQuery(CQLQuery, String) : CQLQueryResults

«interface»engine::FederatedQueryEngine

+ execute(Document) : CQLQueryResults

ResultAggregator

+ aggregateGroups(Group[]) : Group+ buildGroup(List) : Group+ processResults(CQLQueryResults) : List

Serv iceClientFactory

+ getSeviceClient() : Object

Object

caGridDataServ ice1Client

+ query(CQLQuery) : CQLQueryResults

caGridDataServ ice2Client

+ query(CQLQuery) : CQLQueryResults

executes / obtains

executes

caBIG compatibility caTRIP grid service WSDL

Hyperlinks to

caTRIP FQP

UML

Page 31: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Semantic interoperability

• The Solution for semantic interoperability lies in object

oriented UML design of the service, an unambiguous

description of elements within the system and storage of

the description in a publicly accessible repository

(metadata).

• UML model

• Use of publicly accessible terminologies/

vocabularies/ontologies (EVS-NCI Thesaurus)

• Use of publicly accessible metadata repository

(caDSR)

Page 32: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Common data element (CDE) details

Legacy Bronze Silver Gold

No Structured metadata is recorded

Data element descriptions have sufficient detail for a subject matter expert to unambiguously interpret

Data elements are built using controlled terminology

Metadata is stored and publicized in an electronic format that is separate from the resource that is being described

Common Data Elements (CDEs) built from controlled terminologies and according to practices validated by the VCDE workspace are used throughout.

CDEs are registered as ISO/IEC 11179 metadata components in the cancer Data Standards Repository (caDSR)

All features of Silver, plus:

Common Data Elements (CDEs) designated as caBIG Standards by the VCDE workspace are used.

Metadata is advertised and discoverable via the caBIG grid services registry

Examples

Free-text pathology reports

GeneOntology from GO website

NCI Thesaurus GeneOntology registered in EVS

NCI Thesaurus

Page 33: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Enterprise Vocabulary Services• Storage of Metadata

• caDSR = cancer Data Standards Repository• Common Data Elements = CDEs• Enable end-users to access information about data and

services without having to access human developers• = Fusion of UML models + Concepts/Definitions

caBIG compatibility Metadata stored in caDSR

caDSR Search Tree: Displays all the current caDSR Contexts. Users can search for groups of DEs by navigating the tree.

Data Element Search Pane: This is the main search window. Users looking for Data Elements can enter a key word or phrase.

Navigation Menu: use these buttons to navigate to the CDE cart, Form Builder, or back to Home( that is back to this page)

Page 34: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility caTRIP CDEs

Hyperlinks to

caTRIP CDEs

Page 35: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Vocabulary/terminology details

Legacy Bronze Silver Gold

Free text used throughout for data collection

Use of publicly accessible controlled vocabularies as well as local terminologies.

Terminologies must include definitions of terms that meet caBIG VCDE workspace guidelines

Terminologies reviewed and validated by the caBIG Vocabulary/Common Data Element (VCDE) Workspace used for all relevant data collection fields.

All features of Silver, plus:

Full adoption of caBIG terminology standards as approved by the VCDE Workspace.

Examples

Free-text pathology reports

GeneOntology from GO website

NCI Thesaurus GeneOntology registered in EVS

NCI Thesaurus

Page 36: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Enterprise Vocabulary Services• Controlled vocabulary resources for the cancer research community• Vocabulary Products and Services

• NCI Thesaurus• NCI Metathesaurus• External Vocabularies

• NCI Thesaurus - controlled vocabulary source for metadata• Has excellent coverage of cancer terminology• Expands based on needs for additional terminology• Based on concepts rather than terms• Each concept has a unique identifier or CUI with definitions and

synonym• Housed by the Enterprise Vocabulary Service (EVS)• LexBIG

• a caBIG-funded vocabulary server to enable a Federated Vocabulary environment.

caBIG compatibility Publicly accessible terminologies

Page 37: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility caTRIP CDEs

Hyperlinks to a

caTRIP

concept

Page 38: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Information model (UML) details

Legacy Bronze Silver Gold

No model describing the system is available in electronic format

Diagrammatic representation of the information model is available in electronic format.

Information models are defined in UML as class diagrams and are reviewed and validated by the VCDE workspace.

All features of Silver, plus:

Information models are harmonized across the caBIG Domain Workspaces

Examples

Database diagramcd StatML

statml::Array

- base64Value: String- dimensions: String- name: String- type: String

statml::Data

statml::List

- length: Integer- name: String- type: String

statml::Null

statml::Scalar

- name: String- type: String- value: String

0..*

+scalar

1

0..*

+scalar

1

0..*+null

1

0..*+null

1

+context 0..1

1

+list 0..*

1

0..*+list

1

0..*+array

1

0..*+array

1

cd StatML

statml::Array

- base64Value: String- dimensions: String- name: String- type: String

statml::Data

statml::List

- length: Integer- name: String- type: String

statml::Null

statml::Scalar

- name: String- type: String- value: String

0..*

+scalar

1

0..*

+scalar

1

0..*+null

1

0..*+null

1

+context 0..1

1

+list 0..*

1

0..*+list

1

0..*+array

1

0..*+array

1

Page 39: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Domain information modeling

• A Domain Information Model is a representation of

our understanding of an area of knowledge.

• Domain Information Models consist of ‘Classes’

that represent ‘things’ in the real world

• Classes contain ‘attributes’ that are characteristics

of different instances of things in the real world.

• Relationships between the classes are described

by ‘associations’ and indicated by lines with

directionality and cardinality

• Each class plus attribute creates one Common

Data Element (CDE)

cd Central Dogma

Gene

+ name: String+ hugoGeneSymbol: String+ sequence: String

Transcript

+ sequence: String+ length: String

Protein

+ name: String+ aminoAcidSequence: String+ molecularWeight: double

1+transcript

1+protein

1+gene

1..*+transcriptCollection

Page 40: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Tumor Registry model

Participant

Diagnosis

Treatment

Follow up and Recurrence

Collaborative Staging

Hyperlinks to

caTRIP UML

Page 41: caBIG Data Structures

Building caBIG Compatible Systems

Page 42: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Building caBIG compatible systemsSteps for creating an analytical system

• Step 1: model and register metadata• Model the domain objects• Register metadata

• Step 2: implement the analytical system• Implement an interface• Map data objects to existing inputs• Plug-in analytics

• Step 3: create the data service• Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service• Configure the service • Deploy

• Step 4: invoke the service• Java-based client• Use caTRIP

Page 43: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Building caBIG compatible systemsSteps for creating a data system

• Step 1: model and register metadata• Model the domain objects• Register metadata

• Step 2: implement the information system• Model the databases (via scripts or EA)• Build the database• Generate Java beans• Create Hibernate mappings• Jar it all up

• Step 3: create the data service• Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service• Configure the service • Deploy

• Step 4: invoke the service• Java-based client• Use caTRIP

Page 44: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Building caBIG compatible systemsN-tier architecture

domainmodel

caCORE SDK

CQL Engine

database

Object-relational mapping

Index Service

caGrid Data ServiceDistributed

Query Engine

ad

vert

ise

advertise

CQL Query

Page 45: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Building caBIG Compatible SystemscaCORE SDK

UML Model XMI File

VerifiedEVSReport

Code Generator

VerifiedAnnotatedFixed XMI

caDSRSTAGEPublic APIs

EVS

NO

Fixed XMI

MetadataRetrieval

Stage

caDSRProduction

Terminology Services

SuccessfulTest?

CompatibilityReview

YES

ApprovedAnnotatedFixed XMI

caDSR ServicesUsing

CodeGen?

YES

NO

SemanticIntegrationWorkbench

(SIW)

Load to Stage

UMLLoaderUML

Loader

Info Model

Messaging Interfaces/

API

CommonData

Elements

Vocabularies

Page 46: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Mapping UML to CDEs

Common Data Element (CDE)

Data Element Concept (DEC)

Value Domain (VD)

Object Class (OC)

Property

EVS Concept

UML ClassUML

Attribute

UML Class Attribute

UML Datatype

UML Class Attribute Datatype

Page 47: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Mapping UML to CDEs example

Class: Gene

Datatype: String

Attribute:entrezGeneID

Gene Entrez Gene Genomic Identifier

java.lang.String

Gene

Entrez GeneGenomic Identifier

java.lang.String

Created Data Element

Page 48: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Use SIW to designate existing CDEs

Page 49: caBIG Data Structures

caGrid

Background, service creation, metadata

Page 50: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridWhat is caGrid?

• What is Grid?• Evolution of distributed computing to support sciences and engineering• Sharing of resources (computational, storage, data, etc)• Secure Access (global authentication, local authorization, policies, trust,

etc.)• Open Standards• Virtualization

• What is caGrid?• Development project of Architecture Workspace

• Helping define and implement Gold Compliance• Implementation of Grid technology

• Leverages open standards, community open source projects• No requirements on implementation technology necessary for compliance

• Specifications will be created defining requirements for interoperability• caGrid provides core infrastructure, and tooling to provide “a way” to achieve

Gold compliance• Gold compliance creates the G in caBIG™

• Gold => Grid => connecting Silver Systems

Page 51: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridMetadata infrastructure goals

• Support strongly typed grid• Syntactic and Semantic interoperability

• Programmatic!

• Smooth transition from Application to Grid and back

• Leverage wealth of existing metadata• Enable service Advertisement and Discovery

Page 52: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridService development process

• Service developers first create a service using a simple wizard to specify information (target directory, type of service, service name, etc)

• Next developer locate the data types they will use for inputs or outputs• Can be discovered from the caDSR, GME, file system, etc

• Operations are then defined that take some number of the data types as input, and produce some number as output

• Metadata and Service Properties can be added and configured• The service’s security can be completely configured

• Some or all of these steps may be automatically handled by extensions

Page 53: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGrid Introduce

• GUI for creating and manipulating a grid service• Provides means of

simple creation of service skeleton that a developer can then implement, build, and deploy

• Automatic code generation of complete caBIG compliant grid service which is configured to provide:

• Advertisement• Standard Metadata• Security• Complete Client API

Page 54: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridSteps for creating a data system

• Step 1: model and register metadata• Model the domain objects• Register metadata

• Step 2: implement the information system• Model the databases (via scripts or EA)• Build the database• Generate Java beans• Create Hibernate mappings• Jar it all up

• Step 3: create the data service• Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service• Configure the service • Deploy

• Step 4: invoke the service• Java-based client• Use caTRIP

Page 55: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridSteps for creating an analytical system

• Step 1: model and register metadata• Model the domain objects• Register metadata

• Step 2: implement the analytical system• Implement an interface• Map data objects to existing inputs• Plug-in analytics

• Step 3: create the data service• Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service• Configure the service • Deploy

• Step 4: invoke the service• Java-based client• Use caTRIP

Page 56: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridcaGrid data description infrastructure

• Client and service APIs are object oriented, and operate over well-defined and curated data types

• Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR)

• Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described

• XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)

Service

Core Services

Client

XSDWSDL

Grid Service

Service Definition

Data TypeDefinitions

Service API

Grid Client

Client API

Registered In

Object Definitions

SemanticallyDescribed In

XMLObjectsSerialize To

ValidatesAgainst

Client Uses

Cancer Data Standards Repository

Enterprise Vocabulary

Services

Objects

GlobalModel

Exchange

GMERegistered In

ObjectDefinitions

Objects

Page 57: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridMetadata services

• Cancer Data Standards Repository (caDSR)• caBIG projects register their data models as Common Data Elements (CDEs) which are

semantically harmonized and then centrally stored and managed the caDSR• The caDSR grid service provides:

• Model discovery and traversal• caGrid standard metadata generation capabilities

• Enterprise Vocabulary Services (EVS)• EVS is set of services and resources that address the need for controlled vocabulary• The EVS grid service provides:

• Query access to the data semantics and controlled vocabulary managed by the EVS• Global Model Exchange (GME)

• GME is a DNS-like data definition registry and exchange service that is responsible for storing and linking together data models in the form of XML schema.

• The GME grid service provides:• Access to the authoritative structural representation of data types on the grid

• Globus Information Services: Index Service• The Globus Information Services infrastructure provides a generic framework for aggregation

of service metadata, a registry of running Grid services, and a dynamic data-generating and indexing node, suitable for use in a hierarchy or federation of services

• The Index grid service provides:• Yellow and white pages for the grid

Page 58: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridcaGrid production environment

Page 59: caBIG Data Structures

The Cancer Translational ResearchInformatics Platform (caTRIP)

Demonstration

Page 60: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP Clinical and research scenarios

• Clinical scenario for demonstration• A patient enters the clinic and is diagnosed with a lobular carcinoma• The Her2/Neu biomarker test comes back positive• What are the treatments and outcomes of other patients with similar

characteristics?• Query for diagnosis date, treatment, treatment date, survival, recurrence, and

BRCA1 and BRCA2 status• Look for treatments given with success and correlation between BRCA status in

case test should be ordered• Research scenario for demonstration

• Is there a correlation between recurrence, mortality, histologic grade, and Her2/Neu status for breast cancer patients diagnosed with lobular carcinoma?

• Query caTRIP for recurrence type, date of death, histologic grade, and Her2/Neu status for patients diagnosed with lobular carcinoma

• Correlation is determined in Microsoft Excel• Investigate gene biomarkers that correlate with a Her2/Neu status of negative

and survival• Query caTRIP for all available tissue to order for microarray experiments

• Query sharing• What are all the triple negative patients?

Page 61: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP Why the Simple GUI?

• What are all the tissue specimens from her2/neu positive patients that have a primary tumor in the breast and are BRCA1 positive?

caTissue CORECAE

Tumor Registry CGEMS

Participant Medical Record Number

Page 62: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Discussion/questions

Page 63: caBIG Data Structures

Backup Slides

Page 64: caBIG Data Structures

CTMS Interoperability Project

Goals, scope, BRIDG, architecture, demo

Page 65: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

CTMSiA collaborative effort

11 Organizations• Booz Allen Hamilton• Dana-Farber• Duke University• Ekagra• Harvard University• Mayo Clinic• NCICB• Nortel Government Solutions• Northwestern University• ScenPro• SemanticBits

8 Locations• Maryland• Minnesota• Virginia• Georgia• Massachusetts• North Carolina• Illinois• France

35+ Team Members / 5 Applications• Cancer Central Clinical Participant Registry

(C3PR)• Cancer Central Clinical Database (C3D)• Patient Study Calendar (PSC)• caXchange: LabViewer and the Clinical Trials

Object Model (CTOM)• Cancer Adverse Events Reporting System

(caAERS)

8 Roles• Analysts• Architects• Developers• Project Director• Project Manager• Project Sponsor• Project Tech Leads• Subject Matter Experts

Page 66: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

CTMSi Credits

Project Director: Meg Gronvall (BAH)Charles N. Mead, M.D. (BAH)

NCICB CTMS Lead:Christo Andonyadis, D.Sc. (NCICB)

Project Manager:Edmond Mulaire (SemanticBits)

Project Architects:Patrick McConnell (Duke)Niket Parikh (BAH)

Analysts:Smita Hastak (ScenPro)Wendy Ver Hoef (ScenPro)

Subject Matter Experts:Sharon Elcombe (Mayo Clinic)Vijaya Chadaram (Duke)Jomol Mathew (Dana-Farber)Renee Webb (Northwestern)

NCICB Systems Support:Gavin Brennan (TerpSys), Vanessa Caldwell (TerpSys), Doug Kanoza (TerpSys), Wei Lu (TerpSys), Ralph Rutherford (TerpSys)

Project Technical Leads:Ram Chilukuri (SemanticBits)Charles Griffin (Ekagra)Vinay Kumar (SemanticBits)Stephen Reckford (Nortel Government Solutions)Rhett Sutphin (Northwestern)Sean Whitaker (Northwestern)

caAERS: Ram Chilukuri (SemanticBits), Krikor Krumlian(Akaza Research), Vinay Kumar (SemanticBits), RhettSutphin (Northwestern), Kulasekaran Sethumadhavan(SemanticBits), Sujith Thayylithodi (SemanticBits)

caGrid: Manav Kher (SemanticBits), Vinay Kumar (SemanticBits), Joshua Phillips (SemanticBits)

caXchange (Lab Viewer/CTOM): Charles Griffin (Ekagra), Smita Hastak (ScenPro), Mukesh Mediratta(Ekagra), Kunal Modi (Ekagra), Wendy Ver Hoef(ScenPro)

caXchange Extensions: Ekagra, SemanticBits

C3D: Srinivas Batchu (Ekagra), Patrick Conrad (Ekagra),Rangaraju Gadiraju (Ekagra), Stephen Reckford (Nortel)

C3PR: Kruttik Aggarwal (SemanticBits), Ram Chilukuri(SemanticBits), Ramakrishna Gundala (SemanticBits),Manav Kher (SemanticBits), Patrick McConnell (Duke), Priyatam Mudivarti (SemanticBits)

PSC: Rhett Sutphin (Northwestern), Sean Whitaker(Northwestern)

Page 67: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

CTMSi Goal

Patient Scheduling

Participant Registration Lab Results

Clinical Trials DB

Adverse Events

Integrate

caGrid

caXchange

Page 68: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

CTMSi BRIDG extract

cd CTMS Interoperability BRIDG-Based Analysis Model for Data Exchange

NOTES

Clinical Research Entities and Roles::Person

+ administrativeGenderCode: BRIDGCodedConcept+ dateOfBirth: dateTime- ethnicGroup: string- firstName: string- lastName: string- race: string

PersonRole

Clinical Research Entities and Roles::Participant

::Person+ administrativeGenderCode: BRIDGCodedConcept+ dateOfBirth: dateTime- ethnicGroup: string- firstName: string- lastName: string- race: string::Role+ id: BRIDGID

Participation

Clinical Research Activ ities and Participation::StudySubject

+ studySubjectIdentifier: BRIDGID::Participation+ endDate: dateTime+ identifier: BRIDGID+ startDate: dateTime = + status: BRIDGStatus

Clinical Research Activ ities and Participation::PerformedActiv ity

+ endDateTime: dateTime+ startDateTime: dateTime

BRIDG Shared Classes::Activity

+ codedDescription: BRIDGCodedConcept+ description: BRIDGDescription+ status: BRIDGStatus+ type: BRIDGCodedConcept

Participation

Clinical Research Activ ities and Participation::StudySite

::Participation+ endDate: dateTime+ identifier: BRIDGID+ startDate: dateTime = + status: BRIDGStatus

Clinical Research Entities and Roles::Organization

+ identifier: BRIDGID+ name: string

OrganizationRole

Clinical Research Entities and Roles::

HealthCareSite

::Organization+ identifier: BRIDGID+ name: string::Role+ id: BRIDGID

Clinical Research Activ ities and Participation::LabTest

ObjectiveResultQuantitativeMeasurement

Clinical Research Activ ities and Participation::LabResult

+ textResult: string::QuantitativeMeasurement+ numericResult: float+ numericUnits: BRIDGCodedConcept+ referenceRangeComment: string+ referenceRangeHigh: int+ referenceRangeLow: int

Name: CTMS Interoperability BRIDG-Based Analysis Model for Data ExchangeAuthor: Smita HastakVersion: 1.0Created: 8/13/2001 12:00:00 AMUpdated: 1/12/2007 9:50:44 AM

In implementation:do NOT use endDate, startDate, status

In implementation:do NOT use identifierC3PR only uses SubjectIdentifier

StudyParticipantEligibility

+ isEligible: boolean

Observation

Observations::AdverseEvent

- verbatimTerm: String::Activity+ codedDescription: BRIDGCodedConcept+ description: BRIDGDescription+ status: BRIDGStatus+ type: BRIDGCodedConcept

Green notes mark classes where attributes inherited from the same superclass are inherited in two differentsubclasses but are not necessarily used in both.

Note to Implementers: This is an analysis model not an implementation model , and therefore supplemental attributes may be required in your implementation model to support data exchange between applications (e.g. extra ids). Furthermore, it may be that not all attributes included here are required for data exchanges and may be eliminated from this model . It is also likely that an implementation based on this model may collapse associations to simplify the structure of data exchanges.

Identifier

+ identifier: BRIDGID+ type: BRIDGCodedConcept

Clinical Research Activities and

Participation::Study

+ id: BRIDGID+ longTitle: string

Disclaimer: BRIDG classes used in this model have been pared down to only what is needed for data exchange in the CTMS Interoperability project and this in no way indicates or suggests changes to the official BRIDG model .

0..*1

0..*

1

1

0..*

0..*1

0..*1

+are performed at

1..*

+participate in

1

1

0..*

+labTest 1

+labResult 0..1

Subject

Site

Study

Labs

Adverse

Events

Eligibility

Page 69: caBIG Data Structures

cd Comprehensiv e Logical Model

Entities and Roles::Access

Entities and Roles::Activ ityRoleRelationship

+ relationshipCode: PSMCodedConcept+ sequenceNumber: NUMBER+ negationIndicator: BOOLEAN+ time: TimingSpecification+ contactMediumCode: PSMCodedConcept+ targetRoleAwarenessCode: PSMCodedConcept+ signatureCode: PSMCodedConcept+ signature: PSMDescription+ slotReservationIndicator: BOOLEAN+ substitionConditionCode: PSMCodedConcept+ id: PSMID+ status: PSMCodedConcept

Entities and Roles::Dev ice

- manufacturerModelName: - softwareName: - localRemoteControlStateCode: - alertLevelCode: - lastCalibrationTime:

Entities and Roles::Employee

+ jobCode: PSMCodedConcept

Entities and Roles::Entity

+ instantiationType: ENUM {Placeholder, Actual}+ id: SET <PSMID>+ name: string+ code: PSMCodedConcept+ quantity: int+ description: PSMDescription+ statusCode: BRIDGStatus+ existenceTime: BRIDGInterval+ riskCode: PSMCodedConcept+ handlingCode: PSMCodedConcept+ contactInformation: SET <PSMContactAddr>

Entities and Roles::Liv ingEntity

+ birthTime: + sex: + deceasedInd: boolean+ deceasedTime: - multipleBirthInd: boolean- multipleBirthOrderNumber: int- organDonorInd: boolean

Entities and Roles::ManufacturedMaterial

- lotNumberText: string- expirationTime: - stabil i tyTime:

Entities and Roles::Material

+ formCode:

Entities and Roles::NonPersonLiv ingEntity

+ strain: - genderStatusCode:

Entities and Roles::Organization

+ geographicAddress: + electronicCommAddr: + standardIndustryClassCode:

Entities and Roles::Patient

+ confidentialityCode:

Entities and Roles::Person

+ geographicAddress: - maritalStatusCode: - educationLevelCode: + raceCode: - disabil i tyCode: - l ivingArrangementCdoe: + electronicCommAddr: - religiousAffi l iationCode: + ethnicGroupCode:

Entities and Roles::Place

+ gpsText: - mobileInd: boolean- addr: - directionsText: - positionText:

Entities and Roles::

ResearchProgram

+ type:

Entities and Roles::Role

+ id: + code: PSMCodedConcept+ name: + status: + effectiveStartDate: + effectiveEndDate: + geographicAddress: + electronicCommAddr: + certificate/l icenseText:

Entities and Roles::Study

OProtocolStructure::Activ ityDeriv edData

OProtocolStructure::ElectronicSystem

OProtocolStructure::ResponsibilityAssignment

AbstractActivity

BasicTypes::RIMActivity

+ businessProcessMode: PSMBusinessProcessMode+ code: PSMCodedConcept+ derivationExpression: TEXT+ status: PSMCodedConcept+ availabil i tyTime: TimingSpecification+ priorityCode: PSMCodedConcept+ confidentialityCode: PSMCodedConcept+ repeatNumber: rangeOfIntegers+ interruptibleIndicator: BOOLEAN+ uncertaintyCode: CodedConcept+ reasonCode: PSMCodedConcept

BasicTypes::RIMActiv ityRelationship

+ relationshipCode: PSMCodedConcept+ sequenceNumber: NUMBER+ pauseCriterion: + checkpointCode: + splitCode: + joinCode: + negationIndicator: BOOLEAN+ conjunctionCode:

«ODM ItemData»Design Concepts::DiagnosticImage

OStudy Design and Data Collection::OEncounterDefinitionList--???

+ listOfDataCollectionInstruments:

OStudy Design and Data Collection::OBRIDGDeriv ationExpression

+ type: ENUM{transformation, selection}+ rule: TEXT+ id: PSMID+ name: TEXT

OStudy Design and Data Collection::OBRIDGTransition

+ criterion: RULE+ eventName: TEXT

Plans::Protocol/Plan

BusinessObjects::Amendment

Protocol Concepts::Bias

«implementationClass»BusinessObjects::

BusinessRule

BusinessObjects::ClinicalDev elopmentPlan

BusinessObjects::CommunicationRecord

Protocol Concepts::Concurrency

Protocol Concepts::

Configuration

Protocol Concepts::Constraint

Protocol Concepts::

Control

Protocol Concepts::DesignCharacteristic

+ synopsis: + type: test value domain = a,d,f,g+ summaryDescription: + summaryCode: + detailedMethodDescription: + detailedMethodCode:

Protocol Concepts::StudyDocument

+ effectiveEndDate: DATETIME+ version: + author: SET+ effectiveStartDate: DATETIME+ ID: SET PSMID+ documentID: + type: ENUMERATED = formal plus non...+ description: PSMDescription+ title: + status: PSMStatus+ confidentialityCode: PSMCodedConcept+ businessProcessMode: PSMBusinessProcessMode

Protocol Concepts::EligibilityCriterion

Protocol Concepts::ExclusionCriterion

BusinessObjects::IntegratedDev elopmentPlan

Design Concepts::Masking

+ level: + objectOfMasking (set): + procedureToBreak: + unmaskTriggerEvent (set):

Protocol Concepts::Milestone

BasicTypes::BRIDGAnalysisVariable

+ name: TEXT+ value: + controlledName: PSMCodedConcept+ businessProcessMode: PSMBusinessProcessMode

BasicTypes::BRIDGBusinessProcessMode

+ modeValue: ENUM {Plan, Execute}

BasicTypes::BRIDGContactAddr

+ type: PSMCodedConcept+ effectiveTime: BRIDGInterval+ usage: PSMCodedConcept

BasicTypes::BRIDGID

+ source: Text+ version: Text+ value: Text

BasicTypes::BRIDGInterv al

- startTime: timestamp+ endTime: timestamp

BasicTypes::BRIDGStatus

+ effectiveEndDate: + effectiveStartDate: + statusValue:

BusinessObjects::ProtocolRev iew

+ date: + result:

Design Concepts::Randomization

+ minimumBlockSize: + maximumBlockSize:

Protocol Concepts::

Scope

BusinessObjects::SiteStudyManagementProjectPlan

BusinessObjects::SiteSubjectManagementProjectPlan

BusinessObjects::SponsorStudyManagementProjectPlan

BusinessObjects::Study

+ startDate: Date+ endDate: Date+ type: PSMCodedConcept+ phase: PSMCodedConcept+ randomizedIndicator: Text+ SubjectType: PSMCodedConcept

Protocol Concepts::StudyBackground(why)

+ description: PSMDescription+ summaryOfPreviousFindings: PSMDescription+ summaryOfRisksAndBenefits: PSMDescription+ justificationOfObjectives: PSMDescription+ justificationOfApproach: PSMDescription+ populationDescription: PSMDescription+ rationaleForEndpoints: PSMDescription+ rationaleForDesign: PSMDescription+ rationaleForMasking: PSMDescription+ rationaleForControl: PSMDescription+ rationaleForAnalysisApproach: PSMDescription

Protocol Concepts::StudyObjectiv e(what)

+ description: PSMDescription+ intentCode: SET ENUMERATED+ objectiveType: ENUM{Primary,Secondary,Ancil lary}+ id: PSMID

Protocol Concepts::StudyObjectiv eRelationship

+ type: PSMCodedConcept

Protocol Concepts::StudyObligation

+ type: ENUMERATED+ description: PSMDescription+ commissioningParty: + responsibleParty:

BusinessObjects::Activ itySchedule (the "how",

"where", "when", "who")

+ description: PSMDescription

BusinessObjects::SupplementalMaterial

+ type: + description: PSMDescription+ version: + ID: SET PSMID

Protocol Concepts::Variance

BusinessObjects::Waiv er

Name: Comprehensive Logical ModelAuthor: FridsmaVersion: 1.0Created: 7/22/2005 2:53:51 PMUpdated: 7/29/2005 2:33:32 PM

BusinessObjects::Adv erseEv entPlan

BusinessObjects::DataManagementPlan

BusinessObjects::ContingencyPlan

BusinessObjects::SubjectRecruitmentPlan

BusinessObjects::DataMonitoringCommitteePlan

BusinessObjects::SafetyMonitoringPlan

BusinessObjects::Inv estigatorRecruitmentPlan

BusinessObjects::AssayProcedures

BusinessObjects::ClinicalTrialMaterialPlans

BusinessObjects::BiospecimenPlan

BusinessObjects::ProtocolDocument

BusinessObjects::ClinicalStudyReport

BusinessObjects::EnrollmentRecord

BusinessObjects::FinalRandomizationAssignment

BusinessObjects::GuideBusinessObjects::

RandomizationAssignment

+ randomizationCode: + subjectID: + assignmentDateTime:

BusinessObjects::

RegulatoryRecord

Protocol Concepts::Outcome

- description: BRIDGDescription- ranking: OutcomeRank- associatedObjective: Set- analyticMethods: Set- asMeasuredBy: Set- outcomeVariable: - threshold:

Statistical Concepts::Hypothesis

+ statement: PSMDescription- associatedObjective: - cl inicallySignificantDiff: char

AbstractActivity

Statistical Concepts::Computation

- description: PSMDescription- algorithm: char- input: AbstractStatisticalParameter- output: AbstractStatisticalParameter

Statistical Concepts::StatisticalModel

+ description: PSMDescription# outputStatistic: StudyVariable- computations: Set- assumptions: Set

Statistical Concepts::SampleSizeCalculation

+ clinicalJustification: TEXT

Statistical Concepts::AnalysisSetCriterion

- description: char- subgroupVariable: StudyDatum- sequence: int

Statistical Concepts::StatisticalAnalysisSet

+ description: PSMDescription- scopeType: AnalysisScopeTypes

Statistical Concepts::StatisticalAssumption

+ description: PSMDescription

Statistical Concepts::SequentialAnalysisStrategy

+ alphaSpendingFunction: + timingFunction: + analysis: + trialAdjustmentRule:

Statistical Concepts::StatisticalConceptArea

- evaluableSubjectDefinition: char- intentToTreatPopulation: char- clinicallyMeaningfulDifference: char- proceduresForMissingData: char- statSoftware: char- methodForMinimizingBias: char- subjectReplacementStrategy: char- randAndStratificationProcedures: char

Statistical Concepts::HypothesisTest

+ significanceLevel: double+ lowerRejectionRegion: int+ upperRejectionRegion: int+ testStatistic: + comparisonType: AnalyticComparisonTypes# associatedSummaryVariables:

AbstractActivity

Statistical Concepts::Analysis

+ description: PSMDescription+ analysisType: Set{AnalysisTypes}+ analysisRole: + rationaleForAnalysisApproach: PSMDescription# associatedStrategy: # associatedHypotheses:

Design Concepts::StudySchedule

- Periods: Set- Tasks: Set- TaskVisits: Set- associatedArms: Set

AbstractActivity

«Period»Design Concepts::Element

- Children: Set- epochType: EpochTypes

AbstractActivity

Design Concepts::PlannedTask

- displayName: char[]- whoPerforms: int- sequence: int- procDefID: PSMCodedConcept- sourceText: char[]

AbstractActivity

Design Concepts::Ev entTask

- localFacil ityType: LocalFacil i tyType- centralFacil i ti tyType: CentralFacil itiyType- eventID: OID- taskID: OID- purposes: Set

SubjectEvent

Design Concepts::ProtocolEv ent

- parent: AbstractActivity- eventType: ScheduledEventType- studyOffset: PSMInterval- studyDayOrTime: char

Design Concepts::Ev entTaskPurpose

- isBaseline: boolean- purposeType: PurposeType- associatedOutcome:

SubjectEvent

Design Concepts::UnscheduledEv ent

- eventType: UnscheduledEventType

BusinessObjects::StatisticalAnalysisPlan

Design Concepts::StudyActiv ityRef

- activityID: OID

«ODM ItemData»Design Concepts::Observ ation

- transactionType:

«ODM:ItemData»Design Concepts::

TreatmentConfirmed

«ODM:ItemDef»Design Concepts::

PlannedInterv ention

«ODM:ItemDef»Design Concepts::

PlannedObserv ation

AbstractActivity

«abstract»Design

Concepts::StudyActivityDef

«implementationClass»Design Concepts::ClinicalDecision

«implementationClass»Design Concepts::

TemporalRule

BasicTypes::StudyVariable

- OID: long- Name: char- unitOfMeasureID: OID- minValid: - maxValid: - controlledName: ENUM

BasicTypes::StudyDatum

- complete: bool- value: Value- timestamp: timestamp- itemOID:

BasicTypes::ActActRelation

- description: BRIDGDescription- relationQualifier: BRIDGCodedConcept- mode: PSMBusinessProcessMode- effectiveTime: BRIDGInterval+ priorityNumber: NUMBER- negationRule: AbstractRule- detail: char- sourceAct: AbstractActivity- destAct: AbstractActivity- sequence: int

+ «property» relationQualifier() : PSMCodedConcept+ «property» sourceAct() : AbstractActivity+ «property» destAct() : AbstractActivity

BasicTypes::AbstractRule

- isExclusive: bool

+ run() : bool

BasicTypes::AnalysisVariableInst

- roleInAnalysis: RoleInAnalysisTypes

Design Concepts::Arm

- nameOfArm: char[]- plannedEnrollmentPerArm: char[]- randomizationWeightForArn: int- associatedSchedules: Set

BasicTypes::BRIDGCodedConcept

- code: TEXT- codeSystem: - codeSystemName: TEXT- codeSystemVersion: NUMBER- displayName: TEXT- originalText: TEXT- translation: SET{PSMCodedConcept}

«ODM:ItemData»Design Concepts::

SubjectDatum

- subjectID: int

0..*

1

*

1

1..*

*

1

+source 1

+target 0..*

1 *

+correlativeStudy 0..*

+primaryStudy 1

1 *

hasAnalysisSets

*-_StatisticalAnalysisSet

hasAssumptions

hasModel

kindOfAnalysis

hasHypotheses

kindOfAnalysis

hasPurposes

hasAnalyses

kindOfActRelation

isKindOf

hasComputations

«abstraction»

1 1..*

hasAnalyses

*

-_Hypothesis

1

1..*

1-sourceobjective

*

*

+target activity

hasChildAnalyses

Defined By

-sourceactivity

*

Scheduled Sub Activities

Defined By

hasAnalysisSets

restates Objective

hasStrategy

hasElements

tasksPerformedThisSchedule

hasArms

as Measured By

hasUnscheduledEvents

hasOngoingEvents

Implements

hasCriteria

implements

«execution mode»

kindOfActivityRelation

implements

hasElements

associatedVariable

*-_DevelopmentPlan

kindOf

HasSubElements

hasSchedules

1..*

1..*

hasScheduledEvents

1

taskAtEvent

1..*

+TerminatingActivity 1..*

+EndEvent 1

+StartEvent 1

+FirstActivity 1..*

+passedTo

1+targetActivity

1+contains

1..*+IsContainedIn

1

1

1..*

1

-sourceactivity

0..*+generates

+sourceActivity

Protocol Authoring and Documentation

Clinical Trial Design

Structured Statistical Analysis

Clinical TrialRegistration

Eligibility Determination

Protocol activities and Safety monitoring (AE)

Page 70: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGrid

Enterprise Service Bus

CTMSi Architectural overview

Oracle

Grid Service

C3PR

GTSDorian Grid Grouper

Inbound

Binding

Component

Outbound

Binding

Component

Routing Rules

Messages

Oracle

Web Service

C3D

Postgre

Grid Service

PSC

Oracle

Grid Service

LabViewer/CTOM

Postgre

Grid Service

caAERS

Authentication

Trust

Authorization

caXchange

Page 71: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

CTMSi Demonstration

sd ov erv iew sequence

SME

C3PRcaExchange PSC LabViewer caAERSCTOM

User will create a new patient and register the patient to a protocol, checking the eligibil ity status. The protocol is already prepopulated amongst all the systems.

The user will have a hot-link from the C3PR interface to thePSC interface. The user will see the patient registered on the prepopulated protocol .

The user will hot-link over to the Lab Viewer to view Lab activities.

C3D WS

We may not be able to hot-linkto C3D, but the data should bepropogated there and viewable from the C3D interface.

caExchange (or some component hooked into caExchange) will load data into C3D.

The user will hot-link from the LabViewer to caAERS, where he can edit and submit the AE.

A new AE with some minimal information willbe created and sent to caAERS through caExchange.

The user hot-links from caAERS to PSC, where they will see the AE notification and make appropriate changes.

registerPatient

registerPatient(Participant,StudySubject, StudySite,HealthCareSite)

isValidProtocol(studyId)

patientPositionId= getPatientPosition(site, studyId)

registerPatient(Participant,StudySubject, StudySite,HealthCareSite)

registerPatient(Participant, StudySubject,StudySite, HealthCareSite)

registerPatient(Participant, StudySubject,StudySite, HealthCareSite)

registerPatient(Participant, StudySubject,StudySite, HealthCareSite)

viewSchedule

scheduleActivity

viewLabActivities

viewLabData(Patient)

viewLabData

Lab[]= query(id[])

loadLabData

loadLabData(Paticipant, StudySubject,Study, LabTest, LabResult)

loadLabData(mrn, studyId, lab, labTest)

viewPatient

viewLabData

selectLabForAE

Lab[]= query(id[])

newAE(Paticipant, StudySubject, Study, LabTest, LabResult)

id= newAE(Participant, StudySubject, Study, LabTest, LabResult, AE)

editAE

submitAE

submitAE(Participant, StudySubject, Study, AE)

flagAE(Participant, StudySubject, Study, AE)

login

aeNotification

modifySchedule

Page 72: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Service Metadata: All Services

• Common Service Metadata

• Provided by all services

• Details service’s capabilities, operations, contact information, hosting research center

• Service operation’s inputs and outputs defined in terms of structure and semantics extracted from caDSR and EVS

• Majority auto-generated by Introduce

Page 73: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Service Metadata: Service Security

• Service Security Metadata

• Provided by all services

• Details the service’s requirements on communication channel for each operation

• Can be used by client to programmatically negotiate an acceptable means of communication

• For example: Does operation X allow anonymous clients, or are credentials required?

• Auto-generated by Introduce

Page 74: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Service Metadata: Data Service

• Data Service Metadata• Provided by all data

services• Describes the Domain

Model being exposed, in terms of a UML model linked to semantics

• Provides information needed to formulate the Object-Oriented Query

• As with common metadata, data types defined in terms of structure and semantics extracted from caDSR and EVS

• Auto-generated by Introduce

Page 75: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP in-depth: ArchitectureSecurity

caGrid Authentication Service

SAML Assertion

User Credentials Dorian

Duke Authentication Plugin

Duke Domain ControllerNT Security

Grid Data Service

User Grid Certificate

CSM

backenddata

authentication

authorization

GridGrouper

Trust Fabric

Page 76: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP in-depth: Data sharingChallenges in data sharing

• Building data-oriented systems

• Duke requires IRB approval to gain access to identifiable data

• We worked around by leveraging people already on IRB protocols

• Deidentifying data

• Data is owned by different groups across the cancer center

• Traditional deidentification: data manager deidentifies an entire dataset then throws away the key

• Distributed deidentification: trusted service provider (TSP) deidentifies discreet values

• Traditional approach is not scalable – requires a middle-man

• IRB approval required for distributed approach because it deviates from traditional deidentification (at Duke)

Page 77: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP in-depth: Data sharingDistributed deidentification

Trusted Service Provider

PHI

DEID

MRN1

ABC123

MRN2 DEF456

. . .

. . .

MRN3

GHI789

MRN3

GHI789

Randomly generated

Has IRB approval to see identifiable data

Has IRB approval to see identifiable data

Has IRB approval to store identifiable data

Secure connection

PHI

DEID

MRN1

ABC123

MRN2 DEF456

MRN3 GHI789

. . .

. . .

Page 78: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP in-depth: ArchitectureSimple GUI configuration

Target

Associated Classes

Service A Service B

Target

Linking ObjectJoin Condition

Linking ObjectJoin Condition

FilterObject

Association Direction

Association Direction

Associated Object Tree

Foreign Association Outbound Path

Foreign Association inbound Paths

Join Condition: CDE ex. MRN

Foreign AssociationService A Service B

TissueSpecimen

ParticipantMedicalIdentifier

SpecimenCharacteristics

SpecimenCollectio

nGroup

ClinicalReport

BreastCancerBiomarkers

Page 79: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP in-depth: ArchitecturecaBIG compatibility

• Challenge

• Silver-compatibility is in some ways (and for good reason) stringent

• Grid technologies were still in development (caGrid 1.0 is now released)

• caTRIP is a silver-compatible application (in theory)

• Compatibility submission package completed

• Going through review now for silver-compatible data services

• caTRIP leverages caCORE technologies

• Common Security Module (CSM) provides authorization

• caCORE-SDK provides tooling to create Java classes from UML (XMI), XML schemas, and castor mappings

• caTRIP leverages caGrid technologies

• Index Service provides advertisement and discovery

• Authentication Service provides

• Dorian helps provide authentication

• GTS provides trust fabrics

Page 80: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Next steps

• Aggregate data from multiple services of the same type• Scenario: caTissue Suite deployed at 13 cancer centers

• Add datasets and data types• CTMS, population sciences, basic science, etc.

• Add analytical services• Integrate with workflow• Add visualization components

• Enhanced reporting• Automate Excel pivot table• Data mining results

• Enhanced querying• Asynchronous, parallel querying• Querying multiple deployed distributed query services

• Continue refinement of user interface• Synchronization of advanced and simple GUI• Additional usability features

Page 81: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridcaBIG Resources

• caBIG™ Website: http://cabig.cancer.gov/index.asp

• caBIG™ Compatibility Guidelines: https://cabig.nci.nih.gov/compatibility_guidelines_documentation/

• Cancer Common Ontologic Representation Environment (caCORE): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview

• Enterprise Vocabulary Services (EVS): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/vocabulary

• Cancer Data Standards Repository (caDSR): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/cadsr

• caCORE Software Developer’s Kit (caCORE SDK): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacoresdk

• caCORE Training: http://ncicb.nci.nih.gov/NCICB/training/cadsr_training

• Model Driven Architecture: http://www.omg.org/mda/

• UML Modeling: http://www.sparxsystems.com.au/UML_Tutorial.htm

Page 82: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIP Why can’t I just write DCQL?

• What are all the tissue specimens from her2/neu positive patients that have a primary tumor in the breast and are BRCA1 positive?

• <DCQLQuery xmlns="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql">• <TargetObject name="edu.wustl.catissuecore.domainobject.impl.TissueSpecimenImpl" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CaTissueCore">• <Association name="edu.wustl.catissuecore.domainobject.impl.SpecimenCollectionGroupImpl" roleName="specimenCollectionGroup">• <Association name="edu.wustl.catissuecore.domainobject.impl.ClinicalReportImpl" roleName="clinicalReport">• <Association name="edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl" roleName="participantMedicalIdentifier">• <Group logicRelation="AND">• <ForeignAssociation>• <JoinCondition>• <LeftJoin>• <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>• <Property>medicalRecordNumber</Property>• </LeftJoin>• <RightJoin>• <Object>edu.duke.catrip.cae.domain.general.ParticipantMedicalIdentifier</Object>• <Property>medicalRecordNumber</Property>• </RightJoin>• </JoinCondition>• <ForeignObject name="edu.duke.catrip.cae.domain.general.ParticipantMedicalIdentifier" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CAE">• <Association name="edu.duke.catrip.cae.domain.general.Participant" roleName="participant">• <Association name="edu.pitt.cabig.cae.domain.general.AnnotationEventParameters" roleName="annotationEventParametersCollection">• <Association name="edu.pitt.cabig.cae.domain.breast.BreastCancerBiomarkers" roleName="annotationSetCollection">• <Attribute name="HER2Status" predicate="LIKE" value="POSITIVE%"/>• </Association>• </Association>• </Association>• </ForeignObject>• </ForeignAssociation>• <ForeignAssociation>• <JoinCondition>• <LeftJoin>• <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>• <Property>medicalRecordNumber</Property>• </LeftJoin>• <RightJoin>• <Object>edu.duke.cabig.tumorregistry.domain.PatientIdentifier</Object>• <Property>medicalRecordNumber</Property>• </RightJoin>• </JoinCondition>• <ForeignObject name="edu.duke.cabig.tumorregistry.domain.PatientIdentifier" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CaTRIPTumorRegistry">• <Association name="edu.duke.cabig.tumorregistry.domain.Patient" roleName="patient">• <Association name="edu.duke.cabig.tumorregistry.domain.Diagnosis" roleName="diagnosisCollection">• <Attribute name="primarySite" predicate="LIKE" value="BREAST%"/>• </Association>• </Association>• </ForeignObject>• </ForeignAssociation>• <ForeignAssociation>• <JoinCondition>• <LeftJoin>• <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>• <Property>medicalRecordNumber</Property>• </LeftJoin>• <RightJoin>• <Object>gov.nih.nci.caintegrator.domain.study.bean.StudyParticipant</Object>• <Property>studySubjectIdentifier</Property>• </RightJoin>• </JoinCondition>• <ForeignObject name="gov.nih.nci.caintegrator.domain.study.bean.StudyParticipant" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CGEMS">• <Association name="gov.nih.nci.caintegrator.domain.analysis.snp.bean.SNPAnalysisGroup" roleName="analysisGroupCollection">• <Attribute name="name" predicate="LIKE" value="BRCA1%"/>• </Association>• </ForeignObject>• </ForeignAssociation>• </Group>• </Association>• </Association>• </Association>• </TargetObject>• </DCQLQuery>

HER2/NEU Positive

Foreign Join w/ CAE

Foreign Join w/ Tumor Registry

Primary Site Breast

Foreign Join w/ CGEMS

BRCA1 Positive

Select tissue

Page 83: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caTRIPDistributed query engine

DCQL Distributed Query Engine

CQL

CQL

CQL

caG

rid d

ata

se

rvic

eca

Grid

da

ta

serv

ice

caG

rid d

ata

se

rvic

e

dat

ab

ase

dat

ab

ase

dat

ab

ase

data objects

data objects

data objects

data objects

Page 84: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

CTMSi BRIDG dynamic modeling

• *Process flow

• *story boards

• *Scenarios

• *Use cases

• *Text UML activity diagrams

• *Links to static structures

• Interaction diagrams (?)

• Sequence diagrams

• Collaboration diagrams (UML 2.0)

Page 85: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

CTMSi Patient registration message

User

C3PR

GridBC

Registration Message

Registration Message

Registration Message

Acknowledgement

caAERS

PSC

PSC Grid Service

caAERS Grid Service

JMS IN

Que

ue

JMS OUT Queue

Router

ESB

Page 86: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility CDE Browser

Page 87: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility CDE Browser permissible values

Page 88: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

Preferred Name

Synonyms

Definition

Relationships

Concept Code

caBIG compatibility NCI Thesaurus

Page 89: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGrid caGrid community involvement

• caGrid itself provides no real “data” or “analysis” to caBIG• It’s the enabling infrastructure which allows the community to do

so• Community members add value to the grid as applications,

services, and processes (for example: shared workflows)

• caGrid provides the necessary core services, APIs, and tooling• The real “value” of the grid comes from bringing this information

to the “end user”• Data Services: expose data to the grid in a unified way• Analytical Services: expose analytical operations to the grid

• Community members develop end user applications which consume of the resources provided by the grid

Page 90: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridcaGrid exposing silver systems

• Object Oriented APIs and data resources are developed using Object types and information models registered in the caDSR

• These “silver systems” are grid-enabled by defining a grid service interface that defines the functionality to be exposed to the grid

• The grid service interface uses the same Object types as the existing system, but leverages a platform and language neutral representation (XML) of them

• The grid service implementation maps service invocations to API calls or queries into the existing system

Page 91: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridFederated Query Processor

• Provides a mechanism to perform basic distributed aggregations and joins of queries over multiple data services

• As caGrid data services all use a uniform query language, CQL, the Federated Query Infrastructure can be used to express queries over any combination of caGrid data services

• Federated queries are expressed with a query language, DCQL, which is an extension to CQL to express such concepts as joins, aggregations, and target services

• Implemented as a stateful grid service, queries may be executed asynchronously and results retrieved at a later time• Supports secure deployments wherein result ownership is

enforced• Coupled with semantic discovery capabilities of caGrid, provides

a powerful framework for data discovery, mining, and integration

Page 92: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridData service common query language

• Specifies a target object (result) type and selects the instances which satisfy the specified properties and nested object properties• Allows path navigation• Provides logical grouping• Provides name/predicate/value filtering on properties of

objects• Recursively defined• Ability to return full Objects, Set of attributes, count of

results, or distinct attribute values

Page 93: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caGridExample CQL query

Return all Genes with a symbol beginning with BRCA and have an associated Taxon with a scientificName equal to “Homo sapiens”:<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"> <Target name="gov.nih.nci.cabio.domain.Gene"> <Group logicRelation="AND"> <Attribute name="symbol" predicate="LIKE“ value="BRCA%"/> <Association roleName="taxon“ name="gov.nih.nci.cabio.domain.Taxon"> <Attribute name=“scientificName" predicate=“EQUAL_TO” value=“Homo sapiens"/> </Association> </Group> </Target></CQLQuery>

LIKE “BRCA%”

= “Homo sapiens”

Page 94: caBIG Data Structures

CS584 Lecture on 4/6/2007 caBIG Data Structures

caBIG compatibility Metadata and concepts example