the cabig™ enterprise - · pdf file(cagrid) what is “systems medicine”?...

140
The caBIG™ Enterprise J. Robert Beck, M.D. Chief Academic Officer Fox Chase Cancer Center Philadelphia, USA November, 2007

Upload: vantuong

Post on 22-Feb-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

The caBIG™ Enterprise

J. Robert Beck, M.D.Chief Academic Officer

Fox Chase Cancer CenterPhiladelphia, USA

November, 2007

base state

selection selectionselection

mutation

mutation

mutation

malignant state

• chemical• virus• hormone• nutrition

genetic constitution

mutation

• immune• hormone• nutrition• (treatment)

Cancer as a Complex Adaptive System

caBIG™ and Molecular Medicine

Molecular Medicine A Complex Continuum

Clinical Research

PathologyMolecular Biology

Imaging

Molecular Medicine

caBIG™ and Molecular Medicine The People

Geneticists Geneticists

Molecular Biologists Molecular Biologists

Lab Technicians

Lab Technicians

RadiologistsRadiologists

Trial Managers

Trial Managers

Clinicians Clinicians

MRI Technicians

MRI Technicians

PathologistsPathologists

Geneticists Geneticists

Molecular Biologists Molecular Biologists

Lab Technicians

Lab Technicians

RadiologistsRadiologists

Trial Managers

Trial Managers

Clinicians Clinicians

MRI Technicians

MRI Technicians

PathologistsPathologists

caBIG™ and Molecular Medicine The Activities

SNP Identification

SNP Identification

Clinical Data

Correlation

Clinical Data

Correlation

Expression Analysis

Expression Analysis

Tissue Banking Tissue

Banking

Study Creation

Study Creation

Patient Enrollment

Patient Enrollment

Clinical Data

Collection

Clinical Data

CollectionImage

Sharing & Analysis

Image Sharing & Analysis

caBIG™ and Molecular Medicine The Software Tools

Translational research

tools

Translational research

tools

caIntegratorcaIntegratorcaARRAY

geWorkbench caARRAY

geWorkbench

caTissuecaTissue

PSCPSC

C3PRC3PR

C3DC3D

NCIANCIA

caBIG™ In Action

caBIG™ and Clinical Trials

Sample capabilities and tools:• Adverse event management (caAERS)

• Clinical data exchange (caXchange)

• Study participant calendar (PSC)

• Study participant registry (C3PR)

• Virtual clinical data warehouse (CTODS)

• caBIG™-compatible systems architecture [caGrid]

• Integration with caBIG™-compatible data management systems

caBIG™ In Action (C3PR) Confirm Registration

caBIG™ In Action (PSC) View Schedule

caBIG™ In Action (caXchange) Extract Data from Hospital Clinical Chemistry Lab; Lab Viewer Marks Out-of-Range Value in Red

caBIG™ In Action (CTODS) Adverse Event Successfully Submitted

caBIG™ In Action (caAERS) Documenting Adverse Events

caBIG™ In Action (PSC) Prompt of Adverse Event and Possible Impact on Scheduling

caBIG™ In Action Clinical Trials Case Studies

Clinical trials data collection for cancer clinical trials (Case Study B)Organizations

• Duke Comprehensive Cancer Center• Lombardi Comprehensive Cancer Center at

Georgetown University

caBIG™ resources:• Cancer Center Clinical Database (C3D)• Cancer Central Clinical Participant Registry (C3PR)• Cancer Data Standards Repository (caDSR)

Results:• Decreased protocol set-up time• Improved speed and quality of data collection• Reuse of standard forms and best practices• Decreased time/effort invested in study design,

procedure programming, and data extraction• Certification, validation, and full audit trails to address

regulatory requirements

caBIG™ In Action Clinical Trials Case Studies

caBIG™ tools and infrastructure that enable translational medicine research (Case Study C)Organizations

• Duke Comprehensive Cancer Center• SemanticBits LLC

caBIG™ resources:• Cancer Translational Research Informatics Platform (caTRIP)• Cancer Text Information Extraction System (caTIES)• caTissue Core• Cancer Annotation Engine (CAE) • caIntegrator

Results:• More efficient and user-friendly way to query data from existing

patients with similar characteristics to find successful treatments• Improved ability to investigate associations between multiple

predictors and their corresponding outcomes• More efficient searches for available tumor tissues

caBIG™ and Life Sciences

Sample capabilities and tools:

• Biobanking management systems (caTissue Core)

• Virtual clinical data warehouse (CTODS)

• Genome-wide data management system (caGWAS)

• In vivo image repository (NCIA)

• Microarray data management system (caArray)

• Microarray gene expression and sequence data management (geWorkbench)

• caBIG™-compatible systems architecture (caGrid)

What is “systems medicine”?

• Systems Medicine and Systems Biology are viewed in the scientific community as novel methods of understanding biology and approaching medicine. Systems Biology seeks to integrate different levels of information…

-Institute for Systems Medicine

• Systems biology is a relatively new biological study field that focuses on the systematic study of complex interactions in biological systems, thus using a new perspective (integration instead of reduction). According to the interpretation of System Biology as the ability to obtain, integrate and analyze complex data from multiple experimental sources using interdisciplinary tools, some typical technology platforms are:…

-Wikipedia

A systems biology real life example

• Does Epidermal growth factor receptor variant III status define clinically distinct subtypes of Glioblastoma Multiforme?• What are the gene expression levels in this patient cohort (n = 268)• What percentage of patients show V3 mutation?• Can vIII predict response to standard therapies like erlotinib &

gifitinib?• How does the survival analysis look like when patients are stratified

based on EGFR vIII status?• How many patients show amplification, upregulation in expression

and variant 3 deletion?• Do MR images from vIII positive patients differ from vIII negative

series?• How many patients in this cohort fall under the 6 classes in the

recursive partitioning analysis (RTOG-RPA)

Source: J. Clinical Oncology 2007 Jun 1; 25(16): 2288-94

Realize scientific discovery with caBIG tools

•What are the gene expression levels in this patient cohort (n = 268)

•What percentage of patients show V3 mutation?

•Can vIII

predict response to standard therapies like erlotinib

& gifitinib?

•How many patients in this cohort fall under the 6 classes in the recursive partitioning analysis (RTOG-RPA)

•How many patients show amplification, upregulation

in expression and variant 3 deletion?

•How does the survival analysis look like when patients are stratified based on EGFR vIII

status?

•Do MR images from vIII

positive patients differ from vIII negative series?

EGFR vIII gene expression analysis

caArray caIntegrator GenePattern, geWorkbench

Realize scientific discovery with caBIG tools

•What are the gene expression levels in this patient cohort (n = 268)

•What percentage of patients show V3 mutation?

•Can vIII

predict response to standard therapies like erlotinib

& gifitinib?

•How many patients in this cohort fall under the 6 classes in the recursive partitioning analysis (RTOG-RPA)

•How many patients show amplification, upregulation

in expression and variant 3 deletion?

•How does the survival analysis look like when patients are stratified based on EGFR vIII

status?

•Do MR images from vIII

positive patients differ from vIII negative series?

EGFR mutation analysis

CGWB

Realize scientific discovery with caBIG tools

•What are the gene expression levels in this patient cohort (n = 268)

•What percentage of patients show V3 mutation?

•Can vIII

predict response to standard therapies like erlotinib

& gifitinib?

•How many patients in this cohort fall under the 6 classes in the recursive partitioning analysis (RTOG-RPA)

•How many patients show amplification, upregulation

in expression and variant 3 deletion?

•How does the survival analysis look like when patients are stratified based on EGFR vIII

status?

•Do MR images from vIII

positive patients differ from vIII negative series?

Survival analysis

Cases with mutation

Cases without mutation

Similar charts can be painted for treatment groups

caIntegrator

Realize scientific discovery with caBIG tools

•What are the gene expression levels in this patient cohort (n = 268)

•What percentage of patients show V3 mutation?

•Can

vIII predict response to standard therapies like

erlotinib & gifitinib?

•How many patients in this cohort fall under the 6 classes in the recursive partitioning analysis (RTOG-RPA)

•How many patients show amplification,

upregulation in expression and variant 3 deletion?

•How does the survival analysis look like when patients are stratified based on EGFR

vIII status?

•Do MR images from

vIII positive patients differ from

vIII negative series?

Correlative analysis of genomic data from patient samples

SNP 6.0

U133A

Gene Pattern/SNP viewer

Realize scientific discovery with caBIG tools

•What are the gene expression levels in this patient cohort (n = 268)

•What percentage of patients show V3 mutation?

•Can

vIII predict response to standard therapies like

erlotinib & gifitinib?

•How many patients in this cohort fall under the 6 classes in the recursive partitioning analysis (RTOG-RPA)

•How many patients show amplification,

upregulation in expression and variant 3 deletion?

•How does the survival analysis look like when patients are stratified based on EGFR

vIII status?

•Do MR images from

vIII positive patients differ from

vIII negative series?

GBM MR image look-up

NCIA

Realize scientific discovery with caBIG tools

•What are the gene expression levels in this patient cohort (n = 268)

•What percentage of patients show V3 mutation?

•Can

vIII predict response to standard therapies like

erlotinib & gifitinib?

•How many patients in this cohort fall under the 6 classes in the recursive partitioning analysis (RTOG-RPA)

•How many patients show amplification,

upregulation in expression and variant 3 deletion?

•How does the survival analysis look like when patients are stratified based on EGFR

vIII status?

•Do MR images from

vIII positive patients differ from

vIII negative series?

Walter J. Curran, Jr. et al.,JCNI, Vol. 85. No. 9, May 5, 1993

RTOG-RPA grouping of GBM patients

Clinical data management and reporting

C3D J-review

Lookup the tools from the JCO example

Tool Membership(Bundle/WS)

Versio n

URL

caArray LSD/ICR 2.0 beta

https://array.nci.nih.gov/

C3D CCTS/CTMS 4.5.2 https://cabig.nci.nih.gov/tools/c3d/

J-Review CTMS 8.0 https://octrials- rpt.nci.nih.gov/jreviewwww/sample_default.htm

caIntegrator LSD/ICR 1.2 http://caintegrator-info.nci.nih.govhttps://cabig.nci.nih.gov/tools/caIntegrator

GenePattern ICR 3.0 https://cabig.nci.nih.gov/tools/GenePattern/

geWorkbench ICR 1.0.6 https://cabig.nci.nih.gov/tools/geWorkbench

CGWB ICR 2.0 http://cgwb.nci.nih.gov/

Evolution of translational research informatics

Breast Cancer Study

Distant Past

•Translational research in the distant past was plagued by:• Siloed development within and across individual studies• Integrative analysis performed by MS Excel resulting in increased time and cost

to validate trial outcome• Lack of structured data sharing inhibiting improvements to patient care, outcome,

and ongoing trials

Clinical Data

Analytical Results

Genomic Data

Lung Cancer Study

Analytical Results

Genomic Data

Breast Cancer Study

Analytical Results

Genomic Data

Publications

Clinical Data

Publications Publications

Epi-demiology

Data

SNPData

MethylationData

Clinical Data

caBIG-compatible tools

Current Translational Research – pt. A

•Current translational research involves:• Inter-operable caBIG solutions enable data integration and sharing • Customizations of the common framework to accommodate unique study

needs •Current translational research challenges:

• There still are silo’ed systems that support local studies• Utility tools are needed to map legacy data to develop roadmap for caBIG

compatibility

Breast Cancer Study

Clinical Data

Genomic Data

Lung Cancer Study

Genomic Data

Epi-demiology

Data

Breast Cancer Study

Genomic Data

Clinical Data

SNPData

Clinical Data

MethylationData

caBIG compatible APIs caBIG compatible APIs caBIG compatible APIs

Columbia cancer CenterUCSC Spore-ISPY hosted at NCI Lung study at Center XYZ

PRESENTATIONTIER BUSINESS TIER

DATABASE/ANALYSIS

TIER

Service Layer(J2EE)

INTE

RN

ET Report Generation(XML/XSL)

Asynchronous Updates (AJAX)

Query Builder(Struts)

Findings Factory

Business Cache (ECHACHE)

Security Manager (CSM)

DTOsAnalysis Server Client Manager

(JMS Node)

JMS(Asynchronous)

Multi-Threaded Query Service

(OJB/Hibernate)

R

R-Binary R-Binary

Object Query Service

StudyQuery Service

Analytical Query Service

WebServer

(JBoss/Tomcat)

DOs

caIntegratorData

Warehouse

Analysis server

Remote Service(EJB Container)

BIOAssay Service

Bioassay DTOs

Presentation Cache

(ECHACHE)

Client Browser

Web Visualization/

Analysis Tools

WebGenome

GenePattern

App State

Current Translational Research : caIntegrator Architecture

Current Translational Research : caTrip architecture

So, where do we want to go – point B

• Next generation translational research requires:• Extraction of trends/patterns from HTP data• Support for handling high volume data sets• Integration with disparate data sources• Support for multi-dimensional complex queries and robust

analytical routines• Data summarization• Advanced Visualization

• Next generation translational research expands upon the needs of current efforts and requires:• Interoperability• Modularization enabling plug and play• Standards adoption where appropriate

What will take us from point A to point B

caBIG softwarethat support TR

DSIC guidance/policiesfor TR Tr community

•Cancer centers•Spores•CTSAs•IPBS•Industry…

Support network•Knowledge centers•Service providers•Program offices

FDA•Regulatory •IOTF/OBQI

StandardsOrganizations•HL7•CDISC

Let’s put the pieces of the puzzle together

• ICR workspace calls – biweekly• Task-oriented working groups – monthly

https://cabig.nci.nih.gov/workspaces/ICR/General_Meeting_Sch edule/

• New task forces of SMEs being established in EY2 to drive the usecase development for next gen integrative tools

• caBIG listservshttps://list.nih.gov/cgi-bin/wa?SUBED1=cabig_ICR-l&A=1

• caBIG getting connected: https://cabig.nci.nih.gov/getting_connected/working_with_cabig/

Life Sciences Distribution Bundle

The Life Sciences Distribution Bundle brings together a range of

caGrid-interfaced

tools that support biomedical informatics•

Functions include:•

Tissue Banking (caTISSUE)

Gene Expression Database (caArray)•

Translational Medicine tools (caIntegrator)

Biomedical Image Management and Analysis (NCIA)

Molecular analysis (geWorkbench)•

…and the supporting

caGrid infrastructure …

Life Sciences Distribution Bundle

Target release in Feb, 2008

ICR Products by Category

Capture data and annotation Analyze data

Link data and analysis tools

Store findings

caArray GenePattern caB2B Rembrandt

CPAS geWorkbench caTRIP TARGET

caBIO,GeneCon nect,caFE, TrAPSS

webGenome EAGLE

caNanoLab, ProtLIMS

Bioconductor CGEMS

caELMIR GOMiner caMOD

gridPIR DWD, VISDA, RProteomics

cPath, Reactome Cytoscape

https://cabig.nci.nih.gov/workspaces/ICR

Many more…

https://cabig.nci.nih.gov/tools/

Translational research website

http://ncicb.nci.nih.gov/NCICB/tools/translation_research

caBIG™ Vision

• Connect the cancer research community through a shareable, interoperable infrastructure

• Deploy and extend standard rules and a common language to more easily share information

• Build or adapt tools for collecting, analyzing, integrating and disseminating information associated with cancer research and care

The caBIG™ Pilot Phase

• An unprecedented effort to connect people, organizations, and data throughout the cancer research community

• 190 participating organizations• 300 software components• 40+ end-user applications in

discovery, clinical trials management, biospecimen management, etc.

• caGrid providing data transmission network that “connects” everyone

• 43 Cancer Centers actively participating in caBIG™ deployment program

• 45+ peer-reviewed publications about caBIG™

caBIG™ Pilot Goals

Illustrate that Cancer Centers with varying needs and capabilities can be joined in a common grid of communications, shared data, applications, and technologies

Demonstrate that Cancer Centers, in collaboration with NCI, will develop new enabling tools and systems that could support multiple Cancer Centers

Create an extensible infrastructure that will continue to be expanded and extended to members of the cancer research community

Demonstrate that Cancer Centers will actively use the grid and realize greater value in their cancer research endeavors by using the grid

Community-Building in Science

Cancer Center Clinical Database (C3D)

caBIG™ In Action (C3D) Clinical Data Collection

caBIG™ In Action (caTissue) Searching for Biospecimens

caBIG™ In Action (caTissue) Viewing Specimen Details to Select Aliquot

caBIG™ In Action (NCIA) Imaging Integration via caBIG™

caBIG™ In Action Life Sciences Case Studies

Biospecimen management for multi- institutional collaborative research activities (Case Study A)

Organization: • Inter-SPORE Prostate Biomarker Study

(IPBS)

caBIG™ resource:• caTissue Suite

Results:• Are capturing biospecimen and biomarker

data in a decentralized way with caTissue Suite

• Data migration plan developed to load all legacy data into caTissue Suite

• Queries conducted quickly and securely across all 11 centers participating in the IPBS study

caBIG™ In Action Life Sciences Case Studies

caBIG™ tools and infrastructure that support Genome-wide association studies [GWAS] (Case Study E)

Organization:• NCI Cancer Genetic Markers of

Susceptibility (CGEMS) project

caBIG™ resources:• caGWAS• caIntegrator

Results:• Improving collaboration• Providing infrastructure for better data

management, analysis, and communication

• Developing commitment to sharing information and developing data standards

AlabamaBirmingham: UAB Comprehensive Cancer Center ArizonaPhoenix: Translational Genomics Research Institute Tucson: University of Arizona CaliforniaBerkeley: University of California Lawrence Berkeley National Laboratory University of California at Berkeley Los Angeles: AECOM California Institute of Technology University of Southern California Information Sciences Institute University of California at Irvine The Chao Family Comprehensive Cancer Center La Jolla: The Burnham Institute Sacramento: University of California Davis Cancer Center San Diego: SAIC San Francisco: University of California San Francisco Comprehensive Cancer Center ColoradoAurora: University of Colorado Cancer Center District of ColumbiaDepartment of Veterans Affairs Lombardi Cancer Research Center - Georgetown University Medical Center FloridaTampa: H. Lee Moffitt Cancer Center at the University of South Florida HawaiiManoa: Cancer Research Center of Hawaii IllinoisArgonne: Argonne National Laboratory Chicago: Robert H. Lurie Comprehensive Cancer Center of Northwestern University University of Chicago Cancer Research Center Urbana-Champaign: University of Illinois at Urbana-Champaign IndianaIndianapolis:Indiana University Cancer Center Regenstrief Institute, Inc.

Iowa Iowa City: Holden Comprehensive Canter Center at the University of IowaLouisianaNew Orleans: Tulane University School of Medicine MaineBar Harbor: The Jackson Laboratory MarylandBaltimore: The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University Bethesda: Consumer Advocates in Research and Related Activities (CARRA) NCI Cancer Therapy Evaluation Program NCI Center for BioinformaticsNCI Center for Cancer Research NCI Center for Strategic Dissemination NCI Division of Cancer Control and Population Sciences NCI Division of Cancer Epidemiology and Genetics NCI Division of Cancer Prevention NCI Division of Cancer Treatment and Diagnosis Terrapin Systems Rockville: Capital Technology Information Services Emmes Corporation Information Management Services, Inc. MassachusettsCambridge: Akaza Research Massachusetts Institute of Technology Somerville:Panther Informatics MichiganAnn Arbor: Internet2 University of Michigan Comprehensive Cancer Center Detroit: Meyer L. Prentis/Karmanos Comprehensive Cancer Center MinnesotaMinneapolis:University of Minnesota Cancer Center Rochester:Mayo Clinic Cancer Center NebraskaOmaha:University of Nebraska Medical Center/Eppley Cancer Center New HampshireLebanon:Dartmouth College Dartmouth-Hitchcock Medical Center

New YorkBuffalo: Roswell Park Cancer Institute Bronx:Albert Einstein Cancer Center Cold Spring Harbor:Cold Spring Harbor Laboratory New York:Herbert Irving Comprehensive Cancer Center Columbia University Memorial Sloan-Kettering Cancer Center New York University Medical Center White Plains: IBM North CarolinaChapel Hill: University of North Carolina Lineberger Comprehensive Cancer Center Raleigh-Durham: Alpha-Gamma Technologies, Inc. Constella Health SciencesDuke Comprehensive Cancer Center OhioCleveland: Case Comprehensive Cancer Center Columbus: Ohio State University Comprehensive Cancer Center OregonPortland: Oregon Health & Science University PennsylvaniaPhiladelphia: Drexel University Fox Chase Cancer Center Kimmel Cancer Center at Thomas Jefferson University Abramson Cancer Center of the University of Pennsylvania Pittsburgh: University of Pittsburgh Cancer Institute TennesseeMemphis: St. Jude’s Children’s Research Hospital TexasAustin: 9 Star Research Houston: M.D. Anderson Cancer Center VirginiaFairfax: SRA International Reston: Scenpro WashingtonSeattle: DataWorks Development, Inc. Fred Hutchinson Cancer Research Center InternationalParis, France: Sanofi Aventis

Collaboration Is Central

Collaboration Is Central

Data Sharing and Security

Sample resources:

• caBIG™ Policies• Processes and Best Practices• Model Documents

caBIG™ In Action Data Sharing and Security

Policies and procedures to support and enable meaningful data sharing and cooperation (Case Study D)

Organizations:• Cancer Center Representatives

caBIG™ resources:• Data Sharing and Intellectual Capital

Workspace (DSIC), • Data Sharing and Security Framework

Results:• Diverse groups of Cancer Center

representatives are working together with government, academic, and commercial groups.

• Are identifying processes to address legal, privacy, and regulatory issues that arise from collaboration and data sharing

caBIG™ Tomorrow

caBIG™ Vision for 2010

• All comprehensive and community cancer centers are connected

• Data is being shared

• All multi-center clinical cancer trials are connected to each other electronically and to the FDA for reporting

• Institutions are collaborating and publishing studies with data they are sharing through caBIG™

caBIG™ - The Enterprise Phase

Connect all biomedical researchers

Increase speed and volume of data aggregation and dissemination

Grow the community in breadth and scope

Scalable national infrastructure for Molecular Medicine

Strategies for Increased Adoption

• Enterprise Adopter Program

• Service Providers

• Knowledge Centers

• Program Offices

Future of caBIG™

caBIG™ infrastructure and tools may link biomedical community globally.

caBIG™ capabilities may be integrated into health IT.

caBIG™ may serve as a model for other disease research and biomedical endeavors.

caBIG Deliverables: Architecture

• caBIG™ Compatibility Guidelines• caGrid 0.5 Security White Paper• caGrid Software Version 0.5• caGrid – 1.0• Technology Evaluation White

Paper• caBIG™ - The Security White

Paper (Technology Evaluation)• Workflow Language

Recommendations White Paper• ID Management White Paper• Common Query Language White

Paper

• The Architecture Cross-Cutting Workspace provides for the development of the underlying standards used by the program, and ensures that common mechanisms are used throughout the caBIG™ community via mentoring, white papers and a structured review process.

caBIG Deliverables: Vocabularies and Common Data Elements

• LexGrid• CDE Governance Model• VCDE Guidance Mentoring

Teams• Vocabularies Deployment

Document• Data Standards Approval

Guidelines• Procedures for the Review and

Approval of New VCDE Content• Mouse/Human Anatomy Ontology

Mapping• Nutrition Ontology

• The Vocabularies and Common Data Elements Cross-Cutting Workspace provides for the development of the underlying data elements and vocabularies used by the program, and ensures that common mechanisms are used throughout the caBIG™ community via mentoring, white papers and a structured review process.

• Community driven

• Dynamic implementation

• Built to be upgraded as standards “harden”, and domains expand

Standards-based interoperability: the cancer common object resource environment (caCORE)

biomedical objects

common data elements

controlled vocabulary

Standards infrastructure and services

• Enterprise Vocabulary Services (EVS)• Browsers• APIs

• cancer Bioinformatics Infrastructure Objects (caBIO)

• Applications• APIs

• cancer Data Standards Repository (caDSR)• CDEs• Case Report Forms• Object models• ISO 11179 model

• Developer Toolkits• caCORE SDK• caAdapter

caGrid

• Grid Infrastructure for caBIG• caGrid Components

• Language (metadata, ontologies)• Security• Advertisement and Discovery• Workflow• Grid Service Graphical

Development Toolkit

NCICBcaCORE- caBIO- caDSR- EVS

repositories

Data Mart

Gene Expression

Data

Clinical Data

Tissue Bank

Data Mart

ResearchCenter

Clinical Data

Analysis Tools

Gene Expression Data

Proteomics Data

Genomics Data

Data ServicesAnalytical ServicesAnnotation ServicesService AdvertisementService DiscoveryService QuerySemantic mappingSecurity Services

Data ServicesAnalytical ServicesAnnotation ServicesService AdvertisementService DiscoveryService QuerySemantic mappingSecurity Services

Researcher

Physician Patient

ResearchCenter

caGrid 1.0 Security Needs

• Authentication• Process of determining whether someone or something is, in fact, who or

what it is declared to be.

• Authorization • Process of determining if an authenticated user may do something on a

given resource.• Can User X perform Operation Y on Resource Z?

• Trust Management • Supports applications and services in deciding whether or not signers of

digital credentials/user attributes can be trusted.

• Secure Communication• The ability to guarantee the integrity and/or privacy of messages

between two parties

Authorization Notional Architecture

Courtesy of Kenneth Lin, BAH

Proposed Federation for Authorization

Courtesy of Kenneth Lin, BAH

caGrid Trust Management

Grid Trust Service (GTS)

Grid

1. Username/Password

2. SAML

Assertion

3. SAML A

sser

tion

Grid Trust Service

Grid Service

Dorian

OSU User

IdPOhio State UniversityCertificate Authority

6. Is Proxy

Trusted?

7. Yes/N

o

Trust Agreement

Globus Trusted

Certificates Directoy

Grid Service

Auto Synchronize

With GTS

A caGrid Illustration: Virtual PACSPresent a PACS interface to analytical and data sources on the grid.

Use your own DICOM WorkstationVirtual PACS federates services on the Grid using caGrid

In Vivo Imaging Middleware Project

• Interoperability Library• Translate between DICOM and

caBIG data models, and DICOM QR and caBIG query language

• DICOM Data Service• Exposes existing DICOM QR aware

data resources (PACS, etc) as caGrid compliant service

• VirtualPACS• Allows DICOM-aware clients (review

workstation, etc) to access DICOM caGrid data services over the grid

• caGrid-based security for data transport, authentication, and authorization

gridIMAGE caGrid integration

•Leverages core caGrid services/tools• Introduce, caDSR Service, caGrid Data Service, Index Service,

Authentication Service•Leverages In Vivo Imaging Core Middleware

• DICOM interoperability and Bulk Data Transport via GridFTP

Infrastructure – Today

caBIG

caGrid 0.5 Test Bed

Index Service

Pittsburgh

Duke

caArray

rProteomics

PIR

caTIES

Georgetown NCI

caArraycaBIO

GUMSCAMS

GME

Standards for vocabularies and common data elements established and housed at NCICB

Sample Applications

Building on a foundation of established infrastructure points

+NCICB housed infrastructure for CDEs, and vocabularies

Grid reference implementations lead the way

Compatibility guidelines and initial compatibility evaluation process for caBIG™ program projects established

A rich set of harmonized standards and vocabularies is available

A group of mentors has been identified to ensure consistency across key projects

Infrastructure – Tomorrow

Instantiated formal process for evaluation and harmonization

Many applications Grid enabled (e.g., gene pattern, reactome)

A compatibility evaluation process for caBIG™ program projects and a certification process for externally developed tools are established

NCICB housed infrastructure for CDEs, and vocabularies

Increased growth and interoperability of infrastructure, Easier to addnew tools, workflow support established

+End user portal available, security infrastructure)

caBIG

caGrid 0.5 Test Bed

Index Service

Pittsburgh

Duke

caArray

rProteomics

PIR

caTIES

Georgetown NCI

caArraycaBIO

GUMSCAMS

GME

A rich set of harmonized standards and vocabularies continues to grow in size

Tooling available to provide site-specific vocabularies and ontology management and support

Mentors actively working in caBIG™ Community and beyond to ensure consistency across key projects and adherence to caBIG™ goals

API’s with common interfaces facilitate scientific workflows

Infrastructure – The Future

Microarray

ResearchGroup

NCBI

Gene Database

caGrid Client

ResearchCenter

Tool 1

Tool 2

SNPlex

Protein Database

caGrid Data Service

caGrid Analytical Service

Image

Tool 2

Tool 3

Grid Services Infrastructure(Secure Communication,

Service Invocation, Data Transfer)

caGridAnalytical

Service

Common Data Types, Terminologies, Ontologies

Common Data Elements

Vocabulariesand Ontologies

SchemaManagement

IndexService

Advertisement and Discovery

GSIGUMSCAMS

Security

GSIGUMSCAMS

Security

GSIGUMSCAMS

Security Query Service

Query

caGrid Data Service

caGrid Data Service

caGrid Data Service

Multiple sites host portions of the federated,scaleable, standards-based infrastructure

Functional applications part of standard practice/fully deployed on GRID

NCICB housed infrastructure for CDEs and vocabularies

Broad adoption and independent support, across and beyond thecancer research community; increased growth and interoperabilityof infrastructure; easier to add new tools (data and services);workflow support expanded; greater interconnectedness, and automation

+

Developed standards increase in number; mechanisms exists for community to develop and harmonize standards and compatibility guidelines

Certification process for externally developed tools

A rich set of community developed harmonized standards and vocabularies continues to grow

Tooling available to provide site specific vocabularies and ontology management and support

Mentors actively working in caBIG™ Community and beyond to ensure consistency across key projects/adherence to caBIG™ goals

Vocabulary services are federated

caBIG™ Tools – Today

TBPTCTMS

ICR

caBIG™ Tools – Tomorrow

ICR

caBIG™ Tools – The Future

ICR

caBIG will need commercial developers to take tools out-- Examples from CTMS:

Velos: Comprehensive clinical trials system in widespread use in the extramural Cancer Centers throughout the country.

PercipEnz: A comprehensive solution for managing all aspects of clinical research – study setup and activation, scientific reviews, subject registration, compliance tracking, visit tracking, data collection, data and safety monitoring, financials management, data extraction, regulatory reporting, and outreach.

Akaza Rsch: web-based, open source software platform for managing multi-site clinical research studies. It facilitates protocol configuration, design of case report forms, electronic data capture, retrieval, and management.

Clinical Research IT Infrastructure

Clinical Systems

De-identificationServices

Labs,EMR,

Tissue,etc.

ClinicalTrials

ExternalReporting

HL7/ CAM

SDK

HL7- v3

HL7-v3,Janus

ClinicalData Mgmt

EDC

Adverse Events

Participant Registry

etc.

Translation Service

FDASPONSOR

NCIother

HL7 trans-

actional database

Clinical Research

InformationExchange

HL7- v2.x, other

Research Data

Warehouse

HL7-v3,Janus

Patient Health Record

Lifecycle Management

The Future

A worldwide biomedical grid community

Bringing translational and clinical research to personalized medicine

caBIG™ Community Outreach Summit

Summit Goals

• Initiate a dialogue with decision-makers and strategic thinkers about what they need to further develop caBIG™ tools and services, and/or participate in the caBIG™ enterprise

• Identify key opportunities, issues, and challenges that must be addressed

• Gather ideas about how caBIG™ should be organizationally structured and governed

Summit Agenda

Keynote Address“The Role of caBIG™ in the Future of Cancer Research”

Dr. John NiederhuberDirector, National Cancer Institute

Research & Development Track

Market Opportunities Track

Governance Track

Discuss drivers and new research models for cancer

research, and identify what is needed in biomedical informatics in the near future to support such

models.

Discuss ways to strengthen and expand the market opportunity for caBIG™-compliant products

and services and create a significantly self-sustaining

economic system.

Discuss future models of caBIG™ structure and

governance and identify strategies and tactics to drive

caBIG™ adoption.

Opening Panel Discussion“Opportunities and Challenges from Where I Sit”

Summit “Deliverables”

• Identify people and organizations who want to participate in the next generation of caBIG™

• Identify projects and collaborations around caBIG™ adoption

• Advance ideas for expanding caBIG™ to a broader, multi- constituency-based biomedical ecosystem

• Develop and disseminate Executive Summary of ideas, insights, and proposed programs to catalyze future activities among broader constituencies

Measure states indirectly

base state(s) malignant state(s)

Center for Cancer ResearchLaboratory of Population Genetics

Mutation status

Allele loss

Constitutional variation

RNA expression

Epigenetic variation

The bench-to-bedside-to- bench cycle

Promises a future of personalized medicine

But…

Survival plot based on gene expression data integrated with clinical outcome

Vision

“When I look into the eyes of a patient losing the battle with cancer, I say to myself, It doesn’t have to be this way.” The Nation’s Investment in Cancer Research (2003)

NCI 2015 challenge goal: eliminate suffering and death due to cancer

A.C. von Eschenbach, M.D. Former Director, National Cancer InstituteDirector, Food & Drug Administration

Biomedical information tsunami

• overwhelming volume of data

• multitude of sources

Informatics tower of Babel

•Each cancer research community speaks its own scientific “dialect”

•Integration critical to achieve promise of molecular medicine

Biomedical Informatics and Middleware

DisseminatesInformation

GridInformation Integration

Brings in InformationGrid

Information Integration

Translates andIntegrates Information

Natural Language ProcessingOntologies

The cancer Biomedical Informatics Grid

• Responding to the Vision to reduce the burden of cancer• Dealing with the problem of massive quantities of data• Dealing with the distributed nature of cancer research

• Involving• Translational research• Clinical research• Patient advocates• Cancer center administration

caBIG™ is an innovative bioinformatics program at the NIH’s National Cancer Institute• 50 Cancer Centers are working towards a common goal of integrated

data, tools and methodologies to accelerate cancer research goals at the National Cancer Institute for Bioinformatics (NCICB), the cancer Biomedical Informatics Grid (caBIG™)

• The goal of caBIG™ is to create a virtual web of interconnected data, individuals, and organizations which will:• redefine how research is conducted• care is provided• patients / participants interact with the biomedical research enterprise

• The principles driving caBIG™ are:• Open Source• Open Access• Open Development• Federated Model

caBIG promotes the Vision

“Nearly every facet of NCI’s strategic plan to eliminate suffering and death due to cancer is predicated on the revolutionizing potential of caBIG™.” Cancer Bulletin, 2005

NCI 2015 challenge goal: eliminate suffering and death due to cancer

A.C. von Eschenbach, M.D. Former Director, National Cancer InstituteDirector, Food & Drug Administration

Scenario, 2009

A researcher involved in a phase II clinical trial of a new molecularly targeted therapeutic for brain tumors observes that cancers derived from one specific tissue progenitor appear to be strongly affected. The trial has been generating proteomic and microarray data. The researcher would like to identify potential biochemical and signaling pathways that might be different between this cell type and other potential progenitors in cancer, deduce whether anything similar has been observed in other clinical trials involving agents known to affect these specific pathways, and identify any studies in model organisms involving tissues with similar pathway activity.

Small Molecules Cell Type

Path

way

s Clinical Trials

Therapeutics Animal ModelsHomologous Proteins

Michael Ochs, 2005

How is such research conducted?

• Today: a lot of manual work finding sources, other groups working on problems, getting data from other sites, re-analyzing, etc.

• With caBIG, much of the work is automated across a data grid, caGrid• Security model authenticates and authorizes the investigator• Data is made available for translational use• Standard tools and architectures exist for analytical flow

Analysis

Pathologyreports

Discrete and manual annotation on tissues

Mutationidentification

Gene expression profiling

Analysis

Potential Drug Targets and Biomarkers

Clinical Trials Tumor Samples

400 brain tumor tissue samples acquired

caArray

Function Express

Gene annotation

GenePattern

PromoterDB

PathwaysTool

caTissue

caTIES

Clinical Annotation Modules

Proteomics LIMS Q5 PIR

Annotation

Discovery utilizing caBIG™Integrated Cancer Research Tools

Identify up- regulated genes in specific pathways

TrAPSS

Identify recurring promoter elements

Thinking about a Solution

A virtual web of interconnected data, individuals, and organizations redefines how:

•research is conducted•care is provided•patients/participants interact with the biomedical research enterprise

Goals of the caBIG pilot

• Illustrate that a spectrum of Cancer Centers with varying needs and capabilities can be joined in a common grid of communications, shared data, applications, and technologies

• Demonstrate that Cancer Centers, in collaboration with NCI, will develop new enabling tools and systems that could support multiple Cancer Centers

• Create an extensible infrastructure that will continue to be expanded and extended to members of the cancer research community

• Demonstrate that Cancer Centers will actively use the grid and realize greater value in their cancer research endeavors by using the grid

caBIG™ Pilot action plan

•Establish pilot network of NCI Cancer Centers• Groups agreeing to caBIG principles• Mixture of capabilities• Mixture of contributions

•Expanding collection of participants•Establish consortium development process

• Collecting and sharing expertise• Identifying and prioritizing community needs• Expanding development efforts

•Moving at the speed of the internet…

Inauguration of the caBIG™ pilot

• 61 cancer centers were asked• What they could contribute to a biomedical informatics data grid and

community initiative• What they would need from the grid

• All respondents were visited for clarification and detail• “Not a site visit”• Most regarded it as a site visit• Rumor: 10-12 pilot sites at $500,000/year

• 49 institutions offered contracts for small portions of caBIG pilot• Idea to build community with multiple small projects and roles• Political resistance to small number of pilot sites

Common needs helped shape priority areas for the caBIG pilot activities

0 5 10 15 20 25 30 35

Clinical Data Management ToolsStaff Resources

Distributed Data Sharing/Analysis ToolsTranslational Research Tools

Access to DataTissue & Pathology Tools

Center Integration & ManagementCommon Data Elements & Architecture

Meta-ProjectVocabulary & Ontology Tools & Databases

Statistical Data Analysis ToolsVisualization & Front-End Tools

Remote/BandwidthProteomics

Microarray & Gene Expression ToolsMeeting

LIMSLicensing Issues

PathwaysHigh Performance Computing

IntegrationImaging Tools & Databases

Database & Datasets

Number of Needs Reported

Clinical Trial Management Systems

Tissue Banks & Pathology

Integrative Cancer Research

Cancer Center Roles in caBIG

• Developer (20% of centers)• Key is to create an environment for sharing tools with other centers• One of the most important issues is not to ignore the need for common

data elements and vocabulary services• Adopter (20% of centers)

• Key is to understand the needs at local center (and be vocal)• Don’t abandon other development efforts; think modular• When adopting tools make sure they “talk” to legacy systems

• Working Group & Strategic Planning (60% of centers)• These are not “soft” roles• Critical to the success of the program• White paper development will guide caBIG successes• Make sure to communicate internally to all parts of the Cancer Center

The first three years of caBIG™ had clearly- defined goals and metrics

and we quickly learned….

This isn’t Rocket Science

• A lot of caBIG™ isn’t even computer science• Most industries did much of this years ago• Really this is an engineering project…

• But it is hard to achieve – it takes time• caBIG™’s goal (oversimplified): facilitate the exchange of

data useful for cancer research and care• Between research domains, systems, investigators, and

organizations• For instance, the caBIG™ compatibility of a system is

determined by how easily the system can exchange data (i.e., interoperability)

Four Domain Workspaces and two Cross Cutting Workspaces have been launched

DOMAIN WORKSPACE 3Tissue Banks & Pathology Tools

provides for the integration, development, and implementation of tissue and pathology tools.

DOMAIN WORKSPACE 2Integrative Cancer Research

provides tools and systems to enable integration and sharing of information.

DOMAIN WORKSPACE 1Clinical Trial Management Systems

addresses the need for consistent, open and comprehensive tools for clinical trials management.

CROSS CUTTING WORKSPACE 2Architecture

developing architectural standards and architecture necessary for other workspaces.

CROSS CUTTING WORKSPACE 1Vocabularies & Common

Data Elements

responsible for evaluating, developing, and integrating systems for vocabulary and ontology content, standards, and software systems for content delivery

DOMAIN WORKSPACE 4Imaging

provides for the sharing and analysis of in vivo imaging data.

Strategic Level Workspaces

caBIG Strategic PlanningAssists in identifying strategic priorities for the development and evolution of the caBIG effort.

TrainingDeveloping strategies for providing training in the use of the caBIG developed resources including on-line turtorials, workshops, training programs.

Data Sharing and Intellectual Capital

Addresses issues related to the sharing of data, applications and infrastructure both within the consortium and in the larger cancer research community.

Overall Goals for caBIG™ Three-year (mid-2008)

• Develop sufficient research tools and standards to have a positive impact on the cancer research community, as measured by adoption of relevant caBIG principles in project proposals.

• Ensure widespread adoption of developer standards so that funded developer projects are operating under the Gold standard of compatibility.

• Adopt and use caBIG interoperable tools and data sets within the caBIG community.

• Develop mechanisms for engaging and promoting caBIG compliant technologies and established datasets within the oncology research community.

Overall Goals for caBIG™ Five-year (mid-2010)

• Ensure widespread adoption, dissemination, and use of caBIG interoperable tools, standards, and data sets within the larger cancer community, to include the biopharmaceutical industry, non-NCI cancer centers, and the national cancer research enterprise.

• Begin to see results of caBIG-compliant interdisciplinary and inter-institutional research affecting clinical oncology care.

Architecture

• Conceptually, caBIG has adopted two primary guiding principles: • To bring systems on-line quickly, caBIG is committed to a

“bias for action.” This implies a commitment to making decisions and moving forward, even if perfection cannot be achieved.

• To allow long-term evolution and improvement of architectural design, caBIG is committed to “designing for change.”

• To turn these thoughts into action, caBIG has also adopted a two-pronged practical approach:• If requirements are well-understood and good solutions are

available, caBIG initiates developmental activities within the architectural workspace.

• If requirements are less clear or if solutions are not yet available, caBIG commissions analysis and assessment activities. This can get UGLY

caBIGTM Compatibility Guidelines

• The caBIGTM compatibility guidelines are designed to insure that systems designed in a Federated environment are still interoperable on the caBIGTM Grid, both syntactically and semantically

• Since achieving interoperability is a process, caBIGTM

recognizes four levels of compatibility, starting from Legacy (not interoperable) through Bronze, Silver and Gold (fully interoperable)

• caBIGTM compatibility is all about interfaces rather than the scientific content of the system

SYNTACTIC

SEMANTIC

SEMANTIC

SEMANTIC

caBIG Compatibility Guidelines

A Lot of Stuff Has Emerged…

And much of it is based on

standard tools and architectures

caBIG Deliverables: Clinical Trials Management Systems

• Componentized, interoperable and standards-based Clinical Trials Management Systems, both purpose- built and commercial off- the-shelf to handle, in an automated fashion, many aspects of developing, managing, conducting, and reporting Clinical Trials

• Biomedical Research Integrated Domain Group Model (BRIDG)

• Adverse Events Reporting Tool• Cancer Clinical Comprehensive

Dictionary (C3D)• Cancer Community Clinical

Patient Registry (C3PR)• Clinical Research Information

Exchange (CRIX)• caBIG™ Compatibility evaluation

for existing commercial tools• Harmonization of UML

Representations• Ontological Representations and

Data Elements for Clinical Trials• Metadata Harmonization

Patient CareWorld

PatientData in

ProprietaryFormat

Clinical ResearchWorld

RegulatoryWorld

Clinical Information Integration Challenges

caBIG Deliverables: Tissue Banks and Pathology Tools

• Systematic description and characterization of tissue resources – tools to inventory, track, mine, and visualize tissue samples from geographically dispersed repositories, with an ability to link tissue resources to clinical and molecular correlative descriptions

• caTISSUE Core• caTIES• caTISSUE Clinical Annotation

Engine• caTISSUE Experimental

Annotation Engine• Requirements Specifications

Survey and Results• Federated Tissue Data Set White

Paper• Cancer Translational

Informatics Platform (caTRIP)

caBIG Deliverables: Integrative Cancer Research

• caArray• caWorkbench 2.0• GenePattern• Gene Ontology Miner (GOMiner)• Protein Information Resource

(PIR)*• RProteomics*• Pathways Tool Development• Tools Distance-Weighted

Discrimination• Magellan• Visual and Statistical Data

Analyzer (VISDA)• Cancer Molecular Pages

• The ICR Workspace seeks to provide for the development of a “Plug and Play” analytic tool set, suitable for a variety of experiemental methodologies, including microarrays, proteomics, biological pathways, data analysis and statistical methods, gene annotation, et al. It will also develop a diverse library of raw, structured data and facilitate the integration of different types of data. All of these tools would help in integration of clinical and basic research

caBIG Deliverables: Integrative Cancer Research (cont’d)

• Proteomics Laboratory Information Management System (LIMS) Prototype

• Q5• TrAPSS• Gene Connect• Integrating Bioconductor and R

into caBIG™• Reverse Phase Protein Lysate

Array based data for caArray• Cancer Translational Informatics

Platform (caTRIP)

• FunctionExpress• HapMap, PromoterDB• SEED• NCI-60 Data Sharing• Quantitative Pathway Analysis in

Cancer (QPACA)• Reactome (GKB) Data

Rembrandt: A brain tumor repository now utilizes available

caBIG tools

Better understanding

Better treatments

Expression array data

Clinical data

SNPArray data

Proteomics data

caIntegrator - DataMart

caBIG Analytic Tools

caBIG™ - Interaction Mechanisms

•For all participants:• Annual meeting• Online “Town Hall”

quarterly• Addresses solicited

questions• Monthly program update

newsletter (big picture)• “What’s big this week”

weekly newsletter (e.g. workspace meeting schedule)

•For Cancer Center Directors:• Director’s newsletter

•For Workspaces participants:• Monthly teleconferences

(more frequently as needed)• Quarterly meeting (face to

face)•For all participants and the general public

• caBIG™ website

…and the realization of that goal

caBIG™ Involves a Large Community with a Wide Range of Interests

9Star ResearchAlbert EinsteinArdaisArgonne National LaboratoryBurnham Institute California Institute of Technology-JPLCity of Hope Clinical Trial Information Service (CTIS)Cold Spring HarborColumbia University-Herbert IrvingConsumer Advocates in Research

and Related Activities (CARRA)Dartmouth-Norris CottonData Works DevelopmentDepartment of Veterans AffairsDrexel University Duke UniversityEMMES CorporationFirst Genetic TrustFood and Drug AdministrationFox Chase Fred HutchinsonGE Global Research CenterGeorgetown University-LombardiIBMIndiana UniversityInternet 2Jackson LaboratoryJohns Hopkins-Sidney Kimmel Lawrence Berkeley National Laboratory Massachusetts Institute of Technology Mayo Clinic Memorial Sloan KetteringMeyer L. Prentis-KarmanosNew York University

Ohio State University-Arthur G. James/Richard SoloveOregon Health and Science UniversityRoswell Park Cancer Institute St Jude Children's Research HospitalThomas Jefferson University-KimmelTranslational Genomics Research InstituteTulane University School of MedicineUniversity of Alabama at BirminghamUniversity of Arizona University of California Irvine-Chao FamilyUniversity of California, San FranciscoUniversity of California-DavisUniversity of ChicagoUniversity of ColoradoUniversity of Hawaii University of Iowa-HoldenUniversity of MichiganUniversity of MinnesotaUniversity of NebraskaUniversity of North Carolina-LinebergerUniversity of Pennsylvania-AbramsonUniversity of PittsburghUniversity of South Florida-H. Lee Moffitt University of Southern California-NorrisUniversity of VermontUniversity of WisconsinVanderbilt University-IngramVelosVirginia Commonwealth University-MasseyVirginia TechWake Forest UniversityWashington University-SitemanWistarYale UniversityNorthwestern University-Robert H. Lurie

“If caBIG™ accomplishes its mission and creates a robust grid for translational and

clinical research, within the cancer community, it will be deemed a failure.”

Bob Robbins (Fred Hutchinson Cancer Research Center), at the initial Strategic Planning Workspace meeting

“Prevention is Better than Cure”

--Desiderius Erasmus (1466-1536)

Embedding caBIG™ in the larger biomedical research community

Embrace the Larger Community

• Expand caGrid and caDSR into other biomedical domains:• Biomedical Informatics Research Network (BIRN): launched

in 2001 by NIH (NCRR), same concept, neurological disorders, smaller scale. Pilot brain tumor project shows substantial homology between BIRN and caBIG

• Cardiac Arrhythmia Research Network (CARNET): just launched by NHLBI, uses caGrid infrastructure adding cardiovascular terminology to DSR

• National Center for Biomedical Ontologies: Roadmap project drawing medical informatics expertise• This is a computer science project--developing some of the

next generation tools for the Grid• Healthgrid™: International (Europe-based) project

developing standards for information sharing• caBIG participants joining Healthgrid.US board of directors