cagrid executive introduction

44
caGrid Executive Introductio n caGrid 1.3 Justin Permar caGrid Knowledge Center https://cabig-kc.nci.nih.gov/Ca Grid/KC

Upload: rivka

Post on 23-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

caGrid Executive Introduction. caGrid 1.3 Justin Permar caGrid Knowledge Center https://cabig-kc.nci.nih.gov/CaGrid/KC. Agenda. Vision and Use Cases caGrid Introduction Building and Using caBIG Applications Component / Service Survey Grid Interactions Grid Service Deployment. Vision. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: caGrid Executive Introduction

caGrid Executive

IntroductioncaGrid 1.3

Justin PermarcaGrid Knowledge Center

https://cabig-kc.nci.nih.gov/CaGrid/KC

Page 2: caGrid Executive Introduction

2

Agenda

• Vision and Use Cases• caGrid Introduction• Building and Using caBIG Applications• Component / Service Survey• Grid Interactions• Grid Service Deployment

Page 3: caGrid Executive Introduction

3

Vision

• “Imagine, if you will, a resource that would give individual scientists the capacity to easily view aggregate information on thousands of patients; a system that would also allow both patients and physicians to have complete medical records - including the patient's personal genome, tests performed over time, and medications taken - available at the click of a mouse. Rather than recruiting patients into clinical trials by who walks into the clinic or by individual referral, clinician-scientists could scan a database for patients precisely matched to their study, even if the study is looking for patients with specific genomic alterations, mutations, or translocations.”

• “In efforts to increase both the efficacy and efficiency of cancer care, managers of healthcare systems would have patient outcome data from hospitals across the country to utilize in benchmarking their own outcomes in key areas and managing cost. These brief examples are just a glimpse of the power that could come from such an interconnected national biomedical resource.”

Source: John Niederhuber, Director, NCI

Page 4: caGrid Executive Introduction

4

About caBIG®

• caBIG® stands for the cancer Biomedical Informatics Grid®. caBIG® is an information network enabling all constituencies in the cancer community – researchers, physicians, and patients – to share data and knowledge.  The components of caBIG® are widely applicable beyond cancer as well.

• The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.

• The goals of caBIG® are to: • Connect scientists and practitioners through a shareable and interoperable

infrastructure • Develop standard rules and a common language to more easily share information • Build or adapt tools for collecting, analyzing, integrating, and disseminating

information associated with cancer research and care.

Source: https://cabig.nci.nih.gov/overview/

Page 5: caGrid Executive Introduction

5

Driving needs:cancer Biomedical Informatics Grid• A multitude of “legacy” information systems, most of which cannot be

readily shared between institutions• An absence of tools to connect different databases• An absence of common data formats• A huge and growing volume of data must be collected, analyzed, and made

accessible• Few common vocabularies, making it difficult, if not impossible, to interlink

diverse research and clinical results• Difficulty in identifying and accessing available resources• An absence of information infrastructure to share data within an institution,

or among multiple institutions• Avoid redundancy by re-building applications at multiple institutions

Page 6: caGrid Executive Introduction

6

What is the Grid?

• “Controlled and coordinated resource sharing and problem solving in dynamic, scalable virtual organizations.”1

• Securely sharing (with policies!):• Computers• Software• Data• Other Resources

1The Anatomy of the Grid: Enabling Scalable Virtual Organizations. I. Foster, C. Kesselman, S. Tuecke. International J. Supercomputer Applications, 15(3), 2001.

Page 7: caGrid Executive Introduction

7

What is caBIG?

• Common, widely distributed infrastructure that addresses common caBIG needs and permits the cancer research community to focus on innovation

• Shared, harmonized set of terminology, data elements, and data models that facilitate information exchange

• Collection of interoperable applications developed to common standards

• Cancer research data available for mining and integration

Page 8: caGrid Executive Introduction

8

Why Grid for caBIG?

Informatics Requirements Advantages of GridControlled Secure, role-based, locally-controlled

accessComprehensive Data from multiple types of sources

Connected Syntactic and Semantic Interoperability

Convenient Simplified and customizable interfaces

Cost Cost effective – builds on existing technologies

Compliant Implements policy & technical standards

Credible Built on experience & best practices

Adapted from Muzna Mirza, MD, MSHI’s presentation on Global Public Health Grid:http://cdc.confex.com/cdc/phin2009/webprogram/Paper21091.html

Page 9: caGrid Executive Introduction

9

Agenda

• Vision and Use Cases• caGrid Introduction• Building and Using caBIG Applications• Component / Service Survey• Grid Interactions• Grid Service Deployment

Page 10: caGrid Executive Introduction

• The “G” in caBIG

• Cancer Biomedical Informatics Grid

• Provides the software infrastructure that underlies the tools and applications of caBIG

• Analogous to the “power grid”

• A multitude of applications with differing requirements can seamlessly be plugged in to a common infrastructure

What is caGrid to caBIG?

Page 11: caGrid Executive Introduction

11

What is caGrid? (2)

• Biomedical applications that share data all have common needs for syntactic and semantic interoperability• caGrid aims to be a platform for interoperability

• caGrid is a Grid software toolkit aimed at software developers creating Grid applications

• caGrid provides• the GAARDS toolkit, a standard security platform• metadata services that add semantic information to all Grid services• Introduce, a toolkit to develop Grid services

• The Grid is a trusted network that supports collaborative biomedical research.• “Getting on the Grid” involves joining the trusted network by applying for

and utilizing Grid credentials

Page 12: caGrid Executive Introduction

12

Compatibility and Interoperability

caBIG® provides standards-based compatibility guidelines for creating software systems that are syntactically and semantically interoperable.

Page 13: caGrid Executive Introduction

The Grid Allows Users to Find and Utilize Data and Analytical resources

Grid service information is advertised to a Grid service directory called the Index service. This service is used to locate Grid services relevant to your research objectives.

Data or Analytical Resource

caBIOGrid

Service

Grid Service

Grid (Client Apps, Users)

Grid Service Directory(Index Service)

advertise

discover

Page 14: caGrid Executive Introduction

14

caGrid: High Level View

Once a caBIG® tool is adopted or adapted by members of the research community, the tool is connected to the Grid to securely share data and analysis routines with collaborating researchers.

Page 15: caGrid Executive Introduction

15

Infrastructure Focus Areas

• Leveraging Grid technologies and standards as an interoperability platform• Metadata Infrastructure

• Surfacing wealth of existing caBIG data-oriented metadata on the grid• Providing new service-oriented metadata

• Security• Integrating existing systems and applications with Grid security• Lowering burden of implementation of grid-wide and local policy

• Tooling for Service Developers• Powerful platform for bringing applications and data to the grid

• Facilitating Grid-wide operations• Federated query, workflow execution, resource discovery

• Making the Grid more accessible• Graphical installation and configuration, higher-level object-oriented APIs, web

portals, graphical administrative applications• Quality

• Comprehensive testing infrastructure, automated builds and test execution on multiple platforms, dashboard with historical archive

Page 16: caGrid Executive Introduction

16

More About Security

• Comprehensive security is critical for collaboration scenarios involving biomedical data sharing. The caGrid security components, collectively known as GAARDS, include the following services:

• Dorian – Allows users to login to the Grid• Authentication Service – Integrates existing institutional login capabilities with

the Grid• Grid Grouper – Allows institutions to implement group-based security policies• Grid Trust Service – Provides capabilities for Grid entities to trust each other• Credential Delegation Service – Provides the ability to securely transfer Grid

credentials to others• Web Single Sign-On – Allows a single login to provide access to multiple web

applications that utilize Grid services

Page 17: caGrid Executive Introduction

17

caGrid Integration with Existing Information Systems

• caGrid is an informatics platform that integrates and augments existing informatics infrastructure

• Examples include the following:• caGrid integrates existing repositories of semantic information such as

ontology servers• caGrid integrates with existing institutional login systems (e.g., LDAP)• caGrid shares data from existing databases and files

• In summary, caGrid integrates with existing systems to share and analyze data for multi-institutional clinical and research scenarios

Page 18: caGrid Executive Introduction

18

Getting Started with caGrid

• To get started developing Grid applications, first install caGrid

• Use the caGrid installer to load caGrid onto your development machine• Using the installer is the easiest way to install caGrid

• Features include:

• Guided, wizard-like interface for easy installation• The installer can be used to re-configure existing installations

• The only requirement to run the installer is the Sun® Java™ 5 Development Kit.

Page 19: caGrid Executive Introduction

19

Agenda

• Vision and Use Cases• caGrid Introduction• Building and Using caBIG Applications• Component / Service Survey• Grid Interactions• Grid Service Deployment

Page 20: caGrid Executive Introduction

20

caGrid Community Involvement: Building Grid Applications

• caGrid itself provides no real “data” or “analysis” to caBIG; caGrid enables the community to build services that share and analyze data

• The real “value” of the grid comes from bringing this information to the “end user”

• Community members develop end user applications which consume of the resources provided by the grid• A Grid data service shares data securely with collaborators• A Grid analytical service analyzes data• A Grid application utilizes multiple Grid services to aid clinical and research

workflows

Page 21: caGrid Executive Introduction

21

caCORE Development Process

• caCORE is a robust set of tools and resources to support the development of caBIG®-compatible systems

• NCI offers comprehensive training for caCORE tools

Create an Information

Model using a modeling tool

Information Models

Perform Semantic

Integration using the SIW

Vocabularies

Generate Code and Interfaces

using the caCORE SDK

Code Generator

APIs

Transform the Model into

Metadata using the UML Loader

CDEs

Generate a Grid Service using caGrid

Grid

Reference: Dr. Robert Freimuth, Vocabulary Knowledge Center Director

Page 22: caGrid Executive Introduction

22

UML Model Creation Process

• Enterprise Vocabulary Services (EVS)• Stores controlled terminologies used during semantic annotation• The SIW pulls concepts from EVS and attaches them to model

components• cancer Data Standards Repository (caDSR)

• Common Data Elements (CDEs)• UML model elements that are semantically annotated are added to the

caDSR as CDEs

Create a Logical Model

(UML class diagram) using

Enterprise Architect

Logical Model

Create a Data Model

(database schema) using

Enterprise Architect

Data Model

Semantically Annotate the UML Model

using the SIW

Semantics

Map the Logical Model

to the Data Model using caAdapter

Mapping

Model is complete and

ready for compatibility

review and load into caDSR

Load Model

Page 23: caGrid Executive Introduction

23

caBIG® Compatibility GuidelinesAreas of Interoperability

• Semantic Interoperability (VCDE)• Information Models• Vocabularies and Ontologies• Common Data Elements (CDEs)

• Syntactic Interoperability (Architecture)• Programming and Messaging Interfaces

An application must meet the criteria specified in all four areas to be "caBIG® Compatible"

Vocabularies Information Models

APIs

CDEs

Reference: Dr. Robert Freimuth, Vocabulary Knowledge Center Director

Page 24: caGrid Executive Introduction

24

caBIG® Compatibility GuidelinesLevels of Maturity

• Legacy: Implies no interoperability with an external system or resource

• Bronze: Minimum requirements to achieve basic interoperability

• Silver: Rigorous requirements to significantly reduce the barrier of use for parties not involved with development of that resource

• Gold: Extensions to silver that add standardization and harmonization practices to enable full syntactic and semantic interoperability

Vocabularies Information Models

APIs

CDEs

Source: https://cabig.nci.nih.gov/guidelines_documentation

Page 25: caGrid Executive Introduction

Using caBIG Applications

Page 26: caGrid Executive Introduction

26

Agenda

• Vision and Use Cases• caGrid Introduction• Building and Using caBIG Applications• Component / Service Survey• Grid Interactions• Grid Service Deployment

Page 27: caGrid Executive Introduction

27

caGrid 1.3 Core Services

All caGrid Core Services were redeployed on all caBIG® Grids (OSU Training, QA, Stage, and Production) for this release.

The (12) caGrid 1.3 Core Services are:

* New for 1.3 ** Significantly Rewritten or Enhanced for 1.3

Metadata Services Security Services Business Activity Services

Global Model Exchange Service**

Authentication Service** Federated Query Processor Service**

Index Service** Credential Delegation Service BPEL Workflow Service

Metadata Model Service* Dorian Service** Taverna Workflow Service*

Grid Grouper Service

Grid Trust Service (Master & Slave)

Page 28: caGrid Executive Introduction

28

What’s the use of metadata?• Service metadata is critical for finding Grid resources relevant to particular

research and clinical scenarios • Metadata describes the service functionality and meaning of data that are

shared by a Grid service• Scenario: Scientists and others using the Grid want to find and utilize existing

data sources and algorithms relevant to their research scenarios• Solution: Grid services register with a Grid service directory• Scenario: Users want to view the structure and relationships of data on the

Grid• Solution: The UML model defines the content of Grid data types and

relationships between these types• Scenario: Users need to know the format of the data described in a UML

model• Solution: XML schemas, stored in a Grid repository, define the data format to

act as the foundation for syntactic interoperability• Scenario: Scientists want to identify the meaning of the data described in a

UML model• Solution: Grid data is annotated with semantic information, such as use of

community-approved vocabulary and concept definitions

Page 29: caGrid Executive Introduction

29

What caGrid services provide this functionality?

• Scenario: Scientists and others using the Grid want to find and utilize existing data sources and algorithms relevant to their research scenarios• The Index Service included in caGrid is a Grid-wide service directory that serves as

the “white” and “yellow” pages of the Grid• Scenario: Users want to view the structure and relationships of data on the

Grid• Every data service provides a data model that represents the information in the

UML model• Scenario: Users need to know the format of the data described in a UML

model• The Global Model Exchange (GME) Service is a Grid-wide repository for XML

schemas• Scenario: Scientists want to identify the meaning of the data described in a

UML model• The Metadata Model Service (MMS) is used to add semantic information to caGrid

services• The MMS also is used to generate a Grid representation of the data in your UML

model, including semantic information

Page 30: caGrid Executive Introduction

30

How does caGrid use the caBIG semantic repositories?

• All caGrid Services are expected to publish a set of standard metadata which draws heavily from the metadata registered in caDSR and EVS• Common Metadata describes generic information about service providing

Cancer Center, points of contact, etc• The Service’s operations are defined and their inputs and outputs

described using CDEs in caDSR and vocabulary from EVS• Data Services additionally describe the domain Model they are exposing

• Classes, attributes, and associations from the UML model• Semantics of the UML model

Page 31: caGrid Executive Introduction

31

What security problems exist for multi-institutional data sharing scenarios?• Inter-institutional “trust”

• What institutions participate in the Grid? How can you verify that an identity is issued by an institution (that is claims to be from)?

• User authentication• How does a user prove their identity? How can we check that the identity

is legitimate?• User authorization

• How can institutions that share Grid services grant privileges to their collaborators?

• How can institutions that share data ensure their collaborators can only access data that the institutions intend to share?

• Data Integrity• How can institutions be sure that data they are sharing is transmitted

properly?• Data Security

• How can institutions be sure that they share data only with whom they intend to share data?

• Allowing services to retrieve and analyze data on your behalf

Page 32: caGrid Executive Introduction

32

What caGrid Services Address these Security Scenarios?• Inter-institutional “trust”

• The Grid Trust Service (GTS) is used to establish a trust fabric, which is a collection of authoritative certificate authorities

• User authentication• Dorian has a CA that is an essential part of the trust fabric• Dorian issues both host certificates and user credentials that are trusted by others

in the Grid because they have synchronized with the trust fabric• The Authentication Service allows institutions to integrate their local user

management systems with the Grid• User authorization

• Grid Grouper provides group management, which in turn, allows service developers to add group-based authorization policies

• The Common Security Module (CSM) can be used to protect individual data elements shared by a Grid data service

Page 33: caGrid Executive Introduction

33

What caGrid Services Address these Security Scenarios? (2)• Data Integrity

• caGrid supports checksums to ensure that data has not been altered during transmissions

• Data Security• caGrid supports encryption to ensure that data cannot be read by others during

transmission• Allowing services to work for you

• The credential delegation service (CDS) allows you to hand your credential to a third party for a specified period of time

Page 34: caGrid Executive Introduction

34

How do Grid applications use core caGrid services?

• The user community adds data services and analytical services to the Grid• These services share data and analytical resources with others

• Multi-institutional collaborations will require the use of multiple Grid services• caGrid provides “higher-level” services that utilize the aforementioned Grid

services• The Federated Query Processor (FQP) provides applications with capabilities to

aggregate data from multiple (equivalent) data services and to join data from multiple data services

• The workflow services allow users to specify interactions between services to achieve a desired result

• For example, retrieve all ECG data for subjects in our clinical trial and calculate the mean QT value, storing the data in a results data service

Page 35: caGrid Executive Introduction

35

Other caGrid Utilities and APIs

• CQL and DCQL• CQL is the “caGrid Query Language” that is used to retrieve data from caGrid data

services• DCQL is the distributed query language that is used for federated query processing

• Web Single Sign On• The Web Single Sign On component allows users to sign in once and use multiple

secure web applications• Introduce

• Grid application developers use the Introduce toolkit to create data and analytical services

• The Introduce toolkit can be extended to add project-specific functionality

Page 36: caGrid Executive Introduction

36

An example Introduce development process (0 lines of developer code!)

Generate Code and Messaging Interfaces using the caCORE SDK Code Generator

PerformSemantic Integration using the Semantic Integration Workbench (SIW)

Create an Information Model in a modeling Tool

Transform the Information Model into Metadata using the UML Loader

y

Generate a caGrid Interface using “Introduce”

y

Getting Connected: Deploying to caGrid™Create Semantically Harmonized Data Model Grid-ifyGenerateData Resource

Page 37: caGrid Executive Introduction

37

Agenda

• Vision and Use Cases• caGrid Introduction• Building and Using caBIG Applications• Component / Service Survey• Grid Interactions• Grid Service Deployment

Page 38: caGrid Executive Introduction

38

Grid Workflows utilize core Grid Services

• The Grid services that are included in caGrid provide a core set of features available for Grid usage scenarios

• Grid workflows are software implementations of real-life clinical and research workflows

Figure: Example Data Analysis Workflow

Page 39: caGrid Executive Introduction

39

Example Image Analysis Scenario

Each image processing step is a Grid service

Each step in background correction is an operation

Source: Joel H. Saltz, Scott Oster, Shannon L. Hastings, Stephen Langella, Renato A. Ferreira, Justin D. Permar, Ashish Sharma, David W. Ervin, Tony C. Pan, Umit V. Catalyurek, Tahsin M. Kurc, "Translational research design templates, Grid computing, and HPC", IEEE International Symposium on Parallel and Distributed Processing., : pp. 1-15, June, 2008. http://bmi.osu.edu/publications_more.php?ID=1113

Page 40: caGrid Executive Introduction

40

Agenda

• Vision and Use Cases• caGrid Introduction• Building and Using caBIG Applications• Component / Service Survey• Grid Interactions• Grid Service Deployment

Page 41: caGrid Executive Introduction

41

Joining the Grid

• During Grid service creation, the service creator specifies the authentication and authorization requirements for the service• For example, a service can require that users must authenticate with the service in order to

communicate• Specify authorization options (CSM/Grid Grouper) that are needed to support data retrieval and

analysis operations that the service offers. A service can require authorization at the service level, operation level, and data level (give the user permission to retrieve only what they are allowed to view)

• Configure a container to host the service• Two types of containers: secure and non-secure• A non-secure container can only host non-secure services and does not support authentication

or authorization• A secure container can host secure and non-secure services and will support authentication

and authorization as specified by the service• A secure container has its own identity that it uses to communicate with the rest of the

Grid• Deploy the service to the container and start the container• The service advertises itself to the Grid service directory

• The service directory, in turn, asks your service for information about its operations and data

Page 42: caGrid Executive Introduction

42

The Role of Grid Policy

• The virtual organizations that join a Grid collectively establish (and enforce) policies that govern the use of the Grid• Security policies

• How long can a user Grid session last?• Data sharing policies

• Sharing de-identified data? Limited data sets? PHI?• Service level agreements

• What requirements are imposed on service providers?• Other domain-specific policies

Page 43: caGrid Executive Introduction

43

Project Resources and Communication

• cagrid.org• Software Downloads• Documentation• Tutorials• Technical Paper and Presentations• FAQs

• caBIG® caGrid Knowledge Center• Knowledge Base• Forums• Enterprise Support• Community engagement• https://cabig-kc.nci.nih.gov/CaGrid/KC/index.php/Main_Page

• caGrid GForge Home (project website)• Feature Requests• Bug Reports• http://gforge.nci.nih.gov/projects/cagrid-1-0/

• caGrid Portal (web portal)• http://cagrid-portal.nci.nih.gov/

Page 44: caGrid Executive Introduction

44

Acknowledgments

• THANK YOU• caGrid Development team• caBIG® Documentation and Training team