Transcript
Page 1: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

caGrid Overview and Core Services

caGrid Knowledge Center

February 2011

Page 2: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

caGrid

• A Grid software middleware infrastructure consisting of services, toolkits, APIs, and runtime environment• Standards Based, Open Source

• Building blocks to create interoperable, Grid-enabled systems

• Service Oriented Architecture• Web Services Resource Framework standards

• Model Driven Architecture• Object oriented view, published information models, strongly-typed services• Rich metadata

• A production Grid deployment of the core services provided by that infrastructure• Security, Data Services Infrastructure, Service Development &

Deployment, Metadata, Federated Query, Workflow, Advertisement & Discovery

• Provides the software foundation which underlies the tools and applications of caBIG

Page 3: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Application Scenario

• A clinician/researcher is involved in a multi-institutional clinical trial of a new targeted therapeutic • Microarray, Proteomic, and Image data are collected from patients

participating in the trial• Researcher wants to carry out a correlative analysis to assess the

treatment • Query and analyze microarray, image, and protein data from

multiple patients to find interesting patterns• Look for similar patterns in other microarray, protein, and

image databases• Patients may have been seen at multiple institutions• Datasets may have been collected at different institutions

Page 4: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Application Scenario

Location AMicroarray, Protein, Image data

Location BMicroarray, Protein, Image data

Location CMicroarray, Protein, Image data

Location CImage Analysis

Location DImage Analysis

Microarray and protein databases at other institutions

Different database systems, different data

representations, security

Different invocations of programs, remote

access, how to transfer data.

Page 5: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

caGrid Production Environment

Page 6: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Infrastructure Core Capabilities

• Model-Driven and Metadata • Enabling and supporting interoperable services• Providing service-oriented metadata

• Service development and deployment• Tooling for bringing applications and data to the grid

• Advertisement and Discovery • Publishing services to the Grid• Enabling search for services based on service metadata

• Security• Integrating existing systems and applications with Grid security• Lowering burden of implementation of grid-wide and local policy

• Facilitating Grid wide operations• Federated query, workflow execution

• Making services and core infrastructure more accessible• Graphical installation and configuration, higher-level object-oriented APIs, web

portals, graphical administrative applications

Page 7: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Model Driven, Interoperable Services

• Client and service APIs are object oriented, and operate over well-defined and curated data types

• Objects are defined in UML and Components, which are in turn registered in the Cancer Data Standards Repository (caDSR)

• Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described

• XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)

Service

Core Services

Client

XSDWSDL

Grid Service

Service Definition

Data TypeDefinitions

Service API

Grid Client

Client API

Registered In

Object Definitions

SemanticallyDescribed In

XMLObjectsSerialize To

ValidatesAgainst

Client Uses

Cancer Data Standards Repository

Enterprise Vocabulary

Services

Objects

GlobalModel

Exchange

GMERegistered In

ObjectDefinitions

Objects

Page 8: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Global Model Exchange and Metadata Model Services

• Global Model Exchange

• Provides support to store and retrieve schemas for types used in Grid services.

• Developers should register the schemas defining types used in Grid services with the GME.

• Metadata Model Service (MMS)

• Provides support for developers to generate and add service metadata

• Developers can augment standard caGrid service metadata with information from metadata registries, such as the caDSR

• External registry provides the means to add, modify, delete, or otherwise manage the UML models and their correspondence to XML Schemas which the MMS leverages

Page 9: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Service Development and Deployment: Introduce

• A framework which enables fast and easy creation of Grid services.• Provides easy to use graphical service authoring tool.• Hides all “grid-ness” from the developer.• Handles all core service architecture requirements for strongly

typed and highly interoperable grid services.• Integration with other core grid services and architecture components

• GAARDS Security Infrastructure• Globus Index Service• Global Model Exchange• Metadata Model Service• Cancer Data Standards Repository

• Extension Framework for integrating with other architecture components

Page 10: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Introduce Features

• Supports modification of operations• Adding operations• Removing Operations• Updating Operations• Importing Operations

• Graphical Configuration• Advertisement

• Security

• Service Metadata Specification

• Service Metadata Editing

• Service Configuration Properties

• Auto Generates Code for Service• Auto generates a client API for service.• Graphical Deployment of Service

• Globus

• Tomcat

• JBoss

Page 11: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Advertisement and Discovery: Index Service

Core Services

Grid Service

Uses TerminologyDescribed In

Cancer Data Standards Repository

Enterprise Vocabulary

Services

References ObjectsDefined in

Index Service

Service Metadata

Publishes

Subscribes Toand Aggregates

Queries ServiceMetadata Aggregated In

Registers To

Discovery Client API

All services register their service metadata information to the Index Service

• Clients can discover services using a discovery API which facilitates inspection of data types

• Leveraging semantic information in EVS (from which service metadata is drawn), services can be discovered by the semantics of their data types

Examples:“Find me all the services from Cancer Center X”“Which Analytical services take Genes as input?”“Find me all the services with some metadata mentioning the string ‘macromolecules’”

Page 12: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Service Metadata: Data Service

• Data Service Metadata

• Describes the Domain Model being exposed, in terms of a UML model linked to semantics

• Data types defined in terms of structure and semantics extracted from caDSR and EVS

• Auto-generated by caGrid service authoring toolkit (Introduce)

Page 13: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Security Services

• Authentication• How to identify a client (or a service)• Secure login • Integrate the Grid with existing institutional login systems!

• Enforce data sharing policies and access control• Local policies• Federated access

• Trust Fabric• How to trust a client and what level• Dynamically adapt trust if security breach

Page 14: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

caGrid Security Infrastructure (GAARDS)

• Dorian• Allows accounts managed in external

domains to be federated and managed in the Grid.

• Allows users to use their existing credentials (external to the Grid) to authenticate to the Grid

• Grid Grouper/CSM• Provides a group-based authorization

solution for the Grid

• Grid Trust Service• Supports applications and services in

deciding whether or not signers of digital credentials can be trusted.

• Supports the provisioning of trusted certificate authorities and corresponding certificate revocation lists.

Provides services and tools for the administration and enforcement of security policy in an enterprise Grid.

Page 15: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Secure Clinical Research Support with GAARDS

• Use Dorian for grid authentication• Integrate with my LDAP user database and authentication

• Use Grid Grouper (along with local mechanisms) for Grid authorization• I let reviewers from institution X access patient data in the “Watson

” research trial for review only• Data Entry personnel for the research trial have permission to add

new data, but not update existing data• I bar institution X from accessing any other data I’m sharing on the

Grid

• Use GTS to update the grid trust fabric• I trust institution Y after finalizing data sharing agreements for the

Watson research

Page 16: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

caGrid Data Service Infrastructure

• caGrid Data Services provide capability to expose data resources to the Grid• Specialization of caGrid grid services to expose data through a

common query interface• Introduce extensions to create data services from information models

and using caCORE SDK• Queries made with caBIG Query Language Query objects.

• Specifies a target object (result) type and selects the instances which satisfy the specified properties and nested object properties

• Ability to return full Objects, Set of attributes, count of results, or distinct attribute values

• Support for Bulk Data Transport for efficient transfer of large data volumes

Page 17: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Federated Query Processor Service

• Provides a mechanism to perform basic distributed aggregations and joins of queries over multiple data services

• Can be used to express queries against any combination of caGrid

data services, since each service uses CQL

• Federated queries are expressed using DCQL, an extension to CQL• Express joins, aggregations, and target data services

• Client API provides a means of expressing DCQL queries• Federated Query Processor service partitions a DCQL query into

queries to respective data services, carries out joins and aggregations, and compiles the results

17

Page 18: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Workflow Service

• Provides capability to describe “orchestrations” of service invocations and data movement

• Support two workflow execution engines• ActiveBPEL (Deprecated in caGrid 1.4)• Taverna

• Coupled with semantic discovery, service metadata, and registration of data type structures in caGrid, provides a powerful framework for analyzing data• Services can be dynamically discovered and federated queries

can be invoked as part of a workflow

Page 19: CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Putting It Together for Example Scenario

Location AMicroarray, Protein, Image data

Location BMicroarray, Protein, Image data

Location CMicroarray, Protein, Image data

Location CImage Analysis

Location DImage Analysis

Microarray and protein databases at other institutions

caGrid Service Interfaces

caGrid Environment

Registered Object Definitions

Advertisement

Log on, Grid credentials

Query and Analysis Workflow

Discovery


Top Related