cagrid overview and core services cagrid knowledge center february 2011

Click here to load reader

Download CaGrid Overview and Core Services caGrid Knowledge Center February 2011

Post on 05-Jan-2016




1 download

Embed Size (px)


  • caGrid Overview and Core ServicescaGrid Knowledge Center

    February 2011

  • caGrid A Grid software middleware infrastructure consisting of services, toolkits, APIs, and runtime environmentStandards Based, Open SourceBuilding blocks to create interoperable, Grid-enabled systemsService Oriented ArchitectureWeb Services Resource Framework standardsModel Driven ArchitectureObject oriented view, published information models, strongly-typed servicesRich metadataA production Grid deployment of the core services provided by that infrastructureSecurity, Data Services Infrastructure, Service Development & Deployment, Metadata, Federated Query, Workflow, Advertisement & DiscoveryProvides the software foundation which underlies the tools and applications of caBIG

  • Application ScenarioA clinician/researcher is involved in a multi-institutional clinical trial of a new targeted therapeutic Microarray, Proteomic, and Image data are collected from patients participating in the trialResearcher wants to carry out a correlative analysis to assess the treatment Query and analyze microarray, image, and protein data from multiple patients to find interesting patternsLook for similar patterns in other microarray, protein, and image databasesPatients may have been seen at multiple institutionsDatasets may have been collected at different institutions

  • Application ScenarioLocation AMicroarray, Protein, Image dataLocation BMicroarray, Protein, Image dataLocation CMicroarray, Protein, Image dataLocation CImage AnalysisLocation DImage AnalysisMicroarray and protein databases at other institutions Different database systems, different data representations, securityDifferent invocations of programs, remote access, how to transfer data.

  • caGrid Production Environment

  • Infrastructure Core CapabilitiesModel-Driven and Metadata Enabling and supporting interoperable servicesProviding service-oriented metadataService development and deploymentTooling for bringing applications and data to the gridAdvertisement and Discovery Publishing services to the GridEnabling search for services based on service metadataSecurityIntegrating existing systems and applications with Grid securityLowering burden of implementation of grid-wide and local policyFacilitating Grid wide operationsFederated query, workflow executionMaking services and core infrastructure more accessibleGraphical installation and configuration, higher-level object-oriented APIs, web portals, graphical administrative applications

  • Model Driven, Interoperable ServicesClient and service APIs are object oriented, and operate over well-defined and curated data types

    Objects are defined in UML and Components, which are in turn registered in the Cancer Data Standards Repository (caDSR)

    Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described

    XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)


  • Global Model Exchange and Metadata Model ServicesGlobal Model ExchangeProvides support to store and retrieve schemas for types used in Grid services. Developers should register the schemas defining types used in Grid services with the GME.

    Metadata Model Service (MMS) Provides support for developers to generate and add service metadata Developers can augment standard caGrid service metadata with information from metadata registries, such as the caDSRExternal registry provides the means to add, modify, delete, or otherwise manage the UML models and their correspondence to XML Schemas which the MMS leverages

  • Service Development and Deployment: Introduce A framework which enables fast and easy creation of Grid services.Provides easy to use graphical service authoring tool.Hides all grid-ness from the developer.Handles all core service architecture requirements for strongly typed and highly interoperable grid services.Integration with other core grid services and architecture componentsGAARDS Security InfrastructureGlobus Index ServiceGlobal Model ExchangeMetadata Model ServiceCancer Data Standards RepositoryExtension Framework for integrating with other architecture components

  • Introduce FeaturesSupports modification of operationsAdding operationsRemoving OperationsUpdating OperationsImporting OperationsGraphical ConfigurationAdvertisementSecurityService Metadata SpecificationService Metadata EditingService Configuration PropertiesAuto Generates Code for ServiceAuto generates a client API for service.Graphical Deployment of ServiceGlobusTomcatJBoss

  • Advertisement and Discovery: Index ServiceAll services register their service metadata information to the Index ServiceClients can discover services using a discovery API which facilitates inspection of data typesLeveraging semantic information in EVS (from which service metadata is drawn), services can be discovered by the semantics of their data typesExamples:Find me all the services from Cancer Center XWhich Analytical services take Genes as input?Find me all the services with some metadata mentioning the string macromolecules


    Queries ServiceMetadata Aggregated In

    Core Services

  • Service Metadata: Data ServiceData Service MetadataDescribes the Domain Model being exposed, in terms of a UML model linked to semantics

    Data types defined in terms of structure and semantics extracted from caDSR and EVS

    Auto-generated by caGrid service authoring toolkit (Introduce)

  • Security ServicesAuthenticationHow to identify a client (or a service)Secure login Integrate the Grid with existing institutional login systems!Enforce data sharing policies and access controlLocal policiesFederated accessTrust FabricHow to trust a client and what levelDynamically adapt trust if security breach

  • caGrid Security Infrastructure (GAARDS) DorianAllows accounts managed in external domains to be federated and managed in the Grid.Allows users to use their existing credentials (external to the Grid) to authenticate to the GridGrid Grouper/CSMProvides a group-based authorization solution for the GridGrid Trust ServiceSupports applications and services in deciding whether or not signers of digital credentials can be trusted.Supports the provisioning of trusted certificate authorities and corresponding certificate revocation lists.

    Provides services and tools for the administration and enforcement of security policy in an enterprise Grid.

  • Secure Clinical Research Support with GAARDSUse Dorian for grid authenticationIntegrate with my LDAP user database and authenticationUse Grid Grouper (along with local mechanisms) for Grid authorizationI let reviewers from institution X access patient data in the Watson research trial for review onlyData Entry personnel for the research trial have permission to add new data, but not update existing dataI bar institution X from accessing any other data Im sharing on the GridUse GTS to update the grid trust fabricI trust institution Y after finalizing data sharing agreements for the Watson research

  • caGrid Data Service InfrastructurecaGrid Data Services provide capability to expose data resources to the GridSpecialization of caGrid grid services to expose data through a common query interfaceIntroduce extensions to create data services from information models and using caCORE SDKQueries made with caBIG Query Language Query objects. Specifies a target object (result) type and selects the instances which satisfy the specified properties and nested object propertiesAbility to return full Objects, Set of attributes, count of results, or distinct attribute valuesSupport for Bulk Data Transport for efficient transfer of large data volumes

  • Federated Query Processor ServiceProvides a mechanism to perform basic distributed aggregations and joins of queries over multiple data services Can be used to express queries against any combination of caGrid data services, since each service uses CQL

    Federated queries are expressed using DCQL, an extension to CQLExpress joins, aggregations, and target data services

    Client API provides a means of expressing DCQL queriesFederated Query Processor service partitions a DCQL query into queries to respective data services, carries out joins and aggregations, and compiles the results *

  • Workflow ServiceProvides capability to describe orchestrations of service invocations and data movement

    Support two workflow execution enginesActiveBPEL (Deprecated in caGrid 1.4)Taverna

    Coupled with semantic discovery, service metadata, and registration of data type structures in caGrid, provides a powerful framework for analyzing dataServices can be dynamically discovered and federated queries can be invoked as part of a workflow

  • Putting It Together for Example ScenarioLocation AMicroarray, Protein, Image dataLocation BMicroarray, Protein, Image dataLocation CMicroarray, Protein, Image dataLocation CImage AnalysisLocation DImage AnalysisMicroarray and protein databases at other institutions caGrid Service InterfacescaGrid EnvironmentRegistered Object DefinitionsAdvertisementLog on, Grid credentialsQuery and Analysis WorkflowDiscovery

    ******Dorian, GTS, Grid Grouper/CSM, Authentication Service, CDSAuthentication: Can enforce role-based and context-based data access policies**