silver to grid data services session iii: deploying a data service on cagrid and using cagrid...
TRANSCRIPT
Silver to Grid Data Services Session III: Deploying a Data Service on caGrid and using caGrid ServiceAPIs
caBIG™ Annual Meeting
June 23-25, 2008
Overview of Sessions I - III
• Divided into three 1 hour 15 minutes sessions
Tuesday Wednesday
10:15 - 11:30 a.m. 10:00 - 11:15 a.m. Session I: • Overview of the Silver to Grid
training program.• Presentation and Live Demo of
caGrid Semantic Interoperability.
12:45 - 2:00 p.m. 12:30 - 1:45 p.m Session II (Lessons 1-5):
Developing a Silver-Level Compatible Data Service API
2:15 - 3:30 p.m. 2:00 - 3:15 p.m Session III (Lessons 6-9):
Deploying a Data Service on caGrid and using caGrid Service APIs
Acknowledgements
• Peter McGarvey • Baris Suzek• Mike Keller• Dianne Reeves • George Komatsoulis• Avinash Shanbhag• Becky Angeles• Jennifer Brush• Jamie Parker• Claire Wolfe• Ken Smith• Sal Mungal• Virginia Hetrick• Shannon Hastings • Architecture/VCDE workspace participants
Session III: Deploying a Data Service on caGrid and using caGrid Service APIs
Session III: Lessons
• Lesson 6: Installing caGrid node for Data Service Deployment
• Lesson 7: Deploying a caGrid Data Service• Lesson 8: Using caGrid Data Services• Lesson 9: Using caGrid Metadata Service APIs
Lesson 6:Installing caGrid for Data Service
Deployment
Installing caGrid for Data Service Deployment
Outline
• Overview• caGrid• caGrid Infrastructure
• Step-by-step caGrid Installation for Data Service Deployment
What is caGrid?
• Development project of Architecture Workspace
• Service oriented infrastructure that supports caBIG™• An architecture that allows building a grid of your own
• Enables collaborating institutions to share information and analytical resources efficiently and securely
caGrid Community Involvement
• caGrid itself provides no real “data” or “analysis” to caBIG™; its the enabling infrastructure which allows the community to develop• Analytical Services• Data Services
• Community members add value to the grid as applications, services (data/analytical), and processes • caGrid provides the necessary core services, APIs, and tooling
• Community members develop end user applications/clients which consume the resources provided on the grid
caGrid Infrastructure
• Client and service APIs are object oriented, and operate over well-defined and curated data types
• Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR)
• Object definitions are drawn from controlled terminology and the vocabulary is registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described
• Objects are serialized to XML that adhere to XML schemas registered in the Global Model Exchange (GME)
Service
Core Services
Client
XSDWSDL
Grid Service
Service Definition
Data TypeDefinitions
Service API
Grid Client
Client API
Registered In
Object Definitions
SemanticallyDescribed In
XMLObjectsSerialize To
ValidatesAgainst
Client Uses
Cancer Data Standards Repository
Enterprise Vocabulary
Services
Objects
GlobalModel
Exchange
GMERegistered In
ObjectDefinitions
Objects
caGrid Infrastructure – cont’d
• Service and the hosting center metadata is registered in Index Service
caGrid Installation: Before starting
• Dowload caGrid 1.2 Installerhttp://gforge.nci.nih.gov/frs/download.php/3738/caGrid-installer-1.2.zip
• Setting environment variables • JAVA_HOME : Location of Java JDK 1.5.X• ANT_HOME: Location of Ant 1.6.5• CATALINA_HOME: Location of Tomcat ver. 5.0.28• GLOBUS_LOCATION: Location of Globus Toolkit ver. 4.0.3
If not available, caGrid Installer installs Ant, Globus Toolkit and/or Tomcat
• Unzip caGrid-installer-1.2.zip• Run caGrid installer:
• java -jar caGrid-installer-1.2.jar
caGrid Installation: Installation Types
• Choose any combination of installation types to install one or more caGrid components
• For data service deployment, choose options “Install caGrid” and “Configure Container”
caGrid Installation: Service Container
• Choose Tomcat or Globus as service containers
caGrid Installation: Prerequisites
• Install (or reinstall) prerequisite software
Ant
Tomcat
Globus Toolkit
caGrid Installation: Location
• Provide the directory where caGrid will be installed
caGrid Installation: Target Grid
Each target grid basically uses different URLs for caGrid core services. For instance service URLs for OSU Training Grid are:cagrid.master.index.service.url=http://training03.cagrid.org:6080/wsrf/services/DefaultIndexServicecagrid.master.cadsr.service.url=http://training02.cagrid.org:6080/wsrf/services/cagrid/CaDSRServicecagrid.master.gme.service.url=http://training02.cagrid.org:6080/wsrf/services/cagrid/GlobalModelExchangecagrid.master.gridgrouper.service.url=https://training03.cagrid.org:6443/wsrf/services/cagrid/GridGroupercagrid.master.dorian.service.url=https://dorian.cagrid.org:6443/wsrf/services/cagrid/Dorian
Choose one of the available grids:
• NCICB Development• NCICB Production• NCICB QA• OSU Development• OSU Trainingand more
caGrid Installation: Container Configuration
• Securing container is needed to host secure services.
• Secure services are those that require clients to use one of the Globus Security Infrastructure (GSI) authentication mechanisms.
caGrid Installation: Completion
Additional Information
• caGrid Wiki:• http://www.cagrid.org/mwiki/index.php?title=CaGrid
• caBIG™ Architecture WS caGrid Web Page:• https://cabig.nci.nih.gov/workspaces/Architecture/caGrid/
Lesson 7:Deploying a caGrid Data
Service
Deploying a caGrid Data Service
Outline
• Overview• Major steps for deployment• Introduce Toolkit
• Step-by-step deployment of a Data Service; gridPIR
caGrid Data Service Deployment – Major steps
• Provide client and service APIs that are object oriented
• Provide objects that are defined in UML and registered in the Cancer Data Standards Repository (caDSR)
• Provide object definitions drawn from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS)
• Provide XML schemas that for XML serialization of objects (may be registered in Global Model Exchange)
• Provide service metadata about the center where service is deployed
Service
Core Services
Client
XSDWSDL
Grid Service
Service Definition
Data TypeDefinitions
Service API
Grid Client
Client API
Registered In
Object Definitions
SemanticallyDescribed In
XMLObjectsSerialize To
ValidatesAgainst
Client Uses
Cancer Data Standards Repository
Enterprise Vocabulary
Services
Objects
GlobalModel
Exchange
GMERegistered In
ObjectDefinitions
Objects
caGrid Data Service Deployment – Major steps
• Register service metadata about the service and the center where service is deployed
Service Metadata (Domain Model Portion)
<ns135:UMLAttribute dataTypeName="CHARACTER" description="UniProtKB primary accession number." name="uniprotkbPrimaryAccession" publicID="2322254" version="1.0">
<ns135:SemanticMetadata conceptCode="C25402" conceptDefinition="A control number unique …..” conceptName="Accession Number" order="1"/>
…..
<ns135:ValueDomain longName="Protein UniProtKB Primary Accession Number Genomic Identifier">
<ns135:enumerationCollection/>
</ns135:ValueDomain>
</ns135:UMLAttribute>
Introduce: Grid Service Authoring Toolkit
• An open-source and extensible toolkit
• Supports easy development and deployment of WS/WSRF compliant Grid services by hiding low level details of the Globus Toolkit
• Enables the implementation of strongly-typed Grid services
• Facilitates caGrid data service development using caCORE SDK artifacts through pluggable service styles
Deploying a caGrid data service using Introduce: Grid-enablement of Protein Information Resource (gridPIR)
• A data service to provide comprehensive and fully annotated protein related information for genomic and proteomic cancer research
• Developed using model driven approach and caCORE SDK 3.2.1
• All data is public so no security layer implemented
Introduce: Create a caGrid Service
ant introduce
Modify an existing service
Deploy an existing service
Browse Data Types from caDSR or GME
Introduce: Enter service information
• An analytical service exposes operation(s) with input/output objects
• A data service exposes objects that presents the data resource
Introduce: Data Service Configuration
Different Service Styles (including caCORE SDK) supported.
gridPIR is generated using caCORE SDK v3.2.1
Optional extensions for Bulk Data Transfer or Web Services Enumeration
Introduce: caCORE SDK-generated Client Selection
Two options for client selection:
Option 1: Use remote API if data service caCORE-like system (API) and caGrid Data Service are on the different machines
Option 2: Use local API if both caCORE-like system (API) and caGrid Data Service are deployed on the same machine
Introduce: Remote API Selection
Library folder (including client jar) generated by caCORE SDK
Introduce: Remote API Selection
Treat all queries case-insensitive
Use Common Security Module
Enter URL for remote caCORE-like gridPIR API (publicly accessible)
Introduce: Choosing objects (model) service exposes
4. Add selected packages
1. Fetch models from caDSR
2. Select gridPIR model v1.2
3. Select package from gridPIR model
Introduce: Choosing XML Schema
Find schemas from GME (if registered)
OR
Resolve schemas manually
Introduce: Choosing XML Schema – Manual Resolution (cont’d)
XSD generated by caCORE SDK
Introduce: Entering Service Description
1. Select Metadata Tab
2. Select ServiceMetadata row
3. Edit Property
Introduce: Entering Service Metadata (cont’d)
Enter:
• POC
• Hosting Center
• Address
Introduce: Deploy gridPIR Data Service
Deploy an existing service
Introduce: Selecting Data Service Location in the file system
Compiled service stubs
Metadata filesLibrary filesXML schemasSource code for service stubs
Introduce: Selecting Data Service Location in the file system
Container information
Register to Index Service?
URL for Index Service
Verifying Deployment
URL for deployed service
Outcome
Additional Information
• caGrid Wiki:• http://www.cagrid.org/mwiki/index.php?title=CaGrid
• Introduce Toolkit Wiki:• http://www.cagrid.org/mwiki/index.php?title=Introduce
• caGrid Data Services Wiki:• http://www.cagrid.org/mwiki/index.php?title=Data_Services
• caBIG™ Architecture WS caGrid Web Page:• https://cabig.nci.nih.gov/workspaces/Architecture/caGrid/
Lesson 8:Using caGrid
Data Services
Outline
• Overview • CQL• Using caGrid Portal• Using Generic Data Service Client
• CQL Examples using gridPIR Data Service
Executing a Data Service Query
Query
Results
caBIG Query Language (CQL)
CQL Query: A simple wrapper element at the head of every CQL query document., contains the target.
Target : The Target element is of the type Object, and describes the data type which the query will return.
QueryModifier: An optional element modifying the returned result set.
This modifier has a required attribute ‘countOnly’ and optionally allows for a choice of a list of Attribute Names or a single Distinct Attribute to return.
Object: Contains the required attribute ‘name.’ This attribute’s value defines the caDSR class of the object. When the Object is the top level target of a CQL query, it identifies the data type that will be returned by the caGrid Data Service. The Object allows for a choice between three child elements. The possible child elements are Attribute, Association, and Group. Objects may have at most one of these child elements. Groups also have an attribute ‘logicOperator,’ an enumeration of the
values “AND” and “OR.”
http://www.cagrid.org/mwiki/index.php?title=Data_Services:CQL
caGrid Portal
Discovery Data Service Query
Portal Allows
Discovery
Exploration of:• Domain Models• Semantic Metadata• Hosting Center Info
Data Queries
Example:Query on Gene Objects
caGrid Portal
1) Select “Edit Query Modifiers”
2) Select “Object” then “Update”
3) Update then “Add Criterion”
4) Select attribute “name”
5) Set “name” EQUAL_TO “BRCA1”
6) Update
caGrid Portal
7) Submit Query
8) When query is finished Select View Results
9) Query returns 11 Gene Objects and attribute values for each
caGrid Portal
See Query and Results as XML
Generic Data Service Client
• Can be used for caGrid data services that are based on caCORE-like systems since such services usually expose only the query method in its public API
• For services that have additional methods other than query, specific clients generated by the Introduce Toolkit needs to be used
• Typically query involves three steps• Initialization• Creating and submitting a CQL query• Processing the results
Generic Data Service Client: Initializing the client
• For gridPIR Data Service:
String serviceURL= "http://141.161.25.20:8080/wsrf/services/cagrid/GridPIR";
DataServiceClient client=new DataServiceClient(serviceURL);
Generic Data Service Client: Creating a CQL query
• Option 1: Create CQLQuery object programmatically:
//CQL query to retrieve all BRCA1 genes
CQLQuery query = new CQLQuery();
Object target = new Object();
target.setName(Gene.class.getName());
Attribute nameAttribute = new Attribute(“name", Predicate.EQUAL_TO, “BRCA1");
target.setAttribute(nameAttribute);
query.setTarget(target);
class Logical Model
domain::Gene
- id: Long- entrezGeneId: Long- name: String
Generic Data Service Client: Creating a CQL query
• Option 2: Load CQLQuery object from an XML string or file:
// from a string
CQLQuery query2 = (CQLQuery) Utils.deserializeObject( new StringReader(“
<ns1:CQLQuery xmlns:ns1="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
<ns1:Target name="edu.georgetown.pir.domain.Gene">
<ns1:Attribute name="name" predicate="EQUAL_TO" value="brca1"/>
</ns1:Target>
</ns1:CQLQuery>
"), CQLQuery.class);
// from a file
CQLQuery query3 = (CQLQuery) Utils.deserializeObject( new FileReader(cqlFile), CQLQuery.class);
class Logical Model
domain::Gene
- id: Long- entrezGeneId: Long- name: String
Generic Data Service Client: Submitting the CQL Query
• Results are returned as CQLQueryResults object:
try {
CQLQueryResults results = client.query(query);
} catch (QueryProcessingExceptionType ex) {
// handle processing exception
} catch (MalformedQueryExceptionType ex) {
// handle malformed query
} catch (RemoteException ex) {
// handle remote exception
}
Generic Data Service Client: Processing the results
• Option 1: Results can be iterated as single items using a specialized implementation of the standard Java Iterator interface CQLQueryResultsIterator:
Iterator iter = new CQLQueryResultsIterator(results,
GridPIRClient.class.getResourceAsStream("client-config.wsdd"));
while (iter.hasNext()) {
Gene gene = (Gene) iter.next();
System.out.println(g.getName());
}
class Logical Model
domain::Gene
- id: Long- entrezGeneId: Long- name: String
Generic Data Service Client: Processing the results
• Option 2: Results can be serialized to a string (or file for future processing) :
StringWriter w = new StringWriter();
Utils.serializeObject(
results,
new QName("http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLResultSet", "CQLResultSet“),
w);
System.out.println(w.getBuffer());
•Find Protein objects for Human Breast cancer 1 (BRCA1):
<ns1:CQLQuery xmlns:ns1="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"> <ns1:Target name="edu.georgetown.pir.domain.Protein"> <ns1:Group logicRelation="AND"> <ns1:Association name="edu.georgetown.pir.domain.Gene" roleName="geneCollection"> <ns1:Attribute name="name" predicate="EQUAL_TO" value="brca1"/> </ns1:Association> <ns1:Association name="edu.georgetown.pir.domain.Organism“ roleName="organismCollection"> <ns1:Attribute name="scientificName" predicate="EQUAL_TO" value="homo sapiens"/> </ns1:Association> </ns1:Group> </ns1:Target></ns1:CQLQuery>
CQL Example: Query involving three classes
CQL Example: Query involving three classes
• Results:
Additional Information
• caGrid Data Services Wiki:
http://www.cagrid.org/mwiki/index.php?title=Data_Services
• caGrid Data Service Client API Wiki:
http://www.cagrid.org/mwiki/index.php?title=Data_Services:Client_API
• CQL Wiki:
http://www.cagrid.org/mwiki/index.php?title=Data_Services:CQL
Lesson 9: Using caGrid
Metadata Service APIs
Outline
• Overview• caGrid Metadata Service APIs
• caDSR Service API• EVS Service API • Discovery API• MetadataUtils
caGrid Data Service Metadata Overview - gridPIR
UML Modeling
Semantic Annotation using EVS concepts
caDSR Load/CDE Creation
EVS Concept Codes
CDE Public ID/version
UML Class/attribute
CDE Long Name
Metadata Services on caGrid - caDSR
caDSR Grid Service API Client
• Provides access to information in the caDSR such as semantically annotated UML model as registered in caDSR
• Has capability to generate caGrid standard metadata instances such as domain models
caDSR Grid Service API Client - Operations
• Provides access to the UML-like view of caDSR registered items:
• findAllProjects
• findPackagesInProject
• findClassesInPackage
• findAttributesInClass
• Enables clients to generate caGrid standard Data Service metadata:
• generateDomainModelForProject
• generateDomainModelForPackage
• generateDomainModelForClasses
• generateDomainModelForClassesWithExcludes
• Provides clients the ability to augment a ServiceMetadata (standard caGrid service metadata) skeleton instance with the information extracted from caDSR
• annotateServiceMetadata
caDSR Grid Service API Client – Retrieving list of projects from caDSR
//caDSR service on production caGrid
String serviceURL=http://cagrid-service.nci.nih.gov:8080/wsrf/services/cagrid/CaDSRService;
// create a caDSRServiceClient instance
CaDSRServiceClient client = new CaDSRServiceClient(serviceURL);
// get list of projects from caDSR
Project[] projects = client.findAllProjects();
//processing results
for (int i = 0; i < projects.length; i++) {
Project p = projects[i];
System.out.println(i+" Short name:"+p.getShortName()+" / Long name:"+p.getLongName());
caDSR Grid Service API Client – Retrieving list of projects from caDSR
Result:
caDSR Grid Service API Client – Retrieving classes/attributes registered for a gridPIR from caDSR
Result:
caDSR Grid Service API Client – Retrieving classes/attributes registered for a gridPIR from caDSR
// Retrieve the list of classes for a registered model UMLClassMetadata[] classArray=
client.findClassesInProject(gridPIRProject);
// Retrieve the list of attributes for a classUMLAttributeMetadata[] attributeArray=
client.findAttributesInClass(gridPIRProject,classArray[i]);
// Retrieve the value domain for an attributeValueDomain
valueDomain=client.findValueDomainForAttribute(gridPIRProject,attributeArray[j]);
Metadata Services on caGrid – Enterprise Vocabulary Services (EVS)
EVS Grid Service API Client
• Provides information on vocabularies and terms/ concepts presented by the NCI Metathesaurus/ Thesaurus
EVS Grid Service API Client - Methods
• Provides a list of programmatically accessible vocabularies• getVocabularyNames
• Provides access to concepts/terms from the vocabularies • searchDescLogicConcept
• Provides complete History for concepts; the evolution of the concept as they are created, merged, modified, split, or retired. • getHistoryRecords
• Provides access to concepts that are supported by the NCI Metathesaurus• searchMetaThesaurus• searchSourceByCode
EVS Grid Service API Client – Retrieve list of vocabularies provided by EVS
//URL for production grid EVS Grid Service
String serviceURL="http://cagrid-service.nci.nih.gov:8080/wsrf/services/cagrid/EVSGridService";
// create a EVSServiceCLient instance
EVSGridServiceClient client=new EVSGridServiceClient(serviceURL);
//retrieve list of vocabularies service provides
DescLogicConceptVocabularyName[] vocabularyNames=client.getVocabularyNames();
//list the names of vocabularies
for(int i=0;i<vocabularyNames.length;i++){ System.out.println(i+": "+vocabularyNames[i].getVocabularyName());
}
EVS Grid Service API Client - Retrieve list of vocabularies provided by EVS
Result:
EVS Grid Service API Client – Retrieve EVS concept code for a term
//Set the search criteria EVSDescLogicConceptSearchParams evsSearchParams = new
EVSDescLogicConceptSearchParams();
//searching in NCI_ThesaurusevsSearchParams.setVocabularyName("NCI_Thesaurus");
//searching the concept code for term protein evsSearchParams.setSearchTerm(“protein”);
//set maximum number of returned terms/concepts to 10 evsSearchParams.setLimit(10);
//run query against the EVS grid serviceDescLogicConcept[] descLogicConcepts = client.searchDescLogicConcept(evsSearchParams);
//process results
EVS Grid Service API Client – Retrieve EVS concept code for a term
Result:
Metadata Services on caGrid – Index Service
Discovery API Client
• Provides methods to query the Index Service and used to discover services of interest
Discovery API Client - Methods
• Searches based on service level metadata• E.g. discoverServicesByResearchCenter
• Searches based on semantic annotation • E.g. discoverDataServicesByModelConceptCode
• Searches based on operation metadata (for analytical services)• E.g. discoverServicesByOperationName
• Searches based on information model metadata (for data services)• E.g. discoverDataServicesByExposedClass
• Searched based on XPath leveraging Service Metadata XML• discoverByFilter (String xpathPredicate)
Discovery API Client – Discovering services using a keyword
//Index service on production caGrid
String serviceURL="http://cagrid-index.nci.nih.gov:8080/wsrf/services/DefaultIndexService";
// create a DiscoveryClient instance
DiscoveryClient client = new DiscoveryClient(serviceURL);
//discover services by keyword
EndpointReferenceType[] endPointReferenceArr = client.discoverServicesBySearchString(“Protein”);
//list URLs for returned services
for (int i=0; i < endPointReferenceArr.length; i++){
System.out.println("Address: "+endPointReferenceArr[i].getAddress());
}
Discovery API Client – Discovering services using a keyword
Result:
Metadata API - MetadataUtils
• Used to access and manipulate instances of service metadata
• Complements the Discovery API;• Once a service is discovered MetadataUtils’s methods
can be used to access and inspect the full metadata for the service
Metadata API – MetadataUtils Methods
• Retrieves the service metadata or domain model from the specified service.• getServiceMetadata
• Writes/reads the XML representation of the service metadata to/from the specified writer/reader:• serializeServiceMetadata• deserializeServiceMetadata
• Writes/reads the XML representation of the domain model to/from the specified writer/reader:• serializeDomainModel • deserializeDomainModel
Metadata API – MetadataUtils - Example
//discover services by concept code used to annotate the model such as C17021 (Protein) EndpointReferenceType[] endPointReferenceArr =
client.discoverServicesByModelConceptCode(“C17021”);
for (int i=0; i < endPointReferenceArr.length; i++){ //retrieve service metadata for the service
ServiceMetadata serviceMetadata=MetadataUtils.getServiceMetadata(endPointReferenceArr[i]);
//print host center informationSystem.out.println("Hosting Center:
"+serviceMetadata.getHostingResearchCenter().getResearchCenter().getDisplayName());
//retrieve domain model for the serviceDomainModel domainModel=MetadataUtils.getDomainModel(endPointReferenceArr[i]);
//print domain model nameSystem.out.println("Domain Model/Project Short Name: "+domainModel.getProjectShortName());
System.out.println("Address: "+endPointReferenceArr[i].getAddress());}
Metadata API – MetadataUtils - Example
Result:
Additional Information
• caGrid Wiki:• http://www.cagrid.org/mwiki/index.php?title=CaGrid
• caBIG™ Architecture WS caGrid Web Page:• https://
cabig.nci.nih.gov/workspaces/Architecture/caGrid/
Session III: Deploying a Data Service on caGrid and using caGrid Service APIs
Questions
Closing Remarks
Additional Resources
• caBIG™ web site: https://cabig.nci.nih.gov/
• caBIG™ gForge site: https://gforge.nci.nih.gov/
• caGrid Wiki: http://www.cagrid.org/
• caBIG™ Learning Management System:http://ncicbtraining.nci.nih.gov/TP2005/tp2000web.dll/NCICBTraining
• caCORE SDK: http://ncicb.nci.nih.gov/infrastructure/cacoresdk
• caBIG™ Compatibility Guidelines:https://gforge.nci.nih.gov/docman/index.php?group_id=233&selected_doc_group_id=1138&language_id=1
• Upcoming boot camps