wrapping third- party analytical services for cabig taverna-cabig project stian soiland-reyes...

Click here to load reader

Post on 26-Dec-2015

219 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • Wrapping third- party analytical services for caBIG Taverna-caBIG project Stian Soiland-Reyes Alexandra Nenadic University of Manchester, UK http://www.mygrid.org.uk/dev/wiki/display/caGrid September 2009
  • Slide 2
  • Agenda Project overview Primary goals Service selection Why these services? Why wrapping? Wrapping benefits? How we did it How does it work Architecture UML models Example client and outputs Project info
  • Slide 3
  • Project overview Taverna-caBIG cooperation on several levels: 1.caGrid-enabling third party analytical services 2.Taverna Workbench enhancements for: Semantic search of caBIG services Invocation of caBIG services from Taverna workflows Support for secure caBIG services (interacting with GAARDS infrastructure prior to service invocation) This presentation addresses caGrid-enablement of third party analytical services (wrapping + achieving silver level of compatibility)
  • Slide 4
  • Primary goals Identify two publicly available analytical services currently accessible through Taverna Wrap, i.e. caGrid-enable, the services: Design the wrapper services in UML and semantically describe/annotate them using caBIGs tooling (EA + SIW) Wrap/implement and deploy them as standard caBIG services on caGrid (Introduce)
  • Slide 5
  • Analytical service selection Services have been selected in collaboration with caBIG Workflow Working Group, lead by Juli Klemm Winners: NCBI BLAST service hosted by EBI (European Bioinformatics Institute) Protein and nucleotide sequence similarity search service InterProScan service hosted by EBI Scans a range of protein signatures in InterPro warehouse against a protein sequence
  • Slide 6
  • Why these services? Freely available Highly reliable, hosted by EBI Widely used by the scientific community Can be combined with existing caBIG tools in biologically meaningful workflows caBIO, GridPIR, etc.
  • Slide 7
  • NCBI BLAST service A popular sequence similarity search tool using local sequence alignment Supports sequences of proteins, DNA, RNA Searches sequences in a whole range of databases: UNIPROT, NCBI, EMBL, etc. SOAP web service hosted by EMBL-EBI
  • Slide 8
  • InterProScan service InterPro warehouse integrates various databases of protein domains and functional sites Searches the InterPro warehouse using protein signature recognition methods, e.g. blastprodom, gene3d, hmmpfam, hmmsmart, scanregexp, profilescan.. SOAP web service hosted by EMBL-EBI
  • Slide 9
  • Why wrapping the services? Original services use various data formats for inputs/outputs (although xml) Does not conform to the caBIG compatibility rules Output format was not even compatible with input format The requirement for the wrapped service: Translate the input data from caBIG-compatible xml to xml format understood by analytical services Convert the received results back to a format understood by caBIG clients
  • Slide 10 http://www.ebi.ac.uk/schema/ApplicationResult.xsd... 763 298 8e-80 100 MRCSISLVLGLLALEVALARNLQ EHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRS CKTPVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDS FSEDTECINCQTNEECAQNDMCCPSSC GRSCKTPVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ MRCSISLVLGLLALEVALARNLQ EHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKT PVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ">
  • NCBI BLAST Output (Untranslated) http://www.ebi.ac.uk/schema/ApplicationResult.xsd... 763 298 8e-80 100 MRCSISLVLGLLALEVALARNLQ EHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRS CKTPVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDS FSEDTECINCQTNEECAQNDMCCPSSC GRSCKTPVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ MRCSISLVLGLLALEVALARNLQ EHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKT PVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ
  • Slide 11 Molecular Function protease inhibitor activity ...">
  • InterProScan Output (Untranslated) http://www.ebi.ac.uk/schemahttp://www.w3.org/2001/XMLSchema- instancehttp://www.ebi.ac.uk/schema/InterProScanResult.xsd... Molecular Function protease inhibitor activity ...
  • Slide 12
  • Motivational workflow This Taverna workflow uses both Blast and InterProScan which can be replaced with wrapped versions of the services Nested workflow that internally invokes InterProScan and checks job status before fetching results Nested workflow that internally invokes NCBI BLAST and checks job status before fetching results Web Service that looks up protein sequences in a database. Will be replaced with the caBIG service caBIO. Shim that splits a string into a list of Fasta strings http://www.myexperiment.org/workflows/230
  • Slide 13
  • Benefits of wrapped services Making analytical services from other service providers available to caBIG users Wrapped services are caBIG Silver Level compatible: Ensures shared meaning and interoperability between these and other caBIG services Data can be exchanged and understood between services
  • Slide 14
  • How we wrapped the services (1) Making the services silver encompassed: 1.Modelled data in UML using Enterprise Architect (EA) 2.Exported model to XMI from EA 3.Using the SIW tool, the XMI file has been semantically annotated using caBIGs vocabularies/ontologies 4.Common Data Elements (CDEs) have been generated for services inputs/outputs, reviewed by the curation team and loaded into caDSR production database 5.Annotated XMI loaded back to the EA to update UML
  • Slide 15
  • How we wrapped the services (2) 6.From the EA, the UML model was exported to a set of xsd files 7.The xsd files have been imported into the Introduce tool, which was used to generate the skeleton APIs of the wrapped services 8.Axis 2 was used to invoke the original InterPro and NCBI BLAST services from the wrapper services 9.The wrapped services are asynchronous; job status and results are available as WSRF resource properties and can be subscribed to using WS-Notifications. There is also a synchronous version where polling is done from the client side.
  • Slide 16
  • How it works Client: using client library, calls wrapped WSRF web service Service: convert input to original format, submit converted input to original service, return a Job Resource that references the jobID Client: Subscribe to notifications from job resource Job Monitor (server): For all jobs, check status using jobID, notify client on completion Client library: Request output data Job Resource: Convert data from original format, Return converted data to client
  • Slide 17
  • Architecture of wrapped services
  • Slide 18
  • UML model of wrapped NCBI BLAST
  • Slide 19
  • UML model of wrapped InterProScan
  • Slide 20
  • Reused several data elements Green classes in diagram reused from IRWG Sequence, NucleicAcidSequence DatabaseCrossReference GeneGenomicIdentifier et al. Red UML classes in diagram reused from PIR ProteinSequence Partial reuse of attributes in ProteinDomainLocation
  • Slide 21
  • Example client NCBI Blast NCBIBlastClient client = new NCBIBlastClient(url); NCBIBlastInput input = new NCBIBlastInput(); ProteinSequenceRepresentation sequenceRepresentation = new ProteinSequenceRepresentation(); ProteinGenomicIdentifier proteinId = new ProteinGenomicIdentifier(); proteinId.setDataSourceName("uniprot"); proteinId.setCrossReferenceId("wap_rat"); sequenceRepresentation.setProteinId(proteinId); input.setSequenceRepresentation(sequenceRepresentation); NCBIBlastInputParameters params = new NCBIBlastInputParameters(); params.setEmail("mannen@soiland-reyes.com"); params.setQueryDatabase(new MolecularSequenceDatabase("", "uniprot")); params.setBlastProgram(BLASTProgram.BLASTP); input.setNcbiBLASTInputParameters(params); NCBIBlastClientUtils clientUtils = new NCBIBlastClientUtils(client); NCBIBlastOutput ncbiBlastOut = clientUtils.ncbiBlastSync(input, TIMEOUT_SECONDS * 1000); SequenceSimilarity[] similarities = ncbiBlastOut.getSequenceSimilarities(); for (SequenceSimilarity similarity : similarities) { for (Alignment align : similarity.getAlignments()) { SequenceFragment querySequenceFragment = align.getQuerySequenceFragment(); System.out.print("Q: " + querySequenceFragment.getSequence().getValue()); (..) data id
  • Slide 22
  • Example SOAP input NCBI Blast BLASTP mannen@soiland-reyes.com uniprot wap_rat uniprot data reused id
  • Slide 23
  • Example client output NCBI Blast Running NCBI Blast client uk.org.mygrid.cagrid.servicewrapper.service.ncbiblast.example. ExampleNCBIBlastClient -url -- Using default service at http://cagrid.taverna.org.uk:8080/wsrf/services/cagrid/NCBIBlast Calling NCBI Blast synchronously (Set -DGLOBUS_LOCATION=/Users/bob/cagrid/ws-core-4.0.3 to do asynchronous client calls) Found 50 similarities Similarity in uniprot:WAP_RAT (sequence length:137) 1 alignments Alignment score=763.0 bits=298.0 eValue=1.0E-79 Q: MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIA AGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ 1-137 P: MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIA AGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ M: MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIA AGPCPKDNPCSIDSDCS