workflows, taverna, and teragrid

48
Workflows, Taverna, and Teragrid Annual Meeting, 2008

Upload: questa

Post on 15-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Workflows, Taverna, and Teragrid. Annual Meeting, 2008. Agenda. Kiran Keshav Workflow Working Group Ravi Madduri Taverna and Workflows Christine Hung TeraGrid. Background - What Does This Look Like?. Data Service A. Analytical Service B. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Workflows, Taverna, and Teragrid

Workflows, Taverna, and Teragrid

Annual Meeting, 2008

Page 2: Workflows, Taverna, and Teragrid

Agenda

• Kiran Keshav• Workflow Working Group

• Ravi Madduri• Taverna and Workflows

• Christine Hung• TeraGrid

Page 3: Workflows, Taverna, and Teragrid

Background - What Does This Look Like?

• This includes run A, get results from A, input results from A into B, get results from B, etc.

Data Service A

Analytical Service B

Analytical Service B

Page 4: Workflows, Taverna, and Teragrid

Atomic Units

• In some cases, you may want to run a series of tasks as one (atomic) unit.

• Reasons:

• You need to run the same tasks in the same order multiple times.

• Each individual task is long running, and you don’t want to wait for one task to finish before you start the next.

• Tasks only make sense in the context of other tasks.

• Enter workflows:

• A sequence of operations executed as a single unit of work.

• The sequence of events can have logic:• Aresult + Bresult C

Page 5: Workflows, Taverna, and Teragrid

Example Scenario

Aresults=Run ABresults =Run B res = Aresults + Cresults =Run C(res)

Analytical Service B

Analytical Service B

Analytical Service A

Analytical Service A

Analytical Service C

Analytical Service C

Page 6: Workflows, Taverna, and Teragrid

What Are We Trying to Accomplish?

• Execute grid analytical and data services to accomplish a pre-defined task.

• Authoring of workflows should be easy.

• Submission of workflows should be easy.

Page 7: Workflows, Taverna, and Teragrid

Authoring and Executing Workflows

• How do you write a workflow?

• Authoring tool

• Who executes the workflow?

• Workflow engine

Page 8: Workflows, Taverna, and Teragrid

Authoring Tool

• Find caBIG services

• Add services of interest to the workflow

• Submit the workflow to the workflow engine

Page 9: Workflows, Taverna, and Teragrid

Workflow Engine

• Executes the workflow

• Can apply logic to the workflow

• Get the results from Service A, add these to the results from Service B, then divide by 2.

• Take this new result pass this as input to Service C.

• Store intermediate data (radiology)

Page 10: Workflows, Taverna, and Teragrid

Workflows in 2007

• Identify a workflow authoring tool • Taverna

• Implement a scientific workflow• Identify significant barriers to workflow

interoperability

Page 11: Workflows, Taverna, and Teragrid

Workflows in 2008

• Use case based:• Identify services and corresponding domains of

caGrid data and analytical services• Implement 2-3 workflows

• Workflow best practices• Tooling enhancements

• Taverna enhancements• Compatibility review enhancements, etc.

Page 12: Workflows, Taverna, and Teragrid

Administrative

• Workflow Working Group

• Kiran Keshav

[email protected]

• 1st and 3rd Monday of every month at 11am EST

Page 13: Workflows, Taverna, and Teragrid

Taverna Overview

• State of Workflow in caGrid 1.0• BPEL• Architecture overview of Workflow services• Experience

• ICR Working Group• Workflows construction should be easy• Investigation and choosing Taverna

• Taverna• Overview• Early work• Advantages • Semantic Scavenger• Future work

Page 14: Workflows, Taverna, and Teragrid

caGrid background Where does workflow fit in?

Orchestrates caGrid-build services

caGrid security supported

caGrid clients build/run workflows

Integrates Semanticallyannotated data

Page 15: Workflows, Taverna, and Teragrid

Workflow in caGridHow does caGrid implement workflow?

• Workflow Factor Service (WFS)

• Grid service to create a new workflow • Workflow Service

• Grid service to access your created workflow• Start, pause, resume, cancel, getWorkflowOutput

• caGrid integration

• Invoke grid services• Security (communication, message, conversation)

• caGrid implementation

• Leverages the ActiveBPEL workflow engine• Workflows exposed as web services in

ActiveBPEL, wrapped as grid services • Wraps the ActiveBPEL Admin Service

• WFS submits a BPR (workflow package)• Accesses the created stateful web service

Page 16: Workflows, Taverna, and Teragrid

Workflow backgroundBPEL example

<receive createInstance="yes" operation="startWorkFlow“ partnerLink="WorkFlowClientPartnerLinkType“ portType="ns2:startWorkFlowPortType“ variable="workFlowInputMessage" /> <assign>

<copy>  <from expression=""1"" /> <to variable="indexCounterDuke" />   </copy> <copy>  <from part="parameters" query="/ns1:WorkFlowInputType/query" variable="workFlowInputMessage" />   <to part="parameters" query="/ns1:query" variable="queryInputMessage" />   </copy></assign>

<invoke inputVariable="queryInputMessage" operation="query“ outputVariable="queryOutputMessage" partnerLink="RproteomicsDataLinkType" portType="ns1:RPDataPortType" /> <assign> <copy>  <from expression="count(bpws:getVariableData('queryOutputMessage', 'parameters', '/ns1:queryResponse')/response/ns4:CQLQueryResult) div 2" />   <to variable="countDuke" />   </copy></assign>

Page 17: Workflows, Taverna, and Teragrid

Investigation

• Conducted extensive tool review of existing tools: http://gforge.nci.nih.gov/docman/view.php/332/7509/icr_workflow_tool_review_2007.doc

• Prioritized feature list: http://gforge.nci.nih.gov/docman/view.php/332/9690/icr_workflow_prioritized_feature_list_2007.xls

Page 18: Workflows, Taverna, and Teragrid

Results: Authoring Tool - Taverna

• Taverna:

• http://gforge.nci.nih.gov/docman/view.php/332/8007/caGrid_workflow_taverna_2007.ppt

• http://gforge.nci.nih.gov/docman/view.php/332/8278/taverna_tom_oinn.doc

• Ravi Madduri (Cagrid workflow point of contact) attended Taverna workshop

• Prototype of Taverna to discover and invoke caGrid services

Page 19: Workflows, Taverna, and Teragrid

Taverna’s Features

• Explicit modeling of data flow. • Implicit iteration.

• Input/Output metadata. Taverna provides a plug-in called Feta, a semantic discovery tool, to support service discovery using the function description as well as input and output metadata, semantically described using the myGrid ontology. This metadata mechanism provides a feasible integration point with caGrid metadata and discovery services.

Page 20: Workflows, Taverna, and Teragrid

Taverna’s Features (Cont..)

• Beanshell scripting and XML processing support. A custom processor that executes a beanshell (a flavor of dynamic Java) script can be defined inside a Taverna workflow. Beanshell scripts can be shared with others by wrapping them in a workflow and publishing the workflow on the Web. Taverna also provides a set of local processors supporting the processing of XML. By this means Taverna can leverage the processing power of Java and XML.

• Taverna is a nice tool which is easy to use for the non-IT specialists, and the Taverna workbench provides both build-time and run-time support for workflow modeling, debugging, tracing and provenance. This feature of Taverna is an obvious advantage over other solutions like BPEL, DAGMan, etc, because these languages and tools are designed for IT specialists and the domain users in caGrid have difficulties in using them. http://www.beanshell.org/

Page 21: Workflows, Taverna, and Teragrid

GT4 Plug-in for Taverna

Taverna Workbench

GT4-Processor Plug-in

GT4Scavenger

GT4Processor

GT4InvocationTask

1. Get service list

2. Get operation (processor)

3. Invoke processor

• Processor: The concept of processor in a Taverna workflow is the same as the concept of task in a common workflow.

• ProcessorTaskWorker: Each processor interface corresponds to a ProcessorTaskWorker interface. ProcessorTaskWorker interface defines the action that is to be performed at runtime when workflow execution reaches to that processor.

• Scavenger: In Taverna, a scavenger is a logic collection of a set of processors. For example, a WSDL scavenger collects all the processors each of which corresponds to an operation defined inside that WSDL.

Page 22: Workflows, Taverna, and Teragrid

GT4 Plug-in for Taverna (cont’d)

GT4 Scavenger

A sample caGrid workflow

Page 23: Workflows, Taverna, and Teragrid

GT4 Plug-in for Taverna (cont’d)

GT4 Scavenger with semantic/metadata based caGrid service query

A sample caGrid workflow

Page 24: Workflows, Taverna, and Teragrid

GT4 Plug-in for Taverna (cont’d)

Execution trace of the sample caGrid workflow

Page 25: Workflows, Taverna, and Teragrid

Demo

Page 26: Workflows, Taverna, and Teragrid

Future Work

• Enable Taverna to invoke Grid Services that use Resource pattern

• Enable Taverna to invoke Secure Grid Services

• Remote Execution of Workflows

• Integration of Taverna execution with caGrid workflow services

Page 27: Workflows, Taverna, and Teragrid

Teragrid

• Overview (5 min)

• Introduction on TeraGrid Workgroup  

• Background on geWorkbench and geWorkbench/caGrid/TeraGrid Project

• Technology (10 min)

• Steps to establishing geWorkbench/caGrid/TeraGrid Interface

• Use of caGrid Security (GTS, Grid Grouper, Dorian, CDS)

• Workflow and communications between services

• Demo (5 min)

• Discussion (5 min)

Page 28: Workflows, Taverna, and Teragrid

Team Members

• geWorkbench (Columbia University)• Christine Hung

• Kiran Keshav

• caGrid (Ohio State University)• Scott Oster

• Stephen Langella

• caGrid/TeraGrid (Argonne National Laboratory)• Ravi Madduri

• TeraGrid (Argonne National Laboratory)• Stuart Martin

• TeraGrid (Texas Advanced Computing Center)• Stephen Mock

• Management• Aris Floratos (Columbia University)

• Krishnakant Shanbhag (Argonne National Laboratory)

• Michael Keller (Booz Allen Hamilton)

• Patrick McConnell (Duke University)

• Nancy Wilkins-Diehr (San Diego Supercomputer Center)

Page 29: Workflows, Taverna, and Teragrid

Overview

• Primary problem to address

• Lack of infrastructure and operating procedures to support high performance computing needs of caBIG

• Overarching goals

• Regular caGrid services will run as caGrid/TeraGrid gateways services

• Virtualize TeraGrid resources (both compute and storage)

• Approach: labor divided between domain and technical tasks

• Use cases will be drafted to identify the needs of the community

• Existing TeraGrid Gateway projects will be surveyed to identify lessons learned and potential technology for reuse

• Demonstrate approach through working prototype

• Document best practices and develop “cookbook”

Page 30: Workflows, Taverna, and Teragrid

TeraGrid Overview

• Characteristics:• > 250 teraflops of

computing capability • >30 petabytes of online

and archival data storage• high-performance

networks

• Mechanics:• Prospective users request

allocation of HPC resources to a review committee

• Allocations are granted, and credentials are issued

• Jobs are run with credentials and resource usage is billed to the allocation

“TeraGrid is an open scientific discovery infrastructure combining leadership class

resources at nine partner sites to create an integrated, persistent computational resource.”

Page 31: Workflows, Taverna, and Teragrid

caGrid Gateway Service Overview

• caGrid service running in the caBIG™ environment which acts as a bridge or proxy to TeraGrid resources for a subset of caBIG™ users

• should meet Gold compatibility requirements

• Created for a specific scientific scenario:

• abstracts away the details of leveraging TeraGrid for performance intensive operations

• uses domain-specific operations and data types

• has access to TeraGrid allocation

• Alleviates the need for caBIG™ users to:

• understand the complexities of TeraGrid (or HPC systems)

• obtain TeraGrid accounts/allocations

Page 32: Workflows, Taverna, and Teragrid

geWorkbench – a Platform for Integrated Genomics

• Integrated genomics analysis application• Support for gene expression data, sequences, pathways, and

structure• 50+ visualization and analysis modules• Access to local and remote data sources and analytical services• Integration with biological annotation sources

• Development Platform

• Open source

• Java based

• Component architecture

• Facilitates customization

Page 33: Workflows, Taverna, and Teragrid

geWorkbench – a Platform for Integrated Genomics

• Large collection of components

• Data parsers: Affy MAS/GCOS (txt and CEL), Genepix, RMA, FASTA, caArray, PDB.

• Data Management: Project folders, marker/sequence/array groups.

• Visualization: Dendrograms, color mosaics, scatter plots, SOM clusters, BLAST results, dot matrices.

• Analyses: Hierarchical clustering, t-test, SVM, ARACNE, MEDUSA. MatrixREDUCE.

• 3rd Party components: Cytoscape, GoMiner, GeneWays, GenePattern, MEV.

• Complete list at www.geworkbench.org.

Page 34: Workflows, Taverna, and Teragrid

geWorkbench – a Platform for Integrated Genomics

http://www.geworkbench.org/

Page 35: Workflows, Taverna, and Teragrid

geWorkbench – Graphical User Interface

ProjectsArea

SelectionArea

VisualizationArea

CommandArea

Page 36: Workflows, Taverna, and Teragrid

Clustering

Page 37: Workflows, Taverna, and Teragrid

caGrid Service

Page 38: Workflows, Taverna, and Teragrid

TeraGrid Aware caGrid Service

Page 39: Workflows, Taverna, and Teragrid

Creating the Gateway Service

• Manually stage the binary (jar file) on TeraGrid• Takes in .ser files as input• Produces results also in a .ser file

• Used the gRAVi plugin for Introduce to create the gateway service

• http://dev.globus.org/wiki/Incubator/gRAVI• Gateway gridFTPs input data and parameters from

geWorkbench to TeraGrid• geWorkbench passes input to the gateway in

geWorkbench’s native format (caDSR compliant)• Gateway serializes the input before gridFTPing to

TeraGrid• Gateway invokes the staged binary• Gateway gridFTPs results back to geWorkbench

• Gateway deserializes the result file• Gateway returns results to geWorkbench in its native

format• Gateway service is a secured caGrid service which in

turn invokes TeraGrid with a caBIG community account

Page 40: Workflows, Taverna, and Teragrid

Steps to establishing geWorkbench/caGrid/TeraGrid Interface

Page 41: Workflows, Taverna, and Teragrid

caGrid Security (GTS, Grid Grouper, Dorian, CDS)

http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

Page 42: Workflows, Taverna, and Teragrid

Workflow and Communications Between Services

Page 43: Workflows, Taverna, and Teragrid

Special Thanks

• caGrid (Security Services)

• Scott Oster

• Stephen Langella

• caGrid(gRAVi Plugin, Gateway Service)

• Ravi Madduri

• TeraGrid 08 for awarding us the Technology Track Best Paper Award

Page 44: Workflows, Taverna, and Teragrid

Demo and Discussions

Page 45: Workflows, Taverna, and Teragrid

Steps to establishing geWorkbench/caGrid/TeraGrid Interface

Page 46: Workflows, Taverna, and Teragrid

caGrid Security (GTS, Grid Grouper, Dorian, CDS)

http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

Page 47: Workflows, Taverna, and Teragrid

caGrid Security (GTS, Grid Grouper, Dorian, CDS)

http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

Page 48: Workflows, Taverna, and Teragrid

caGrid Security (GTS, Grid Grouper, Dorian, CDS)