taverna summary

28
Apache Taverna NERSC Workflow Day, Berkeley Lab, California 2015-02-20 http://taverna.incubator.apache.org/ Stian Soiland-Reyes @soilandreyes [email protected] http://orcid.org/0000-0001-9842-9718 Donal Fellows @donalfellows [email protected] http://orcid.org/0000-0002-9091-5938 This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Upload: mygrid-team

Post on 17-Jul-2015

128 views

Category:

Science


5 download

TRANSCRIPT

Apache Taverna

NERSC Workflow Day, Berkeley Lab, California 2015-02-20

http://taverna.incubator.apache.org/

Stian Soiland-Reyes@soilandreyes

[email protected]://orcid.org/0000-0001-9842-9718

Donal Fellows@donalfellows

[email protected]://orcid.org/0000-0002-9091-5938

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Taverna Workflow Ecosystem

• Workflow Language — SCUFL2 (and t2flow)• Workflow Engine — Taverna• Used in…

– Taverna Command Line Tool– Taverna Server– Taverna Workbench

• Allied services– myExperiment, workflow repository– Service Catalographer, service catalog software

• Instantiated as BioCatalogue, BiodiversityCatalogue, …

NERSC Workflow Day 2

UI Plugins

Map of the Taverna Ecosystem

UI Plugins

TavernaWorkbench UI Plugins

TavernaCommand Line Tool

UI PluginsTaverna APIs

UI PluginsUI PsTavernaEngine

ActivityPlugins

TavernaCore

TavernaServer

UI Plugins

TavernaPlayer

Ru

by clien

t

REST API

SOAP API

UI Plugins

TavernaOnline

UI Plugins

TavernaLite

Components Other Servers

Workflow Repository

Service Catalogs

many services…

Application-Specific Portals

3

Taverna In Use

Users, Scientific Areas, Projects

NERSC Workflow Day 4

Taverna Users Worldwide

NERSC Workflow Day 5

Taverna Uses — Scientific Areas

• Biodiversity — BioVeL project

• Digital Preservation — SCAPE project

• Astronomy — AstroTaverna product

• Solar Wind Physics — HELIO project

• In silico Medicine — VPH-Share project

NERSC Workflow Day 6

Biodiversity: BioVeL

• Virtual e-Laboratory for Biodiversity– Service and knowledge commons– Supporting biodiversity research– Integrating with third-party

applications• For example, iPython Notebook

• Portal for running production-grade workflows on users’ data– Powered by Taverna Server– Integration with major biodiversity

databases– Interaction support made to

support

NERSC Workflow Day 7

Digital Preservation: SCAPE

• Automated petabyte-scale digital collection maintenance– Century of scanned

newspapers– Whole national radio/TV

output– Major Web archives

• Processing engine powered by Taverna– Lift simple workflows to work

at collection level– Metadata management– Semantic annotations and

components for guided workflow construction

NERSC Workflow Day 8

Astronomy: AstroTaverna

• Taverna plugin: IVOA (Virtual Observatory)– Astronomy data services and tools

• Example workflow:– List of galaxy names → Look up VO

properties → Find similar/near galaxies →Add bibliography

• VOTable support (select/merge/split/..) – Later adapted by bioinformatics community

• Projects: CANUBE, Wf4Ever, VAMDC, ER-Flow

• Taverna Workbench used on the desktop:– IVOA service registry user interface– Integrated with standalone astronomy tools

(SAMPS protocol): Aladin, TOPCAT

NERSC Workflow Day 9

Astrophysics: HELIO

• Virtual laboratory for Solar Wind Science– Observation catalogs

– Processing

– Data integration platform

• Taverna is workflow glue– Taverna Server created to

support

– Workflows manage catalog access

– Workflows manage data processing

NERSC Workflow Day 10

Medicine and Physiology: VPH-Share

• Platform for computer-aided medicine– Support for diagnosis and

treatment prognosis• Osteoarthritis, Dementia, Liver

disease, Cardiovascular disease

– Driven by specially-configured cloud instances

• Taverna is control and data management layer– Coordinates processing within

cloud instances– User communication with

cloud instances via Taverna interactions• Including complex 3D tasks

NERSC Workflow Day 11

Inside the Taverna Ecosystem

Introduction to the Taverna Workflow Language and its Executors

NERSC Workflow Day 12

The Basics of a Taverna Workflow

Input Ports (data in)

SOAP processor (web service call)

XML handling processors

Data Links (connect processors)

Output Ports (data out)

13

Get concept suggestions from termEelke van der Horsthttp://www.myexperiment.org/workflows/4590.html

NERSC Workflow Day

Taverna Workflows

• Describe how data flows between processing nodes– Control dependencies also supported

• Processing service nodes of various kinds– Invoke programs (local or on cluster or grid or …)– Call services (SOAP or REST)– Read from and write to databases– Transfer data– Interact with the user

• Built-in parallelism and iteration– Processes lists of data in parallel

• Large data usually handled by reference– Avoids having to transfer it where not necessary

NERSC Workflow Day 14

Taverna Workflows can get complex…

NERSC Workflow Day 15

BioVeL Population Model Construction and AnalysisMaria Paula Balcázar-Vargas, Jonathan Giddy and Gerard Oostermeijerhttp://www.myexperiment.org/workflows/3684.html

Managing Workflow Complexity

• Subworkflows– Put smaller workflows within larger ones

– Like using a user-defined function in a programming language

– Can hide contents of subworkflow

• Components– “Black box” (but implemented with subworkflow)

– Semantically-annotated; described behaviour

– Like using a library in a programming language

NERSC Workflow Day 16

Taverna Engine

• Executes (“enacts”) Taverna Workflows

• Pushes data through system in parallel– Subject to limits described in workflow

• Processor nodes invoked when their databecomes available– Turn inputs into outputs

• Captures detailed trace of what happened (“provenance”)– Follows W3C PROV specification

NERSC Workflow Day 17

Taverna Command Line Tool

• Simple wrapper round Taverna Workflow Engine

• Inputs as simple files

• Outputs as directory structure

• Provenance packaged in Research Object

– ZIP Archive

– Inputs, Outputs, Intermediate values

– Workflow, Provenance, Overall metadata

NERSC Workflow Day 18

Taverna Server

• Extends Workflow Engine to work for multiple simultaneous users

– Isolates workflows from each other

– Allows asynchronous usage

– Manages resources

– Clients can be in any language, not just Java

• Designed to sit behind a Portal

– User interfaces are domain-specific

NERSC Workflow Day 19

Taverna Server Architecture

20

Tomcat Container+ CXF Framework

Taverna Server Webapp

Common System Model

Per

Use

r Fi

le M

anag

er

Web Portal

Ruby Client

Per-

Ru

n T

aver

na

Wo

rkfl

ow

En

gin

e

Processing Service

Catalog Services

Storage Services

Tave

rna

Wo

rkb

ench

(f

ort

hco

min

g)

Deployment Host

Common Management

Model

SelectedNotificationEndpoints

ManagementInterface

(separate auth)NERSC Workflow Day

Taverna Workbench

• IDE for Taverna Workflows

• Designworkflows

• Run workflows

• Analyzeworkflows

• Access workflow repository

NERSC Workflow Day 21

Taverna OnlineWeb IDE for Taverna

NERSC Workflow Day 22

The Future of Taverna

Apache Taverna and Future Releases

NERSC Workflow Day 23

• Non-profit organization, forming a community of open-source software projects.

• Strong emphasis on openness, collaborationand a consensus-based development process.

• Examples: – Apache HTTP Server, Tomcat, Maven, Hadoop,

OpenOffice, Subversion

NERSC Workflow Day 24

Why Apache Taverna?

• Open development: Everything on mailing list

• Engagement: Encourage developer involvement – not just making plugins

• Independence: Apache Taverna is an independent project – Not a “Manchester thing”

• Shared ownership: equal participation

• Sustainability: self-managed community

NERSC Workflow Day 25

Apache IncubatorGradually becoming an Apache project

• Intellectual Property assigned to ASF

– License changed to Apache License 2.0

• Infrastructure change – everything at *.apache.org

• Community building – growing developer base

• Mentoring on the “Apache Way” by volunteers from other Apache projects

NERSC Workflow Day 26

Taverna Releases

• Current stable release: Taverna 2.5– Command Line (2.5.1), Server (2.5.4), Workbench (2.5.1)

• http://www.taverna.org.uk/download/

• Taverna 3 Release plan:– Apache Taverna Language

• API for workflow definitions

– Apache Taverna Engine & Command Line• Can also run workflows from Taverna 2 Workbench

– Apache Taverna Server

– Apache Taverna Workbench

NERSC Workflow Day 27

Try Taverna!

• Get Taverna:– http://taverna.org.uk/download/

• Documentation:– http://www.taverna.org.uk/documentation/taver

na-2-x/

• Code:– http://taverna.incubator.apache.org/code/

• Getting involved:– http://taverna.incubator.apache.org/community/

NERSC Workflow Day 28