taverna, myexperiment and biocatalogue: workflow tools for informatics integration dr katy...

44
Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Upload: cedric-bangs

Post on 01-Apr-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration

Dr Katy Wolstencroft

School of Computer Science

University of Manchester

Page 2: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

• Interoperability, Integration and Collaboration

• Access to distributed and local resources

• Iteration over data sets• Automation of data flow• Agile software development• Extensible• Experimental protocols• Part of the myGrid toolkit

Taverna Workflows

Page 3: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

What is myGrid?An e-Science Collaboration Since 2001

• Software ● Services ● Content ● Skills ● Community

• Manchester, Southampton, Oxford and the EMBL-EBI+ an alliance of intl. contributing projects and partners

• Sustainable production level quality– Open Middleware Infrastructure Institute UK– Software Sustainability Institute– Mixture of developers, bioinformaticians and researchers

• Open source development and content LGPL or BSD

Page 4: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Connecting Things Together

• Data Resources– Genome databases– Kinetic/metabolite data

• Analysis tools– Sequence alignment– Similarity searching– Pattern matching

• Knowledge Resources– Ontologies– Controlled vocabularies

Page 5: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Create and run workflows

Share, discover and reuse workflows

Manage the metadata needed and generated

RDF, OWL

Discover and reuse services

Feta

A Collection of Components

Page 6: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

What is a Workflow

• Set of services (web services, RESTful, local scripts, other workflows)

• Set of data links between services - “put output X from service A as input Y to service B”– If needed: List handling, control links

• This can be called a data-oriented workflows (dataflow)

– Say where you want the data to flow instead of what you want to do

– Compare with more procedural workflow languages like BPEL

• Beneficial way of thinking for much data-driven scientific research

Page 7: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

KeplerTriana

BPEL

Ptolemy II

Taverna

Page 8: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Workflow diagram

Tree view of workflow structure

Tree view of workflow structure

Available services

Taverna

Open source and extensible

Page 9: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Taverna Gui and Enactor

Taverna Remote Execution service

T-REX

Graphical WorkbenchDrag and drop interface

Plug-in architectureNested Workflows

Workflow EnactorLocal and remote enactorImplicit iteration over data collectionsAutomation of data flowLogging and data provenance tracking

Page 10: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Taverna http://www.taverna.org.uk

Software Release • Taverna first released 2004. • Current versions 1.7.2 and Taverna 2.1.2• Currently 1500 + users per month, 350+ organizations, ~40 countries,

80000+ downloads across versions

Availability• Freely available, open source LGPL• Windows, Mac OS, and Linux

Resources• http://www.taverna.org.uk, http://www.mygrid.org.uk• User and developer workshops, documentation, email help desk• Collaborations with numerous groups including NCI’s cancer biomedical

informatics grid (caBIG), EMBL-EBI, NCBI, Concept Web Alliance, Bio2RDF

Software ● Services ● Content ● Skills ● Community ●

Page 11: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

What types of service?

• WSDL Web Services• BioMart • R-processor• BioMoby• Soaplab• Grid Services• Local Java services• Beanshell• Workflows• Coming soon.....New REST support

Page 12: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Who Provides the Services?

• Open domain services and resources• Taverna accesses 3500+ services (11,874 operations)• Third party – we don’t own them – we didn’t build them• All the major providers

– NCBI, DDBJ, EBI …• Enforce NO common data model.

• Quality Web Services considered desirable

Page 13: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

What do Scientists use Taverna for?

Astronomy Music

Meteorology

Social Science

Cheminformatics

Page 14: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

UK Institutes

Systems Biology

International Institutes

International

Networks

Universities

ProjectsLots of Universities

Tav

erna

Ado

ptio

n

Page 15: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Hypothesis Construction and Explanation from the Literaturemy BioAID, Vl-e

Manipulation of SBML models in workflows

PharmacogenomicsAssociation study of Nevirapine-induced skin rash in Thai Population

Data WarehousingtGRAP Database

Rescue

Page 16: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Genome-wide SNP Analysis

• Analysis over compute clusters• Automate annotation of results• Mine annotation data for patterns

[Hoyle]

Shared Genomics

Page 17: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Taverna Grid Use Cases

– KnowArc – The Grid-enabled Know-how Sharing Technology Based on ARC Services and Open Standards

– caGrid – US Cancer Research project– Moteur – A medical imaging project running on EGEE

Page 18: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

MicroArray from

tumor tissue

Microarray

preprocessing

Lymphoma

prediction

Lymphoma Prediction Workflow

Wei Tan Univ. Chicago

Ack. Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI)Jared Nedzel (MIT)

caArray

GenePattern

Use gene-expression

patterns associated with two lymphoma types to predict the type of an

unknown sample.

Page 19: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

caGrid Plugin for Taverna

• Taverna support for GAARDS-secured caGrid services

• Wrap existing 3rd party services (that are used by existing Taverna users) for caGrid and annotate them to match compatibility guidelines

• Enables discovery of services in caGrid service registry

Lymphoma type prediction workflow

Page 20: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Genotype Phenotype Studies

• Mouse whipworm infection - parasite model of the human parasite - Trichuris trichuria

Understanding Phenotype• Comparing resistant vs susceptible strains – Microarrays

Understanding Genotype• Mapping quantitative traits – Classical genetics QTL

Joanne Pennock, Richard GrencisUniversity of Manchester

Page 21: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Workflow Results

• Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite.

• Manual experimentation: Two year study of candidate genes, processes unidentified

Joanne Pennock, Richard GrencisUniversity of Manchester

Page 22: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

• Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite.

• Manual experimentation: Two year study of candidate genes, processes unidentified

• JO IS A LAB BIOLOGIST

• JO HAS NEVER BUILT A WORKFLOW

Joanne Pennock, Richard GrencisUniversity of Manchester

Workflow Results

Page 23: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Understanding Phenotype

• Comparing resistant vs susceptible strains – Microarrays

Understanding Genotype

• Mapping quantitative traits – Classical genetics QTL

Integrated Microarray data, genomic sequences, pathway data, literature mining.

Trypanosomiasis Study

Identified a pathway for which its correlating gene (Daxx) is believed to

play a role in trypanosomiasis resistance

Paul Fisher, et al Nucleic Acids Research, 2007, 35(16)

http://www.youtube.com/watch?v=x83pzMMw7lkhttp://www.youtube.com/watch?v=Y6_Kz5L010g

Page 24: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester
Page 25: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Just Enough Sharing….

• myExperiment can provide a central location for workflows from one community/group

• myExperiment allows you to say– Who can look at your workflow– Who can download your workflow– Who can modify your workflow– Who can run your workflow

Page 26: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

The most important aspect of myExperiment - Designed by scientistsThe most important aspect of myExperiment - Designed by scientists

Ownership and Attribution

Page 27: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

• Packs allow you to collect different items together, like you might with a "wish list" or "shopping basket"

• You can collect internal things (such as workflows, files and even other packs) as well as link to things outside myExperiment

• Your packs can then be shared, tagged, discovered and discussed easily on myExperiment

Packs

Page 28: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Bringing myExperiment to the Taverna User

myExperiment Plugin in Taverna

Page 29: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Running Workflows Through myExperimentTaverna Remote Execution (T-REX)

Page 30: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX myexp: <http://rdf.myexperiment.org/ontology#>PREFIX sioc: <http://rdfs.org/sioc/ns#>select ?friend1 ?friend2 ?acceptedat where {?z rdf:type<http://rdf.myexperiment.org/ontology#Friendship> . ?z myexp:has-requester?x .?x sioc:name ?friend1 . ?z myexp:has-accepter ?y . ?y sioc:name ?friend2 .?z myexp:accepted-at ?acceptedat }

All accepted Friendships including accepted-at time

Semantically-Interlinked Online Communities

Page 31: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Service Discovery

There are thousands of distributed services. How do we find an appropriate one?

• We need to annotate services by their functions (and not their names!)

• The services might be distributed, but a registry of service descriptions can be central and queried

Page 32: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

BioCataloguewww.biocatalogue.org

• A “Web 2.0” catalogue for sharing, discovering and monitoring web services for the Life Sciences.

• Community and expert curation• Community and provider

contribution• Launched mid 2009. • Currently: 370+ members, 1700+

services, 11,870+ operations• 110+ providers, 110+ different

countries

REST APIsLinked Open DataSoftware Open source BSD

Software ● Services ● Content ● Skills ● Community ●

Page 33: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Data and Provenance

• Workflows can generate vast amount of data - how can we manage and track it?

• We need to manage data AND metadata AND experimental provenance

• Scientists need to check back over past results, compare workflow runs and share workflow runs with colleagues

• Scientists need to look at intermediate results when designing and debugging

Page 34: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Provenance ##

• Another slide here• Screenshot of provenance view

Page 35: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

myGrid Open Suite of Tools

Client User InterfacesWorkflow GUI Workbench

Workflow Repository

Service CatalogueThird Party Tools

Programming and APIs

Web Portal

Activity and Service Plug-in Manager

Provenance Store

Workflow Server

Open Provenance

Model

Secure Service Access

Page 36: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Toolkits “Taverna Inside”

Workflows under the hood• e-Laboratories (portals)

– Systems Biology, e-Health

• Web based execution– Running workflows over the web through myExperiment

• Visualisation clients that call workflows in the background

Page 37: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Open e-Lab Platforms

• Customised myExperiment instances– Australian Kepler Repository– eStat, NeuroHub, Nema, – SpaceBook, HPC/NA– Microsoft Trident

• BioCatalogue installations– Emory – ed unify project– Eli Lilly

SysMO-SEEK• e-Laboratory for interlinking and sharing data,

models, SOPS and workflows for Systems Biology in Europe

• ISA-TAB & SBML/MIRIAM compliant

Software ● Services ● Content ● Skills ● Community ●

Page 38: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Current Work

Page 39: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Taverna 2.2

Released end June

• Workflow diagnostics and error resolution• Retry and parallelisation• Stop/pause/resume workflows• Intermediate results display

Page 40: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Taverna Roadmap

• Next Generation Workbench• Access to service, data and workflow repositories• More data driven• Component families for vertical markets• Workflow Patterns• Taverna from Excel

“myGrid-in-a-Box” – Virtualised Taverna server deployment and distribution, bundle of

myExperiment, BioCatalogue and database/tools components.

Page 41: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Taverna Labs

• Semantic Taverna– Semantic provenance

• Open Provenance Model

– Linked Open Data• Dutch NBIC Aida toolkit

– Automated workflow planning through reasoning

• e-Lico with U Zurich and Rapid-Miner

• Taverna in the Cloud• Blogging the lab book

– Blog3 with Southampton U

Page 42: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Training

• Tutorials and Training– 58+ tutorials to >900 people.– >20 universities, Life Science

institutes, and networks.– Major Bio conferences– Summer schools in Biology and

Middleware.

• Developer and User Days– Annotation Jamborees

• Undergraduate and Postgraduate Bioinformatics in > 30 universities.

Software ● Services ● Content ● Skills ● Community

Page 43: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester
Page 44: Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

More Information

• myGrid– http://www.mygrid.org.uk

• Taverna– http://www.taverna.org.uk

• myExperiment– http://www.myexperiment.org – http://wiki.myexperiment.org

• BioCatalogue– http://www.biocatalogue.org– http://beta.biocatalogue.org