biocatalogue talk slides

37
BioCatalogue Joined project: Aim: Create a registry of annotated biological web services & Funded by:

Upload: biocatalogue

Post on 11-May-2015

764 views

Category:

Documents


3 download

DESCRIPTION

BioCatalogue talk by Carole Goble. She outlines in these slides the reasons behind the BioCatalogue project. And present the BioCatalogue and its goals.

TRANSCRIPT

Page 1: Biocatalogue Talk Slides

BioCatalogue

Joined project:

Aim: Create a registry of annotated biological web services

&

Funded by:

Page 2: Biocatalogue Talk Slides

Timeline and Approach

• Started 1st June• 6 months Pilot 1• Perpetual beta

• “BioCatalogue-Friends” focus group

• Extensible software• Built to be evolved and to be scaled.

Page 3: Biocatalogue Talk Slides

In the Wild Cloud Data ServicesMajor data centres EMBL-EBI, UK, DDBJ, Japan, NCBI, USA, PDBJ, Japan

Smaller projects and databases o Kanehisa Laboratory, Kyoto, Japan o myGrid, Manchester, UK o BASIS, University of Newcastle, UK o Biomolecular Interaction Network Database, BIND, University of Toronto,

Canada o GeneCruiser, Broad Institute, Harvard-MIT, USA o Genomics and Bioinformatics Group: Lab of Molecular Pharmacology, USA o BioMoby o Virginia Bioinformatics Institute, USA o Center for Biological Sequence Analysis, CBS, Technical University of

Denmark o Helmholtz open bioinformatics technology, Germany o Information Hyperlinked over Proteins, iHOP o SIGENAE project, France o The Nottingham Arabidopsis Stock Centre, NASC, UK o Bioinformatics Competence Center Braunschweig, Germany o Gene Ontology visualisation, Goviz o Bioinformatics group, Italy o The National Centre for Text Mining, NaCTeM o Centro de Ciencias Genómicas, UNAM, Mexico o e-Fungi, Manchester, UK o FUGE bioinformatics platform, Norway o Institute of Bioinformatics, Tsinghua University, China o EMAP, Edinburgh Mouse Atlas Project, UK

o The Chemical Informatics and Cyberinfrastructure Collaboratory (CICC), Indiana University, US, ChemSpider

http://www.mygrid.org.uk/wiki/Mygrid/BiologicalWebServices

Variable sustainable stewardship

Page 4: Biocatalogue Talk Slides

Curate Processes

A repositoryA means to pool, discover and reuse workflowsA means to curate workflowsA platform for workflow monitoring and analytics

A registryA means to pool metadata about services in the wildA means to discover and reuse those servicesA means to curate servicesA platform for service monitoring and analytics

Page 5: Biocatalogue Talk Slides

Service and Workflow analytics and network analysis

Recommendations and co-use.Social networks of third party externally hosted services

Automated diagnostics, monitoring

and metadata curation

Page 6: Biocatalogue Talk Slides

Finding and Curating ServicesFinding and Curating Serviceshttp://www.biocatalogue.org

Drawing on 6 years experience in Taverna of semantic annotation of services using RDF and OWL ontologies.

Drawing on experience at EBI in service provision.

First pilot early November 2008, will cover major providers (EBI, NCBI, DDBJ) at “bronze” quality and show some at platinum.

Page 7: Biocatalogue Talk Slides

Web Services in the Wild

Findable?• The clustalw program from Emboss is called ‘emma’

Executable?• WSDL / WADL / W*DL• Other kinds of services?

Understandable?• Input0:string, Output0: string• What does the polymorphic SeqRet actually do?• Example data? Parameter configurations? Input-Output

correlations? • Poorly documented black boxes.

Usable?• Quality of Service, monitoring, robustness• Stability and dependability• Licensing

Page 8: Biocatalogue Talk Slides

Writing Reusable stuff is DIFFICULT

Predicting the unknown required by the unknown.

Scientists and Developers are under pressure and naughty.

Page 9: Biocatalogue Talk Slides

Services Mutability and Preservation• Services are in constant and often

silent change.• Dynamic and Unstable.• Metadata decay (esp. on services instances).• Workflow Decay.• Monitoring and Repair.• BioNanny.• Implications for preservation not

fossilisation.• Implications for sustainability.

Page 10: Biocatalogue Talk Slides

Workflows and Services

Curation by Experts

Social Curation by the Crowd

refinevalidate

refinevalidate

Self-Curation by Contributors

seedseed

refinevalidate

seed

Automated Curation

refinevalidate seed

Page 11: Biocatalogue Talk Slides

Multiple Annotation Profiles

User Profile

Service Profile

ProfileAnnotation

ProfileAnnotation

ProfileAnnotation

RankingFunctions

Group Profile

Page 12: Biocatalogue Talk Slides

CurationModel

Quantitative Content

Tags

Service Model

Semantic Content Model

Ontologies

FunctionalProvenance

OperationalOperationalMetrics

Conditions of Use

Social Standing

Service Profile

6 facets

Versioning

QoS

Usage

Page 13: Biocatalogue Talk Slides

A.N. Other

Curation

Quant’ve

Service Model

Semantic Content Model

Execution atHost

Service ProfileFinding

WSDL

WADL

S-A.N. Other

SAWSDL

SA-REST

Analytics

Ranking

Browse/Shop

Search

Customised

Services

Workflows

Monitoring

Profiles

Page 14: Biocatalogue Talk Slides

Services

Interface

Neutral

Func

tiona

l

Conditions of Use

Operational

Social Standing Oper

ation

al M

etric

sProvenance

Service Profile Facets

Page 15: Biocatalogue Talk Slides

Services

Interface

Neutral

Func

tiona

l

Conditions of Use

Operational

Social Standing Oper

ation

al M

etric

sProvenance

Multiply described Third Party

Aggregated FeedsMonitoring

Multiple Sources

Multiple Versions

Dynamic

Multiple Instances

Discovery

Interoperability

Composition

Reuse

TrustedAuthorities

Policies

Ontologies

Controlled Vocabularies

Tags

Free text

Folksonomies

StandardsW*DLAtom

Schemas

Page 16: Biocatalogue Talk Slides

Services

Interface

Neutral

Func

tiona

l

Conditions of Use

Operational

Social Standing Oper

ation

al M

etric

sProvenance

Multiply described Third Party

Aggregated FeedsMonitoring

Multiple Sources

Multiple Versions

Dynamic

Multiple Instances

Discovery

Interoperability

Composition

Reuse

TrustedAuthorities

Policies

Ranking

Page 17: Biocatalogue Talk Slides

Pay as you Go, Emergent CurationJust enough, Just in Time, not Just in Case.

What is the Return for the Investment?

Gain

Pain

VeryBAD

Good, butUnlikely

Just right

Folksonomy Tagging

Hard Core full on Ontology Curation

Rich enough metadatafor effective reuse

Page 18: Biocatalogue Talk Slides

Scientist – Finding.• Simple metadata on a few properties. Smart

tools. “Coarse grained”. • Decision Support. Simple Ontologies.

Folksonomies. Indexing. Matching.

Automation – Composition, Validation and Execution.

• Rich metadata for automatic service configuration, invocation, debugging, repair, automated composition

• Decision making. Rich ontologies. Reasoning.

Scientist – (Re)Using.• Richer metadata explanation on the inputs,

outputs and each operation.

Page 19: Biocatalogue Talk Slides

myGrid History - Feta

Page 20: Biocatalogue Talk Slides

• 3500+ service operations

• 700+ annotated by full-time curator.

• Feta and Find-O-Matic discovery tools

Page 21: Biocatalogue Talk Slides
Page 22: Biocatalogue Talk Slides

BioCatalogue: The pilot

• Features: – User Registration– Service Registration– Search– Annotation– Notification– Integration with myExperiment

– Keep it simple

Page 23: Biocatalogue Talk Slides

Service Coverage+ EMBRACE

Page 24: Biocatalogue Talk Slides
Page 25: Biocatalogue Talk Slides
Page 26: Biocatalogue Talk Slides
Page 27: Biocatalogue Talk Slides

Roadmap – Perpetual BetaServices• BioMoby and Embrace support• Support for REST servicesOperational Metrics• Service monitoring• Notifications • “Test a service”Discovery• Enhancing search functionality• Semantic search• Facetted Browsing a la Amazon• Customised rankingCuration• Semantic annotation• Usage metrics collection • Improved user interfacesThird Party integration• REST APIs• Third party scavenging and monitoring – SeekDa!, BioMOBY• myExperiment integration

Page 28: Biocatalogue Talk Slides

ImportersImporters

OntologyEditor

Ontologist

BioCatalogue

CatalogueManager

Service Providers

Service ProviderWorkbench

Domain ServicesBio Web Services

ExtractionImporters

CuratorWorkbench Expert

Curator

Chameleon change handler

DiscoveryService EB-eye

Search

Scientists

OntologyExporter

Curation and Acquisition Tools

Discovery S

ervices

Backend Catalogue Services

OntologyServices

“Shopping”Web

Interface

Find-O-Matic

AutoAnnotation

AdvancedFinding

Web Service Interface

BioNannyMonitor

Reviewing FeedbackBlogging

Tags

Service ProvidersTool DevelopersWeb Browser

Tool DevelopersTags

Community analysis

Service analysis

Community Use Monitor

CommunityTools + Tags Scientists

EB-eye

Ranking Matching

Page 29: Biocatalogue Talk Slides

Sister Project

Close partnershipSocial Curation

Shared Code

Page 30: Biocatalogue Talk Slides

Finding, curating and reusing workflowsConnecting Scientists in the Wild

A supermarket for workflow users.A toolbox for workflow creators.Social networking over commodities.Different disciplines.1200+ members from 114 countries.50000+ workflows downloads.1500-2000 unique visitors / month460+ workflows. 98 groups. 35+ packs.

Running for just over a year.Joint Manchester and Southampton.Project leader: Prof David De Roure

Page 31: Biocatalogue Talk Slides

• Workflows, simulations, scripts, experimental plans statistical models, ...

• Bottom up e-Science repository for Scientific Research Objects

• Sharing to propagate expertise and build reputation.

• Collaboration.• Towards reusable and

comparable research.

,

http://myexperiment.org

Page 32: Biocatalogue Talk Slides

Open and off the shelf…..…. Open to workflow systems (Taverna, Trident, BPEL…)…. Open to voluntary added applications.…. Web Services and scripts…. Browser mashups…. Applications and tools…. User’s environments

Google GadgetWeb 2.0 protocols, Open Archive Initiative, Linked Open Data,

RESTful APIs, Global, persistent URIs

Page 33: Biocatalogue Talk Slides

More Information

• BioCatalogue website • http://www.biocatalogue.org/

• BioCatalogue wiki • http://www.biocatalogue.org/wiki

• myGrid website • http://www.mygrid.org.uk/

Page 34: Biocatalogue Talk Slides

BioCatalogue Team

Thomas Laurent

Hamish McWilliams

Franck Tanoh

Jiten Bhagat

Carole Goble

Rodrigo Lopez

Eric Nzuobontane

Page 35: Biocatalogue Talk Slides

myGrid+ TeammyGrid+ Team

Page 36: Biocatalogue Talk Slides

Curation Sweatshop• Steady increase in numbers of services and

workflows • Users able to find annotates services BUT • Time-consuming and expensive.• More and more services built daily SO• We should enable suppliers to add value• We should get users involved

Page 37: Biocatalogue Talk Slides

WS4LS Catalogue Diagrammatic Work PlanPhase 1 Phase 2 Phase 3 Phase4

Months1 6 12 18 24 30 36

Work Packages Staff M1 M2 M3WP1 Backend Catalogue servicesT1.1: Building Catalogue MCR1

T1.2: Metadata definition MCR2

T1.3: WS Interface and app API MCR1

T1.4: Run and support service EBI1

T1.5: Service support tools EBI1

T1.6: Packaging EBI1

WP2 Acquisition and Curation ToolsT2.1: Automated tools MCR1 + EBI1

T2.2: Curator workbench MCR1

T2.3: Service provider workbench EBI1

T2.4: Online Community tools MCR1

WP3 Discovery toolsT3.1: Community "shopping" MCR1

T3.2: Search EBI1

T3.3: Advanced discovery MCR1

WP4: Curation and AcquisitionT4.1: Ontology MCR2

T4.2: Catalogue curation MCR2 + EBI1

WP5: User engagementT5.1: Training MCR2 + EBI1

T5.2: Online Community building MCR2 + EBI1

T5.3: Focus group MCR2 + EBI1

Phase 1: Rapid assembly, clarify requirements and prototype for field tests within focus group (Milestone1) PrototypePhase 2: Development and parallel piloting of catalogue and its tooling in the field (Milestone 2) DevelopPhase 3: Revised catalogue and tooling, parallel deployment in the field (Milestone 3) Enhance