biocatalogue talk slides
DESCRIPTION
BioCatalogue talk by Carole Goble. She outlines in these slides the reasons behind the BioCatalogue project. And present the BioCatalogue and its goals.TRANSCRIPT
BioCatalogue
Joined project:
Aim: Create a registry of annotated biological web services
&
Funded by:
Timeline and Approach
• Started 1st June• 6 months Pilot 1• Perpetual beta
• “BioCatalogue-Friends” focus group
• Extensible software• Built to be evolved and to be scaled.
In the Wild Cloud Data ServicesMajor data centres EMBL-EBI, UK, DDBJ, Japan, NCBI, USA, PDBJ, Japan
Smaller projects and databases o Kanehisa Laboratory, Kyoto, Japan o myGrid, Manchester, UK o BASIS, University of Newcastle, UK o Biomolecular Interaction Network Database, BIND, University of Toronto,
Canada o GeneCruiser, Broad Institute, Harvard-MIT, USA o Genomics and Bioinformatics Group: Lab of Molecular Pharmacology, USA o BioMoby o Virginia Bioinformatics Institute, USA o Center for Biological Sequence Analysis, CBS, Technical University of
Denmark o Helmholtz open bioinformatics technology, Germany o Information Hyperlinked over Proteins, iHOP o SIGENAE project, France o The Nottingham Arabidopsis Stock Centre, NASC, UK o Bioinformatics Competence Center Braunschweig, Germany o Gene Ontology visualisation, Goviz o Bioinformatics group, Italy o The National Centre for Text Mining, NaCTeM o Centro de Ciencias Genómicas, UNAM, Mexico o e-Fungi, Manchester, UK o FUGE bioinformatics platform, Norway o Institute of Bioinformatics, Tsinghua University, China o EMAP, Edinburgh Mouse Atlas Project, UK
o The Chemical Informatics and Cyberinfrastructure Collaboratory (CICC), Indiana University, US, ChemSpider
http://www.mygrid.org.uk/wiki/Mygrid/BiologicalWebServices
Variable sustainable stewardship
Curate Processes
A repositoryA means to pool, discover and reuse workflowsA means to curate workflowsA platform for workflow monitoring and analytics
A registryA means to pool metadata about services in the wildA means to discover and reuse those servicesA means to curate servicesA platform for service monitoring and analytics
Service and Workflow analytics and network analysis
Recommendations and co-use.Social networks of third party externally hosted services
Automated diagnostics, monitoring
and metadata curation
Finding and Curating ServicesFinding and Curating Serviceshttp://www.biocatalogue.org
Drawing on 6 years experience in Taverna of semantic annotation of services using RDF and OWL ontologies.
Drawing on experience at EBI in service provision.
First pilot early November 2008, will cover major providers (EBI, NCBI, DDBJ) at “bronze” quality and show some at platinum.
Web Services in the Wild
Findable?• The clustalw program from Emboss is called ‘emma’
Executable?• WSDL / WADL / W*DL• Other kinds of services?
Understandable?• Input0:string, Output0: string• What does the polymorphic SeqRet actually do?• Example data? Parameter configurations? Input-Output
correlations? • Poorly documented black boxes.
Usable?• Quality of Service, monitoring, robustness• Stability and dependability• Licensing
Writing Reusable stuff is DIFFICULT
Predicting the unknown required by the unknown.
Scientists and Developers are under pressure and naughty.
Services Mutability and Preservation• Services are in constant and often
silent change.• Dynamic and Unstable.• Metadata decay (esp. on services instances).• Workflow Decay.• Monitoring and Repair.• BioNanny.• Implications for preservation not
fossilisation.• Implications for sustainability.
Workflows and Services
Curation by Experts
Social Curation by the Crowd
refinevalidate
refinevalidate
Self-Curation by Contributors
seedseed
refinevalidate
seed
Automated Curation
refinevalidate seed
Multiple Annotation Profiles
User Profile
Service Profile
ProfileAnnotation
ProfileAnnotation
ProfileAnnotation
RankingFunctions
Group Profile
CurationModel
Quantitative Content
Tags
Service Model
Semantic Content Model
Ontologies
FunctionalProvenance
OperationalOperationalMetrics
Conditions of Use
Social Standing
Service Profile
6 facets
Versioning
QoS
Usage
A.N. Other
Curation
Quant’ve
Service Model
Semantic Content Model
Execution atHost
Service ProfileFinding
WSDL
WADL
S-A.N. Other
SAWSDL
SA-REST
Analytics
Ranking
Browse/Shop
Search
Customised
Services
Workflows
Monitoring
Profiles
Services
Interface
Neutral
Func
tiona
l
Conditions of Use
Operational
Social Standing Oper
ation
al M
etric
sProvenance
Service Profile Facets
Services
Interface
Neutral
Func
tiona
l
Conditions of Use
Operational
Social Standing Oper
ation
al M
etric
sProvenance
Multiply described Third Party
Aggregated FeedsMonitoring
Multiple Sources
Multiple Versions
Dynamic
Multiple Instances
Discovery
Interoperability
Composition
Reuse
TrustedAuthorities
Policies
Ontologies
Controlled Vocabularies
Tags
Free text
Folksonomies
StandardsW*DLAtom
Schemas
Services
Interface
Neutral
Func
tiona
l
Conditions of Use
Operational
Social Standing Oper
ation
al M
etric
sProvenance
Multiply described Third Party
Aggregated FeedsMonitoring
Multiple Sources
Multiple Versions
Dynamic
Multiple Instances
Discovery
Interoperability
Composition
Reuse
TrustedAuthorities
Policies
Ranking
Pay as you Go, Emergent CurationJust enough, Just in Time, not Just in Case.
What is the Return for the Investment?
Gain
Pain
VeryBAD
Good, butUnlikely
Just right
Folksonomy Tagging
Hard Core full on Ontology Curation
Rich enough metadatafor effective reuse
Scientist – Finding.• Simple metadata on a few properties. Smart
tools. “Coarse grained”. • Decision Support. Simple Ontologies.
Folksonomies. Indexing. Matching.
Automation – Composition, Validation and Execution.
• Rich metadata for automatic service configuration, invocation, debugging, repair, automated composition
• Decision making. Rich ontologies. Reasoning.
Scientist – (Re)Using.• Richer metadata explanation on the inputs,
outputs and each operation.
myGrid History - Feta
• 3500+ service operations
• 700+ annotated by full-time curator.
• Feta and Find-O-Matic discovery tools
BioCatalogue: The pilot
• Features: – User Registration– Service Registration– Search– Annotation– Notification– Integration with myExperiment
– Keep it simple
Service Coverage+ EMBRACE
Roadmap – Perpetual BetaServices• BioMoby and Embrace support• Support for REST servicesOperational Metrics• Service monitoring• Notifications • “Test a service”Discovery• Enhancing search functionality• Semantic search• Facetted Browsing a la Amazon• Customised rankingCuration• Semantic annotation• Usage metrics collection • Improved user interfacesThird Party integration• REST APIs• Third party scavenging and monitoring – SeekDa!, BioMOBY• myExperiment integration
ImportersImporters
OntologyEditor
Ontologist
BioCatalogue
CatalogueManager
Service Providers
Service ProviderWorkbench
Domain ServicesBio Web Services
ExtractionImporters
CuratorWorkbench Expert
Curator
Chameleon change handler
DiscoveryService EB-eye
Search
Scientists
OntologyExporter
Curation and Acquisition Tools
Discovery S
ervices
Backend Catalogue Services
OntologyServices
“Shopping”Web
Interface
Find-O-Matic
AutoAnnotation
AdvancedFinding
Web Service Interface
BioNannyMonitor
Reviewing FeedbackBlogging
Tags
Service ProvidersTool DevelopersWeb Browser
Tool DevelopersTags
Community analysis
Service analysis
Community Use Monitor
CommunityTools + Tags Scientists
EB-eye
Ranking Matching
Sister Project
Close partnershipSocial Curation
Shared Code
Finding, curating and reusing workflowsConnecting Scientists in the Wild
A supermarket for workflow users.A toolbox for workflow creators.Social networking over commodities.Different disciplines.1200+ members from 114 countries.50000+ workflows downloads.1500-2000 unique visitors / month460+ workflows. 98 groups. 35+ packs.
Running for just over a year.Joint Manchester and Southampton.Project leader: Prof David De Roure
• Workflows, simulations, scripts, experimental plans statistical models, ...
• Bottom up e-Science repository for Scientific Research Objects
• Sharing to propagate expertise and build reputation.
• Collaboration.• Towards reusable and
comparable research.
,
http://myexperiment.org
Open and off the shelf…..…. Open to workflow systems (Taverna, Trident, BPEL…)…. Open to voluntary added applications.…. Web Services and scripts…. Browser mashups…. Applications and tools…. User’s environments
Google GadgetWeb 2.0 protocols, Open Archive Initiative, Linked Open Data,
RESTful APIs, Global, persistent URIs
More Information
• BioCatalogue website • http://www.biocatalogue.org/
• BioCatalogue wiki • http://www.biocatalogue.org/wiki
• myGrid website • http://www.mygrid.org.uk/
BioCatalogue Team
Thomas Laurent
Hamish McWilliams
Franck Tanoh
Jiten Bhagat
Carole Goble
Rodrigo Lopez
Eric Nzuobontane
myGrid+ TeammyGrid+ Team
Curation Sweatshop• Steady increase in numbers of services and
workflows • Users able to find annotates services BUT • Time-consuming and expensive.• More and more services built daily SO• We should enable suppliers to add value• We should get users involved
WS4LS Catalogue Diagrammatic Work PlanPhase 1 Phase 2 Phase 3 Phase4
Months1 6 12 18 24 30 36
Work Packages Staff M1 M2 M3WP1 Backend Catalogue servicesT1.1: Building Catalogue MCR1
T1.2: Metadata definition MCR2
T1.3: WS Interface and app API MCR1
T1.4: Run and support service EBI1
T1.5: Service support tools EBI1
T1.6: Packaging EBI1
WP2 Acquisition and Curation ToolsT2.1: Automated tools MCR1 + EBI1
T2.2: Curator workbench MCR1
T2.3: Service provider workbench EBI1
T2.4: Online Community tools MCR1
WP3 Discovery toolsT3.1: Community "shopping" MCR1
T3.2: Search EBI1
T3.3: Advanced discovery MCR1
WP4: Curation and AcquisitionT4.1: Ontology MCR2
T4.2: Catalogue curation MCR2 + EBI1
WP5: User engagementT5.1: Training MCR2 + EBI1
T5.2: Online Community building MCR2 + EBI1
T5.3: Focus group MCR2 + EBI1
Phase 1: Rapid assembly, clarify requirements and prototype for field tests within focus group (Milestone1) PrototypePhase 2: Development and parallel piloting of catalogue and its tooling in the field (Milestone 2) DevelopPhase 3: Revised catalogue and tooling, parallel deployment in the field (Milestone 3) Enhance