biodbcore: current status and next developments

24
Pascale Gaudet Chair, International Society for Biocuration Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics BioDBCore: Current status and future developments

Upload: pascale-gaudet

Post on 27-Jan-2015

110 views

Category:

Technology


2 download

DESCRIPTION

BioDBCore: Current Status and Next Developments Presented at the Biohackaton 2013, Tokyo, Japan

TRANSCRIPT

Page 1: BioDBCore: Current Status and Next Developments

Pascale Gaudet Chair, International Society for Biocuration Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics

BioDBCore: Current status and future developments

Page 2: BioDBCore: Current Status and Next Developments

International Society for Biocuration: Mission statement

•  Define and promote the work of biocurators

•  Foster connections with user communities to ensure that databases and accompanying tools meet specific user needs

•  Promote communication and exchanges between curators: meetings, workshops,

•  Encourage best practices by providing documentation on standards and annotation procedures ISB

Page 3: BioDBCore: Current Status and Next Developments

The need • Databases: improve data integration from

published papers

• Journals: link to databases objects

• Researchers: identify resources

• Grant submitters: enforce data sharing plans

Page 4: BioDBCore: Current Status and Next Developments

Goals 1)  Gather information required to provide a

general overview of the database landscape and compare the various resources

2)  Encourage consistency and interoperability 3)  Promote the use of standards 4)  Provide guidance for users 5)  Maximize the collective impact of the

resources

Page 5: BioDBCore: Current Status and Next Developments

BioDBcore group organization •  Lead by Pascale Gaudet (ISB/SIB) and

Philippe-Rocca-Serra (BioSharing)

•  Guidelines proposed in 2011 paper

•  Implemented in 2012 NAR database issue

Page 6: BioDBCore: Current Status and Next Developments

Use cases •  Show all resources of type database which use

MIMARK guidelines •  Show all resources where John Smith is involved •  Show all resources for mouse phenotypes • Where can I submit my data?

and also: • Guidance for grants’ data sharing policies •  Improving integration of data from papers into

databases

Page 7: BioDBCore: Current Status and Next Developments

Collaborative philosophy • Many groups/resources have been providing

registries and lists of databases • Often not funded, not maintained •  BioDBCore seeks to collaborate with all interested

parties to work together to provide a more permanent solution to database descriptions

Page 8: BioDBCore: Current Status and Next Developments

BioDBcore: Participating groups ²  BioDB100 ²  BioSharing ²  BioCatalogue ²  Bioinformatics Links Directory ²  Biositemaps ²  CASIMIR ²  MIBBI ²  MIRIAM ²  Model Organism Databases ²  NIF registry ²  … and your group !

Page 9: BioDBCore: Current Status and Next Developments

BioDBCore descriptors 1.  Database name

2.  Main resource URL 3.  Contact information (e-mail; postal mail) 4.  Date resource established (year) 5.  Conditions of use (Free, or type of license) 6.  Scope: data types captured, curation policy,

standards used 7.  Standards: MIs, Data formats, Terminologies 8.  Taxonomic coverage 9.  Data accessibility/output options 10.  Data release frequency 11.  Versioning policy and access to historical files 12.  Documentation available 13.  User support options 14.  Data submission policy 15.  Relevant publications 16.  Resource’s Wikipedia URL 17.  Tools available

Page 10: BioDBCore: Current Status and Next Developments

Database name dictyBase Main resource URL http://dictybase.org Contact information [email protected] Date resource established (year) 2003 Conditions of use Free Scope: Data types captured Genome sequence; gene models including CDS and predicted proteins; Phenotypes, Gene Ontology annotations, Functional annotation (gene product names), Gene nomenclature; Strains; Plasmids; Free text descriptions, Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Protein subcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot), Citations, Researchers database

Page 11: BioDBCore: Current Status and Next Developments

Curation policy manual curation Standards: MIs, Data formats, Terminologies Gene Ontology, Dicty Anatomy Ontology, Dicty Gene Nomeclature Data formats FASTA, OBO, GAF, GFF3 (standard) Taxonomic coverage (use NCBI Taxid) D. discoideum (44689) including all strains [PRIMARY], also some genome/EST/gene model info for D. purpureum (5786), and gene model sequences for P. pallidum (13642) and D. fasiculatum (261658) Data accessibility/output options HTML, text, database reports Data release frequency curators work on the 'live' database, weekly data dumps (sequences) or monthly (other data) Versioning policy/ access to historical files no versioning but access to historical files is possible

Page 12: BioDBCore: Current Status and Next Developments

Documentation available http://dictybase.org/FAQ/HelpFilesIndex.html User support options documents, email, webform Data submission policy Data from published literature. Some HTP data

corresponding to published analyses is incorporated Relevant publications PMID: 18974179, PMID: 14681427 Resource’s Wikipedia URL http://en.wikipedia.org/wiki/DictyBase Tools available BLAST, BioMart, Generic Genome Browser, TextPresso, MetaCyc (dictyCyc)

Page 13: BioDBCore: Current Status and Next Developments

Implementation of BioDBCore at BioSharing (Many thanks to Philippe RS !)

Page 14: BioDBCore: Current Status and Next Developments

BioDBcore announcement

Published in Nucleic Acids Research database issue 2011 and in the DATABASE journal

Page 15: BioDBCore: Current Status and Next Developments

Implementation plan •  Goal: BioDBCore data public and linked

•  Community aware approach: reuse existing stuff

•  Current Data model: RDF based on categories from BioSiteMap, MIRIAM, NIF, Dublin core, Darwin Core

•  Defined extension mechanisms

Page 16: BioDBCore: Current Status and Next Developments

www.biodbcore.org

Page 17: BioDBCore: Current Status and Next Developments

Example BioDBCore entry (1/2)

Page 18: BioDBCore: Current Status and Next Developments

Example BioDBCore entry (2/2)

Page 19: BioDBCore: Current Status and Next Developments

Creating, editing, maintaining entries

• Until now: records are manually created from data provided by NAR at publication of Database issue and the Life Sciences Registry (Michel Dumontier and Nick Juty) - Those mostly come as xls files that need to be manually entered - Close to 200 records have been entered out of over 2,000 obtained

Page 20: BioDBCore: Current Status and Next Developments

Beyond maintenance at BioSharing Ideally database providers would maintain their BioDBCore record up to date •  Claim ownership

- A database provider can now (in theory) maintain his own BioDBCore record

Encouraging best practices •  DATABASE and Nucleic Acids Research journals:

Editors in chief request BioDBCore information from submitters

•  ISB seal of approval •  BioDB100 - launched at InCoB 2011 – examples of 100 well

annotated databases

Page 21: BioDBCore: Current Status and Next Developments

What’s next ?

q  Continue to extend participating groups and journals

q  Refine scope

q  Integrate semantic support

q  Develop querying system

q  Implement validation tests

q  Set up mechanisms for exchange of data among

collaborating groups (in BioDBCore RDF format, or

other)

Page 22: BioDBCore: Current Status and Next Developments

Identifying or developing semantic support •  Policies and guidelines: BioSharing

•  Publications and taxon info: identifiers.org

•  Authors: ORCID (will also implement organizations)

•  Keywords/database scope: NIF when possible

Identifying resources is preferable to developing them !

Page 23: BioDBCore: Current Status and Next Developments

For biohackaton2013

q  Evaluate need for BioDBCore in today’s landscape

of metadatabase resources

q  Evaluate further collaboration opportunities

q  Set up a better system for creating and maintaining

BioDBCore records

q  Identify/develop ontologies pertinent to BioDBCore

Page 24: BioDBCore: Current Status and Next Developments

Acknowledgements Philippe Rocca-Serra Susanna-Assunta Sansone Eamonn Maguire Alejandra Gonzalez Beltran

International Society for Biocuration

Michael Galperin David Landsman Francis Ouellette

OXFORD  UNIVERSITY  PRESS  

collaborators