![Page 1: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/1.jpg)
Provenance in a Collaborative Bio-database
RAASWiki
Donald Dunbar & Jon ManningQueen’s Medical Research Institute
University of Edinburgh
Use Cases for ProvenanceApril 20th 2009
![Page 2: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/2.jpg)
Provenance in Bio-databases
including RAASWiki
Donald Dunbar & Jon ManningQueen’s Medical Research Institute
University of Edinburgh
Use Cases for ProvenanceApril 20th 2009
![Page 3: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/3.jpg)
Plan
bio-databases
provenanceRAASWiki
collaborativeknowledgebases
![Page 4: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/4.jpg)
Biological databases• Sequences
– Ensembl, Entrez• Structure
– PDB• Expression
– GEO, ArrayExpress– Function
– Gene Ontology– Interaction
– MINT, BIND, KEGG– ‘Warehouses’
– GeneCards, IUPHAR– Literature
– Pubmed
![Page 5: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/5.jpg)
How do they handle provenance?Ensembl produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.
‘Gene’ ID histories (with stable ID)
Evidence for gene predictions
Links to other databases (eg Uniprot)
![Page 6: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/6.jpg)
How do they handle provenance?The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies.
Primary citation
History: deposition and last update
Raw data and protocols
![Page 7: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/7.jpg)
How do they handle provenance?Gene Expression Omnibus: a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.
Standards compliance (protocols, data…)
Links within database (microarrays, protocols)
Raw data and protocols
![Page 8: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/8.jpg)
How do they handle provenance?The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism.
Evidence for gene annotation (experimental, computational)
Links to original publications
No versioning, just updates
![Page 9: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/9.jpg)
How do they handle provenance?PubMed is a free search engine for accessing the MEDLINE database of citations, abstracts and some full text articles on life sciences and biomedical topics.
Original source material, authors, abstracts
Unique Pubmed ID (used by other databases)
Continual updates (new papers), occasional retractions
![Page 10: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/10.jpg)
How do they handle provenance?GeneCards® is a searchable, integrated database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes.
Lots of data from other databases
IDs/keys from sources
Lots of data integration based on IDs
![Page 11: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/11.jpg)
How do they handle provenance?The IUPHAR database (IUPHAR-DB) integrates peer-reviewed pharmacological, chemical, genetic, functional and anatomical information on GPCRs, ligand-gated ion channels and voltage-gated-like ion channel subunits encoded by the human, rat and mouse genomes.
Curated by experts
Original sources plus curation provenance
Suggested citations
![Page 12: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/12.jpg)
Newer developments
WikiGenes is the first wiki system to combine the collaborative and largely altruistic possibilities of wikis with explicit authorship. In view of the extraordinary success of Wikipedia there remains no doubt about the potential of collaborative publishing, yet its adoption in science has been limited. Here I discuss a dynamic collaborative knowledge base for the life sciences that provides authors with due credit and that can evolve via continual revision and traditional peer review into a rigorous scientific tool.
but….
![Page 13: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/13.jpg)
RAASWikiRAASWiki is a knowledgebase of information on the renin-angiotensin-aldosterone system. While much of the seed data were derived from pre-existing databases such as KEGG and OMIM, supplementary data are included not easily available through such resources. This includes short textual reports on the genes involved, and more experimentally-oriented information such as animal models.
Important biology - hypertension
Automatic seeding of database (BioKB)
Collaborative editing (Wiki based, useful functionality)
Genes, publications, animal models, datasets…
![Page 14: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/14.jpg)
RAASWiki – provenance
Seeded data tagged with source database and date
Edits are tagged with editor and date
Comments are tagged: name and date
Wiki functionality allows versioning and roll back
Identifiers for source databases preserves provenance
‘Crowd wisdom’ will hopefully unsure good quality
![Page 15: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/15.jpg)
RAASWiki – provenance issues
How much detail (each edit, granularity, versions)?
Who will use provenance data?
How much should we rely on sources for provenance?
Annotation & comments v changing data
Public v private data
Different focus depending on data (who, when, confidence)
Likely to become a big issue
![Page 16: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/16.jpg)
What provenance to we need?Example:Gene expression in a transgenic animal
gene annotation gene expression measurements
public databases output from machine
processingintegration
where, when
which identifiers how
when, what, how
data miningwhat and how did we select genes
……
![Page 17: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/17.jpg)
What provenance to we need?Example:Curated gene database
curation database links
curator input
archive
contributor, date
verify, add, delete, modify
source, identifiers, dates
Curated databaseversions, dates
developmentschema & interface changes
![Page 18: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/18.jpg)
Collaborative knowledgebasesdatabases
experiments knowledge knowledgebase
papers
![Page 19: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/19.jpg)
Collaborative knowledgebaseprovenance issues
Confidence in data
Tracking data to its (real) source
When is something (knowledge) finished
Citing of knowlegebase records
Linking between knowledgebase records
Published papers do not contain all information
Some sort of dynamic publication
![Page 20: Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for](https://reader036.vdocuments.net/reader036/viewer/2022062713/56649cc45503460f9498d9f2/html5/thumbnails/20.jpg)
Conclusions
• In biology provenance is a mixed bag• We use mainly static databases• Usually source is clear but not much else• RAASWiki contains static and curated data • We have implemented a very rudimentary
provenance scheme• Collaborative knowledgebases will need to
address provenance in new ways