protein annotation ontology the biosapiens virtual institute for genome annotations janet thornton...
Post on 27-Mar-2015
219 Views
Preview:
TRANSCRIPT
Protein Annotation OntologyProtein Annotation OntologyThe BioSapiens Virtual Institute for Genome AnnotationsThe BioSapiens Virtual Institute for Genome Annotations
Janet Thornton & Gabby Reeves
AFP/BioSapiens
Vienna: July 07
OutlineOutline
• Integrating annotations -- why it is so important to think about it.
• Progress made by the BioSapiens towards the virtual institute for genome annotations.
• Creating the ontology
• ontology rules
• software (OBO)
• The Ontology – a brief outline
The European Virtual Institute for Genome Annotation
Funded by the European Commission
BioSapiens
Network of ExcellenceNetwork of Excellence
26 partners in 14 different countries
The objective of the BIOSAPIENS Network of Excellence is to provide a large-scale, concerted effort to annotate genome data by laboratories distributed around Europe, using both informatics tools and input from experimentalists.
“
”
• Many tools have been developed for the annotation of proteins – many make similar predictions.
• These tools come from a number of different labs in different locations
BIOSAPIENSBIOSAPIENS
BioSapiens Genome Annotation
DNA
Annotation
Proteome Annotation
Functional Annotation
•Gene definition/ alternative splicing•Regulators and promoters•Expression•Variation (haplotypes and SNPs)
•Protein families, orthologues•Membrane proteins and ligands•3D protein structure•Post translational modification and localisation
•Sequence and structure to function•Protein-protein complexes•Pathways and networks
How can we provide an integrated view of this information for the biologist?
• 69 sources from 19 partner sites, providing approximately 330 annotations.
• Information provided but not functionally ordered.
• Without a defined ontology, accurate interpretation of these annotations is impossible .
• The servers providing annotations also need sensible IDs to allow adequate identification and administration
Functional Grouping of AnnotationsFunctional Grouping of Annotations
Integrating AnnotationsIntegrating Annotations
Sequencing projects, structural genomics initiatives,Sequencing projects, structural genomics initiatives, ever ever
increasing increasing experimental based knowledgeexperimental based knowledge of biological systems. of biological systems.
1.1. Additional information needs to be added to already Additional information needs to be added to already existing entries.existing entries.
e.g. EMBL/Genbank/DDBJe.g. EMBL/Genbank/DDBJ
• Third Party Annotation pilot studyThird Party Annotation pilot study
• Entries via the website, marked as TPA entriesEntries via the website, marked as TPA entries
• Checked carefully by curators before published.Checked carefully by curators before published.
UniProt - proposals
• The “adopt a protein” scheme, - a research community in a particular area The “adopt a protein” scheme, - a research community in a particular area would be responsible for the update of information would be responsible for the update of information
• Making use of “grey matter” – using the growing population of retired scientists Making use of “grey matter” – using the growing population of retired scientists at home – with broadband accounts and nothing to do.at home – with broadband accounts and nothing to do.
• Quality and uniformity of curation is an issue – input fields free text/drop down Quality and uniformity of curation is an issue – input fields free text/drop down menusmenus
Distributed Annotation SystemDistributed Annotation System
• allows a system of decentralised annotation
Integrating AnnotationsIntegrating Annotations
2.2. Manually curated databases are struggling with the influx of Manually curated databases are struggling with the influx of information.information.
• What it is
– The distributed annotation system (DAS) is a specification of a client-server system for sharing various types of sequence annotations.
– An “annotation” is an entity which is anchored to a reference subsequence with a start and a stop position, together with some information about the type and method of annotation, and possibly some other textual information.
– Today, DAS is used for serving positional annotations on genomes and on proteins, and for serving “global annotations” on genes.
DAS, the distributed annotation system
Distributed Annotation System
Viewer
DAS Protocol
Dasty2Dasty2
Rafael Jimenez
SpiceSpice
Andreas Prlic
1.Cluster like annotations together to aid comparison between sources.
What will the ontology do?What will the ontology do?
Information on metal binding sites from two
sources
1.Cluster like annotations together to aid comparison between sources.
2.Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt).
What will the ontology do?What will the ontology do?
Duplications in the data.
1.Cluster like annotations together to aid comparison between sources.
2. Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt).
3.Standardise the vocabulary used by each partner. This will allow us to manipulate the data in a more powerful way.
What will the ontology do?What will the ontology do?
Standardisation of information provided by all DAS servers. Standardisation of information provided by all DAS servers.
Sometimes annotation types on some servers are exactly the same as names on other servers
Server
Annotation
1.Cluster like annotations together to aid comparison between sources.
2. Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt).
3.Standardise the vocabulary used by each partner site. This will allow us to manipulate the data in a more powerful way.
4.Provide evidence for each annotation to give an indication on how the information can be used.
What will the ontology do?What will the ontology do?
Evidence codes.Evidence codes.
• Each annotation must have at least one evidence code associated with it.
• Evidence codes can be selected from the EEvidence CCode OOntology
It is up to each partner to decide the evidence codes for their own annotations as each case is very
individual.
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=ECO
Designing an OntologyDesigning an Ontology
• The provision of a controlled vocabulary which can be shared between data sources.
• Needs Approval of the community.
• The creation of terms and clustering can only be done properly by an expert in the field rather than an expert in ontologies.
• Clear Goals essential: What relationship are necessary; What should they show.
• Increased complexity becomes laborious and time-intensive
• Continuous evolution.
Once agreed, the ontology will be deposited with the SO for maintenance.
Ontology RulesOntology Rules
Terms: computer friendly Phrase spacing: terms do not include white space. e.g. binding_site. Case: terms are always in lowercase except where demanded by context e.g. mRNA Abbreviations: If there is a common abbreviation, it is used for the name of the term, eg UTR. Symbols: Symbols and greek letters are generally spelled out in full. Full stops, slashes, and hyphens are not allowed, underscores used instead. Brackets (){}[] are not allowed.
Synonyms: They facilitate searching the ontology. Types of synonym: The long version of the words in the abbreviated phrase spelled out, different words that mean the same thing. Synonym rules: There is no limit on synonym number, one synonym can be used more than once, Synonyms do not have to be computer friendly. They can begin with numbers and include punctuation such as hyphens.
Definitions: Each term should have a definition. A definition must have a reference to it’s origin. (PubMed, database, website, the person that created it). The format of a definition:
a bicycle -- has two wheels a tandem -- is a bicycle with two saddles and two sets of
handle bars. (inherits all the features of bicycle – therefore the definition for bicycle definition cannot state “a saddle and a set of handlebars”)
Understanding relationships: Currently there are 3 types of relationship in SO; is_a, part_of and derived_from
The OBO EditorThe OBO Editor
The OntologyThe Ontology
Still in draft form.
AcknowledgementsAcknowledgements
Gaby ReevesGaby Reeves
Midori Harris (GO), Karen Eilbeck (SO)Midori Harris (GO), Karen Eilbeck (SO)
Luisa Montecchi, Henning Hermjakob, Eugene Luisa Montecchi, Henning Hermjakob, Eugene Kulesha, Andreas Prlic Kulesha, Andreas Prlic
Members of UniProt (EBI and SIB):Members of UniProt (EBI and SIB):
Alan Bridge, Alan Bridge, Michele MagraneMichele Magrane, Clare O’Donovan, , Clare O’Donovan, and Anne-Lise Veutheyand Anne-Lise Veuthey
BioSapiens Workshop held in February:BioSapiens Workshop held in February:
University of Bologna, CNIO, University of Dundee, University of Bologna, CNIO, University of Dundee, EBI, EBI, ENZIM Hungary, ENZIM Hungary, Hebrew University MPI, Hebrew University MPI, Sanger and UCL Sanger and UCL
top related