1 sri international bioinformatics go term integration and curation in pathway tools and ecocyc...
TRANSCRIPT
1 SRI International Bioinformatics
GO Term Integration and Curation in Pathway Tools
and EcoCyc
Ingrid M. KeselerBioinformatics Research Group
SRI International
2 SRI International Bioinformatics
History of Classification and GO terms in EcoCyc
The MultiFun classification scheme was/is used for
gene/gene product classification in EcoCyc.
Developed by Monica Riley and collaborators Hierarchical classification scheme with 10 major
categories for cellular function
In 2005, we began to add support for adding GO terms to genes/gene products.
3 SRI International Bioinformatics
Why go with GO?
GO has become the standard ontology/classification scheme for gene products
GO is being actively developed with input from the user communities
GO is allowing standardization of annotation across all domains of life
Data mining across genomes Genome annotation by similarity (e.g. via InterPro, Pfam,
TIGRFAM, COG mappings)
Tools that take advantage of GO annotations, e.g. microarray data clustering etc.
4 SRI International Bioinformatics
The Evolution of GO Within EcoCyc
1. 12/2005 -- Mapping of MultiFun terms to GO terms (multifun2go – Ashburner and Lomax): multiple specific GO terms were sometimes mapped to one general MultiFun term, resulting in misleading GO term annotations in EcoCyc; no evidence codes, citations
2. 12/2007 -- Mapping of EC reactions to GO terms (ec2go): imported GO terms for enzymes that catalyzed reactions with full EC number assignments; no evidence codes, citations
5 SRI International Bioinformatics
3. 4/2008 -- Importing GO term assignments from UniProt; mostly computational evidence codes
4. Since ~2007 -- Manual curation of GO terms based on publications, with evidence codes (mostly experimental) and literature citations
5. Since ~2008 -- EcoCyc and EcoliWiki are the source of the official E. coli gene-association file (in collaboration with J. Hu and D. Siegele, EcoliWiki, Texas A&M)
The Evolution of GO Within EcoCyc
6 SRI International Bioinformatics
Of Requirements and Differences
Specific requirements for GO gene-association file Presence of evidence codes and citations Pathway Tools uses a different evidence code ontology; it is
therefore necessary to map the evidence codes carefully Some types of evidence require use of a With/From qualifier
in GO – e.g IPI, ISS Annotation with other qualifiers is not required by GO (e.g.
NOT, contributes_to, colocalizes_with) and is not (yet) supported by Pathway Tools
7 SRI International Bioinformatics
Tools for the Curator
GO classification editor is accessible via the protein editor
GO database can be searched in the editor; term definitions are available
Tools available locally (ask developers about general availability):
Import new GO database (for newly created terms etc.) Export gene-association file
8 SRI International Bioinformatics
Manual Curation of GO terms
Ongoing when we curate or re-curate gene products within EcoCyc
No particular effort to back-fill GO terms; e.g. metabolic enzymes get experimental GO term assignments when we re-curate old metabolic pathways, or when new literature appears
Texas A&M team is part of the Reference Genome Annotation Project; GO term assignments from EcoliWiki get imported into EcoCyc on a regular basis
9 SRI International Bioinformatics
GO Term Statistics for E. coli (8/2009)
3721 gene products annotated with at least one GO term
42724 total GO term annotations, of which there are 6330 non-IEA annotations
10 SRI International Bioinformatics
Acknowledgements
Peter KarpSuzanne PaleyMarkus KrummenackerTomer Altman
Jim HuDebby Siegele
GO experts at the GO consortium