1 sri international bioinformatics go term integration and curation in pathway tools and ecocyc...

10
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International [email protected]

Upload: madlyn-singleton

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

1 SRI International Bioinformatics

GO Term Integration and Curation in Pathway Tools

and EcoCyc

Ingrid M. KeselerBioinformatics Research Group

SRI International

[email protected]

Page 2: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

2 SRI International Bioinformatics

History of Classification and GO terms in EcoCyc

The MultiFun classification scheme was/is used for

gene/gene product classification in EcoCyc.

Developed by Monica Riley and collaborators Hierarchical classification scheme with 10 major

categories for cellular function

In 2005, we began to add support for adding GO terms to genes/gene products.

Page 3: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

3 SRI International Bioinformatics

Why go with GO?

GO has become the standard ontology/classification scheme for gene products

GO is being actively developed with input from the user communities

GO is allowing standardization of annotation across all domains of life

Data mining across genomes Genome annotation by similarity (e.g. via InterPro, Pfam,

TIGRFAM, COG mappings)

Tools that take advantage of GO annotations, e.g. microarray data clustering etc.

Page 4: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

4 SRI International Bioinformatics

The Evolution of GO Within EcoCyc

1. 12/2005 -- Mapping of MultiFun terms to GO terms (multifun2go – Ashburner and Lomax): multiple specific GO terms were sometimes mapped to one general MultiFun term, resulting in misleading GO term annotations in EcoCyc; no evidence codes, citations

2. 12/2007 -- Mapping of EC reactions to GO terms (ec2go): imported GO terms for enzymes that catalyzed reactions with full EC number assignments; no evidence codes, citations

Page 5: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

5 SRI International Bioinformatics

3. 4/2008 -- Importing GO term assignments from UniProt; mostly computational evidence codes

4. Since ~2007 -- Manual curation of GO terms based on publications, with evidence codes (mostly experimental) and literature citations

5. Since ~2008 -- EcoCyc and EcoliWiki are the source of the official E. coli gene-association file (in collaboration with J. Hu and D. Siegele, EcoliWiki, Texas A&M)

The Evolution of GO Within EcoCyc

Page 6: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

6 SRI International Bioinformatics

Of Requirements and Differences

Specific requirements for GO gene-association file Presence of evidence codes and citations Pathway Tools uses a different evidence code ontology; it is

therefore necessary to map the evidence codes carefully Some types of evidence require use of a With/From qualifier

in GO – e.g IPI, ISS Annotation with other qualifiers is not required by GO (e.g.

NOT, contributes_to, colocalizes_with) and is not (yet) supported by Pathway Tools

Page 7: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

7 SRI International Bioinformatics

Tools for the Curator

GO classification editor is accessible via the protein editor

GO database can be searched in the editor; term definitions are available

Tools available locally (ask developers about general availability):

Import new GO database (for newly created terms etc.) Export gene-association file

Page 8: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

8 SRI International Bioinformatics

Manual Curation of GO terms

Ongoing when we curate or re-curate gene products within EcoCyc

No particular effort to back-fill GO terms; e.g. metabolic enzymes get experimental GO term assignments when we re-curate old metabolic pathways, or when new literature appears

Texas A&M team is part of the Reference Genome Annotation Project; GO term assignments from EcoliWiki get imported into EcoCyc on a regular basis

Page 9: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

9 SRI International Bioinformatics

GO Term Statistics for E. coli (8/2009)

3721 gene products annotated with at least one GO term

42724 total GO term annotations, of which there are 6330 non-IEA annotations

Page 10: 1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International

10 SRI International Bioinformatics

Acknowledgements

Peter KarpSuzanne PaleyMarkus KrummenackerTomer Altman

Jim HuDebby Siegele

GO experts at the GO consortium