1 gene ontology and functional annotation donghui li aspb plant biology, june 29, 2008, merida
TRANSCRIPT
1
Gene Ontology and Functional Annotation
Donghui Li
ASPB Plant Biology, June 29, 2008, Merida
2
TAIR literature statistics
May 2007 May 2008
Reference 31,058 34,179
Research articles 22,640 25,001
Full-text papers 15,572 16,638
Average new papers/month
204 216
Loci with valid references
9,289 10,847
3
Functional annotation
Controlled vocabularies: GO and PO
Functional annotation at TAIR
Community annotation
Outline
4
is defined as the process of collecting information about a gene’s biological identity:
• molecular function (protein kinase)• biological roles (protein phosphorylation)• subcellular localization (cytoplasm)
• aliases• mutant phenotype• expression domain
Functional annotation
5
An annotation is a statement that a gene product …
…has a particular molecular function
…is involved in a particular biological process
…is located within a certain cellular component
…as determined by a particular method
…as described in a particular reference
What is an annotation?
Adapted from Harold J Drabkin, The Jackson Laboratory
6Adapted from Harold J Drabkin, The Jackson Laboratory
Smith et al. (2006) determined by an enzyme assay that Abc2 has protein kinase activity, is involved in the process of protein phosphorylation, and is located in the cytoplasm.
Smith et al. (2006) determined by an enzyme assay that Abc2 has protein kinase activity, is involved in the process of protein phosphorylation, and is located in the cytoplasm.
ReferenceReference
Evidence code
Evidence code
Controlled vocabulariesControlled
vocabularies
Gene productGene
product
7
Non-controlled vocabulary• same name, different concept• different name, same concept
Controlled vocabulary (CV)
Controlled vocabulary• A standardized restricted set of defined terms
designed to reduce ambiguity in describing a concept
8
Same name, different concept
Cell
9
Same name, different concept
germination
seed germinationpollen germinationspore germination
10
glucose biosynthesisglucose synthesisglucose formationglucose anabolismgluconeogenesis
Different name, same concept
noncarbohydrate precursors(pyruvate, amino acids and glycerol)
glucose
(3Z)-phytochromobilin + oxidized ferredoxin = biliverdin IXa + reduced ferredoxin. (EC:1.3.7.4)phytochromobilin synthase activity =phytochromobilin:ferredoxin oxidoreductase activity
protein formationtranslation = protein biosynthesis
11
Cross-species cross-database comparison is problematic without CV
• translation• protein biosynthesis
• phytochromobilin synthase activity• phytochromobilin:ferredoxin oxidoreductase activity
12
Cross-species cross-database comparison is problematic without CV
pollen spore
germination
seed germinationpollen germinationspore germination
13
GO: The Gene Ontology, Gene Ontology Consortium
PO: The Plant Ontology, Plant Ontology Consortium
Controlled vocabularies used by TAIR
14
molecular function: catalytic / binding activitieskinase activity, DNA binding activitytranscriptional factor
biological process: biological goal or objectivesignal transductionmitosis, purine metabolism
cellular component: location or complexnucleus ribosome, proteasome
Gene Ontology
15
Term
16
Ontology structure: directed acyclic graph (DAG)
DAG: each child may have one or more parents
parent 1
child
parent 2
17
protein complex
organelle
mitochondrion
fatty acid beta-oxidation multienzyme complex
Ontology structure: directed acyclic graph (DAG)
18
is-a
protein complex
organelle
mitochondrion
fatty acid beta-oxidation multienzyme complex
part-of
is-a
Ontology structure: term-term relationships
19
Gene ontology browser: AmiGO
http://www.geneontology.org
http://amigo.geneontology.org
20
Plant structure
morphological and anatomical structures
stamen, petal, guard cell
Growth and developmental stages
whole plant growth stages and plant structure developmental stages
seedling growth, rosette growth, leaf development stages, embryo development stages
Plant Ontology
21
term
evidence
association
gene
How are annotations made?
The Plant Journal (2006) 47:701
AT5G27620
GO:0004672 protein kinase activity
kinase assay
22
Experimental evidence codesExperimental evidence codes
EXPEXP - Inferred from Experiment- Inferred from Experiment
IMPIMP -- IInferred from MMutant PPhenotype
IDAIDA -- IInferred from DDirect AAssay
IGIIGI - I- Inferred from GGenetic IInteraction
IPIIPI -- IInferred from PPhysical IInteraction
IEPIEP -- IInferred from EExpression PPattern
Computational analysis evidence codesComputational analysis evidence codes
ISSISS -- IInferred from SSequence or structural SSimilarity
Evidence codes
23
24
May 2008
KnownKnown, EXP Unannotated
Unknown
Functional annotation of Arabidopsis genome using GO
25
Search GO Annotations
26
27
28
29
30
31
Total With gene-related data
Indexed Curated
Papers in priority 1 journals
222 166 100% 144 (86%)
Papers in priority 2 journals
546 385 100% 207 (54%)
Papers in priority 3 journals
517 314 100% 31 (10%)
Papers in priority 4 journals
1291 461 100% 11 (2%)
Total 2576 1326 1326 393 (30%)
Papers entered into TAIR (May 07 to May 08)
32
TAIR - Plant Physiology collaboration
• Author submits annotation after the paper is accepted
• Web-based interface
• AGI locus identifier (At1g01040)
• Gene function annotation linked to loci with method
• Will expand to include other journals (Plant Cell ...)
33
35
Add your comment on TAIR