prioritization of avian go annotation
DESCRIPTION
Prioritization of Avian GO Annotation. Structural Annotation. Genome Build 2. No. Proteins (NRPD). % predicted proteins. No. Entrez Genes. Species. proteins/gene. Human. 36.3. 36,437. 415,830. 4.91. 11.41. Mouse. 37.1. 64,018. 228,696. 9.28. 3.57. - PowerPoint PPT PresentationTRANSCRIPT
Prioritization of Avian GO Annotation
1.59546.62431,819319,97932.1Chicken2.1829.99108,06949,5163.4Rat1
3.579.28228,69664,01837.1Mouse11.414.91415,83036,43736.3Human
proteins/gene% predicted
proteinsNo. Proteins
(NRPD)No. Entrez
GenesGenome
Build2Species
Structural Annotation
1. The rat genome was published only 8 months prior to the chicken genome, yet rat has 2x as many genes in Entrez Gene and 3x as many proteins.
2. After two genome builds chicken still has 5% of genomic sequence that has not been assigned a chromosome and mini-chromosomes have not been sequenced.
3. Chicken genes and proteins are under-represented in public databases.
4. Of the chicken proteins available from NRPD, almost half are predicted based upon computational analysis.
5. On average chicken has only 1 protein per gene so very little is known about isoforms and alternate transcripts in the chicken gene products.
NRPD: Non-redundant Protein Database
Phase 1: “Breadth”
7, 478 Chicken entries in UniProtKBGOA provides IEA mapping for UniProtKB entries
Initial strategy for AgBase biocurators was to add GO to chicken gene products that had none.
Since 46% of the chicken proteins in NRPD were predicted, they would have no GO IEA, ISS, ISO….
0
20
40
60
80
100
Human Mouse Rat Chicken
no GO
AgBase
computational GO
manual GO
% of gene products
annotated
the proportion of GO for chicken is over-represented because of their under-representation in public databases
Functional Annotation
Phase 2: “Depth”
What are the community needs?
GO Annotation of Arrays
DelMar14K, FHCRC, Tgu array 44K Agilent oligo array AIIM array, Affymetrix
Should we be focusing on arrays? What arrays should we do?
GO Annotation Priorities?
Provide “breadth” of coverage Annotate products represented on arrays Reference Genome targets Subject areas (immunity,
nutrition/metabolism, development Ad hoc as requested