rice proteins data acquisition curation resources development and integration of controlled...

Download Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology www.gramene.org

Post on 14-Jan-2016

213 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • Rice ProteinsData acquisition Curation ResourcesDevelopment and integration of controlled vocabulary Gene OntologyTrait OntologyPlant Ontologywww.gramene.orgRice Protein and Ontology Database

  • Objectives

    Annotation of rice proteins using Gene Ontology (GO) concepts of Molecular Function, Biological Process and Cellular Localization4,000 rice genes annotated during projectLeading to presentation of Rice Protein Database (RPD) (http://www.gramene.org/perl/protein_search)OntologyContribute GO terms for monocot plants Develop and curate vocabulary for plant anatomy developmental stagesphenotypes or trait (TO-Trait Ontology)www.gramene.org(PO-Plant Ontology)

  • Gene OntologyMolecular functionBiological process Cellular localizationPublished report-PubMed-BIOSIS-OthersExperimental evidenceDirect enzyme assayExpressionMutant/phenotypePhysical interactionComplementationGenetic interactionLocalizationElectronic-predictionCitationSequence similarityElectronic Curation information Sequence similarityClustal / BLASTTraceable author statementPredictions/identificationGen Ontology mappingGramene & Interpro (EBI)PfamPROSITEPROTOMAPTransmembrane helicesCellular localizationPredictions based on HMMPhysiochemical propertiesProDom3D-Structural alignmentsDBXref / ReferencesGenBankSWISSPROTEMBL/DDBJOther databasesSequence entryRice Protein database (RPD)EnsEMBL Genome BrowsersequenceIEA and ISS codesNon IEA codeLink backPlant OntologyAnatomy & growth stagesNon IEA codeBLATFeatures on Peptide map DBXrefsGermplasm bankGramene Moduleswww.gramene.org

  • Name(s): Shows all the different names by which the molecule is represented in various databases and in scientific literature. E.C. Number(s): Shows the designated Enzyme Commission (E.C.) number. The EC numbers link to the GenomeNet, Japan, from where further links to biochemical pathways and Ligands are accessibleGene name(s): Lists all the gene names by which the molecule is called, as designated by the Commission on Plant Gene Nomenclature. If not available consider using a systematic name given to the ORF/Gene. GenBank/SWISSPROT ENTRYGet information on Courtesy KEGG databasewww.gramene.orgProtein page

  • Accession number: Is the Swissprot accession number, also similar to the "AC" field from SWALL (EMBL) record and "ACCESSION" field of GenBank records for respective protein entry. Links the protein entry to the other databases namely, GenBank protein database, SWALL from EMBL and SWISS-PROT. GenBank/SWISSPROT ENTRYGet information on Organism: Represents the taxonomic information on the organism from which the protein sequence was derived. Species: Shows the species of the Genus Oryza (presently represents 23 of 25 species)Subspecies: The subspecies indica or the japonica of the rice species Oryza sativa. Cultivar: Is the variety/cultivar name from which the sequence was derived and will link to a germplasm bank (GRIN/IRIS) for further informationwww.gramene.orgProtein page

  • GenBank/SWISSPROT ENTRYPerform a Blat alignment of the Rice protein sequences from SWISSPROT and translated peptides from Ensembl Rice genome sequence database at Gramene.

    The cut-off score used is 99% identity. The curator should validate. Add the features to the Protein structure - a map showing protein domains (e.g. Pfam) and protein features (trans-membrane, low complexity and coil regions) on the Ensembl peptide report page.SequenceUse it for performing analyses to identify features such as,Pfam / Prosite domains and generate predictions for trans-membrane helix, coiled coil regions, cellular component localizationValidationBased on available CDS features and gene indices/ESTswww.gramene.orgMap with featuresProtein page

  • Various tools used by Gramene in annotation of rice gene productsftp://www.gramene.org/pub/gramene/protein/feature/Oryza_TMHMM_result.txtPfam members in RPDProsite members in RPDwww.gramene.org

  • Annotate rice gene function using the Gene Ontology (GO) system Provide literature citations as evidence for assertion and classify them using the evidence codeswww.gramene.orgRice Functional InformationGene Ontology is a controlled vocabulary to define the following concepts for a gene product

    Molecular function: GO term(s) defining the molecular function of gene product

    Biological process: GO term(s) defining the biological process

    Cellular component: GO term(s) identifying the localization of the protein in a cellAfter identifying a number of features, finally the curator proceeds to annotate gene product(s) in Rice Protein Database

  • Gene Ontology (GO) AssociationsIDA inferred from direct assayEnzyme assays / in vitro reconstitutionimmunofluorescence / cell fractionationbinding assay

    IEA inferred from electronic annotationFeature search / Interpro / Pfam / Prosite /Annotations from database records

    IEP inferred from expression patternNortherns / microarray data / western blots

    IMP inferred from mutant phenotypeGene mutation / deletion or disruption /over expression / ectopic expressionanti-sense experiments / RNAi experiments / specific protein inhibitors

    NR not recordedVery old annotationIGI inferred from genetic interactionSuppressor screens / synthetic lethal / functionalComplementation / rescue experiments

    IPI inferred from physical interaction2-hybrid interactions/3-hybrid interactions co-purification / co-immunoprecipitation / affinity interaction

    ISS inferred from sequence or structural similaritySequence similarity / Recognized domains / Structural similarity Southern blotting

    NAS non-traceable author statementNo citation / non-traceable by curator

    TAS traceable author statementreview article / text book / dictionary / website / databaseA complete list is available at http://www.gramene.org/plant_ontology/evidence_codes.htmlEVIDENCE CODES APPLIED IN RICE PROTEIN DATABASEwww.gramene.org

  • The association of protein 1433_ORYSA with the GO termGene Ontology (GO) Associationswww.gramene.orgProtein pageGramene Ontology Database

  • The association of protein 1433_ORYSA with literature citation (EVIDENCE for molecular function)www.gramene.orgGene Ontology (GO) AssociationsGramene Literature DatabaseProtein page

  • The association of protein 1433_ORYSA with the Literature citation and EVIDENCE CODESGene Ontology (GO) Associationswww.gramene.orgProtein page

  • Total number of associations: 9866 (3321 gene products associated with 781 GO terms)

    Biological Process: 242 terms-2881 associationsMolecular Function: 449 term-5599 associationsCellular Component: 90 terms-1386 associationsTotal number of proteins: 8985Number of proteins from SWISSPROT: 397Number of proteins from TrEMBL: 8588Total number of evidences: 21170Total number of IEA evidences: 20593Total number of non-IEA evidences: 577Total number of references as evidences: 74Biological processMolecular functionRice Protein Database (RPD) statistics-1www.gramene.orgGO mappings are based on Interpro-EBI and Gramene curation

    Chart6

    187

    27

    83

    620

    309

    643

    219

    77

    223

    315

    76

    300

    92

    192

    90

    252

    Sheet1

    signal transduction53

    enzyme regulator87

    carrier proteins1301

    transporters332

    transcription regulator186

    storage protein33

    structural protein157

    defense/immunity50

    enzymes2573

    nucleic acid binding1314

    total6086

    biological_process (GO:0008150) #2881

    behavior (GO:0007610) #0

    biological_process unknown (GO:0000004) #0

    cell communication252

    viral life cycle (GO:0016032) #0

    cell growth and/or maintenance2605

    physiological processes10

    death88

    developmental processes12

    electron transport187

    coenzyme metabolism27

    energy pathway83

    nucleic acid metabolism620

    phosphate metabolism309

    protein metabolism643

    carbohydrate metabolism219

    amino acid metabolism77

    catabolism223

    biosynthesis315

    stress related76

    transport300

    cell organization and biogenesis92

    cell cycle192

    oxygen and radical metabolism90

    cell communication252

    Sheet1

    53

    87

    1301

    332

    186

    33

    157

    50

    2573

    1314

    Sheet2

    Chart1

    53

    87

    1301

    332

    186

    33

    157

    50

    2573

    1314

    Sheet1

    signal transduction53

    enzyme regulator87

    carrier proteins1301

    transporters332

    transcription regulator186

    storage protein33

    structural protein157

    defense/immunity50

    enzymes2573

    nucleic acid binding1314

    total6086

    Sheet1

    Sheet2

  • Total number of proteins in RPD: 8985Number of proteins from SWISS-PROT: 397Number of proteins from TrEMBL: 8588Total number of correspondences between proteins and translations: 7960 (6912 proteins correspond to 7957 translations)

    Proteins have only one corresponding translation:5911 Proteins have two corresponding translations: 959 Proteins have three corresponding translations: 37 Proteins have four corresponding translations: 5

    Gene products associated with 781 GO terms: 3321 (refer to previous slide)Number of Pfam entries: 874Total number of proteins that have mappings to Pfam: 3663Number of Prosite entries: 556Total number of proteins that have mappings to Prosite: 3201

    Total number of proteins that have mappings to trans-membrane features: 1583

    www.gramene.orgRice Protein Database (RPD) statistics-2

  • Trait Ontology (TO) to describe Mutants/phenotypes in ricewww.gramene.org

  • www.plantontology.orgPLANT ONTOLOGY resources will be