Transcript
Page 1: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

• Rice Proteins

• Data acquisition

• Curation

• Resources

• Development and integration of controlled vocabulary

• Gene Ontology

• Trait Ontology

• Plant Ontology

www.gramene.org

Rice Protein and Ontology Database

Page 2: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Objectives

– Annotation of rice proteins using Gene Ontology (GO) concepts of Molecular Function, Biological Process and Cellular Localization• 4,000 rice genes annotated during project• Leading to presentation of Rice Protein Database

(RPD) (http://www.gramene.org/perl/protein_search)

– Ontology• Contribute GO terms for monocot plants • Develop and curate vocabulary for

• plant anatomy • developmental stages• phenotypes or trait (TO-Trait Ontology)www.gramene.org

(PO-Plant Ontology)

Page 3: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Gene mining using the Controlled vocabulary

Protein

Morphology

AnatomyOr

Histology Cell

Sub-Cellular

Tissue

Root

Shoot

Seed

Meristematic

Vascular

Ground

Cell components

Pathways Reactions

Other roles

Enzyme

others

Localization

Molecular Function

Biological Process

Molecule

Traits (TO)

Organ

Cell type

Transcript

Gene

Development

Sub components

Sub components

Sub components

Sub components

Sub components

Sub components

Sub components

Agronomic

(PO)

PO

&

TO

GO

Organic

InorganicFats/carohydrates/proteins/mutagens/others

Internal CVOwww.gramene.org

Page 4: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Gene Ontology

Molecular function

Biological process

Cellular localization

Published report

-PubMed

-BIOSIS

-Others

Experimental evidence• Direct enzyme assay• Expression• Mutant/phenotype• Physical interaction• Complementation• Genetic interaction• Localization• Electronic-prediction• Citation• Sequence similarity

Electronic Curation information • Sequence similarity

•Clustal / BLAST• Traceable author statement• Predictions/identification

•Gen Ontology mapping•Gramene & Interpro (EBI)

•Pfam•PROSITE•PROTOMAP•Transmembrane helices•Cellular localization•Predictions based on HMM•Physiochemical properties•ProDom•3D-Structural alignments

• DBXref / References

GenBankSWISSPROTEMBL/DDBJOther databases

Sequence entry

Rice Protein database (RPD)

EnsEMBL Genome Browser

sequence

IEA and ISS

codes

Non IEA code

Link back

Plant Ontology

Anatomy & growth stages

Non IEA code

BLAT

Features on Peptide map

DBXrefs

Germplasm bank

Gramene Modules

www.gramene.org

Page 5: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

• Name(s): Shows all the different names by which the molecule is represented in various databases and in scientific literature.

• E.C. Number(s): Shows the designated Enzyme Commission (E.C.) number. The EC numbers link to the GenomeNet, Japan, from where further links to biochemical pathways and Ligands are accessible

• Gene name(s): Lists all the gene names by which the molecule is called, as designated by the Commission on Plant Gene Nomenclature. If not available consider using a systematic name given to the ORF/Gene.

GenBank/SWISSPROT ENTRY

Get information on

Courtesy KEGG databasewww.gramene.org

Protein page

Page 6: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Accession number: Is the Swissprot accession number, also similar to the "AC" field from SWALL (EMBL) record and "ACCESSION" field of GenBank records for respective protein entry. Links the protein entry to the other databases namely, GenBank protein database, SWALL from EMBL and SWISS-PROT.

GenBank/SWISSPROT ENTRY

Get information on

Organism: Represents the taxonomic information on the organism from which the protein sequence was derived.

• Species: Shows the species of the Genus Oryza (presently represents 23 of 25 species)

• Subspecies: The subspecies indica or the japonica of the rice species Oryza sativa.

• Cultivar: Is the variety/cultivar name from which the sequence was derived and will link to a germplasm bank (GRIN/IRIS) for further information

www.gramene.org

Protein page

Page 7: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

GenBank/SWISSPROT ENTRYPerform a “Blat” alignment of the Rice protein sequences from SWISSPROT and translated peptides from Ensembl Rice genome sequence database at Gramene.

The cut-off score used is 99% identity. The curator should validate. Add the features to the Protein structure - a map showing protein domains (e.g. Pfam) and protein features (trans-membrane, low complexity and coil regions) on the Ensembl peptide report page.

Sequence

Use it for performing analyses to identify features such as,Pfam / Prosite domains and generate predictions for trans-membrane helix, coiled coil regions, cellular component localization

Validation

Based on available CDS features and gene indices/ESTs

www.gramene.org

Map with features

Pro

tein

pag

e

Page 8: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Various tools used by Gramene in annotation of rice gene products

ftp://www.gramene.org/pub/gramene/protein/feature/Oryza_TMHMM_result.txt

Pfam members in RPD

Prosite members in RPD

www.gramene.org

Page 9: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

• Annotate rice gene function using the Gene Ontology (GO) system

• Provide literature citations as evidence for assertion and classify them using the evidence codes

www.gramene.org

Rice Functional Information

Gene Ontology is a controlled vocabulary to define the following

concepts for a gene product

Molecular function: GO term(s) defining the molecular function of gene product

Biological process: GO term(s) defining the biological process

Cellular component: GO term(s) identifying the localization of the protein in a cell

After identifying a number of features, finally the curator proceeds to annotate gene product(s) in Rice Protein

Database

Page 10: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Gene Ontology (GO) Associations

IDA inferred from direct assay

Enzyme assays / in vitro reconstitution

immunofluorescence / cell fractionation

binding assay

IEA inferred from electronic annotation

Feature search / Interpro / Pfam / Prosite /

Annotations from database records

IEP inferred from expression pattern

Northerns / microarray data /

western blots

IMP inferred from mutant phenotype

Gene mutation / deletion or disruption /

over expression / ectopic expression

anti-sense experiments / RNAi

experiments / specific protein inhibitors

NR not recorded

Very old annotation

IGI inferred from genetic interactionSuppressor screens / synthetic lethal / functionalComplementation / rescue experiments

IPI inferred from physical interaction2-hybrid interactions/3-hybrid interactions co-purification / co-immunoprecipitation / affinity interaction

ISS inferred from sequence or structural similaritySequence similarity / Recognized domains / Structural similarity Southern blotting

NAS non-traceable author statementNo citation / non-traceable by curator

TAS traceable author statementreview article / text book / dictionary / website / database

A complete list is available at http://www.gramene.org/plant_ontology/evidence_code

s.html

EVIDENCE CODES APPLIED IN RICE PROTEIN DATABASE

www.gramene.org

Page 11: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

The association of protein 1433_ORYSA with the GO term

Gene Ontology (GO) Associations

www.gramene.org

Pro

tein

pag

e

Gramene Ontology Database

Page 12: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

The association of protein 1433_ORYSA with literature citation (EVIDENCE for molecular function)

www.gramene.org

Gene Ontology (GO) Associations

Gramene Literature Database

Pro

tein

pag

e

Page 13: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

The association of protein 1433_ORYSA with the Literature citation and EVIDENCE CODES

Gene Ontology (GO) Associations

www.gramene.org

Pro

tein

pag

e

Page 14: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Total number of associations: 9866 (3321 gene products associated with 781 GO terms)

•Biological Process: 242 terms-2881 associations•Molecular Function: 449 term-5599 associations•Cellular Component: 90 terms-1386 associations

Total number of proteins: 8985Number of proteins from SWISSPROT: 397Number of proteins from TrEMBL: 8588

Total number of evidences: 21170Total number of IEA evidences: 20593Total number of non-IEA evidences: 577Total number of references as evidences: 74

5%

1%

2%

18%

8%

17%6%2%

6%

9%

2%

8%

2%

5%

2% 7%

electron transport

coenzyme metabolism

energy pathway

nucleic acid metabolism

phosphate metabolism

protein metabolism

carbohydrate metabolism

amino acid metabolism

catabolism

biosynthesis

stress related

transport

cell organization and biogenesis

cell cycle

oxygen and radical metabolism

cell communication

1%

1%

21%

5%

3%

1%

3%42%

22%

1%

signal transduction

enzyme regulator

carrier proteins

transporters

transcription regulator

storage protein

structural protein

defense/immunity

enzymes

nucleic acid binding

Biological process

Molecular function

Rice Protein Database (RPD) statistics-1

www.gramene.org

GO mappings are based on Interpro-EBI and Gramene curation

Page 15: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Total number of proteins in RPD: 8985

Number of proteins from SWISS-PROT: 397

Number of proteins from TrEMBL: 8588

Total number of correspondences

between proteins and translations: 7960 (6912 proteins correspond to 7957 translations)

Proteins have only one corresponding translation: 5911

Proteins have two corresponding translations: 959

Proteins have three corresponding translations: 37

Proteins have four corresponding translations: 5

Gene products associated with 781 GO terms: 3321 (refer to previous slide)

Number of Pfam entries: 874

Total number of proteins that have mappings to Pfam: 3663

Number of Prosite entries: 556

Total number of proteins that have mappings to Prosite: 3201

Total number of proteins that have mappings to trans-membrane features: 1583

www.gramene.org

Rice Protein Database (RPD) statistics-2

Page 16: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Trait Ontology (TO) to describe

Mutants/phenotypes in rice

www.gramene.org

Page 17: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

www.plantontology.org

PLANT ONTOLOGY resources will be available soon

www.gramene.org

Page 18: Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology

Future plans

www.gramene.org

• Continue annotation of rice proteins

• Identify the resources and tools to provide much improved annotation of rice proteins, using HMM’s, structure predictions and other tools.

• Develop tools to simplify the process of gene mining using Gramene and other databases by building combination search tools using controlled vocabulary and feature tables.

• Start building up a resource for creating a protein interaction map for the complete rice genome based on association in a biochemical pathway, assembly in a functional complex / interacting partners, proximity on the genome and common regulation mechanism (a possible collaboration).

• Contribute / share the controlled vocabulary for monocots with other databases

• Develop the necessary tools and host the resource pages for Plant Ontology Consortium

• Collaborate with Gene Ontology Consortium on various aspects of ontology development and curation


Top Related