part ii go-vocabulary of genome

61
Part II GO-Vocabu lary of Genome

Upload: mikel

Post on 17-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Part II GO-Vocabulary of Genome. S. cerevisiae. D. melanogaster. Cells that normally survive. CED-3 CED-4 OFF. CED-9 ON. Cells that normally die. CED-3 CED-4 ON. CED-9 OFF. C elegans. M. musculus. Comparison of sequences from 4 organisms. MCM3. MCM2. CDC46/MCM5. CDC47/MCM7. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Part II  GO-Vocabulary of Genome

Part II GO-Vocabulary of Genome

Page 2: Part II  GO-Vocabulary of Genome

S. cerevisiae

Page 3: Part II  GO-Vocabulary of Genome

D. melanogaster

Page 4: Part II  GO-Vocabulary of Genome

Cells that normally surviveCED-9

ON

CED-3CED-4OFF

CED-9OFF

CED-3CED-4

ON

Cells that normally die

C elegans

Page 5: Part II  GO-Vocabulary of Genome

M. musculus

Page 6: Part II  GO-Vocabulary of Genome

MCM3

MCM2

CDC46/MCM5

CDC47/MCM7

CDC54/MCM4

MCM6

These proteins form a hexamer in the species that have been examined

Comparison of sequences from 4 organisms

Page 7: Part II  GO-Vocabulary of Genome

A Common Language for Annotation of Genes from

Yeast, Flies and Mice

The Gene Ontologies

…and Plants and Worms

…and Humans

…and anything else!

Page 8: Part II  GO-Vocabulary of Genome

Gene Ontology - 1998

FlyBase Drosophila Cambridge, EBI, HarvardBerkeley & Bloomington.

SGD Saccharomyces Stanford.

MGI Mus Jackson Labs., Bar Harbor.

Page 9: Part II  GO-Vocabulary of Genome

Gene Ontology -now

• Fruitfly - FlyBase• Budding yeast - Saccharomyces Genome Database (SGD)• Mouse - Mouse Genome Database (MGD & GXD)• Rat - Rat Genome Database (RGD)• Weed - The Arabidopsis Information Resource (TAIR)• Worm - WormBase• Dictyostelium discoidem - Dictybase• InterPro/UniProt at EBI - InterPro• Fission yeast - Pombase• Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen• Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB - Sa

nger• Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR• Grasses - rice & maize - Gramene database• zebra fish – Zfin.........

Page 10: Part II  GO-Vocabulary of Genome

To provide

structured controlled vocabularies

for the

representation of biological knowledge

in

biological databases.

Page 11: Part II  GO-Vocabulary of Genome

• Be open source

• Use open standards

• Make data & code available without constraint

• Involve your community

Page 12: Part II  GO-Vocabulary of Genome

Outline

• Introduction to the Gene Ontologies (GO)

• Annotations to GO terms

• GO Tools

• Applications of GO

Page 13: Part II  GO-Vocabulary of Genome

Gene Ontology Objectives• GO represents concepts used to classify

specific parts of our biological knowledge:– Biological Process– Molecular Function– Cellular Component

• GO develops a common language applicable to any organism

• GO terms can be used to annotate gene products from any species, allowing comparison of information across species

Page 14: Part II  GO-Vocabulary of Genome

GO: Three ontologies

Where does it act?

What processes is it involved in?

What does it do? Molecular Function

Cellular Component

Biological Process

gene product

Page 15: Part II  GO-Vocabulary of Genome

Function (what) Process (why)

Drive nail (into wood) Carpentry

Drive stake (into soil) Gardening

Smash roach Pest Control

Clown’s juggling object Entertainment

Example: Gene Product = hammer

Page 16: Part II  GO-Vocabulary of Genome

Biological ExamplesMolecular FunctionMolecular FunctionBiological ProcessBiological Process Cellular ComponentCellular Component

Page 17: Part II  GO-Vocabulary of Genome

• Molecular Function = elemental activity/task– the tasks performed by individual gene products; examples are carbohydrate b

inding and ATPase activity

• Biological Process = biological goal or objective

– broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

• Cellular Component = location or complex– subcellular structures, locations, and macromolecular complexes; examples in

clude nucleus, telomere, and RNA polymerase II holoenzyme

The 3 Gene Ontologies

Page 18: Part II  GO-Vocabulary of Genome

Molecular Function

• A single reaction or activity, not a gene product

• A gene product may have several functions

• Sets of functions make up a biological process

Page 19: Part II  GO-Vocabulary of Genome

Molecular Function

Page 20: Part II  GO-Vocabulary of Genome

Carbonate dehydratase activity

Page 21: Part II  GO-Vocabulary of Genome

Biological Process

Page 22: Part II  GO-Vocabulary of Genome

Gluconeogenesis

Page 23: Part II  GO-Vocabulary of Genome

Cellular Component

• where a gene product acts

Page 24: Part II  GO-Vocabulary of Genome

Mitochondrial membrane

Page 25: Part II  GO-Vocabulary of Genome

term: gluconeogenesis

id: GO:0006094

definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.

What’s in a GO term?

Page 26: Part II  GO-Vocabulary of Genome

What’s in a name?

Page 27: Part II  GO-Vocabulary of Genome

Molecular Function 7,309 terms Biological Process 10,041 terms Cellular Component 1,629 terms

Total 18, 975 terms

Definitions: 94.9 %Obsolete terms: 992

Content of GO

As of October 2005

Page 28: Part II  GO-Vocabulary of Genome
Page 29: Part II  GO-Vocabulary of Genome
Page 30: Part II  GO-Vocabulary of Genome

What’s in a name?

• Glucose synthesis• Glucose biosynthesis• Glucose formation• Glucose anabolism• Gluconeogenesis

• All refer to the process of making glucose from simpler components

Page 31: Part II  GO-Vocabulary of Genome

tree directed acyclic graph

Page 32: Part II  GO-Vocabulary of Genome

Nucleus

Nucleoplasm Nuclearenvelope

ChromosomePerinuclear spaceNucleolus

A child is a subset ofa parent’s elements

The cell component term Nucleus has 5 children

Parent-Child Relationships

Page 33: Part II  GO-Vocabulary of Genome

Ontology RelationshipsDirected Acyclic Graph

Page 34: Part II  GO-Vocabulary of Genome
Page 35: Part II  GO-Vocabulary of Genome

Evidence Codes for GO Evidence Codes for GO AnnotationsAnnotations

http://www.geneontology.org/doc/GO.evidence.html

Page 36: Part II  GO-Vocabulary of Genome

IEA Inferred from Electronic Annotation

ISS Inferred from Sequence Similarity

IEP Inferred from Expression Pattern

IMP Inferred from Mutant Phenotype

IGI Inferred from Genetic Interaction

IPI Inferred from Physical Interaction

IDA Inferred from Direct Assay

RCA Inferred from Reviewed Computational Analysis

TAS Traceable Author Statement

NAS Non-traceable Author Statement

IC Inferred by Curator

ND No biological Data available

Page 37: Part II  GO-Vocabulary of Genome

IEAInferred from Electronic Annotation

• Sequence Similarity (BLAST)

• Automatic transfer from mappings (InterPro2GO, EC2GO etc.)

-> Not manually reviewed

Page 38: Part II  GO-Vocabulary of Genome

ISSInferred from Sequence or Structural

Similarity

• Sequence similarity

• Recognized domains

• Structural similarity

-> Use of ‘with’ column recommended

Page 39: Part II  GO-Vocabulary of Genome

IEPInferred from Expression Pattern

• Transcript levels (Northerns, microarrays)

• Protein levels (Western blots)

-> Timing or localization of expression

-> Biological process annotations

Page 40: Part II  GO-Vocabulary of Genome

IMPInferred from Mutant Phenotype

• Gene mutation/knockout

• Overexpression/ectopic expression

• Anti-sense experiments

• RNAi experiments

• Specific protein inhibitors

Page 41: Part II  GO-Vocabulary of Genome

IGIInferred from Genetic Interaction

• Suppressors, synthetic lethals…

• Functional complementation

• Rescue experiments

->Use of ‘with’ column recommended

Page 42: Part II  GO-Vocabulary of Genome

IPIInferred from Physical Interaction

• 2-hybrid interactions

• Co-purification

• Co-immunoprecipitation

• Ion/complex/protein binding experiments

->Use of ‘with’ column recommended

Page 43: Part II  GO-Vocabulary of Genome

IDAInferred from Direct Assay

• Enzyme assays

• In vitro reconstitution (e.g. transcription)

• Immunofluorescence (for cell. comp.)

• Cell fractionation (for cell. comp.)

• Physical interaction/binding assay

Page 44: Part II  GO-Vocabulary of Genome

RCAInferred from Reviewed Computational

Analysis

• Non-sequence-based computational methods

• Genome-wide analyses (e.g. 2-hybrid)

• Combinations of large-scale experiments

Page 45: Part II  GO-Vocabulary of Genome

TASTraceable Author Statement

• Support from review article

• Textbook ‘common knowledge’

->Data that can be ‘traced’ back

Page 46: Part II  GO-Vocabulary of Genome

NASNon-traceable Author Statement

• Database entries that don't cite a paper

->Data that cannot be ‘traced’ back

Page 47: Part II  GO-Vocabulary of Genome

ICInferred by Curator

• Not supported by any direct evidence

• Inferred from other GO annotations

-> GO term in ‘with/from’ column required

Page 48: Part II  GO-Vocabulary of Genome

NDNo biological Data available

• molecular function unknown GO:0005554

• biological process unknown GO:0000004

• cellular component unknown GO:0008372

Curator found no information supporting any annotation

Page 49: Part II  GO-Vocabulary of Genome

TAS/IDA

IMP/IGI/IPI

ISS/IEP

NAS

IEA

Term Hierarchy

Page 50: Part II  GO-Vocabulary of Genome

Meloidogyne incognita: McCarter et al. 2003

Annotation summaries

Page 51: Part II  GO-Vocabulary of Genome
Page 52: Part II  GO-Vocabulary of Genome

Mitochondrial P450

Annotation of gene products with GO terms

Page 53: Part II  GO-Vocabulary of Genome

Cellular component: mitochondrial inner membrane GO:0005743

Biological process:Electron transportGO:0006118

Molecular function: monooxygenase activity GO:0004497substrate + O2 = CO2 +H20 product

Page 54: Part II  GO-Vocabulary of Genome

Other gene products annotated to monooxygenase activity (GO:0004497)

- monooxygenase, DBH-like 1 (mouse)- prostaglandin I2 (prostacyclin) synthase (mouse)- flavin-containing monooxygenase (yeast)   - ferulate-5-hydrolase 1 (arabidopsis)

Page 55: Part II  GO-Vocabulary of Genome

Unknown v.s. Unannotated• “Unknown” is used when the curator has

determined that there is no existing literature to support an annotation.– Biological process unknown GO:0000004– Molecular function unknown GO:0005554– Cellular component unknown GO:0008372

• NOT the same as having no annotation at all – No annotation means that no one has looked yet

Page 56: Part II  GO-Vocabulary of Genome

Annotation of a genome

• GO annotations are always work in progress

• Part of normal curation process

– More specific information

– Better evidence code

• Replace obsolete terms

• “Last reviewed” date

Page 57: Part II  GO-Vocabulary of Genome

How to access the Gene ontology and its annotations

1. Downloads • Ontologies

• Annotations : Gene association files

• Ontologies and Annotations

2. Web-based access • AmiGO (http://www.godatabase.org)

• QuickGO

(http://www.ebi.ac.uk/ego)

among others…

Page 58: Part II  GO-Vocabulary of Genome

Gene Ontology :

Page 59: Part II  GO-Vocabulary of Genome
Page 60: Part II  GO-Vocabulary of Genome

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

attacked

time

control

Puparial adhesionMolting cyclehemocyanin

Defense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genes

Immune responseToll regulated genes

Amino acid catabolismLipid metobolism

Peptidase activityProtein catabloismImmune response

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

…analysis of high-throughput data according to GOMicroArray data analysis

Page 61: Part II  GO-Vocabulary of Genome

Anatomy

Physiology

Phenotype

Pathway

Disease

Molecular

MetabolicDevelopmental

Stage

Ontologies