bi class 2010 gene ontology overview and perspective

78
BI class 2010 Gene Ontology Overview and Perspective

Post on 21-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: BI class 2010 Gene Ontology Overview and Perspective

BI class 2010

Gene OntologyOverview and Perspective

Page 2: BI class 2010 Gene Ontology Overview and Perspective

What is Ontology?

• Dictionary:

A branch of metaphysics concerned with the nature and relations of being

16061700s

Page 3: BI class 2010 Gene Ontology Overview and Perspective

What is the Gene Ontology?

• Allows biologists to make queries across large numbers of genes without researching each one individually

Page 4: BI class 2010 Gene Ontology Overview and Perspective

So what does that mean?From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things.

Page 5: BI class 2010 Gene Ontology Overview and Perspective

Car?

Ontology -definition

Page 6: BI class 2010 Gene Ontology Overview and Perspective

Gene Ontology (GO) Consortium

aa

www.geneontology.org• Formed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language

to share knowledge.

• Seeks to achieve a mutual understanding of the definition and meaning of any word used; thus we are able to support cross-database queries.

• Members agree to contribute gene product annotations and associated sequences to GO database.

Page 7: BI class 2010 Gene Ontology Overview and Perspective

How does GO work?

• What does the gene product do?

• Where and does it act?

• Why does it perform these activities?

What information might we want to capture about a gene product?

Page 8: BI class 2010 Gene Ontology Overview and Perspective

What is the Gene Ontology?

• Set of standard biological phrases (terms) which are applied to genes/proteins:– protein kinase– apoptosis– membrane

Page 9: BI class 2010 Gene Ontology Overview and Perspective

Molecular Function = elemental activity/task

– the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity

Biological Process = biological goal or objective

– broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

Cellular Component = location or complex– subcellular structures, locations, and macromolecular complexes;

examples include nucleus, telomere, and RNA polymerase II holoenzyme

GO represents three biological domains

Page 10: BI class 2010 Gene Ontology Overview and Perspective

The GO is Actually Three Ontologies

Biological ProcessGO term: tricarboxylic acid cycleSynonym: Krebs cycleSynonym: citric acid cycleGO id: GO:0006099

Cellular ComponentGO term: mitochondrionGO id: GO:0005739Definition: A semiautonomous, self replicating organelle that occurs in varying numbers, shapes, and sizes in the cytoplasm of virtually all eukaryotic cells. It is notably the site of tissue respiration.

Molecular FunctionGO term: Malate dehydrogenase. GO id: GO:0030060(S)-malate + NAD(+) = oxaloacetate + NADH.

H

O

H

O

O

H

O

H

O

H

H

O

O

H

O

H

O

H

H

O

NAD+NADH + H+

Page 11: BI class 2010 Gene Ontology Overview and Perspective

Cellular Component

• where a gene product acts

Page 12: BI class 2010 Gene Ontology Overview and Perspective

Cellular Component

Page 13: BI class 2010 Gene Ontology Overview and Perspective

Cellular Component

Page 14: BI class 2010 Gene Ontology Overview and Perspective

Cellular Component

• Enzyme complexes in the component ontology refer to places, not activities.

Page 15: BI class 2010 Gene Ontology Overview and Perspective

Molecular Function

• activities or “jobs” of a gene product

glucose-6-phosphate isomerase activity

Page 16: BI class 2010 Gene Ontology Overview and Perspective

Molecular Function

insulin binding

insulin receptor activity

Page 17: BI class 2010 Gene Ontology Overview and Perspective

Molecular Function

• A gene product may have several functions

• Sets of functions make up a biological process.

Page 18: BI class 2010 Gene Ontology Overview and Perspective

Biological Process

a commonly recognized series of events

cell division

Page 19: BI class 2010 Gene Ontology Overview and Perspective

Biological Process

transcription

Page 20: BI class 2010 Gene Ontology Overview and Perspective

Biological Process

regulation of gluconeogenesis

Page 21: BI class 2010 Gene Ontology Overview and Perspective

Biological Process

limb development

Page 22: BI class 2010 Gene Ontology Overview and Perspective

Ontology Structure

• Terms are linked by two relationships– is-a – part-of

Page 23: BI class 2010 Gene Ontology Overview and Perspective

Ontology Structure

cell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

Page 24: BI class 2010 Gene Ontology Overview and Perspective

Ontology Structure

• Ontologies are structured as a hierarchical directed acyclic graph (DAG)

• Terms can have more than one parent and zero, one or more children

Page 25: BI class 2010 Gene Ontology Overview and Perspective

Ontology Structure

cell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

Directed Acyclic Graph (DAG) - multiple

parentage allowed

Page 26: BI class 2010 Gene Ontology Overview and Perspective

– what kinds of things exist?

– what are the relationships between these things?

eye

_part of

lens

_is a

sense organ

developsfrom

Optic placode

A biological ontology is:• A (machine) interpretable representation of

some aspect of biological reality

Page 27: BI class 2010 Gene Ontology Overview and Perspective

GO Definitions: Each GO term has 2 Definitions

A definition written by a biologist:

necessary & sufficientconditions

written definition(not computable)

Graph structure: necessary conditions

formal(computable)

Page 28: BI class 2010 Gene Ontology Overview and Perspective

Appropriate Relationships to Parents

• GO currently has 2 relationship types– Is_a

• An is_a child of a parent means that the child is a complete type of its parent, but can be discriminated in some way from other children of the parent.

– Part_of• A part_of child of a parent means that the child is

always a constituent of the parent that in combination with other constituents of the parent make up the parent.

Page 29: BI class 2010 Gene Ontology Overview and Perspective

Placement in the Graph: Selecting Parents

• To make the most precise definitions, new terms should be placed as children of the parent that is closest in meaning to the term.

• To make the most complete definitions, terms should have all of the parents that are appropriate.

• In an ontology as complicated as the GO this is not as easy as it seems.

Page 30: BI class 2010 Gene Ontology Overview and Perspective

True Path Violations Create Incorrect Definitions

..”the pathway from a child term all the way up to its top-level parent(s) must always be true".

chromosome

Part_of relationship

nucleus

Page 31: BI class 2010 Gene Ontology Overview and Perspective

True Path Violations

..”the pathway from a child term all the way up to its top-level parent(s) must always be true".

chromosome

Mitochondrial chromosome

Is_a relationship

Page 32: BI class 2010 Gene Ontology Overview and Perspective

True Path Violations

..”the pathway from a child term all the way up to its top-level parent(s) must always be true".

chromosome

Mitochondrial chromosome

Is_a relationship

Part_of relationship

nucleus

A mitochondrial chromosome is not part of a nucleus!

Page 33: BI class 2010 Gene Ontology Overview and Perspective

True Path Violations

..”the pathway from a child term all the way up to its top-level parent(s) must always be true".

nucleus chromosome

Nuclear chromosome

Mitochondrial chromosome

Is_a relationshipsPart_of

relationship

mitochondrion

Part_of relationship

Page 34: BI class 2010 Gene Ontology Overview and Perspective

The Development Node(some example for consistent definitions)

Page 35: BI class 2010 Gene Ontology Overview and Perspective

Cell level

[i] y cell differentiation---[p] y cell fate commitment ------[p] y cell fate specification ------[p] y cell fate determination ---[p] y cell development ------[p] y cellular morphogenesis during differentiation ------[p] y cell maturation

Page 36: BI class 2010 Gene Ontology Overview and Perspective

y cell differentiation

The process whereby a relatively unspecialized cell acquires specialized features of a y cell.

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=cmed6.figgrp.41173

[i] y cell differentiation---[p] y cell fate commitment------[p] y cell fate specification------[p] y cell fate determination---[p] y cell development------[p] y cellular morphogenesis during differentiation------[p] y cell maturation

Page 37: BI class 2010 Gene Ontology Overview and Perspective

y cell fate commitment

The process whereby the developmental fate of a cell becomes restricted such that it will develop into a y cell.

[i] y cell differentiation---[p] y cell fate commitment------[p] y cell fate specification------[p] y cell fate determination---[p] y cell development------[p] y cellular morphogenesis during differentiation------[p] y cell maturation

Page 38: BI class 2010 Gene Ontology Overview and Perspective

y cell fate specification

The process whereby a cell becomes capable of differentiating autonomously into a y cell in an environment that is neutral with respect to the developmental pathway. Upon specification, the cell fate can be reversed.

[i] y cell differentiation---[p] y cell fate commitment------[p] y cell fate specification------[p] y cell fate determination---[p] y cell development------[p] y cellular morphogenesis during differentiation------[p] y cell maturation

Page 39: BI class 2010 Gene Ontology Overview and Perspective

y cell fate determination

The process whereby a cell becomes capable of differentiating autonomously into a y cell regardless of its environment; upon determination, the cell fate cannot be reversed.

[i] y cell differentiation---[p] y cell fate commitment------[p] y cell fate specification------[p] y cell fate determination---[p] y cell development------[p] y cellular morphogenesis during differentiation------[p] y cell maturation

Page 40: BI class 2010 Gene Ontology Overview and Perspective

Gene Ontology widely adopted

AgBase

Page 41: BI class 2010 Gene Ontology Overview and Perspective

Terms are defined graphically relative to other terms

Page 42: BI class 2010 Gene Ontology Overview and Perspective

The Gene Ontology (GO)

1. Build and maintain logically rigorous and biologically accurate ontologies

2. Comprehensively annotate reference genomes

3. Support genome annotation projects for all organisms

4. Freely provide ontologies, annotations and tools to the research community

Page 43: BI class 2010 Gene Ontology Overview and Perspective

• The GO is still developing daily both in ontological structures and in domain knowledge

• Ontology development workshops focus on specific domains needing experts

• 2 workshops / year1. Metabolism and cell cycle 2. Immunology and defense response 3. Early CNS development 4. Peripheral nervous system development 5. Blood Pressure Regulation 6. Muscle Development

Building the ontologies

Page 44: BI class 2010 Gene Ontology Overview and Perspective

Mappings files

Fatty acid biosynthesis ( Swiss-Prot Keyword)

EC:6.4.1.2 (EC number)

IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)

GO:Fatty acid biosynthesis

(GO:0006633)

GO:acetyl-CoA carboxylase activity

(GO:0003989)

GO:acetyl-CoA carboxylaseactivity

(GO:0003989)

Page 45: BI class 2010 Gene Ontology Overview and Perspective

725 new terms related to immunology

127 new terms added to cell type ontology

Building the ontology: Immune System Process

Red part_of

Blue is_a

Page 46: BI class 2010 Gene Ontology Overview and Perspective

P05147

PMID: 2976880

GO:0047519IDA

P05147 GO:0047519 IDA PMID:2976880

GO Term

Reference

Evidence

Annotating Gene Products using GO

Gene Product

Page 47: BI class 2010 Gene Ontology Overview and Perspective

Gene protein inherits GO term

• There is evidence that this gene product can be best classified using this term

• The source of the evidence and other information is included

• There is agreement on the meaning of the term

Page 48: BI class 2010 Gene Ontology Overview and Perspective

Annotations for APP: amyloid beta (A4) precursor protein

Annotations are assertions

Page 49: BI class 2010 Gene Ontology Overview and Perspective

NO Direct ExperimentInferred from evidence

Direct Experiment in organism

We use evidence codes to describe the basis of the annotation

• IDA: Inferred from direct assay• IPI: Inferred from physical interaction• IMP: Inferred from mutant phenotype• IGI: Inferred from genetic interaction• IEP: Inferred from expression pattern• IEA: Inferred from electronic annotation• ISS: Inferred from sequence or structural

similarity• TAS: Traceable author statement • NAS: Non-traceable author statement • IC: Inferred by curator• RCA: Reviewed Computational Analysis• ND: no data available

Page 50: BI class 2010 Gene Ontology Overview and Perspective

GO structure

• GO isn’t just a flat list of biological terms

• terms are related within a hierarchy

Page 51: BI class 2010 Gene Ontology Overview and Perspective

GO structure

gene A

Page 52: BI class 2010 Gene Ontology Overview and Perspective

GO structure

• This means genes can be grouped according to user-defined levels

• Allows broad overview of gene set or genome

Page 53: BI class 2010 Gene Ontology Overview and Perspective

GO Annotation Stats (2007)

I

GO Annotations

Total manual GO annotations - 388,633

Total proteins with manual annotations – 80,402

Contributing Groups (including MGI): - 19

Total Pub Med References – 346,002

Total number predicted annotations – 17,029,553

Total number taxa – 129,318

Total number distinct proteins – 2,971,374

Page 54: BI class 2010 Gene Ontology Overview and Perspective

gene -> GO term

associated genes

GO annotations

GO database

genome and protein databases

Page 55: BI class 2010 Gene Ontology Overview and Perspective

Now we can query across all annotations based on shared biological activity.

Annotations of gene products to GO are genome specific

Page 56: BI class 2010 Gene Ontology Overview and Perspective

GO browser

Page 57: BI class 2010 Gene Ontology Overview and Perspective

Search on ‘mesoderm development’

mesoderm development

Page 58: BI class 2010 Gene Ontology Overview and Perspective
Page 59: BI class 2010 Gene Ontology Overview and Perspective

Definition of mesodermdevelopment

Gene productsinvolved in mesodermdevelopment

Page 60: BI class 2010 Gene Ontology Overview and Perspective

Traditional analysis

Gene 1ApoptosisCell-cell signalingProtein phosphorylationMitosis…

Gene 2Growth controlMitosisOncogenesisProtein phosphorylation…

Gene 3Growth controlMitosisOncogenesisProtein phosphorylation…

Gene 4Nervous systemPregnancyOncogenesisMitosis…

Gene 100Positive ctrl. of cell prolifMitosisOncogenesisGlucose transport…

Page 61: BI class 2010 Gene Ontology Overview and Perspective

Using GO annotations

• But by using GO annotations, this work has already been done

GO:0006915 : apoptosis

Page 62: BI class 2010 Gene Ontology Overview and Perspective

Grouping by process

ApoptosisGene 1Gene 53

MitosisGene 2Gene 5Gene45Gene 7Gene 35…

Positive ctrl. of cell prolif.Gene 7Gene 3Gene 12…

GrowthGene 5Gene 2Gene 6…

Glucose transportGene 7Gene 3Gene 6…

Page 63: BI class 2010 Gene Ontology Overview and Perspective

Anatomy of a GO term

id: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.[http://cancerweb.ncl.ac.uk/omd/index.html]exact_synonym: glucose biosynthesisxref_analog: MetaCyc:GLUCONEO-PWYis_a: GO:0006006is_a: GO:0006092

unique GO IDterm name

definition

synonymdatabase ref

parentage

ontology

Page 64: BI class 2010 Gene Ontology Overview and Perspective

GO is a functional annotation system of great utility to the data-

driven biologist

Page 65: BI class 2010 Gene Ontology Overview and Perspective

GO enables genomic data analysis

• Microarrays allow biologists to record changes in gene function across entire genomes

• Result: Vast amounts of gene expression data desperately needing cataloging and tagging

• Many data analysis tools use GO graph structure to statistically evaluate clusters of co-expressed genes based on shared functional annotations

Page 66: BI class 2010 Gene Ontology Overview and Perspective

OCT 13, 2006

Cancer Genome Projects

GO supports functional classifications

Page 67: BI class 2010 Gene Ontology Overview and Perspective

GO is wildly successful

FIGURE 3. Representative cell-type-specific genes and corresponding molecular functions.

Nature: January 2007

Page 68: BI class 2010 Gene Ontology Overview and Perspective

Comprehensively annotate Reference Genomes

• Saccharomyces cerevisiae• Schizosaccharomyces

pombe• Arabidopsis thaliana

• Human• Mouse• Fly• Rat• Chicken• Zebrafish• Worm• Dicty• E.coli

Page 69: BI class 2010 Gene Ontology Overview and Perspective

Species coverage

• All major eukaryotic model organism species

• Human via GOA group at UniProt

• Several bacterial and parasite species through TIGR and GeneDB at Sanger– many more in pipeline

Page 70: BI class 2010 Gene Ontology Overview and Perspective

Annotation coverage

Page 71: BI class 2010 Gene Ontology Overview and Perspective

GO tools

• GO resources are freely available to anyone to use without restriction– Includes the ontologies, gene associations

and tools developed by GO

• Other groups have used GO to create tools for many purposes:

http://www.geneontology.org/GO.tools

Page 72: BI class 2010 Gene Ontology Overview and Perspective

GO tools

• Affymetrix also provide a Gene Ontology Mining Tool as part of their NetAffx™ Analysis Center which returns GO terms for probe sets

Page 73: BI class 2010 Gene Ontology Overview and Perspective

GO tools

• Many tools exist that use GO to find common biological functions from a list of genes:

http://www.geneontology.org/GO.tools.microarray.shtml

Page 74: BI class 2010 Gene Ontology Overview and Perspective

GO tools

• Most of these tools work in a similar way:– input a gene list and a subset of ‘interesting’

genes– tool shows which GO categories have most

interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes

– tool provides a statistical measure to determine whether enrichment is significant

Page 75: BI class 2010 Gene Ontology Overview and Perspective

GO for microarray analysis

• Annotations give ‘function’ label to genes

• Ask meaningful questions of microarray data e.g.– genes involved in the same process,

same/different expression patterns?

Page 76: BI class 2010 Gene Ontology Overview and Perspective

Using GO in practice

• statistical measure – how likely your differentially regulated genes

fall into that category by chance

microarray

1000 genesexperiment

100 genes differentially regualted

mitosis – 80/100apoptosis – 40/100p. ctrl. cell prol. – 30/100glucose transp. – 20/100

0

10

20

30

40

50

60

70

80

mitosis apoptosis positive control ofcell proliferation

glucose transport

Page 77: BI class 2010 Gene Ontology Overview and Perspective

Using GO in practice

• However, when you look at the distribution of all genes on the microarray:

Process Genes on array # genes expected in occurred 100 random genes

mitosis 800/1000 80 80apoptosis 400/1000 40 40p. ctrl. cell prol. 100/1000 10 30 glucose transp. 50/1000 5 20

Page 78: BI class 2010 Gene Ontology Overview and Perspective

Enrichment tools

• GO is developing its own enrichment tool as part of the GO browser AmiGO

• Currently in testing phase, should be released next month