intro to data analysis: gene ontology and pathways · intro to data analysis: gene ontology and...

36
 Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim microarray.no

Upload: others

Post on 27-Mar-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

Intro to data analysis:Gene Ontology and 

Pathways

Kjell PetersenIntoduction to Microarray technology 

September 2009

Presentation adapted from Endre Anderssen and Vidar Beisvåg

NMC Trondheim

microarray.no

Page 2: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Overview• How can ontologies and pathway information help us• What is an ontology?• The Gene Ontology and how it's structured• How to use

– Interactively– Statistically

• Pathways

Page 3: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

So here you are

• Figure of diff exp

Page 4: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Gene lists

• Long list of differentially expressed genes

• Possibly hundreds of papers describing the functions of the genes

• Misleading names• Different names in 

dfifferent organisms

Page 5: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

What’s in a name?

• The same name can be used to describe different concepts

• What is a cell?

Page 6: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Cell

Page 7: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Cell

Page 8: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Cell

Image from http://microscopy.fsu.edu

Page 9: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Ontologies

• Gene Ontology (GO)• Sequence Ontology (SO) (sequence features)• Phenotype and Trait Ontology (PATO)• Taxon (NCBI)• Anatomy (Penn)• Disease (ICD9)• Developmental stage (multiple sources)

Page 10: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Gene Ontology (GO)

• Why Gene Ontology?–   Produce a controlled vocabulary describing aspects 

of molecular biology, that can be applied to all organisms.

– Facilitate communication between people and organization.

– Improve interoperability between systems.

Page 11: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.noGoal of GO Consortium

• Produce a controlled vocabulary describing aspects of molecular biology, that could be applied to all organism.

• Describe gene products using vocabulary terms (annotation).

• Develop tools:– to query and modify the vocavularies and annotations

(http://www.geneontology.org/)

Page 12: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

How does GO work?

What information might we want to capture about a gene product?

• What does the gene product do?• Why does it perform these activities?• Where does it act?

Page 13: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

The Gene Ontology (GO)

– Molecular function:• Gene product at biochemical level.

– Biological process:• Cellular events to which the gene product 

contributes.

– Cellular component:• Location or complex of gene/protein.

Page 14: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Molecular Function

• activities or “jobs” of a gene product

Insulin bindinginsulin transport activity

Page 15: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Molecular Function

• drug transporter activity

Page 16: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Biological Process

• a commonly recognized series of events

cell division

Page 17: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Cellular Component

• where a gene product acts

Page 18: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Content of GO

• Molecular Function  7,309 terms• Biological Process  10,041 terms• Cellular Component  1,629 terms• Total              18, 975 terms• Obsolete terms:    992•                          As of October 2005

Page 19: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Ontology Structure• Directed acyclic graphs (DAGs)

• Relationships

– “is a”

• a is a type of b(e.g. truck is a car, or mitochondrion is an organelle)

– Regulates

• Positively regulates

• Negatively regulates

– “part of”

• sub­process of (process)

• physical part of (component)(e.g. engine is part of a car, or mitochondrion membrane is a part of a 

mitochondrion)

Page 20: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Page 21: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Term Definitions and Curation

• The definitions for each GO term are being primarily derived from the Oxford Dictionary of Molecular Biology, or from relevant literature sources (SWISS­PROT, PIR, NCBI CGAP, EC...).

• Curators around the world shifting through genomic and proteomic data then use the definitions and GO terms provided by GO to annotate or curate the genes and proteins in their favorite species.

• GO is stored as flat­files, as XML files and as a relational database implemented in MySQL.

Page 22: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

GO Annotation

• Association between gene product and applicable GO terms• Provided by member databases. Collaborating databases 

annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations.

• Made by manual or automated methods.

• GO Annotation• Database object: gene or gene product• GO term ID• Evidence supporting annotation• Reference

– publication or computational method

Page 23: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Gene Ontology and Microarrays• Hypothesis: Functionally related, differentially expressed genes 

should accumulate in the corresponding GO­group.

• Problem: Find a method, which scores accumulation of differential gene expression in a node of the Gene Ontology.

• GO­tools can be important in order to answer questions such as:

– “are genes involved in process P overrepresented among the total of differentially expressed genes in an experiment” or

– “does treatment A induce more genes involved in process P than treatment B?".

Page 24: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Browsing GO in J­Express

Page 25: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Overrepresentation of GO terms

• We have a subset of genes– List of differentially expressed genes– List of genes that cluster together

• Which biological processes do these genes take part in?

• Is there an over­representation of the number of genes belonging to a particular biological process, compared to what could be expected?

Page 26: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Question

• If we look at the dataset containing all of our genes and see that 10% of these belong to cell cycle. We then do a differentially expressed genes analysis and get a list of genes we believe are significantly changed. 

• How many of the genes in the gene list do you expect belong to cell cycle?

Page 27: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Setup

• We name our subset of interesting genes for test data• And the dataset containing all of our genes, the dataset 

we extracted the interesting genes from and that we want to compare our testdata to, for reference data

Test data

Reference data

Page 28: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.noGene Ontology Analysis

Reference data

Test data

Statistical comparison between the two GO components

Page 29: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Biological pathways

Page 30: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

GO vs. Pathways

• Overview• Can handle a large 

number of genes• Many genes 

annotated• Every gene 

considered on its own

• Detail view• Focused sets of 

genes• Scattered 

datasources• Focuses on 

interactions between genes

Page 31: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Types of pathways

•  Cartoons– Textbooks– Biocarta

• Circuit diagrams– KEGG– Reactome– geneRifs

• Computational networks– SBML models– Transcription factor 

networks

Page 32: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Global networks

Page 33: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Local networks

Page 34: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Kegg

• Global network of regulation and metabolism

• Organised by separate pathways with hand drawn diagrams

• Pathways can be used to look for overrepresentation or enrichment

• Visually check for path­ness or direction

Page 35: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Page 36: Intro to data analysis: Gene Ontology and Pathways · Intro to data analysis: Gene Ontology and Pathways Kjell Petersen Intoduction to Microarray technology September 2009 Presentation

   

microarray.no

Conclusion

• GO is the world map of molecular biology

• Pathways provide more detailed information

• Need for dynamic pathway creation coupled to data analysis