2 outline review of major computational approaches to facilitate biological interpretation of ...

56
Chapter 8: Biological Knowledge Assembly and Interpretation Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao

Upload: loreen-bridges

Post on 22-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

Chapter 8: Biological Knowledge Assembly and Interpretation

Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine,

Seoul, Korea,

Presenter: Zhen Gao

Page 2: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

2

Outline

Review of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.

Page 3: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

3

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 4: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

4

FAA: Functional Annotation Analysis GO: Gene Ontology Pathway DEG: Differentially Expressed Genes GSEA: Gene Set Enrichment Analysis Biological Interpretation and Biological

Semantics Concept lattice analysis

Glossary

Page 5: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

Pathway and Ontology-Based Analysis

GO and biological pathway-based analysis: one of the most powerful methods for inferring

the biological meanings of expression changes list of genes obtained by:

differential expression analysis co-expression analysis (or clustering)

Page 6: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

6

Pathway and Ontology-Based Analysis

Page 7: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

7

Page 8: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

8

Attributes can be applied for FAA:

transcription factor binding clinical phenotypes like disease associations MeSH (Medical Subject Heading) terms microRNA binding sites protein family memberships chromosomal bands, etc GO terms biological pathways

Pathway and Ontology-Based Analysis

Page 9: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

9

Features may have their own ontological

structures

GO has a structure as a DAG (Directed Acyclic Graph)

Pathway and Ontology-Based Analysis

Page 10: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

10

DEGs:

Pathway and Ontology-Based Analysis

Page 11: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

11

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 12: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

12

DEGs: 3 techniques which help obtain DEGs:

t-test Wilcoxon’s rank sum test ANOVA

Need to note that multiple-hypothesis-testing problem should be properly managed

Pathway and Ontology-Based Analysis

Page 13: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

13

Co-expression analysis

Pathway and Ontology-Based Analysis

Page 14: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

14

Co-expression analysis

puts similar expression profiles together and different ones apart

Returning genes that are assumed to be co-regulated

Clustering algorithms: hierarchical-tree clustering partitional clustering

Pathway and Ontology-Based Analysis

Page 15: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

15

Pathways are powerful resources for the

understanding of shared biological processes E.g.: KEGG, MetaCyc and BioCarta (signaling

pathways)

Pathway and Ontology-Based Analysis

Page 16: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

16

MetaCyc:

an experimentally determined non-redundant metabolic pathway database

It is the largest collection containing over 1400 metabolic pathways

Pathway and Ontology-Based Analysis

Page 17: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

17

Ontology / GO:

providing a shared understanding of a certain domain of information

controlled vocabularies

DAG structures with 3 vocabularies of GO: Molecular Function (MF) Cellular Compartment (CC) Biological Process (BP)

Pathway and Ontology-Based Analysis

Page 18: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

18

Common Gos:

MIPS: integrated source, protein properties, variety of complete genomes

MeSH: clinical including disease names OMIM (Online Mendelian Inheritance in Man) UMLS (Unified Medical Language System)

Pathway and Ontology-Based Analysis

Page 19: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

19

GO enrichment test: For example

if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’

only 1% of the genes in the whole human genome fall into this functional category

Pathway and Ontology-Based Analysis

Page 20: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

20

Common statistical tests:

Chi-square binomial hypergeometric tests

Pathway and Ontology-Based Analysis

Page 21: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

21

hypergeometric test:

Pathway and Ontology-Based Analysis

Page 22: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

22

Avoid pitfalls when using hypergeometric test

Choice of background, that makes substantial impact on the result. All genes having at least one GO annotation all genes ever known in genome databases all genes on the microarray

GO has a hierarchical tree (or graphical) structure while hypergeometric test assumes independence of categories

Pathway and Ontology-Based Analysis

Page 23: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

23

Common Tools

DAVID ArrayX- Path Pathway Miner EASE GOFish GOTree etc.

Pathway and Ontology-Based Analysis

Page 24: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

24

Page 25: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

25

Gene Set-Wise Differential Expression Analysis

Page 26: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

26

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 27: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

27

Evaluates coordinated differential expression

of gene groups

Gene Set Enrichment Analysis (GSEA) The first developed in this category evaluates for each a pre-defined gene set the

significant association with phenotypic classes

Gene Set-Wise Differential Expression Analysis

Page 28: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

28

Difference between FAA and GSEA:

FAA: find over-represented GO terms from a interesting gene list

GSEA: obtain the pre-defined gene list first and test the changes under different conditions.

Gene Set-Wise Differential Expression Analysis

Page 29: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

29

Page 30: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

30

Advantages of gene set-wise differential expression

analysis: successfully identified modest but coordinated

changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis.

(many tiny expression changes can collectively create a big change)

straightforward biological interpretation because the gene sets are defined by biological knowledge

Gene Set-Wise Differential Expression Analysis

Page 31: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

31

Enrichment Score (ES) is calculated by evaluating the

fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation,

Gene Set-Wise Differential Expression Analysis

Page 32: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

32

Typical gene sets:

regulatory-motif function-related disease-related sets

Database: MSigDB:

6769 gene sets classified into five different collections Has some interesting extensions

Gene Set-Wise Differential Expression Analysis

Page 33: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

33

Differential Co-Expression Analysis

Page 34: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

34

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 35: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

35

Co-expression analysis:

determines the degree of co-expression of a cluster of genes under a certain condition

Differential co-expression analysis: determines the degree of co-expression difference of a

gene pair or a gene cluster across different conditions

Differential Co-Expression Analysis

Page 36: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

36

3 major types:

(a) differential co-expression of gene cluster(s) (b) gene pair-wise differential co- expression (c) differential co-expression of paired gene sets

Differential Co-Expression Analysis

Page 37: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

37

Page 38: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

38

Type (a), identify differentially co-expressed gene

cluster(s) between two conditions Let conditions and genes be denoted by J and I,

respectively. The mean squared residual of model is a measurement of co-expression of genes:

Differential Co-Expression Analysis

Page 39: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

39

Differential Co-Expression Analysis

Type (a) cont.

Page 40: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

40

Type (b)

Differential Co-Expression Analysis

Page 41: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

41

Type (b), identify differentially co-expressed gene pairs

Techniques: F-statistic A meta-analytic approach

Differential Co-Expression Analysis

Page 42: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

42

Note that identification of differentially co-expressed

gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs.

Thus the interpretation may also be improved by ontology and pathway-based annotation analysis.

Differential Co-Expression Analysis

Page 43: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

43

Type (c), dCoxS (differential co-expression of gene sets)

algorithm identifies gene set pairs differentially co-expressed across different conditions

Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed.

Differential Co-Expression Analysis

Page 44: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

44

Type (c) cont.

To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample-wise distances regardless of whether the two pathways have the same number of genes or not.

Differential Co-Expression Analysis

Page 45: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

45

Type (c) cont.

Differential Co-Expression Analysis

Page 46: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

46

Biological Interpretation and Biological Semantics

Page 47: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

47

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 48: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

48

Biomedical semantics provides rich descriptions for

biomedical domain knowledge.

Motivation for Biological Semantics: GO has limitations:

The result of GO is typically a long unordered list of annotations

Most of the analysis tools evaluate only one cluster at a time time-consuming to read the massive annotation lists hard to manually assemble Many annotations are redundant

Biological Interpretation and Biological Semantics

Page 49: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

49

Introducing BioLattice:

a mathematical framework based on concept lattice analysis organize traditional clusters and associated annotations

into a lattice of concepts A graphical summary considers gene expression clusters as objects and

annotations as attributes

Thus, complex relations among clusters and annotations are clarified, ordered and visualized.

Biological Interpretation and Biological Semantics

Page 50: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

50

Another advantage of BioLattice is that heterogeneous

biological knowledge resources can be added

Biological Interpretation and Biological Semantics

Page 51: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

51

Page 52: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

52

Tool to construct BioLattice:

The Ganter algorithm http:// www.snubi.org/software/biolattice/

Biological Interpretation and Biological Semantics

Page 53: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

53

Page 54: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

54

Review of major computational approaches to

facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.

Conclusion

Page 55: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

55

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 56: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments

56