chapter 5: microarray techniques - columbia university · chapter 5: microarray techniques 5.2...

23
1 Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering

Upload: others

Post on 14-Mar-2020

44 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

1

Prof. Yechiam Yemini (YY)

Computer Science DepartmentColumbia University

Chapter 5: Microarray Techniques

5.2 Analysis of Microarray Data

2

Overview

Normalization Clustering

Page 2: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

2

3

Processing Microarray DataProblem 1: extract data from microarraysProblem 2: analyze the meaning of data (multiple arrays)

gm

g1g2

gi

Tj

Genes

Expression levelof gi under test Tj

Heat map

Test Tj

4

Normalization

Page 3: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

3

5

Differentiating Gene Expression Ideal data

R=G for all genes that are not differentiatedR>G for up-regulated genes (R<G for down regulated)

Microarray data can be noisyNoise due to technology factors:

o Measurements of R and G may be noisy; two arrays can vary greatlyo Even a single array can have variations in dye, mRNA, scanning…

Noise due to biological factors:o Samples variability

Down-regulated

G

R Up-regulated

Ideal G

R

More likely

6

Normalizing Expression Levels Consider logR, logG to evaluate orders of magnitude differences Normalization: calibrate R,G fluorescence measurements Regression: consider log(R/G) -c = log(aR/G) c=-log(a)

c is selected to shift the mean log ratio to 0 Under ideal circumstances this gives the distribution below

Rotate 45o

M=logR-logG=log(R/G)A=½[logR+logG]=log(RG)1/2

A

M

logG

logR Regression

logG

logR

Log(aR/G)

Page 4: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

4

7

Lowess NormalizationRelationships of M/A may not be linearLowess (Locally WEighted polynomial regreSSion)

Lowess

Normalized M values are the heights of spots from the “trend” line

A

M

A

M

8

Normalizing Data From Two Arrays

Normalization:• Transform to A,M axes• Apply Lowess adjustment• Use resulting values for

gene expression matrix

Page 5: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

5

9

Differentiated Expression Analysis

Use the normalized regression “Fold” lines determine region

A

M

Up-regulated

Down-regulated

Fold line

10

Hierarchical Clustering

Page 6: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

6

11

Heat Map Matrix

gi

Gene expression profile

gm

g1g2

gi

T1T2 TnTj

Genes

Tests/experiments/samples/conditions

Tj

Test expression profile

Expression levelof gi under test Tj

Heat map

12

Clustering Analysis Gene profile co-expression Test/sample profile sample similarity

gi

Gene expression profileTj

Test expression profile

Page 7: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

7

13

Clustering Expression ProfilesProfile vector of expression values

Gene (rows): gi (ei1,ei2,….ein) Test/sample (columns): Tj(e1j,e2j,….emj)

gm

g1g2

gi

T1T2 TnTj

Tj=(e1j,e2j,….emj)

gi=(ei1,ei2,….ein)

14

Hierarchical Clustering

Key idea: cluster recursively the “closest” pairE.g., We used this for phylogeny and MSAAgglomerative (bottom-up) vs. Divisive (top-down)

Distance metrics*: d(A,B)

2 3 4 51 0.3 0.2 0.8 0.12 0.9 0.1 0.83 0.2 0.74 0.1

Distance Matrix

1. Euclidean: √Σi = 1 (xiA - xiB)2

2. Manhattan: Σi = 1 |xiA – xiB|m

3. Pearson correlation

−= ∑

= B

BBTn

T A

AAT xxxxn

r(A,B)σσ

1

1

(* Triangle inequality is not required; semi-metric is sufficient)

Page 8: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

8

15

Hierarchical Clustering1) Connect nearest neighbors into cluster2) Compute distance matrix to new cluster3) Repeat until all clustered

16

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 9: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

9

17

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

18

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 10: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

10

19

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

20

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 11: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

11

21

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

22

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 12: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

12

23

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

24

Pros & Cons of Hierarchical Clustering Pros

It provides useful partitioning of the data Organizes co-expressed genes and similar tests Visual 2D organization of data

Cons Can be very sensitive to noise Dimensionality may exacerbate sensitivity May not be related to nature (genes are not hierarchical)

Page 13: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

13

25

An example : Hierarchical Clustering

26

An example : Hierarchical Clustering

Page 14: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

14

27

An example : Hierarchical Clustering

28

K-Means Clustering

Page 15: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

15

29

K-Means Clustering

Key-idea: iterative improvement of clustersStart with random partitioning and improve it

30

Initialization1) Select # of clusters: k=42) Select k random centroids {mj}3) Assign genes to cluster of closest centroid

4) Compute new centroids

5) Repeat until convergence!

c = argmin j ||m j " gi || Classify gene i to cluster c

http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/AppletKM.html

!

mc =1

NC

gii=1

NC

"

Page 16: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

16

31

Self Organizing Maps

32

Self Organizing Maps (SOM) Clustering

Kohonen 87 Iterative clustering similar to k-means Select # of clusters k and a grid of k centroids Move grid closer to points

Page 17: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

17

33

Initialize: A. Select k=6 Clusters

34

Initialize: B. Select Random LocationFor a Grid of k=6 Centroids

Page 18: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

18

35

Iteration: Select A Random Point

P

36

Iteration: Identify Nearest Centroid

NP

P

Page 19: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

19

37

Iteration: Move Centroid Towards Point

P

NP

38

Q

NQ

Iteration: Repeat For New Point

Page 20: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

20

39

Q

NQ

Iteration: Repeat

40

Iteration: Repeat Until Convergenece

Page 21: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

21

41

Comparison(Based on W. Noble slides)

42

Comparison of clustering algorithms Hierarchical clustering

+ Widely used.+ Easy to understand.+ Does not require the number of clusters a priori.- Difficult to implement well.- Requires post-processing.- Unstable.- Greediness can lock in early mistakes.- Expression data may not be organized hierarchically.

Page 22: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

22

43

Comparison of clustering algorithms k-means

- Less widely used.- Requires the number of clusters a priori.- Creates unorganized clusters that are hard to interpret.+ Easy to understand.+ Easy to implement.+ Scales well.+ Stable.

44

Comparison of clustering algorithms Self-organizing maps

- Less widely used.- Difficult to understand.- Requires the number of clusters a priori.+ Easy to implement.+ Scales well.+ Allows imposition of partial structure.+ Stable.

Page 23: Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

23

45

What clustering can’t do Identify differentially regulated genes. Account for complex experimental design. Provide semantics for discovered clusters. Determine whether a pathway is differentially expressed. Incorporate prior knowledge about relevant gene groups.