the click clustering algorithmrshamir/abdbm/pres/17/click.pdf · performance of click vs other...

20
The CLICK clustering algorithm Roded Sharan, Naama Arbili, Adi Maron, Rani Elkon ABDBM © Ron Shamir 1

Upload: others

Post on 23-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

The CLICK clustering algorithm

Roded Sharan, Naama Arbili, Adi Maron, Rani Elkon

ABDBM © Ron Shamir 1

Page 2: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

CLICK: CLuster Identification via Connectivity Kernels

• Identify highly homogeneous sets of elements - connectivity kernels. • Add elements to kernels via similarity to average kernel fingerprints. •Uses tools from graph theory and probabilistic considerations for similarity evaluation and kernel identification. • Efficient implementation.

ABDBM © Ron Shamir 3

Page 3: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Data

Raw Data

experiments

gene

s

•Input: Real-valued matrix. •Row vectors: gene fingerprints •Compute gene similarity matrix

ABDBM © Ron Shamir 4

Similarity S

genes

gene

s

Page 4: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Probabilistic Model • Mates: genes that belong to the same true

cluster • Probabilistic assumptions:

– Similarity between mates ~ N(µT,σT) – Similarity between non-mates ~ N(µF,σF) – Independence of pairwise similarities – Mates probability: p

• Observed for real data • Justified in some cases by the Central Limit

Theorem, simulations • Parameter values needed:

– Computed from partially known solution, or – Estimated using the EM algorithm ABDBM © Ron Shamir 5

Page 5: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Similarity Graph • Input ⇒ weighted graph G with a vertex

per element and an edge between similar elements.

• The weight of edge (i,j)

3 1

4

fM / fN = p.d.f. at Sij for ij mates /nonmates

( )ln

(1 ) ( )

Mij

ij Nij

p f Sw

p f S⋅

=− ⋅

ABDBM © Ron Shamir 6

Page 6: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Edge weights (contd)

2

22

( )

1( | , are mates)2

ij T

T

T

S

ijf S i j eσ

µ

πσ

−−

=

( )ln

(1 ) ( )

Mij

ij Nij

p f Sw

p f S⋅

=− ⋅

ABDBM © Ron Shamir 7

Page 7: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Kernel Identification Cut - Partition of vertices into two groups. Weight of cut - Sum of weights of edges

crossing the cut. • For each cut C in G we test two hypotheses: H0: C contains only edges between non-mates. H1: C contains only edges between mates. G is declared a kernel if H1 is more probable for

all cuts. Thm: G is a kernel iff weight of min. cut > 0.

3 1

4

1 1

3

ABDBM © Ron Shamir 8

Page 8: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Main Theorem

1 1 1

0 0 0| |

| |

Pr( | ) Pr( ) ( | )ln lnPr( | ) Pr( ) ( | )

( )ln

(1 ) ( )

( )ln ( )

(1 ) ( )

C Mijij C

C Nijij C

Mij

ijNij C ij Cij

H C H f C HH C H f C H

p f S

p f S

p f Sw W C

p f S

∈ ∈

=

=−

⋅= = =

− ⋅

∏∏

∑ ∑

Pf: pick any cut C in G:

Thm: G is a kernel iff weight of min. cut > 0.

Bayes Thm.

Sij–s and mate relations indep

Take C* min cut of G. H1 accepted for C* iff it is accepted for all cuts => accept H1 iff W(C) >0.

ABDBM © Ron Shamir 9

Page 9: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Kernel Identification Algorithm

Basic-CLICK(G): • If G={v} then v is a singleton. • o/w:

– Compute a min. weight cut C. – If Weight(C)>0 then G is a kernel. – o/w cut G into the two resulting pieces and continue recursively with each one.

ABDBM © Ron Shamir 10

Page 10: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

ABDBM © Ron Shamir 11

Page 11: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Refinements • Adoption Step: Adopt singleton into kernel

if it is sufficiently similar to its fingerprint • Iterative application of Basic-CLICK and

the adoption step. • Merging Step: Merge clusters with

sufficiently similar average fingerprints

Mincut: NP-hard when there are negative weights

=> heuristic: remove all negative wt edges, compute mincut, correct cut weight for kernel test criterion.

These steps use both fingerprints and similarity values

ABDBM © Ron Shamir 12

Page 12: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Figures of Merits Evaluating a clustering solution when no correct

clustering is known; using fingerprints Homogeneity: Average similarity between the fingerprint of an element and the average fingerprint of its cluster

Separation: Weighted average similarity between the average fingerprints of clusters

Minimum

maximum

ABDBM © Ron Shamir 13

Page 13: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Gene Expression: Yeast Cell Cycle

*Self-Organizing Maps; Tamayo et al., PNAS 1999.

Separation Homogeneity Clus-ters

-0.02

-0.07 Ave

30

30

0.74

0.8 Ave

0.97 -0.88 Gene-Cluster*

0.65 -0.19 CLICK Max Min

Expression levels of 6,218 S. cerevisiae genes, measured at 17 time points over two cell cycles. (Data from Cho et al., Mol. Cell 1998)

ABDBM © Ron Shamir 14

Page 14: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

CLICK clusters: Yeast Cell Cycle

ABDBM © Ron Shamir 15

Page 15: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Yeast Cell Cycle: late G1 Cluster

• Contains 91% of late G1-peaking genes. • In contrast, in GeneCluster 87% are contained in 3 clusters. • M peaking genes: CLICK: 95% in a single cluster, GeneCluster: 92.5% in 3 clusters. • Similar specificities

N=164

ABDBM © Ron Shamir 16

Page 16: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Gene Expression: Serum Response

*Average linkage agglomerative hierarchical clustering Eisen et al., PNAS 1998.

Separation Homogeneity Clus-ters

-0.13

-0.34

Ave

10

10

0.87

0.88

Ave

0.9 -0.75 CLUSTER*

0.65 0.13 CLICK

Max Min

Human fibroblast cells starved for 48 hours, then stimulated by serum. Expression levels of 8,613 genes measured at 13 time points. (Data from Iyer et al., Science 1999)

ABDBM © Ron Shamir 17

Page 17: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

Performance of CLICK vs other clustering algorithms

Elements Problem Original Algorithm

CLICK Improvement

Time (min)*

517 Gene Expression Fibroblasts

Cluster Eisen et al.

Yes 0.5

826 Gene Expression Yeast cell cycle

GeneCluster Tamayo et al.

Yes 0.2

2,329 cDNA OFP Blood Monocytes

HCS Hartuv et al.

Yes 0.8

20,275 cDNA OFP Sea urchin eggs

K-Means Herwig et al.

Yes 32.5

72,623 Protein similarity ProtoMap Yona et al.

Minor 53

117,835 Protein similarity SYSTERS Krause et al.

Yes 126.3

* Executed on an SGI ORIGIN200 machine utilizing one IP27 processor. Does not include preprocessing time. ABDBM © Ron Shamir 26

Page 18: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

“True” CAST*

GeneCluster (SOM)

K-means

CLICK

Homogeneity

Sepa

rati

on

Performance on Yeast Cell Cycle data

*Ben-Dor, Shamir, Yakhini ‘99

698 genes, 71 conditions (data from Spellman et al ‘98)

Each alg was run by its authors in a “blind” test

ABDBM © Ron Shamir 27

Page 19: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

EXPression ANalyzer and DisplayER

Clustering Identify clusters of co-expressed genes

CLICK, KMeans, SOM, hierarchical

http://acgt.cs.tau.ac.il/expander

A. Maron, R. Sharan Bioinformatics 03

Functional

enrichment Visualization

Promoter analysis Analyze TF binding

sites of co-regulated genes

PRIMA

Biclustering Identify

homogeneous submatrices

SAMBA

A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon BMC Bioinformatics 05

microRNA enrichment

FAME

ABDBM © Ron Shamir 30

Page 20: The CLICK clustering algorithmrshamir/abdbm/pres/17/Click.pdf · Performance of CLICK vs other clustering algorithms Elements Time Problem Original Algorithm CLICK Improvement (min)*

FIN

ABDBM © Ron Shamir 31