recentadvances&in&stochas0c&flow&clustering& · 1 theoretical results •...

48
Recent Advances in Stochas0c Flow Clustering Srinivasan Parthasarathy Data Mining Research Laboratory Dept. of Computer Science and Engineering The Ohio State University h:p://www.cse.ohiostate.edu/~srini

Upload: others

Post on 31-Aug-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Recent  Advances  in  Stochas0c  Flow  Clustering  

Srinivasan  Parthasarathy  

Data  Mining    Research  Laboratory  Dept.  of  Computer  Science  and  Engineering  

The  Ohio  State  University  

h:p://www.cse.ohio-­‐state.edu/~srini  

Page 2: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Graph  Clustering:  A  Fundamental  Problem  

2  

Given a graph, discover groups of nodes that are strongly connected to one another but weakly connected to the rest of the graph.!!! !

Page 3: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

What  Makes  this  Problem  Hard?  

•  Scale  –  High  throughput  experiments,  social  media,  high-­‐res  images.  

•  Noise  –  False  posi0ve  interac0ons;  False  nega0ves  

•  Novel  Topological  Characteris0cs  –  Hub  nodes;  power-­‐law  

•  Domain  Insights  –  Balance;  known  biological  rela0onships  

•  Dynamics  –  Changes  to  nodes  and  links  and  content  

3  

Page 4: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Extant  Solu0ons  

• Spectral  methods  [Shi  ‘00]      • Edge-­‐based  agglomera0ve/divisive  methods  [Newman  ‘04]  

 • Graclus/Kernel  K-­‐Means  [Dhillon  ‘07]  

• Me0s  [Karypis  ’98]  +  MQI  [Leskovec,  Lang’10]  

• Markov  Clustering  [van  Dongen’00]  

• A  Host  of  Specialized  Solu0ons  (e.g.  MCODE,  LINK-­‐CLUSTER;  etc.)  

 

Page 5: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

5

Markov Clustering (MCL)!Stijn van Dongen, 2000!

!The original Stochastic flow

clustering algorithm!!

Page 6: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

6  

3

1

2

4

1! 2! 3! 4!

1! 0.33! 0.25! 0.33!

2! 0.33! 0.25! 0.5! 0.33!

3! 0.25! 0.5!

4! 0.33! 0.25! 0.33!

Out-flows of 2!

In-flows of 2!

Column Stochastic Matrix: A matrix where each column sums to 1.!!Stochastic Flow: An entry in a column stochastic matrix, interpreted as the “flow” or “transition probability”.!

Page 7: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

7

Repeatedly apply certain operations to the flow matrix until the matrix converges and can be interpreted as a clustering. !

1! 2! 3! 4!

1!

2! 1.0! 1.0! 1.0!

3! 1.0!

4!3

1

2

4

Page 8: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

The  MCL  algorithm  

Expand:  M  :=  M*M  

Inflate:  M  :=  M.^r  (r  usually  2),  renormalize  columns  

Converged?  

Input:  A,  Adjacency  matrix  Ini0alize  M  to  MG,  the  canonical  transi0on  matrix  M:=  MG:=  (A+I)  D-­‐1  

Yes  

Output  clusters  

No  

Prune  

Enhances  flow  to  well-­‐connected  nodes  (i.e.  nodes  within  a  community).  

Increases  inequality  in  each  column.  “Rich  get  richer,  poor  get  poorer.”  (reduces  flow  across  communi0es)  

Saves  memory  by  removing  entries  close  to  zero.  Enables  faster  convergence  

Clustering  Interpreta0on:  Nodes  flowing  into  the  same  sink    node  are  assigned  same  cluster  labels  

Page 9: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

9  

[van Dongen ’00]

Page 10: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

10  

MCL Strengths!!1.  Theoretically well founded [Von Dongen’00]!

2.  Simple, linear algebraic operations!

3.  Noise Tolerant. [Brohee’06, Vlasblom’09]!!

[Chakrabarti and Faloutsos ‘06]

Page 11: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

11  

MCL Limitations!!

!1.  Outputs many small clusters. [Satuluri, Parthasarathy’09]!!

!!

2.  Does not scale well. ! ! [Chakrabarti, Faloutsos’06]!

[Chakrabarti and Faloutsos ‘06]

Page 12: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

12  

MCL Flaws!!!1. Outputs many small clusters.!!

Fix I: Regularized MCL !!

2. Does not scale well. !!

Fix II: Multi-Level Regularized MCL! Fix III: Localized Graph Sparsification!!

Page 13: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Key  Idea  I:  The  Regularize  operator  

Why  does  MCL  output  many  clusters?  Due  to  overfisng;  it  does  not  penalize  divergence  of  flows  between  neighbors.    

Remedy:  Penalize  divergence  in  flows  between  neighbors.  Use  KL  Divergence  (a  well  known  measure  for  comparing  probability  distribu0ons).    

Turns  out  to  have  a  nice  closed  form  soluCon:        

 Regularize(M)  :=M*(A+I)D-­‐1=  M*MG    

     

Page 14: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

The  Regularized-­‐MCL  algorithm  

Regularize:  M  :=  M*MG  

Inflate:  M  :=  M.^r  (r  usually  2),  renormalize  columns  

Converge?  

Yes  

Output  clusters  

No  

Prune  

Takes  into  account  flows  of  the  neighbors.  

Increases  inequality  in  each  column.  “Rich  get  richer,  poor  get  poorer.”  [Hadamard  power  +  rescaling]  

Saves  memory  by  removing  entries  close  to  zero.  Enables  faster  convergence  

Input:  A,  Adjacency  matrix  Ini0alize  M  to  MG,  the  canonical  transi0on  matrix  M:=  MG:=  (A+I)  D-­‐1  

Nodes  flowing  into  the  same  sink  node  are  assigned  same  cluster  labels  

Page 15: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Key  Idea  II:  Mul0-­‐level  Regularized  MCL  

Input  Graph  

Intermediate    Graph  

Intermediate    Graph  

Coarsest  Graph  

.  .  .   .  .  .  

Coarsen  

Coarsen  

Coarsen  

Run  Curtailed  R-­‐MCL,project  flow.  

Run  Curtailed  R-­‐MCL,  project  flow.  

Input  Graph  

Run  R-­‐MCL  to  convergence,  output  clusters.  

Faster to run on smaller graphs first!

Captures global

topology of graph!

Good initialization for refined flow

matrix!

Page 16: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

16  

Comparison with MCL on!Protein Interaction Networks!

Dataset !(n,m)

Quality !Change! Speedup (Time)!

Yeast !(5k, 15k) 36%! 2.5x !

(0.4s)!

Yeast_Noisy!(6k, 200k) 300%! 57x!

(8s)!

Human!(10k, 60k) 21.6%! 200x!

(2s)!

[Hardware: Quad-core Intel i5 CPU, 3.2 GHz, with 16GB RAM ]

Page 17: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Comparison  with  Graclus  and  Me0s  

Quality:  MLR-­‐MCL  improves  upon  both  Graclus  and  Me0s    

Speed:  MLR-­‐MCL  is  faster  than  Graclus,  comparable  to  Me0s  

Page 18: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Key  Idea  III:      Graph  Sparsifica0on    

Is there a simple pre-processing of the graph to reduce the edge set that can “clarify” or “simplify” its cluster structure?

 

18  

Original! Sparsified!

!!

Page 19: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Our  Approach  

Main  Idea:  Retain  edges  which  are  likely  to  be  intra-­‐cluster    edges  ,  while  discarding  likely  inter-­‐cluster  edges.    

Similarity-­‐based  SparsificaCon  HeurisCc:  An  edge  (i,j)  is  likely  to  be  an  intra-­‐cluster  edge  if  ver0ces  i  and  j    have  highly  overlapping  adjacency  lists.    

|)()(||)()(|),(

jAdjiAdjjAdjiAdjjiSim

∩=

Page 20: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

20  

Algorithm: Global Sparsification (G-Spar)!

Parameter: Sparsification ratio, s!!

1. For each edge <i,j>:!(i) Calculate Sim ( <i,j> )

2. Retain top s% of edges in order of Sim, discard others!

!

Page 21: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

21  

Dense clusters are over-represented, sparse clusters under-represented!!Works great when the goal is to just find the top communities!

Page 22: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

22  

Algorithm: Local Sparsification (L-Spar)!

Parameter: Sparsification exponent, e (0 < e < 1)!!

1. For each node i of degree di:!(i) For each neighbor j: !

(a) Calculate Sim ( <i,j> ) (ii) Retain top (d i)e neighbors in order of Sim, for node i!!

Edges compete to be retained locally (think globally act locally paradigm)

Page 23: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

23  

Ensures representation of clusters of varying densities!

Page 24: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

24  

But...!

Similarity computation is expensive!!!!

Solution: A randomized, approximate solution based on Minwise Hashing [Broder et. al., 1998]!

Page 25: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

25  

Dataset !(n, m)!

Spars. !Ratio!

L-Spar!

Speed! Quality!

Yeast ! 17%! 17x! +4%!

Human! 40%! 6x! +1%!

L-Spar: Results Using MLR-MCL

[Hardware: Quad-core Intel i5 CPU, 3.2 GHz, with 16GB RAM ]

Page 26: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Results  using  MLR-­‐MCL  

Dataset  (Nodes,  Edges)  

Spars.  RaCo  

RandomEdge    G-­‐Spar   L-­‐Spar  

Spdup   QualityΔ   Spdup  

QualityΔ   Spdup  

QualityΔ  

BioGrid(6K,  200K)  

17%   6x   -­‐16%   38x   -­‐23%   17x   +4%  

Wiki  (1.1M,  53M)  

15%   19x   -­‐58%   92x   -­‐54%   23x   -­‐4.5%  

Orkut(3M,  117M)  

17%   6x   -­‐32%   39x   -­‐59%   22x   0  

Twi:er(146K,  83M)  

4%   63x   -­‐90%   188x   +10%   22x   +40%  

L-­‐Spar  enables  high  speed-­‐ups,  without  significant    loss  of  accuracy.      

Page 27: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

1 Theoretical Results

• Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal tothe number of components of the graph.

• Theorem 2.2(d) of [1]:Pn

i=1 �i = 2|E(G)| =P

v d(v)

2 SIGMOD Paper

Flickr 33911, Flickr.spars 33953. Total 64903 nodesWikipedia 1, Wikipedia.spars 164. Total 1129060 nodesOrkut 186, Orkut.spars 257. Total 3072626 nodes

Dataset Original L-Spar G-SparBioGrid �4197 = 0.34284616 �4197 = 0.08680336 �4894 = 0.14459744DIP �45 = 0.117378 �54 = 0.03226163 �888 = 0.036117

Human �219 = 0.1038301 �234 = 0.05650493 �2266 = 0.05255889

Table 1: Spectral Gap Comparisons

Figure 1: BioGrid Eigenvalues COmparison

References

[1] B. Mohar. The laplacian spectrum of graphs. Graph theory, combinatorics, and applications,2:871–898, 1991.

1

Impact of Sparsification on Spectrum: Yeast PPI

Page 28: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Global Sparsification results in multiple components

Local sparsification seems to match trends of original graph

Human  PPI  

Page 29: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Synthe0c  data  (Fortunato’09)  

As  clustering  problem  gets  harder,  L-­‐Spar  is  more  beneficial  

Page 30: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

SOCIAL  IMPACT:  EMERGENCY  RESPONSE  AND  FLOOD  MAPPING  

Page 31: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Copyright 2006, Data Mining Research Laboratory

Crisis Informatics and Flood Mapping

• Disaster  Informatics  or  crisis  informatics  is  the  study  of  the  use  of  information  and  technology  in  the  different  phases  of  disasters  or  crisis

•  Flood  Mapping:    Mapping  the  extent  of  flood  damage  a  key  step  for  relief  and  recovery

Page 32: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Copyright 2006, Data Mining Research Laboratory

Chennai Floods (2015): Social Sensing Enhanced Flood Mapping

Prior to 1rst Depression After 3rd Depression

Page 33: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Markov  Clustering  and  Flood  Mapping  

•  Water  Delinea0on  Segmenta0on  of  remote-­‐sensed  images  is  key  strategy  employed  in  flood  mapping.  

•  Confounding  factors:  cloud  cover,  sinuous  river  beds,  urban  area  reflectance  effects  

•  Key  Idea:  Semi-­‐supervised  flood  mapping  using  MLR-­‐MCL  as  a  key  pre-­‐processing  step.  

 

Page 34: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Procedure  

1.  Cluster  patches  of  satellite  image  with  MLR-­‐MCL  –  Graph-­‐based  image  segmenta0on  [Shi-­‐Malik’00]  

2.  Guided  patch  labeling  a)  Volunteer  crowdsourcing  b)  Social-­‐media  induced  labels  

3.  Semi-­‐supervised  learning  of  flood  extent  HUG-­‐FM    (KNN  variant)  SEANO          (Neural  Net)  

Page 35: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Houston Floods (original image)

Page 36: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Otsu  Thresholding

• Clustering  based  image  thresholding  [Otsu’79]  

• Converts  image  to  binary  image  

•  Simple,  widely  used,  but  prone  to  false  posi0ves  

•  Detects  highways  as  waterbodies  

Page 37: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Watershed  Algorithm • Relies  on  pre-­‐iden0fied  landmarks  [Beucher’79,  Meyer’92]  

• Applies  gradient  transforma0on  and  thresholding    

• Prone  to  smoothing  errors  

Page 38: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Normal  Thresholding •  Improved  variant  of  Otzu  and  watershed  

• Relies  on  land-­‐cover  iden0fica0on    

•  nuanced  threshold  separa0on  of  types  of  land-­‐cover  from  water  

•  State-­‐of-­‐the-­‐art  in  remote  sensing  [2016]  

Page 39: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

HUG-­‐FM  +  MLR-­‐MCL

Page 40: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of
Page 41: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Quantitative evaluation on Houston dataset

METHOD   ACCURACY   F1  Otsu  Thr   0.89   0.74  Watershed   0.89   0.68  Normal  Thr   0.87   0.84  HUG-­‐FM   0.96   0.87  SEANO   0.97   0.90  

Rely  on  MLR-­‐MCL  preprocessing  Quality  (SEANO)  vs  Speed  (HUG-­‐FM)  tradeoff  

Standard  Remote  Sensing  Methods  

Page 42: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Chennai  Floods  11/24  (bet  2nd  and  3rd  depression)

Page 43: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Watershed  (Beucher,  Meyer  1992)

Page 44: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

N-­‐cuts  (100  parOOons)  (Shi,  Malik’01)

Page 45: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

HUG-­‐FM  +  MLR-­‐MCL  patching

Page 46: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Take  Home:  Recent  Advances  in  MCL   Key Idea 1: Regularization !•  Avoids fragmenting community structure. [SIGKDD’09, ACM BCB’10, Bioinformatics 2012]!

!Key Idea 2: Multi-level Regularization !•  Improves scalability. [SIGKDD’09, ACM BCB’10]!

!Key Idea 3: Sparsification: Simple pre-processing that makes a difference!•  Reduces clustering time from hours down to minutes. [SIGMOD’11, WWW’13]!•  Theoretical rationale [SoCG’17]!

!Key Ideas 4 & 5 : Soft Clustering[ISMB12] & GPU acceleration [HiPC14]!!Social Impact: Use of MLR-MCL for Flood Mapping shows promise.!

46  

Page 47: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

References  (incomplete)  

1.  MCL  -­‐  Graph  Clustering  by  Flow  SimulaVon.  S.  van  Dongen,  Ph.D.  thesis,  University  of  Utrecht,  2000.  

2.  Graclus  -­‐  Weighted  Graph  Cuts  without  Eigenvectors:  A  MulVlevel  Approach.  Dhillon  et.  al.,  IEEE.  Trans.  PAMI,  2007.  

3.  Me0s  -­‐  A  fast  and  high  quality  mulVlevel  scheme  for  parVVoning  irregular  graphs.  Karypis  and  Kumar,  SIAM  J.  on  Scien0fic  Compu0ng,  1998  

4.  Normalized  Cuts  and  Image  SegmentaVon.  Shi  and  Malik,  IEEE.  Trans.  PAMI,  2000.  

5.  Finding  and  evaluaVng  community  structure  in  networks.  Newman  and  Girvan,  Phys.  Rev.  E  69,  2004.    

6.  The  idenVficaVon  of  funcVonal  modules  from  the  genomic  associaVon  of  genes.  Snel    et.  al.,  PNAS  2002.  

Page 48: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of

Thanks  &    Acknowledgements

•  Joint  work  with:  •  Albert  Liang  •  Peter  Jacobs  •  Nikhita  Vedula  •  Venu  Satuluri    (Twi:er)  •  Yu-­‐Keng  Shih  (GraphSQL)  •  Sitaram  Asur  (HP  Laboratories)  •  Duygu  Ucar  (Jackson  Laboratories)    

•  Grant  Acknowledgements  •   NSF  HazardSEES  #1520870  and  SOCS  #  IIS-­‐1111118  

•  Soyware  and  References:h:ps://sites.google.com/site/stochas0cflowclustering/