graph clustering based on random walk

23

Upload: rafi

Post on 01-Feb-2016

25 views

Category:

Documents


1 download

DESCRIPTION

Graph Clustering based on Random Walk. Outline. Background Graph Clustering Random Walks MCL Basis Inflation Operator Algorithm Convergence MCL++ R-MCL MLR-MCL. Outline. Background Graph Clustering Random Walks MCL Basis Inflation Operator Algorithm Convergence MCL++ R-MCL - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Graph Clustering based on Random Walk
Page 2: Graph Clustering based on Random Walk

Background◦ Graph Clustering◦ Random Walks

MCL◦ Basis◦ Inflation Operator◦ Algorithm◦ Convergence

MCL++◦ R-MCL◦ MLR-MCL

Page 3: Graph Clustering based on Random Walk

Background◦ Graph Clustering◦ Random Walks

MCL◦ Basis◦ Inflation Operator◦ Algorithm◦ Convergence

MCL++◦ R-MCL◦ MLR-MCL

Page 4: Graph Clustering based on Random Walk

Clustering: group items naturally Vector clustering Graph clustering

Many links within a cluster, and fewer links

between clustersVectors are more likely to

each other in the same cluster

Page 5: Graph Clustering based on Random Walk

Observation: If you start at a node, and then randomly travel to a connected node, you’re more likely to stay within a cluster than travel between.

This is what MCL based on.

Random walk on a graph is a Markov process, that means next state only depends on current state.

Page 6: Graph Clustering based on Random Walk

Background◦ Graph Clustering◦ Random Walks

MCL◦ Basis◦ Inflation Operator◦ Algorithm◦ Convergence

MCL++◦ R-MCL◦ MLR-MCL

Page 7: Graph Clustering based on Random Walk

Transition matrix P

P1000

What’s wrong??

1

2 3

4

5

6

0 0.5 0.5 0.33 0 00.33 0 0.5 0 0 00.33 0.5 0 0 0 00.33 0 0 0 0.5 0.50 0 0 0.33 0 0.50 0 0 0.33 0.5 0

0.2148 0.2148 0.2148 0.2148 0.2148 0.21480.1428 0.1428 0.1428 0.1428 0.1428 0.14280.1428 0.1428 0.1428 0.1428 0.1428 0.14280.2141 0.2141 0.2141 0.2141 0.2141 0.21410.1428 0.1428 0.1428 0.1428 0.1428 0.14280.1428 0.1428 0.1428 0.1428 0.1428 0.1428

Page 8: Graph Clustering based on Random Walk

"Flow is easier within dense regions than across sparse boundaries, however, in the long run this effect disappears."

How to deal with it?◦ During the walking, we should encourage the intra-cluster

communications and punish the inter-ones.

0 0.5 0.5 0.33 0 00.33 0 0.5 0 0 00.33 0.5 0 0 0 00.33 0 0 0 0.5 0.50 0 0 0.33 0 0.50 0 0 0.33 0.5 0

1

2 3

4

5

6

Page 9: Graph Clustering based on Random Walk

MCL adjusting the transitions by columns. For each vertex, the transition values are changed so

that Strong neighbors are further strengthened Less popular neighbors are demoted.

This adjusting can be done by raising a single column to a non-negative power, and then re-normalizing.

This operation is named “Inflation” (the matrix powers is named “Expansion”)

Page 10: Graph Clustering based on Random Walk
Page 11: Graph Clustering based on Random Walk

Strengthens strong flows, and weakens already weak flows

The inflation parameter, r, controls the extent of this strengthening / weakening. This influences the granularity of clusters.

Square, andthen normalize

Page 12: Graph Clustering based on Random Walk

Two processes are repeated alternately:◦ Expansion◦ Inflation

Page 13: Graph Clustering based on Random Walk

Convergence is not proven in the thesis, however it is shown experimentally that it often does occur.

In practice, the algorithm converges nearly always to a "doubly idempotent" matrix:◦ It's at steady state.◦ Every value in a single column has the same number

Page 14: Graph Clustering based on Random Walk
Page 15: Graph Clustering based on Random Walk
Page 16: Graph Clustering based on Random Walk

How to interpret clusters?

Page 17: Graph Clustering based on Random Walk

To interpret clusters, the vertices are split into two types. Attractors, which attract other vertices, and vertices that are being attracted by the attractors.

Attractors have at least one positive flow value within their corresponding row (in the steady state matrix).

Each attractor is attracting the vertices which have positive values within its row.

Attractors and the elements they attract are swept together into the same cluster.

Page 18: Graph Clustering based on Random Walk

Only when a vertex is attracted exactly equally by more than one cluster

This occurs only when both clusters are isomorphic

Page 19: Graph Clustering based on Random Walk
Page 20: Graph Clustering based on Random Walk

For clusters with large diameter, MCL has problems Distributing flow across cluster needs long expansion

and low inflation (otherwise the cluster will split). Takes many iterations and causes MCL to be sensitive

to small perturbations in the graph.

Page 21: Graph Clustering based on Random Walk

O(N3), where N is the number of vertices◦ N3 cost of one matrix multiplication on two matrices of

dimension N.◦ Inflation can be done in O(N2) time◦ The number of steps to converge is not proven, but

experimentally shown to be ~10 to 100 steps, and mostly consist of sparse matrices after the first few steps.

Speed can be improved through pruning◦ Inspect matrix and set small values directly to zero◦ Works well when the diameter of the clusters is small

Page 22: Graph Clustering based on Random Walk

Background◦ Graph Clustering◦ Random Walks

MCL◦ Basis◦ Inflation Operator◦ Algorithm◦ Convergence

MCL++◦ R-MCL◦ MLR-MCL

Page 23: Graph Clustering based on Random Walk

[1] S. V. Dongen. Graph Clustering by Flow Simulation. PhD Thesis, University of Utrecht, 2000. http://igitur-archive.library.uu.nl/dissertations/1895620/inhoud.htm

[2] http://www.cs.ucsb.edu/~xyan/classes/CS595D-2009winter/MCL_Presentation2.pdf

[3] V. Satuluri and S. Parthasarathy. Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery, KDD'09. http://portal.acm.org/citation.cfm?id=1557101

[4] http://velblod.videolectures.net/2009/contrib/kdd09_paris/satuluri_sgcusfacd/kdd09_satuluri_sgcusfacd_01.ppt