community detection algorithm and community quality metric

28
unity Detection Algorithm and Community Quality M Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic Institute

Upload: fauna

Post on 22-Feb-2016

114 views

Category:

Documents


0 download

DESCRIPTION

Community Detection Algorithm and Community Quality Metric. Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic Institute. Community Structure. Many networks display community structure - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Community Detection Algorithm and Community Quality Metric

Community Detection Algorithm and Community Quality Metric

Mingming Chen & Boleslaw K. SzymanskiDepartment of Computer ScienceRensselaer Polytechnic Institute

Page 2: Community Detection Algorithm and Community Quality Metric

Community Structure

Many networks display community structure Groups of nodes within which connections are

denser than between them

Community detection algorithms

Community quality metrics

Page 3: Community Detection Algorithm and Community Quality Metric

Two Related Community Detection Topics

Community detection algorithm LabelRank: a stabilized label propagation

community detection algorithm LabelRankT: extended algorithm for dynamic

networks based on LabelRank

A new community quality metric solving two problems of ModularityM. E. J. Newman, 2006;Newman and Girvan, 2004.

Xie, Chen, and Symanski, 2013.

Xie and Symanski, 2013.

Page 4: Community Detection Algorithm and Community Quality Metric

LabelRank Algorithm

Four operators applied to the labels Label propagation operator Inflation operator Cutoff operator Conditional update operator

2

4

1

3

1

1

1

1

Question: NP=P ?Node 1: No;Node 2: No;Node 3: No;Node 4: Yes.

P1 (No)=3/4;P1 (Yes)=1/4. Node 1: No.

No

No

No

Yes

97P1 (No)=3/100;P1 (Yes)=97/100. Node 1: Yes.

Page 5: Community Detection Algorithm and Community Quality Metric

Label Propagation Operator

where W is the n x n weighted adjacent matrix. P is the n x n label probability distribution matrix which is composed of n (1 x n) row vectors Pi, one for each node

Each element Pi(c) holds the current estimation of probability of node i observing label , where C is the set of labels (here, suppose C={1, 2, …, n}) Ex. Pi=(0.1, 0.2, …, 0.05, …)

To initialize P, each node is assigned a distribution of probabilities of all incoming edges

W P

c C

( )

( ) , s.t. 0.ici ic

ikk Nb i

wP c c C w

w

Page 6: Community Detection Algorithm and Community Quality Metric

Label Propagation Operator Each node receives the label probability distribution

from its neighbors and computes the new distribution ( )

( )

( )( ) , .

ij jj Nb ii

ikk Nb i

w P cP c c C

w

P3= (0.25, 0, 0.25, 0, 0, 0, 0.25, 0.25, 0, 0)

P2= (0.25, 0.25, 0, 0, 0.25, 0.25, 0, 0, 0, 0) P4= (0.25, 0, 0, 0.25, 0, 0, 0, 0, 0.25, 0.25)

P1= (0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0, 0, 0)

P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

Page 7: Community Detection Algorithm and Community Quality Metric

Inflation Operator Each element Pi(c) rises to the inth power:

It increases probabilities of labels with high probability but decreases that of labels with low probabilities during label propagation.

( )( ) ( )in

iinin i

ij C

P cP c P j

( 2)in in

P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)

Page 8: Community Detection Algorithm and Community Quality Metric

Cutoff Operator The cutoff operator on P removes labels that are

below the threshold with the help from Inflation Operator that decreases probabilities of labels with low probabilities during propagation.

efficiently reduces the space complexity from quadratic to linear.

r[0,1]r

r

P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)

( 0.1)r r

P1= (0.129)

With r = 0.1, the average number of labels in each node is less than 3.

Page 9: Community Detection Algorithm and Community Quality Metric

Conditional Update Operator At each iteration, it updates a node i only when it is

significantly different from its incoming neighbors in terms of labels:

where is the set of maximum probability labels at node i at the last step. returns 1 if and 0 otherwise. ki is the node degree and q∈ [0,1].

isSubset can be viewed as a measure of similarity between two nodes.

* *

( )

( , ) ,i j ij Nb i

isSubset C C qk

*iC

1 2( , )isSubset s s 1 2s s

Page 10: Community Detection Algorithm and Community Quality Metric

Effect of Conditional Update Operator

Page 11: Community Detection Algorithm and Community Quality Metric

Running time of LabelRank O(Tm): m is the number of edges and T is the number

of iterations.

LabelRank is a linear algorithm

Page 12: Community Detection Algorithm and Community Quality Metric

Performance of LabelRank

Page 13: Community Detection Algorithm and Community Quality Metric

LabelRankT It is a LabelRank with one extra conditional update rule

by which only nodes involved changes will be updated. Changes are handled by comparing neighbors of node i at two consecutive steps, and .1( )tNb i ( )tNb i

Page 14: Community Detection Algorithm and Community Quality Metric

Two Problems of Modularity Maximization

Split large communities Favor small communities

Resolution limit problem Modularity optimization may fail to discover

communities smaller than a scale even in cases where communities are unambiguously defined.

This scale depends on the total number of edges in the network and the degree of interconnectedness of the communities.

Favor large communitiesFortunato et al, 2008; Li et al, 2008; Arenas et al, 2008; Berry et al, 2009; Good et al, 2010; Ronhovde et al, 2010; Fortunato, 2010; Lancichinetti et al, 2011; Traag et al, 2011; Darst et al, 2013.

Page 15: Community Detection Algorithm and Community Quality Metric

Modularity Modularity (Q): the fraction of edges falling within

communities minus the expected value in an equivalent network with edges placed at random

Equivalent definition

,

,

1 ,2 | | 2 | |

1 if nodes and in the same community,0 otherwise.

i j

i j

i jij c c

ij

c c

k kQ A

E E

i j

2| | | | 2 | | | |

,| | 2 | |

| |: the number of intra edges of Community ;

| |: the number of inter edges of Community .

i i i

i

i

i

in in outcc c c

c

inc i

outc i

E E EQ

E E

E c

E c

M. E. J. Newman, 2006.

Newman and Girvan, 2004.

Page 16: Community Detection Algorithm and Community Quality Metric

Modularity with Split Penalty Modularity (Q): the modularity of the community

detection result

Split penalty (SP): the fraction of edges that connect nodes of different communities

Qs = Q – SP: solving the problem, favoring small communities, of Modularity

,1 .

2 | | 2 | | i j

i jij c c

ij

k kQ A

E E

,1 (1 ).

2 | | i jij c cij

SP AE

, ,1 1 (1 ).

2 | | 2 | | 2 | |i j i j

i js ij c c ij c c

ij ij

k kQ Q SP A A

E E E

Page 17: Community Detection Algorithm and Community Quality Metric

Qs with Community Density Resolution limit: Modularity optimization may fail to

detect communities smaller than a scale Intuitively, put density into Modularity and Split Penalty

to solve the resolution limit problem

Equivalent definition

,

2, , ,

,

1 1 (1 )2 | | 2 | | 2 | |

| || | (| | 1) / 2

| |

| || |

i i i j i j i j

i

i

i j

i j

i jds ij c c c c ij c c c c

ij ij

inc

ci i

c cc c

i j

k kQ A d d A d

E E E

Ed

c cE

dc c

2| | | |

,,

| || | 2 | | | || | 2 | | 2 | |

i ji i i

i i i j

i j

j i

in in outc Cc cc c c

ds c c c cc c

c c

EE E EQ d d d

E E E

Page 18: Community Detection Algorithm and Community Quality Metric

Example of Two Well-Separated Communities

  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

2 communities 0.5 0 0.5 0.51 community 0 0 0 0.245

Page 19: Community Detection Algorithm and Community Quality Metric

Example of Two Weakly Connected Communities

  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

2 communities 0.357 0.143 0.214 0.3391 community 0 0 0 0.25

Page 20: Community Detection Algorithm and Community Quality Metric

Ambiguity between One and Two Communities

  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

2 communities 0.3 0.2 0.1 0.2631 community 0 0 0 0.249

Page 21: Community Detection Algorithm and Community Quality Metric

Ambiguity between One and Two Communities

  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

2 communities 0.25 0.25 0 0.1881 community 0 0 0 0.245

Page 22: Community Detection Algorithm and Community Quality Metric

Example of One Well Connected Community

  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

2 communities 0.167 0.333 -0.167 0.04171 community 0 0 0 0.23

Page 23: Community Detection Algorithm and Community Quality Metric

Example of One Very Well Connected Community

  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

2 communities 0.0455 0.455 -0.409 -0.2391 community 0 0 0 0.168

Page 24: Community Detection Algorithm and Community Quality Metric

Example of One Complete Graph

Community Quality on a complete graph with 8 nodes  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

2 communities -0.0714 0.571 -0.643 -0.6431 community 0 0 0 0

Page 25: Community Detection Algorithm and Community Quality Metric

Modularity Has Nothing to Do with #Nodes

2

2

2

12 13(clique) (tree) 2* 0.4231;26 26

12 13 1(clique) (tree) 2* 0.3462;26 26 26

12 13 1 1(clique) 2* *1 *1 * 0.4183;26 26 26 4*4

12 2 13 2(tree) 2* * *26 7 26 7

s s

ds

ds

Q Q

Q Q

Q

Q

2 1 1* 0.2214.26 7*7

Page 26: Community Detection Algorithm and Community Quality Metric

5-clique Example

  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

30 communities

0.8758 0.09091 0.7848 0.8721

15 communities

0.8879 0.04545 0.8424 0.4305∆Qs=(0.8424-0.7848)=0.0576 > ∆Q=(0.8879-0.8758)=0.0121

Page 27: Community Detection Algorithm and Community Quality Metric

Thanks!Q & A

Page 28: Community Detection Algorithm and Community Quality Metric

Example of Two Weakly Connected Communities

  Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds

2 communities 0.309 0.25 0.0586 0.2641 community -0.00586 0.125 -0.131 0.202