community detection via semi–synchronous label propagation ... · 12/12/2010 business...

22
Community Detection via Semi–Synchronous Label Propagation Algorithms Gennaro Cordasco and Luisa Gargano Dipartimento di Informatica ed Applicazioni “R.M. Capocelli” Università degli Studi di Salerno, ITALY Business Applications of Social Network Analysis (BASNA) 2010

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Community Detection via Semi–Synchronous Label Propagation Algorithms

Gennaro Cordasco and Luisa Gargano

Dipartimento di Informatica ed Applicazioni “R.M. Capocelli” Università degli Studi di Salerno, ITALY

Business Applications of Social Network Analysis (BASNA) 2010

Page 2: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Outline

• Complex networks – Community structure in networks

• Community detection algorithms • Label Propagation algorithms (LPAs)

– Synchronous vs Asynchronous LPAs – Stopping Criteria and Tie Resolution strategies

• Our Proposal: Semi-synchronous LPA – Convergence – Experimental Results

• Conclusion

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 2

Page 3: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Complex networks

• Real-world networks are characterized by several properties:

– Small-world phenomenon

– Scale-free distribution

– …

– Community structure

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 3

Page 4: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

The largest connected component of a coauthorship network connecting physicists who have published together on networks. Each node is colored according to community membership.

Enron email corpus at UC Berkeley (2005)

A metabolic network consisting of metabolic genes (blue hexagons) regulated by carbon, light, and nitrogen treatments.

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 4

Community Structure in networks

Page 5: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Community Detection Algorithms • Input: A network G=(V,E)

• Output: A partition of V into communities consisting of nodes

with similar characteristics.

• Two families of methods: • Agglomerative

• Divisive

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 5

Agglo

merative

Divisive

Page 6: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Label Propagation Algorithms (LPAs) [RAK07]

1. Each vertex in the network is assigned a unique label (its own community)

2. An iterative process is performed so that connected groups of vertices are able to reach a consensus on some label giving rise to a community.

LPAs algorithms have shown to be quick (experimentally) and effective

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 6

Page 7: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Synchronous LPA

• Easy to parallelize

• Stable

• Labels may oscillate

• Complex stop criterion are needed

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 7

In the synchronous LPA each vertex computes its label at step i based on the label of its neighbors at step i − 1.

Page 8: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Synchronous LPA: The oscillation phenomenon

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 8

Page 9: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Asynchronous LPA

• Hard to parallelize: some dependencies need to be considered

• Unstable: different runs may provide different results

• Less effective: some runs may provide a ‘‘monster’’ community

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9

In the asynchronous LPA, for each step, vertices labels are sequentially updated based on the current labeling.

Page 10: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Asynchronous LPA

6

3

4

8 1 9

2

7

6

3

5

4

8 1 9

2

7

… …

… …

… 1

3

9

5

1

6 8

4

7

10

Propagation Step 1

Propagation Step 2

1 5 6

7 9 10

Generate a Random permutation of vertices

Generate a Random permutation of vertices

8 4

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 10

10

10

6

3

5

4

8 1 9

2

7 10

6

3

5

4

8 1 9

2

7 10

6

3

5

4

8 1 9

2

7 10

1

3

9

5

1

6 8

4

7

10

6

3

5

4

8 1 9

2

7 10

6

3

5

4

8 1 9

2

7 10

1

3

9

5

1

6 8

4

7

10

5

… … …

Page 11: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

LPA Tie Resolution Strategies

• When a ties occurs (i.e., there are two or more labels that maximize the sum in the labeling equation ):

• LPA-Random (LPA): one of the labels that maximize the sum is chosen randomly.

• LPA-Prec: if the current label satisfies the labeling equation, then the vertex keeps its current label, otherwise the label is chosen randomly.

• LPA-Max: each tie is solved deterministically by taking the label with higher value.

• LPA-Prec-Max: uses both the rules above.

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 11

)(

][maxargvNu

ul

v lll

Page 12: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

LPA Stopping Criteria

• Stop Criteria (C0) compare the current labeling with the previous one, if no label change occurs, the algorithm is stopped.

(C1) if in the current labeling every vertex in the network has a label to which the maximum number of its neighbors belong to, then the algorithm is stopped.

(C2) if either lv(i) = lv(i − 1) for each v ∈ V or lv(i) = lv(i − 2) for each v ∈ V, then the algorithm is stopped.

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 12

Does not prevent cycles

LPA Asynch

LPA Synch

Page 13: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Our Proposal: Semi–Synchronous LPA

Two Phases: Coloring Phase. Color the network vertices so that no two adjacent vertices share the same color (i.e., by any distributed graph coloring algorithm). Propagation Phase. Each label propagation step is divided into stages:

At stage c, labels are simultaneously propagated to the vertices that have been assigned color c during the coloring phase.

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 13

Neighbouring nodes are never updated simultaneously

Page 14: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Propagation Step 1

A

Coloring Phase

B C

Propagation Step 2

A B C

Our Proposal: Semi–Synchronous LPA

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 14

A

A

C

C C

A C

C

C

B

A

A

A

A

C

C C

A C

C

C

B

A

A

A

A

C

C C

A C

C

C

B

A

A

A

A

C

C C

A C

C

C

B

A

A

A

A

C

C C

A C

C

C

B

A

A

A

A

C

C C

A C

C

C

B

A

A

A

A

C

C C

A C

C

C

B

A

A

Page 15: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Our Proposal: Convergence

Theorem 1:

Consider a network G = (V,E). Assume that Semi–Synchronous LPA uses the stop criterion (c1), that is, it ends at the first step t such that for each v ∈ V one of the following condition holds:

i) lv(t) = lv(t − 1)

ii) lv(t) ≠ lv(t − 1) but this change is due to a tie.

Then the Algorithm converges, independently of the tie management rule.

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 15

Hint: The number of monochromatic edges in the network increases at each propagation step

Page 16: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Our Proposal: Experimental Results

• Analyzed Networks:

1. Zachary’s Karate: Social network of friendships between 34 members of a karate club at a US university in the 1970s [Zac77];

2. Dolphins: Social network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand [LSB+03];

3. Football: Network of American football games between Division IA colleges during regular season Fall 2000 [GN02];

4. NetScience: Coauthorship network of scientists working on network theory and experiment, as compiled by M. Newman in May 2006 [New06a];

5. Power: A network representing the topology of the Western States Power Grid of the United State [WS98];

6. Internet: A symmetrized snapshot of the structure of the Internet at the level of autonomous systems, reconstructed from BGP tables posted by the University of Oregon Route Views [New06b];

7. Cond-Mat: Coauthorships between scientists posting preprints between Jan 1, 1995 and June 30, 2003 on the Condensed Matter E-Print Archive [New01].

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 16

Page 17: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Our Proposal: Experimental Results Test Settings:

Each test is identified by a triple (P,N,T) where

P, indicate the algorithm

N, indicate the network

T, the tie resolution strategy

Overall 56 different test settings have been considered.

Test Execution: i. Initial labeling (randomly chosen) and network coloring (if needed)

ii. LPA execution (Stop criteria (c1))

iii. Community identification

iv. Community structure evaluation (quality and stability)

Quality of Communities: We measure the quality of a network partition using the Modularity [NG04]:

The modularity of a given partion C of a graph G=(V,E) is defined as

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 17

Page 18: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Our Proposal: Experimental Results

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 18

Mo

du

lari

ty

Page 19: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Our Proposal: Experimental Results

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 19

Nu

mb

er o

f st

ages

Page 20: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Our Proposal: Experimental Results

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 20

Mo

du

lari

ty s

tan

dar

d d

evia

tio

n

Page 21: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Conclusion

• Semi–synchronous LPA is

• Effective

• Efficient

• Stable

• Several problems related to the convergence of LPAs and their speed are still open.

12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 21

Page 22: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,

Thanks for your attention

Gennaro Cordasco

Luisa Gargano