community detection via semi–synchronous label propagation ... · 12/12/2010 business...
TRANSCRIPT
Community Detection via Semi–Synchronous Label Propagation Algorithms
Gennaro Cordasco and Luisa Gargano
Dipartimento di Informatica ed Applicazioni “R.M. Capocelli” Università degli Studi di Salerno, ITALY
Business Applications of Social Network Analysis (BASNA) 2010
Outline
• Complex networks – Community structure in networks
• Community detection algorithms • Label Propagation algorithms (LPAs)
– Synchronous vs Asynchronous LPAs – Stopping Criteria and Tie Resolution strategies
• Our Proposal: Semi-synchronous LPA – Convergence – Experimental Results
• Conclusion
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 2
Complex networks
• Real-world networks are characterized by several properties:
– Small-world phenomenon
– Scale-free distribution
– …
– Community structure
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 3
The largest connected component of a coauthorship network connecting physicists who have published together on networks. Each node is colored according to community membership.
Enron email corpus at UC Berkeley (2005)
A metabolic network consisting of metabolic genes (blue hexagons) regulated by carbon, light, and nitrogen treatments.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 4
Community Structure in networks
Community Detection Algorithms • Input: A network G=(V,E)
• Output: A partition of V into communities consisting of nodes
with similar characteristics.
• Two families of methods: • Agglomerative
• Divisive
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 5
Agglo
merative
Divisive
Label Propagation Algorithms (LPAs) [RAK07]
1. Each vertex in the network is assigned a unique label (its own community)
2. An iterative process is performed so that connected groups of vertices are able to reach a consensus on some label giving rise to a community.
LPAs algorithms have shown to be quick (experimentally) and effective
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 6
Synchronous LPA
• Easy to parallelize
• Stable
• Labels may oscillate
• Complex stop criterion are needed
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 7
In the synchronous LPA each vertex computes its label at step i based on the label of its neighbors at step i − 1.
Synchronous LPA: The oscillation phenomenon
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 8
Asynchronous LPA
• Hard to parallelize: some dependencies need to be considered
• Unstable: different runs may provide different results
• Less effective: some runs may provide a ‘‘monster’’ community
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9
In the asynchronous LPA, for each step, vertices labels are sequentially updated based on the current labeling.
Asynchronous LPA
6
3
4
8 1 9
2
7
6
3
5
4
8 1 9
2
7
… …
… …
… 1
3
9
5
1
6 8
4
7
10
Propagation Step 1
Propagation Step 2
1 5 6
7 9 10
Generate a Random permutation of vertices
Generate a Random permutation of vertices
8 4
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 10
10
10
6
3
5
4
8 1 9
2
7 10
6
3
5
4
8 1 9
2
7 10
6
3
5
4
8 1 9
2
7 10
1
3
9
5
1
6 8
4
7
10
6
3
5
4
8 1 9
2
7 10
6
3
5
4
8 1 9
2
7 10
1
3
9
5
1
6 8
4
7
10
5
… … …
LPA Tie Resolution Strategies
• When a ties occurs (i.e., there are two or more labels that maximize the sum in the labeling equation ):
• LPA-Random (LPA): one of the labels that maximize the sum is chosen randomly.
• LPA-Prec: if the current label satisfies the labeling equation, then the vertex keeps its current label, otherwise the label is chosen randomly.
• LPA-Max: each tie is solved deterministically by taking the label with higher value.
• LPA-Prec-Max: uses both the rules above.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 11
)(
][maxargvNu
ul
v lll
LPA Stopping Criteria
• Stop Criteria (C0) compare the current labeling with the previous one, if no label change occurs, the algorithm is stopped.
(C1) if in the current labeling every vertex in the network has a label to which the maximum number of its neighbors belong to, then the algorithm is stopped.
(C2) if either lv(i) = lv(i − 1) for each v ∈ V or lv(i) = lv(i − 2) for each v ∈ V, then the algorithm is stopped.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 12
Does not prevent cycles
LPA Asynch
LPA Synch
Our Proposal: Semi–Synchronous LPA
Two Phases: Coloring Phase. Color the network vertices so that no two adjacent vertices share the same color (i.e., by any distributed graph coloring algorithm). Propagation Phase. Each label propagation step is divided into stages:
At stage c, labels are simultaneously propagated to the vertices that have been assigned color c during the coloring phase.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 13
Neighbouring nodes are never updated simultaneously
Propagation Step 1
A
Coloring Phase
B C
Propagation Step 2
A B C
Our Proposal: Semi–Synchronous LPA
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 14
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
Our Proposal: Convergence
Theorem 1:
Consider a network G = (V,E). Assume that Semi–Synchronous LPA uses the stop criterion (c1), that is, it ends at the first step t such that for each v ∈ V one of the following condition holds:
i) lv(t) = lv(t − 1)
ii) lv(t) ≠ lv(t − 1) but this change is due to a tie.
Then the Algorithm converges, independently of the tie management rule.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 15
Hint: The number of monochromatic edges in the network increases at each propagation step
Our Proposal: Experimental Results
• Analyzed Networks:
1. Zachary’s Karate: Social network of friendships between 34 members of a karate club at a US university in the 1970s [Zac77];
2. Dolphins: Social network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand [LSB+03];
3. Football: Network of American football games between Division IA colleges during regular season Fall 2000 [GN02];
4. NetScience: Coauthorship network of scientists working on network theory and experiment, as compiled by M. Newman in May 2006 [New06a];
5. Power: A network representing the topology of the Western States Power Grid of the United State [WS98];
6. Internet: A symmetrized snapshot of the structure of the Internet at the level of autonomous systems, reconstructed from BGP tables posted by the University of Oregon Route Views [New06b];
7. Cond-Mat: Coauthorships between scientists posting preprints between Jan 1, 1995 and June 30, 2003 on the Condensed Matter E-Print Archive [New01].
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 16
Our Proposal: Experimental Results Test Settings:
Each test is identified by a triple (P,N,T) where
P, indicate the algorithm
N, indicate the network
T, the tie resolution strategy
Overall 56 different test settings have been considered.
Test Execution: i. Initial labeling (randomly chosen) and network coloring (if needed)
ii. LPA execution (Stop criteria (c1))
iii. Community identification
iv. Community structure evaluation (quality and stability)
Quality of Communities: We measure the quality of a network partition using the Modularity [NG04]:
The modularity of a given partion C of a graph G=(V,E) is defined as
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 17
Our Proposal: Experimental Results
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 18
Mo
du
lari
ty
Our Proposal: Experimental Results
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 19
Nu
mb
er o
f st
ages
Our Proposal: Experimental Results
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 20
Mo
du
lari
ty s
tan
dar
d d
evia
tio
n
Conclusion
• Semi–synchronous LPA is
• Effective
• Efficient
• Stable
• Several problems related to the convergence of LPAs and their speed are still open.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 21
Thanks for your attention
Gennaro Cordasco
Luisa Gargano