Hao-Shang Ma and Jen-Wei Huang
Knowledge and Information Discovery Lab,Dept. of Electrical Engineering,
National Cheng Kung University
The 7th Workshop on Social Network Mining and Analysis (SNA-KDD'13) joint with the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'13)
CUT: Community Update and Tracking in Dynamic Social Networks
About Me Jen-Wei Huang (黃仁暐 )
Knowledge and Information Discovery Lab
Dept. of Electrical Engineering,
National Cheng Kung University
Email: jwhuang @ mail.ncku.edu.tw
http://kid.ee.ncku.edu.tw2013/11/22
KID Lab, National Cheng Kung University 2
ResearchData Mining and Database
◦Time Series Mining◦Social Network Analysis
Multimedia Information RetrievalUbiquitous Computing
◦Mobile Computing◦Cloud Computing
Bioinformatics
2013/11/22KID Lab, National Cheng Kung University 3
KID Lab, National Cheng Kung University
4
OutlineIntroductionCUT AlgorithmExperimentsConclusionsReferences
2013/11/22
IntroductionSocial networking websites allow
users to establish their own personal communities or social networks based on relationships of friends.
2012/10/12KID Lab, National Cheng Kung University 5
http://www.facebook.com/ http://twitter.com/
KID Lab, National Cheng Kung University
6
IntroductionBased on the relationships
between users, social networks exhibit a community structure.
2013/11/22
KID Lab, National Cheng Kung University
7
IntroductionThe detection of communities in a
network usually puts network nodes into groups in such a way that nodes in the same group are densely connected to one another.
An objective function is chosen to determine the quality of a community.
Modularity [1] is a measure of the quality of a partition in terms of the number of intra-community and inter-community edges.
2013/11/22
KID Lab, National Cheng Kung University
8
IntroductionSocial networks are always changing
with the time. We want to quickly and efficiently
identify the community structures of a network at every timestamp.
Updating the network structure by tracking previously known information instead of recalculating the relationships of all nodes and edges in the networks.
2013/11/22
KID Lab, National Cheng Kung University
9
IntroductionIn this work, we define the seed
of community, which is a collection of 3-cliques where any two of 3-cliques share more than one edge.
By tracking seed of communities, we are able to efficiently update and track the dynamics of communities in a social network.
2013/11/22
KID Lab, National Cheng Kung University
11
CUT AlgorithmWe propose CUT algorithm,
standing for Community Update and Tracking algorithm, to update and track seed of communities.
There are two phases in CUT algorithm.◦Initial phase, executed only once.
Find seed of communities Extend seed of communities to
communities
◦Update and Tracking phase Maintain and update CAB graph
2013/11/22
KID Lab, National Cheng Kung University
12
Find Seed of Communities1. Find all 3-cliques in a network 2. Build CBA (Clique Bipartite
Adjacent) graph 3. Determine the seed of
communities in a network
2013/11/22
KID Lab, National Cheng Kung University
17
Determine Seed of Community DFS-like algorithm to find connected
component
2013/11/22
KID Lab, National Cheng Kung University
18
CAB GraphThe complexity of tracking CAB is
lower than that of tracking the original graph ◦Complexity of building CAB is O(3|
C|)=O(|C|) ◦Complexity of determining the
connected component is O(3|C|)=O(|C|)
Easy to combine or split the seeds of community
2013/11/22
KID Lab, National Cheng Kung University
19
Extend to CommunitiesIgnore the sparse nodes whose degree
is smaller than 2. Assign the remain nodes to the closest
seed of communityClosest: the seed of community which has
the most links to the node
2013/11/22
KID Lab, National Cheng Kung University
20
Update and Tracking Phase
Maintain and Update CAB Graph◦If there are some changes in the
network, do the following cases Case 1: New nodes & new edges are
added Case 2: Old nodes & edges are removed
Extend to Communities
2013/11/22
KID Lab, National Cheng Kung University
21
Case 1: Merge and JoinNew Node : 20,21New Edge : (2,8)(5,20), (9,20),(11,21)
New 3-cliques: (2,6,8) and (5,9,20)
2013/11/22
KID Lab, National Cheng Kung University
22
Case 1: Merge and Join=(), =() If any two edges link to different seeds of
communities, Si and Sj , we merge(Si, Sj )Else if any edge of Ck links to any Si then we
Join(Si, Ck )
Complexity is O(3*| new C |) = O(| new C |)
2013/11/22
KID Lab, National Cheng Kung University
23
Case 2: Split and RemovalIf there are nodes removed , we find
all edges which connect to the removed nodes
2013/11/22
N10 is removed.
Therefore,(4,10),(6,10)(8,10),(10,12)(10,11) are removed.
KID Lab, National Cheng Kung University
24
Node Removed Case - SplitRemove corresponding edges and cliquesRun FindSeedofCommunity algorithm
again to update to new seeds of communities
Complexity is O(3|C|+| removed C |)2013/11/22
KID Lab, National Cheng Kung University
25
Joint CaseThere are new nodes added and edges removed at the same time
2013/11/22
KID Lab, National Cheng Kung University
26
Joint CaseWe simply deal with the Case 1 first, and
then deal with the Case 2 so that we can decrease the unnecessary splits.
Finally, extend seed of communities to communities.
2013/11/22
KID Lab, National Cheng Kung University
27
Related Works - Update the Community StructureNam P. Nguyen et al. propose a QCA
algorithm. [9]◦ The QCA algorithm uses the already known
community structure, and deal with the changing cases, new nodes, new edges, nodes removed, and edges removed based on modularity.
◦ In QCA algorithm, they keep the whole community structure at each timestamp.
◦ Using original CPM in removed case every time, which cost lots of time.
◦ They have to identifying the nodes or edges belong to which type of cases. It costs much time as well.
2013/11/22
KID Lab, National Cheng Kung University
28
ExperimentsCoauthor network
(2002~2010)◦ 1. About 20000
authors in one network
◦ 2. Densely connected graph
◦ 3. Five years as a time period, t1 is 2002-2006 (first update)
◦ 4. Variations of network at each time stamp are small
2013/11/22
KID Lab, National Cheng Kung University
30
Experimentsp2p-Gnutella
network◦ 1. t1-t4 is a
snapshot from August 4 to 7 2002, about 6000 nodes
◦ 2. Sparse connected graph
◦ 3. Variations of network at each time stamp are large.
2013/11/22
KID Lab, National Cheng Kung University
32
ConclusionsWe design CUT algorithm for updating
community structures in dynamic social networks instead of recalculating relationships of all nodes and edges in the social network.
Keeping seeds of communities in the memory at each timestamp is more efficient than keeping all communities.
Using Clique Adjacent Bipartite graph to update and track seeds of community leads to lower complexity.
2013/11/22
KID Lab, National Cheng Kung University
33
References1. M. E. J. Newman and M. Girvan, “Finding and
evaluating community structure in networks,” Phy. Rev. E 69, 2004.
2. Bowen Yan and Steve Gregory,” Detecting Communities in Networks by Merging Cliques,” ICIS, 2009.
3. CLAUSET, G., NEWMAN, M. E. and MOORE, C., “Finding community structure in very large networks,” Phys. Rev. E 70, 066111, 2004.
4. Zhengzhang Chen, Kevin A. Wilson, Ye Jin, William Hendrix and Nagiza F. Samatova, “Detecting and Tracking Community Dynamics in Evolutionary Networks,” ICDMW, 2010.
2013/11/22
KID Lab, National Cheng Kung University
34
References5. Yi Wang, Bin Wu, and Xin Pei, “CommTracker: A Core-
Based Algorithm of Tracking Community Evolution,” ADMA, 2008.
6. Nam P. Nguyen, Thang N. Dinh, Ying Xuan, and My T. Thai. “Adaptive Algorithms for Detecting Community Structure in Dynamic Social Networks,” INFOCOM, 2011.
7. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte and Etienne Lefebvre,”Fast unfolding of communities in large networks,” JSTAT, 2008.
8. Nan Du, Bin Wu, Xin Pei, Bai Wang and Liutong Xu,” Community Detection in Large-Scale Social Networks,” SNA-KDD, 2007.
2013/11/22