a community-based model of online social networkskroon/pubs/botha2010community... · a...

1
ELECTRONIC MEDIA LAB A Community-Based Model of Online Social Networks Leendert Botha and Steve Kroon [email protected], [email protected] 1. Problem Statement An accurate random graph model for social networks (SNs) can help provide: insight into how SNs grow; a basis for SN analysis without violating pri- vacy; and a test bed for algorithms and novel data struc- tures. 2. Solution We propose a model for generating SNs. Our community-based model simulates the growth of SNs over time, focusing on reproducing distinctive properties of SNs, including low average separation; high level of clustering; and a power-law degree distribution. 3. Our approach We propose a community-based approach, first modeling the community structure and then translating that model into a SN. A major advantage of this approach is that it is very intuitive with an obvious correspondence to real-world behavior where people meet new friends through the communities they belong to. 2 1 4 3 5 A 1 2 4 11 3 Communities Users C B 5 6 9 7 8 8 6 10 7 9 11 10 A B C 4. Model definition We develop a bipartite graph B representing a community structure as follows: 1. Community nodes, user nodes and connections are added at different rates. 2. With users we associate activity values, with communities density values, and with connections, com- mitment values. 3. The mechanism for creating connections is as follows: A node, u j , is chosen preferentially based on activity. A community c is selected preferentially based on the commitments of u j . c i is selected from the set of communities u j is not a member of, using PA based on the overlap between c and these communities. (The overlap θ (c, c k ) is the number of mutual members of c and c k ). The user node u i is connected to the community node c k . 5. Building Social Network Whenever a user node u j is connected to a commu- nity node c i in B , u j is connected in the SN to each member u i of c k with probability f (δ ik jk ,d k ) exp - 1 δ ik + 1 δ jk + 1 d k The final probability that two users u i and u j will be connected in the completed SN is given by: P (e ij )= r X k=1 " f (u i ,u j ,c k ) · k-1 Y l=1 (1 - f (u i ,u j ,c l )) # with the sum over their mutual communities. 7. Current Work Shrinking diameters and densification power laws: In a real-world SN, the connections grow super- linearly in the number of users and the diameter shrinks over time. We are currently investigating if and when our model replicates this behavior. Parameter estimation: We are currently implementing an automated parameter estimation technique based on simulated annealing. 6. Results Clustering coefficient: The leftmost figure shows the evolution of the clustering coefficients of the true data (FN) and the fitted models (PA [1], GL [2] and Our Model). Our model provides the best fit and is the only model to capture the initial growth period of the network in which the clustering increases with network size. Average Separation: The centre figure shows a histogram of the shortest path lengths. Our model matches the histogram noticeably better than the other two models, both of which overproduce shorter paths and fail to produce paths of longer length. Degree distribution: The rightmost figure shows the evolution of the power-law parameters of the degree distribution. The PA model provides a very bad fit, whereas our model and the GL model yields a good fit at the end of the simulation, with our model the only one to show the same downward trend in α, although it decreases more rapidly than in the true data. References [1] A. Barabasi and R. Albert. Emergence of scaling in ran- dom networks. Science, 286(5439):509, 1999. [2] J. Guillaume and M. Latapy. Bipartite graphs as models of complex networks. Physica A: Statistical Mechanics and its Applications, 371(2):795–813, 2006. ACKNOWLEDGEMENTS: MIH for supporting the research project. Brian Amberg for the poster style sheet: http://www. brian-amberg.de/uni/poster/.

Upload: others

Post on 12-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Community-Based Model of Online Social Networkskroon/pubs/botha2010community... · A Community-Based Model of Online Social Networks Leendert Botha and Steve Kroon lwbotha@ml.sun.ac.za,

EL ECTRON ICMEDIA L A B

A Community-Based Model of Online Social NetworksLeendert Botha and Steve Kroon

[email protected], [email protected]

1. Problem StatementAn accurate random graph model for social networks(SNs) can help provide:

• insight into how SNs grow;

• a basis for SN analysis without violating pri-vacy; and

• a test bed for algorithms and novel data struc-tures.

2. SolutionWe propose a model for generating SNs. Ourcommunity-based model simulates the growth ofSNs over time, focusing on reproducing distinctiveproperties of SNs, including

• low average separation;

• high level of clustering; and

• a power-law degree distribution.

3. Our approachWe propose a community-based approach, first modeling the community structure and then translatingthat model into a SN. A major advantage of this approach is that it is very intuitive with an obviouscorrespondence to real-world behavior where people meet new friends through the communities they belongto.

21

43

5

A

1 2 4 113

Communities

Users

CB

5 6 97 8

8

6 10

7

9

11

10

A

B

C

4. Model definitionWe develop a bipartite graph B representing a community structure as follows:

1. Community nodes, user nodes and connections are added at different rates.

2. With users we associate activity values, with communities density values, and with connections, com-mitment values.

3. The mechanism for creating connections is as follows:

• A node, uj , is chosen preferentially based on activity.

• A community c is selected preferentially based on the commitments of uj .

• ci is selected from the set of communities uj is not a member of, using PA based on the overlapbetween c and these communities. (The overlap θ(c, ck) is the number of mutual members of cand ck).

• The user node ui is connected to the community node ck.

5. Building Social NetworkWhenever a user node uj is connected to a commu-nity node ci in B, uj is connected in the SN to eachmember ui of ck with probability

f(δik, δjk, dk) ∝ exp[−(

1δik

+1δjk

+1dk

)]The final probability that two users ui and uj willbe connected in the completed SN is given by:

P (eij) =r∑

k=1

[f(ui, uj , ck) ·

k−1∏l=1

(1− f(ui, uj , cl))

]

with the sum over their mutual communities.

7. Current WorkShrinking diameters and densification power laws: In a real-world SN, the connections grow super-linearly in the number of users and the diameter shrinks over time. We are currently investigating if andwhen our model replicates this behavior.

Parameter estimation: We are currently implementing an automated parameter estimation techniquebased on simulated annealing.

6. Results

2000 4000 6000 8000 10000 12000Nodes, n

10-2

10-1

CC

FN

PA

GL

Our Model

0 2 4 6 8 10 12Path Distance

101

102

103

104

105

106

107

108

Count

FN

PA

GL

Our Model

2000 4000 6000 8000 10000 12000Nodes, n

1.8

2.0

2.2

2.4

2.6

2.8

� FN

PA

GL

Our Model

Clustering coefficient: The leftmost figure shows the evolution of the clustering coefficients of the truedata (FN) and the fitted models (PA [1], GL [2] and Our Model). Our model provides the best fit and isthe only model to capture the initial growth period of the network in which the clustering increases withnetwork size.

Average Separation: The centre figure shows a histogram of the shortest path lengths. Our modelmatches the histogram noticeably better than the other two models, both of which overproduce shorterpaths and fail to produce paths of longer length.

Degree distribution: The rightmost figure shows the evolution of the power-law parameters of the degreedistribution. The PA model provides a very bad fit, whereas our model and the GL model yields a good fitat the end of the simulation, with our model the only one to show the same downward trend in α, althoughit decreases more rapidly than in the true data.

References

[1] A. Barabasi and R. Albert. Emergence of scaling in ran-dom networks. Science, 286(5439):509, 1999.

[2] J. Guillaume and M. Latapy. Bipartite graphs as modelsof complex networks. Physica A: Statistical Mechanicsand its Applications, 371(2):795–813, 2006.

ACKNOWLEDGEMENTS:

• MIH for supporting the research project.

• Brian Amberg for the poster style sheet: http://www.

brian-amberg.de/uni/poster/.

1