link prediction in online social networks using group...
TRANSCRIPT
IntroductionProposal
Experimental EvaluationConclusion
Link Prediction in Online Social NetworksUsing Group Information
Jorge Valverde-Rebazaand
Alneu de Andrade Lopes
Laboratory of Computational Intelligence (LABIC)University of São Paulo (USP)
Brazil
July 2014
IntroductionProposal
Experimental EvaluationConclusion
Outline
1 Introduction
2 Proposal
3 Experimental Evaluation
4 Conclusion
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 2
IntroductionProposal
Experimental EvaluationConclusion
Social NetworksGroups DetectionLink Prediction
Outline
1 Introduction
2 Proposal
3 Experimental Evaluation
4 Conclusion
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 3
IntroductionProposal
Experimental EvaluationConclusion
Social NetworksGroups DetectionLink Prediction
Social Networks
Structure made up of a set of actors (individual ororganizations) and social relations between them
Social network analysis is an interesting research field ingraph and complex network theory, data mining, machinelearning and other areas
Rise of online social networks
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 4
IntroductionProposal
Experimental EvaluationConclusion
Social NetworksGroups DetectionLink Prediction
Groups Detection
Real networks are characterized by high concentration oflinks within special groups of vertices and lowconcentrations of links between these groups
Online social networks offer a wide variety of possiblegroups: families, working and friendship circles, artistic oracademic preferences, towns, nations, etc.
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 5
IntroductionProposal
Experimental EvaluationConclusion
Social NetworksGroups DetectionLink Prediction
Link Prediction Process
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 6
IntroductionProposal
Experimental EvaluationConclusion
Social NetworksGroups DetectionLink Prediction
Link Prediction Measures
Based on global informationHigher accuracy
Very time-consuming computation
Usually infeasible for large-scalenetworks
E.g.: Katz index, Hitting time index,Simrank, etc. [Lü and Zhou, 2011]
Based on local information
Lower accuracy than measures based onglobal information
Faster computation
E.g.: Common neighbors (CN), Adamic Adar(AA), Jaccard (Jac), Resource Allocation(RA), Preferential Attachment (PA), etc.[Lü and Zhou, 2011]
Hybrid strategy based on communityinformation
As the community structure grows, the accuracy ofthese measures drastically improves
Perform better than most of measures based onlocal information
E.g.: PFF [Zheleva et al., 2010], CN1, RA1[Soundarajan and Hopcroft, 2012], WIC, W-measures[Valverde-Rebaza and Lopes, 2012], etc.
A node belongsto just one group
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 7
IntroductionProposal
Experimental EvaluationConclusion
Social NetworksGroups DetectionLink Prediction
Link Prediction Measures
Based on global informationHigher accuracy
Very time-consuming computation
Usually infeasible for large-scalenetworks
E.g.: Katz index, Hitting time index,Simrank, etc. [Lü and Zhou, 2011]
Based on local information
Lower accuracy than measures based onglobal information
Faster computation
E.g.: Common neighbors (CN), Adamic Adar(AA), Jaccard (Jac), Resource Allocation(RA), Preferential Attachment (PA), etc.[Lü and Zhou, 2011]
Hybrid strategy based on communityinformation
As the community structure grows, the accuracy ofthese measures drastically improves
Perform better than most of measures based onlocal information
E.g.: PFF [Zheleva et al., 2010], CN1, RA1[Soundarajan and Hopcroft, 2012], WIC, W-measures[Valverde-Rebaza and Lopes, 2012], etc.
A node belongsto just one group
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 7
IntroductionProposal
Experimental EvaluationConclusion
Social NetworksGroups DetectionLink Prediction
Link Prediction Measures
Based on global informationHigher accuracy
Very time-consuming computation
Usually infeasible for large-scalenetworks
E.g.: Katz index, Hitting time index,Simrank, etc. [Lü and Zhou, 2011]
Based on local information
Lower accuracy than measures based onglobal information
Faster computation
E.g.: Common neighbors (CN), Adamic Adar(AA), Jaccard (Jac), Resource Allocation(RA), Preferential Attachment (PA), etc.[Lü and Zhou, 2011]
Hybrid strategy based on communityinformation
As the community structure grows, the accuracy ofthese measures drastically improves
Perform better than most of measures based onlocal information
E.g.: PFF [Zheleva et al., 2010], CN1, RA1[Soundarajan and Hopcroft, 2012], WIC, W-measures[Valverde-Rebaza and Lopes, 2012], etc.
A node belongsto just one group
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 7
IntroductionProposal
Experimental EvaluationConclusion
Social NetworksGroups DetectionLink Prediction
Link Prediction Measures
Based on global informationHigher accuracy
Very time-consuming computation
Usually infeasible for large-scalenetworks
E.g.: Katz index, Hitting time index,Simrank, etc. [Lü and Zhou, 2011]
Based on local information
Lower accuracy than measures based onglobal information
Faster computation
E.g.: Common neighbors (CN), Adamic Adar(AA), Jaccard (Jac), Resource Allocation(RA), Preferential Attachment (PA), etc.[Lü and Zhou, 2011]
Hybrid strategy based on communityinformation
As the community structure grows, the accuracy ofthese measures drastically improves
Perform better than most of measures based onlocal information
E.g.: PFF [Zheleva et al., 2010], CN1, RA1[Soundarajan and Hopcroft, 2012], WIC, W-measures[Valverde-Rebaza and Lopes, 2012], etc.
A node belongsto just one group
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 7
IntroductionProposal
Experimental EvaluationConclusion
PreliminaryWOCGCNGTPOG
Outline
1 Introduction
2 Proposal
3 Experimental Evaluation
4 Conclusion
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 8
IntroductionProposal
Experimental EvaluationConclusion
PreliminaryWOCGCNGTPOG
Preliminary
We consider that each node participates in multiple groups
In the network G(V ,E) exists M > 1 groups identified bydifferent group labels g1,g2, . . .gM
Each node x belongs to a set of node groupsG = {ga,gb, . . .gp} with size P > 0 and P ≤ M
The set of neighbors of a vertex x is Γ(x) = {y | (x , y) ∈ E}The set of all common neighbors (CN) of a vertex pair(x , y) is Λx ,y = Γ(x) ∩ Γ(y)
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 9
IntroductionProposal
Experimental EvaluationConclusion
PreliminaryWOCGCNGTPOG
CN Within and Outside of Common Groups (WOCG)
Considering Gα,β = Gα ∩ Gβ
We redefine the set of CN as Λx ,y = ΛWCGx ,y ∪ ΛOCG
x ,y
ΛWCGx,y = {zGγ ∈ Λx,y | Gα,β ∩ Gγ 6= ∅} - the set of common
neighbors within common groups (WCG)
ΛOCGx,y = Λx,y − ΛWCG
x,y - the set of common neighbors outsideof the common groups (OCG)
Our final score, called as common neighbors within andoutside of common groups (WOCG) measure, is definedas:
sWOCGx ,y =
|ΛWCGx ,y ||ΛOCG
x ,y |(1)
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 10
IntroductionProposal
Experimental EvaluationConclusion
PreliminaryWOCGCNGTPOG
Common Neighbors of Groups (CNG)
We define the set of common neighbors of groups asΛG
x ,y = {zGγ ∈ Λx ,y | Gα ∩ Gγ 6= ∅ ∨ Gβ ∩ Gγ 6= ∅}
Our final score, called as common neighbors of groups(CNG), is defined as:
sCNGx ,y = |ΛG
x ,y | (2)
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 11
IntroductionProposal
Experimental EvaluationConclusion
PreliminaryWOCGCNGTPOG
CN with Total and Partial Overlapping of Groups(TPOG)
We redefine the set of CNG as ΛGx ,y = ΛTOG
x ,y ∪ ΛPOGx ,y
ΛTOGx,y = {zGγ ∈ ΛG
x,y | Gα ∩ Gγ 6= ∅ ∧ Gβ ∩ Gγ 6= ∅} - the setof CN with total overlapping of groups (TOG)
ΛPOGx,y = ΛG
x,y − ΛTOGx,y - the set of CN with partial overlapping
of groups (POG)
Our final score, called as the common neighbors withtotal and partial overlapping of groups (TPOG)measure, is defined as:
sTPOGx ,y =
|ΛTOGx ,y ||ΛPOG
x ,y |(3)
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 12
IntroductionProposal
Experimental EvaluationConclusion
DatasetsExperimental setupResults
Outline
1 Introduction
2 Proposal
3 Experimental Evaluation
4 Conclusion
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 13
IntroductionProposal
Experimental EvaluationConclusion
DatasetsExperimental setupResults
Datasets
Table : High-level topological features of our four social networks[Mislove et al., 2007]
Flickr LiveJournal Orkut YoutubeNumber of nodes 1,846,198 5,284,457 3,072,441 1,157,827Number of links 22,613,981 77,402,652 223,534,301 4,945,382Average degree per node 12.24 16.97 106.1 4.29Fraction of links symmetric 62.0% 73.5% 100.0% 79.1%
Average path length 5.67 5.88 4.25 5.10Diameter 27 20 9 21Average clustering coefficient 0.313 0.330 0.171 0.136Average assortativity coefficient 0.202 0.179 0.072 −0.033Number of node groups 103,648 7,489,073 8,730,859 30,087Average number of groups membership per node 4.62 21.25 106.44 0.25Average group size 82 15 37 10Average group clustering coefficient 0.47 0.81 0.52 0.34
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 14
IntroductionProposal
Experimental EvaluationConclusion
DatasetsExperimental setupResults
Experimental setup
For a network G(V ,E), the set E is divided into the trainingset, ET , and the test set, EP
For EP are randomly selected 2/3 of links formed bynodes with average degree two times greater than theaverage. The remaining links constitute ET . This isperformed 10 times for each network
Evaluate traditional local measures: CN, AA, Jac, RA andPA, and our proposals: WOCG, CNG and TPOG
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 15
IntroductionProposal
Experimental EvaluationConclusion
DatasetsExperimental setupResults
Experimental setup
For each network, create 5 types of characteristic vectorswere considered : VLocal (all the local measures),VGroup (all our proposals), VTop (three best localmeasures - CN, AA and RA - and two best of ourproposals - CNG and TPOG), VTop2 (the five best overallmeasures: TPOG, CNG, AA, WOCG and CN) and VTotal(all measures evaluated)
Table : Number of instances by class for all networks
Existent Non-existent TotalFlickr 500001 500001 1000002LiveJournal 300001 300001 600002Orkut 1500001 1500001 3000002Youtube 20001 20001 40002
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 16
IntroductionProposal
Experimental EvaluationConclusion
DatasetsExperimental setupResults
Unsupervised results measured by AUC
Table : Prediction results measured by AUC
WOCG CNG TPOG CN AA Jac RA PAFlickr 0.637 (5.0) 0.728 (1.0) 0.728 (2.0) 0.674 (3.0) 0.656 (4.0) 0.431 (8.0) 0.616 (6.0) 0.566 (7.0)Livejournal 0.596 (4.0) 0.611 (3.0) 0.665 (1.0) 0.582 (5.0) 0.580 (6.0) 0.624 (2.0) 0.565 (7.0) 0.542 (8.0)Orkut 0.649 (2.0) 0.621 (3.0) 0.651 (1.0) 0.572 (7.0) 0.620 (4.0) 0.575 (6.0) 0.566 (8.0) 0.602 (5.0)Youtube 0.434 (7.0) 0.723 (5.0) 0.555 (6.0) 0.834 (4.0) 0.928 (1.0) 0.217 (8.0) 0.892 (3.0) 0.917 (2.0)Average rank 4.50 (4.0) 3.00 (2.0) 2.50 (1.0) 4.75 (5.0) 3.75 (3.0) 6.00 (7.5) 6.00 (7.5) 5.50 (6.0)
1 2 3 4 5 6 7 8
TPOGCNG
AAWOCG CN
PAJacRA
CD
Figure : Post-hoc test for results with CD = 5.25
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 17
IntroductionProposal
Experimental EvaluationConclusion
DatasetsExperimental setupResults
Unsupervised results measured by AUC
Table : Prediction results measured by AUC
WOCG CNG TPOG CN AA Jac RA PAFlickr 0.637 (5.0) 0.728 (1.0) 0.728 (2.0) 0.674 (3.0) 0.656 (4.0) 0.431 (8.0) 0.616 (6.0) 0.566 (7.0)Livejournal 0.596 (4.0) 0.611 (3.0) 0.665 (1.0) 0.582 (5.0) 0.580 (6.0) 0.624 (2.0) 0.565 (7.0) 0.542 (8.0)Orkut 0.649 (2.0) 0.621 (3.0) 0.651 (1.0) 0.572 (7.0) 0.620 (4.0) 0.575 (6.0) 0.566 (8.0) 0.602 (5.0)Youtube 0.434 (7.0) 0.723 (5.0) 0.555 (6.0) 0.834 (4.0) 0.928 (1.0) 0.217 (8.0) 0.892 (3.0) 0.917 (2.0)Average rank 4.50 (4.0) 3.00 (2.0) 2.50 (1.0) 4.75 (5.0) 3.75 (3.0) 6.00 (7.5) 6.00 (7.5) 5.50 (6.0)
1 2 3 4 5 6 7 8
TPOGCNG
AAWOCG CN
PAJacRA
CD
Figure : Post-hoc test for results with CD = 5.25
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 17
IntroductionProposal
Experimental EvaluationConclusion
DatasetsExperimental setupResults
Unsupervised results measured by precision
00 1,000 2,500 5,0000
0.20.40.60.8
1
L
Pre
cisi
on
(a) Flickr
00 1,000 2,500 5,0000
0.20.40.60.8
1
L
Pre
cisi
on
(b) LiveJournal
00 1,000 2,500 5,0000
0.20.40.60.8
1
L
Pre
cisi
on
(c) Orkut
00 1,000 2,500 5,0000
0.20.40.60.8
1
LP
reci
sion
(d) Youtube
WOCG CNG TPOG CN AA Jac RA PA
Figure : Precision results on four social networks evaluated
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 18
IntroductionProposal
Experimental EvaluationConclusion
DatasetsExperimental setupResults
Supervised results measured by f-measure
Table : Average of F-measure on four social networks
J48 NB MLP SMO J48 NB MLP SMOFlickr VLocal 0.777 0.507 0.713 0.651 Orkut VLocal 0.825 0.702 0.800 0.764Flickr VGroup 0.706 0.583 0.699 0.668 Orkut VGroup 0.781 0.676 0.773 0.737Flickr VTop 0.724 0.525 0.711 0.676 Orkut VTop 0.799 0.720 0.77 0.759Flickr VTop2 0.722 0.558 0.709 0.669 Orkut VTop2 0.793 0.722 0.773 0.758Flickr VTotal 0.777 0.548 0.712 0.680 Orkut VTotal 0.826 0.731 0.801 0.771LiveJournal VLocal 0.797 0.687 0.788 0.774 Youtube VLocal 0.823 0.531 0.73 0.565LiveJournal VGroup 0.768 0.698 0.768 0.750 Youtube VGroup 0.658 0.563 0.655 0.567LiveJournal VTop 0.791 0.700 0.787 0.772 Youtube VTop 0.789 0.543 0.724 0.617LiveJournal VTop2 0.79 0.691 0.781 0.772 Youtube VTop2 0.780 0.600 0.717 0.613LiveJournal VTotal 0.797 0.702 0.786 0.774 Youtube VTotal 0.826 0.577 0.723 0.623
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 19
IntroductionProposal
Experimental EvaluationConclusion
Conclusion
Outline
1 Introduction
2 Proposal
3 Experimental Evaluation
4 Conclusion
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 20
IntroductionProposal
Experimental EvaluationConclusion
Conclusion
Conclusion
Our proposals consider that a node can belong to morethan one group, as usually occurs in real networks
In an unsupervised strategy, our proposals outperform thelocal measures but there is no statistically significantwinner
In a supervised strategy, our proposals combined with localmeasures may improve the performance of classifiers
In general, our proposals improve the performance of linkprediction task by considering mainly the information ofcommon groups to which users belong to
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 21
IntroductionProposal
Experimental EvaluationConclusion
Conclusion
References
Lü, L. and Zhou, T. (2011).Link prediction in complex networks: A survey.Physica A: Statistical Mechanics and itsApplications, 390(6):1150 – 1170.
Mislove, A., Marcon, M., Gummadi, K. P.,Druschel, P., and Bhattacharjee, B. (2007).Measurement and analysis of online socialnetworks.In ACM SIGCOMM IMC ’07, pages 29–42.
Soundarajan, S. and Hopcroft, J. (2012).Using community information to improve theprecision of link prediction methods.In Proceedings of the 21st InternationalConference Companion on World Wide Web,WWW ’12 Companion, pages 607–608.
Valverde-Rebaza, J. and Lopes, A. (2012).Link Prediction in Complex Networks Based onCluster Information.In Advances in Artificial Intelligence - SBIA2012, Lecture Notes in Computer Science,pages 92–101. Springer Berlin Heidelberg.
Zheleva, E., Getoor, L., Golbeck, J., and Kuter,U. (2010).Using friendship ties and family circles for linkprediction.In Proceedings of the Second InternationalConference on Advances in Social NetworkMining and Analysis, SNAKDD’08, pages97–113, Berlin, Heidelberg.
Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 22
IntroductionProposal
Experimental EvaluationConclusion
Conclusion
Thank you
Jorge Carlos Valverde-Rebaza