recommender system based on modularity

1
Recommender System Based on Modularity Maria A. A. Sibaldo 1,2 , Tiago B. A. de Carvalho 1,2 , Tsang Ing Ren 1 , George D. C. Cavalcanti 1 1 Centro de Informática, UFPE, Recife, Brasil 2 Unidade Acadêmica de Garanhuns, UFRPE, Garanhuns, Brasil www.cin.ufpe.br/viisar {maas2, tbac, tir, gdcc}@cin.ufpe.br 1. Introduction This work presents a solution for the RecSys Challenge 2014 by performing a clustering in a bipartite graph whose vertices are of two types: user and item, having the edges as the engagement given to a tweet. The Modularity metric was used to form the groups that contain users and movies. 2. The Proposed Approach 2.1 Bipartite Graph The generated bipartite graph is modeled on an adjacency matrix composed by all nodes of both classes: user and item, this results in a sparse graph shown in Figure 1. 0 0.5 1 1.5 2 2.5 3 3.5 x 10 4 0 0.5 1 1.5 2 2.5 3 3.5 x 10 4 non zeros = 15964 Figure 1: Adjacency matrix formed by users (1-22079) and items (22080-35697). Figure 2 shows the power-law degree distribution of the one-mode item graph, then our bipartite graph can be considered a scale-free network (Birmele [2009]). 0 1 2 3 4 5 6 7 8 9 0 2 4 6 One-mode User Graph log(degree) log(frequency) 0 1 2 3 4 5 6 7 8 0 2 4 6 One-mode Item Graph log(degree) log(frequency) Figure 2: Degree distribution of the user and item one-mode graphs. 2.2 Louvain Algorithm and Modularity The Louvain algorithm (Blondel et al. [2008]) is used for clustering the bipartite graph (see Figure 3) based on the Modularity metric (Newman and Girvan [2004]): Q = 1 2m X ij A ij - k i k j 2m δ (c i ,c j ), (1) where A ij represents the weight in the edge that links i and j , ij A ij is the sum of the weights of every edge in the graph, k i = j A ij is the sum of the weigths that has endpoint in i, c i is the community to which the vertex i is assigned. The function δ (c i ,c j ) is 1 if c i = c j and 0 otherwise, and m = 1 2 ij A ij . Figure 3: Clusters formed by the Louvain algorithm: G 1 = U 1,U 2,I 1,I 2,I 3 e G 2 = U 3,U 4,I 4,I 5. 2.3 Engagement estimation Let k be the vertex that represents the item and g k is the group label to which the item k belongs: V k is the set that contains every vertex that is in the same group as the k vertex: V k = {v |g v = g k }; (2) W k contains every weight bigger than zero of every edge linking a vertex that is part of the same group of k , V is the set of all vertices in the graph: W k = {w ab |w ab > 0,a V k ,b V }, (3) ˆ w ku is the estimated weight for the edge linking the item k to any user u: ˆ w ku = 1 |W k | X aV k ,bV w ab , (4) w ku W k , |W k | is the number of elements of W k . Motivated by the NDCG@10 evaluation (Loiacono et al. [2014]), we redefined the recommendation to a rank function. Therefore, the number of engagement for each tweet is calculated according to Equation 5: eng iu w iu + rate iu + 10 × tweet_retweeted iu , (5) where eng iu is the estimated engagement for a tweet, rate iu is the rating given by the user u to the item i and tweet_retweeted iu is a boolean value. The eng iu value that are bigger than user _followers_count, are set to this attribute value. 3. Experiments and Results All zero Set all the engagements of the test dataset to 0. Clusters For each tweet in the test dataset, we obtain the item k posted in the tweet and obtain the average ˆ w iu , that was set as the engagement value of that tweet. Improved clusters Adding to ˆ w iu the user rate value to that item and multiplying by 10 the attribute tweet_retweeted. If the estimated engagement is bigger than the user _followers_count attribute, it is set to the value of the this attribute. Table 1: NDCG@10 evaluation value for the strategies Strategy Evaluation All zero 0.7494269049198918 Clusters 0.7901253952498258 Improved clusters 0.8279531044818939 4. Conclusions It is not necessary to transform the bipartite graph into one-mode graph; Engagement estimation obtained using the Modularity metric; This tecnique can be also used with the data in a time window. References Etienne Birmele. A scale-free graph model based on bipartite graphs. Discrete Ap- plied Mathematics, 157(10):2267–2284, 2009. Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of community hierarchies in large networks. CoRR, abs/0803.0476, 2008. Daniele Loiacono, Andreas Lommatzsch, and Roberto Turrin. Recsys challenge 2014: Learning to rank. 2014. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in net- works. Physical Review E, 69(2):026113, February 2004. This work was partially supported by CIn/UFPE and Brazilian agencies: CNPq, CAPES and FACEPE.

Upload: inden

Post on 09-Aug-2015

79 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Recommender System Based on Modularity

Recommender System Based onModularity

Maria A. A. Sibaldo1,2, Tiago B. A. de Carvalho1,2,Tsang Ing Ren1, George D. C. Cavalcanti1

1Centro de Informática, UFPE, Recife, Brasil2Unidade Acadêmica de Garanhuns, UFRPE, Garanhuns, Brasil

www.cin.ufpe.br/∼viisar{maas2, tbac, tir, gdcc}@cin.ufpe.br

1. Introduction

This work presents a solution for the RecSys Challenge 2014 by performing a clusteringin a bipartite graph whose vertices are of two types: user and item, having the edgesas the engagement given to a tweet. The Modularity metric was used to form thegroups that contain users and movies.

2. The Proposed Approach

2.1 Bipartite GraphThe generated bipartite graph is modeled on an adjacency matrix composed by allnodes of both classes: user and item, this results in a sparse graph shown in Figure 1.

0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

0.5

1

1.5

2

2.5

3

3.5

x 104

non zeros = 15964

Figure 1: Adjacency matrix formed by users (1-22079) and items (22080-35697).

Figure 2 shows the power-law degree distribution of the one-mode item graph, thenour bipartite graph can be considered a scale-free network (Birmele [2009]).

0 1 2 3 4 5 6 7 8 90

2

4

6

One−mode User Graph

log(degree)

log(

freq

uenc

y)

0 1 2 3 4 5 6 7 80

2

4

6

One−mode Item Graph

log(degree)

log(

freq

uenc

y)

Figure 2: Degree distribution of the user and item one-mode graphs.

2.2 Louvain Algorithm and ModularityThe Louvain algorithm (Blondel et al. [2008]) is used for clustering the bipartite graph(see Figure 3) based on the Modularity metric (Newman and Girvan [2004]):

Q =1

2m

∑ij

[Aij −

kikj2m

]δ(ci, cj), (1)

where Aij represents the weight in the edge that links i and j,∑ij Aij is the sum of

the weights of every edge in the graph, ki =∑j Aij is the sum of the weigths that

has endpoint in i, ci is the community to which the vertex i is assigned. The functionδ(ci, cj) is 1 if ci = cj and 0 otherwise, and m = 1

2

∑ij Aij.

Figure 3: Clusters formed by the Louvain algorithm: G1 = U1, U2, I1, I2, I3 e G2 =U3, U4, I4, I5.

2.3 Engagement estimationLet k be the vertex that represents the item and gk is the group label to which theitem k belongs:

• Vk is the set that contains every vertex that is in the same group as the k vertex:

Vk = {v|gv = gk}; (2)

•Wk contains every weight bigger than zero of every edge linking a vertex that ispart of the same group of k, V is the set of all vertices in the graph:

Wk = {wab|wab > 0, a ∈ Vk, b ∈ V }, (3)

• wku is the estimated weight for the edge linking the item k to any user u:

wku =1

|Wk|∑

a∈Vk,b∈Vwab, (4)

wku ∈ Wk, |Wk| is the number of elements of Wk.

Motivated by the NDCG@10 evaluation (Loiacono et al. [2014]), we redefined therecommendation to a rank function. Therefore, the number of engagement for eachtweet is calculated according to Equation 5:

engiu = wiu + rateiu + 10× tweet_retweetediu, (5)

where engiu is the estimated engagement for a tweet, rateiu is the rating given by theuser u to the item i and tweet_retweetediu is a boolean value. The engiu value that arebigger than user_followers_count, are set to this attribute value.

3. Experiments and Results

All zero Set all the engagements of the test dataset to 0.

Clusters For each tweet in the test dataset, we obtain the item k posted in the tweetand obtain the average wiu, that was set as the engagement value of that tweet.

Improved clusters Adding to wiu the user rate value to that item and multiplying by10 the attribute tweet_retweeted. If the estimated engagement is bigger than theuser_followers_count attribute, it is set to the value of the this attribute.

Table 1: NDCG@10 evaluation value for the strategies

Strategy EvaluationAll zero 0.7494269049198918Clusters 0.7901253952498258

Improved clusters 0.8279531044818939

4. Conclusions

• It is not necessary to transform the bipartite graph into one-mode graph;

• Engagement estimation obtained using the Modularity metric;

• This tecnique can be also used with the data in a time window.

References

Etienne Birmele. A scale-free graph model based on bipartite graphs. Discrete Ap-plied Mathematics, 157(10):2267–2284, 2009.

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre.Fast unfolding of community hierarchies in large networks. CoRR, abs/0803.0476,2008.

Daniele Loiacono, Andreas Lommatzsch, and Roberto Turrin. Recsys challenge 2014:Learning to rank. 2014.

M. E. J. Newman and M. Girvan. Finding and evaluating community structure in net-works. Physical Review E, 69(2):026113, February 2004.

This work was partially supported by CIn/UFPE and Brazilian agencies: CNPq, CAPES and FACEPE.