recommender system based on modularity

Recommender System Based onModularity

Maria A. A. Sibaldo1,2, Tiago B. A. de Carvalho1,2,Tsang Ing Ren1, George D. C. Cavalcanti1

1Centro de Informática, UFPE, Recife, Brasil2Unidade Acadêmica de Garanhuns, UFRPE, Garanhuns, Brasil

www.cin.ufpe.br/∼viisar{maas2, tbac, tir, gdcc}@cin.ufpe.br

1. Introduction

This work presents a solution for the RecSys Challenge 2014 by performing a clusteringin a bipartite graph whose vertices are of two types: user and item, having the edgesas the engagement given to a tweet. The Modularity metric was used to form thegroups that contain users and movies.

2. The Proposed Approach

2.1 Bipartite GraphThe generated bipartite graph is modeled on an adjacency matrix composed by allnodes of both classes: user and item, this results in a sparse graph shown in Figure 1.

0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

0.5

1

1.5

2

2.5

3

3.5

x 104

non zeros = 15964

Figure 1: Adjacency matrix formed by users (1-22079) and items (22080-35697).

Figure 2 shows the power-law degree distribution of the one-mode item graph, thenour bipartite graph can be considered a scale-free network (Birmele [2009]).

0 1 2 3 4 5 6 7 8 90

2

4

6

One−mode User Graph

log(degree)

log(

freq

uenc

y)

0 1 2 3 4 5 6 7 80

2

4

6

One−mode Item Graph

log(degree)

log(

freq

uenc

y)

Figure 2: Degree distribution of the user and item one-mode graphs.

2.2 Louvain Algorithm and ModularityThe Louvain algorithm (Blondel et al. [2008]) is used for clustering the bipartite graph(see Figure 3) based on the Modularity metric (Newman and Girvan [2004]):

Q =1

2m

∑ij

[Aij −

kikj2m

]δ(ci, cj), (1)

where Aij represents the weight in the edge that links i and j,∑ij Aij is the sum of

the weights of every edge in the graph, ki =∑j Aij is the sum of the weigths that

has endpoint in i, ci is the community to which the vertex i is assigned. The functionδ(ci, cj) is 1 if ci = cj and 0 otherwise, and m = 1

2

∑ij Aij.

Figure 3: Clusters formed by the Louvain algorithm: G1 = U1, U2, I1, I2, I3 e G2 =U3, U4, I4, I5.

2.3 Engagement estimationLet k be the vertex that represents the item and gk is the group label to which theitem k belongs:

• Vk is the set that contains every vertex that is in the same group as the k vertex:

Vk = {v|gv = gk}; (2)

•Wk contains every weight bigger than zero of every edge linking a vertex that ispart of the same group of k, V is the set of all vertices in the graph:

Wk = {wab|wab > 0, a ∈ Vk, b ∈ V }, (3)

• wku is the estimated weight for the edge linking the item k to any user u:

wku =1

|Wk|∑

a∈Vk,b∈Vwab, (4)

wku ∈ Wk, |Wk| is the number of elements of Wk.

Motivated by the NDCG@10 evaluation (Loiacono et al. [2014]), we redefined therecommendation to a rank function. Therefore, the number of engagement for eachtweet is calculated according to Equation 5:

engiu = wiu + rateiu + 10× tweet_retweetediu, (5)

where engiu is the estimated engagement for a tweet, rateiu is the rating given by theuser u to the item i and tweet_retweetediu is a boolean value. The engiu value that arebigger than user_followers_count, are set to this attribute value.

3. Experiments and Results

All zero Set all the engagements of the test dataset to 0.

Clusters For each tweet in the test dataset, we obtain the item k posted in the tweetand obtain the average wiu, that was set as the engagement value of that tweet.

Improved clusters Adding to wiu the user rate value to that item and multiplying by10 the attribute tweet_retweeted. If the estimated engagement is bigger than theuser_followers_count attribute, it is set to the value of the this attribute.

Table 1: NDCG@10 evaluation value for the strategies

Strategy EvaluationAll zero 0.7494269049198918Clusters 0.7901253952498258

Improved clusters 0.8279531044818939

4. Conclusions

• It is not necessary to transform the bipartite graph into one-mode graph;

• Engagement estimation obtained using the Modularity metric;

• This tecnique can be also used with the data in a time window.

References

Etienne Birmele. A scale-free graph model based on bipartite graphs. Discrete Ap-plied Mathematics, 157(10):2267–2284, 2009.

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre.Fast unfolding of community hierarchies in large networks. CoRR, abs/0803.0476,2008.

Daniele Loiacono, Andreas Lommatzsch, and Roberto Turrin. Recsys challenge 2014:Learning to rank. 2014.

M. E. J. Newman and M. Girvan. Finding and evaluating community structure in net-works. Physical Review E, 69(2):026113, February 2004.

This work was partially supported by CIn/UFPE and Brazilian agencies: CNPq, CAPES and FACEPE.

recommender system based on modularity

Data & Analytics

thenour bipartite graph

bipartite graphsee figure

sparse graph

bipartite graphthe

adjacency matrix

modularity metric newman

mode graphs

powerlaw degree distribution