personalized recommendation with implicit feedback via learning...

Knowl Inf Syst (2019) 58:295–318https://doi.org/10.1007/s10115-018-1154-5

REGULAR PAPER

Personalized recommendation with implicit feedback vialearning pairwise preferences over item-sets

Weike Pan1,2 · Li Chen2 · Zhong Ming1

Received: 25 January 2017 / Revised: 28 September 2017 / Accepted: 5 January 2018 /Published online: 20 January 2018© Springer-Verlag London Ltd., part of Springer Nature 2018

Abstract Preference learning is a fundamental problem in various smart computing appli-cations such as personalized recommendation. Collaborative filtering as a major learningtechnique aims to make use of users’ feedback, for which some recent works have switchedfrom exploiting explicit feedback to implicit feedback. One fundamental challenge of lever-aging implicit feedback is the lack of negative feedback, because there is only some observedrelatively “positive” feedback available, making it difficult to learn a prediction model. Inthis paper, we propose a new and relaxed assumption of pairwise preferences over item-sets,which defines a user’s preference on a set of items (item-set) instead of on a single itemonly. The relaxed assumption can give us more accurate pairwise preference relationships.With this assumption, we further develop a general algorithm called CoFiSet (collaborativefiltering via learning pairwise preferences over item-sets), which contains four variants,CoFiSet(SS), CoFiSet(MOO), CoFiSet(MOS) and CoFiSet(MSO), representing “Set vs.Set,” “Many ‘One vs. One’,” “Many ‘One vs. Set”’ and “Many ‘Set vs. One”’ pairwise

This work is an extension of our previous work [29]. We have added the following new contents in thismanuscript: (i) we have developed three new variants of the CoFiSet algorithm in Sections 2.3–2.6; (ii) wehave included new experimental results (Tables 3, 4, 5; Figures 3, 4) and associated analysis in Section 3;(iii) we have added more related works and discussions in Section 1 and Section 4; and (iv) we have mademany improvements throughout the whole paper.Some of this work was done, while Weike Pan was a postdoctoral research fellow in the Department ofComputer Science, Hong Kong Baptist University.

B Li [email protected]

B Zhong [email protected]

Weike [email protected]

1 College of Computer Science and Software Engineering, Shenzhen University, 3688 NanhaiAvenue, Nanshan District, Shenzhen, People’s Republic of China

2 Department of Computer Science, Hong Kong Baptist University, 224 Waterloo Road, KowloonTong, Kowloon, Hong Kong, People’s Republic of China

123

http://crossmark.crossref.org/dialog/?doi=10.1007/s10115-018-1154-5&domain=pdf

296 W. Pan et al.

comparisons, respectively. Experimental results show that our CoFiSet(MSO) performs bet-ter than several state-of-the-art methods on five ranking-oriented evaluation metrics on threereal-world data sets.

Keywords Pairwise preferences over item-sets · Top-k recommendation · Collaborativefiltering · Implicit feedback

1 Introduction

Personalization and intelligent recommendation are important topics in smart computingapplications, where preference learning techniques are usually employed to learn users’ pref-erences from users’ behaviors or feedback in order to provide personalized deliveries. Amongvarious preference learning techniques in recommendation, collaborative filtering (or calledsocial information filtering) [1,10,40] as a content-free technique has been widely adoptedin commercial recommender systems, like book recommendation in Amazon [21], videorecommendation in YouTube [7], music recommendation in Yahoo [17], job recommenda-tion in XING [22], people recommendation in Tencent Weibo (microblog) [6] and manyothers. Various memory-based and model-based methods have been proposed to improvethe prediction accuracy using users’ explicit feedback, such as user-based [3,36], item-based [8,21,39], matrix-factorization-based [19,34,38,44,47], probability-based [12,23,42]or integrated [24,45] and transfer learning [20,28] methods, etc.

However, in real applications, users’ explicit ratings are not easily obtained, so they maynot be sufficient for the purpose of training an adequate prediction model. On the contrary,users’ implicit data [25] like browsing and clicking records can bemore easily collected. Somerecent works have thus turned to improve the personalized recommendation performance viaexploiting users’ implicit feedback, which include users’ logs of watching TV programs[14], assigning tags [27], purchasing products [35], browsing web pages [48], building socialconnections [49], clicking social updates [13] and so on.

One fundamental challenge in personalized recommendation with implicit feedback isthe lack of negative feedback. A learning algorithm can only make use of some observedrelatively “positive” feedback, instead of ordinal ratings in explicit data. Some early works[14,27] assume that an observed feedback denotes “like” and an unobserved feedback denotes“dislike,” and propose to reduce the problem to collaborative filtering with explicit feedbackvia some weighting strategies. Recently, some works [35,46] assume that a user prefers anobserved item to an unobserved item and reduce the problem to a classification [35] or aregression [46] problem. Empirically, the latter assumption of pairwise preferences over twoitems results in better recommendation accuracy than the earlier like/dislike assumption.

However, the pairwise preferences with respect to two items might not be always valid.For example, a user bought some fruits, but afterward he finds that he actually does not likethem very much, or a user may inherently like some fruits though he has not bought themyet. Motivated by this observation, in this paper, we propose a new and relaxed assumption,which is that a user is likely to prefer a set of observed items to a set of unobserved items.We call our assumption pairwise preferences over item-sets, which is illustrated in Fig. 1. InFig. 1, we can see that the pairwise preference relationship “apple � peach” does not holdfor this user, since his true preference score on apple (ru,y = 3.5) is lower than that on peach(ru,peach = 4). Instead, the relaxed pairwise relationship of “item-set of apple and grapes �item-set of peach” is more likely to be true, since he likes grapes a lot (ru,grapes = 5). Thus,

123

Personalized recommendation with implicit feedback via… 297

Fig. 1 Illustration of pairwise preferences over item-sets. The numbers under some fruits denote a user’s truepreference scores, ru,apple = 3.5, ru,grapes = 5 and ru,peach = 4. We thus have the relationship ru,apple �>ru,peach and (ru,apple + ru,grapes)/2 > ru,peach

we can see that our assumption is likely to be more accurate and the corresponding pairwiserelationship is more likely to be valid. With this assumption, we define a user’s preference tobe on a set of items (item-set) rather than on a single item only, and then develop a generalalgorithm called CoFiSet. Note that we use the term “item-set” instead of “itemset” to makeit different from that used in frequent pattern mining and association rule mining [2,11],because we do not construct sets of items based on the co-occurrence information.

We summarize our main contributions as follows, (1) we define a user’s preference on anitem-set (a set of items) instead of on a single itemonly andpropose a newand relaxed assump-tion, pairwise preferences over item-sets, to fully exploit users’ implicit data; (2) we designa general learning framework, CoFiSet, which absorbs some recent algorithms as specialcases; (3) we develop four specific variants of CoFiSet, i.e., CoFiSet(SS), CoFiSet(MOO),CoFiSet(MOS) and CoFiSet(MSO), which represent “Set vs. Set,” “Many ‘One vs. One’,”“Many ‘One vs. Set”’ and “Many ‘Set vs. One”’ pairwise comparisons, respectively; and (4)we conduct extensive empirical studies and observe that CoFiSet(MSO) performs better thanseveral state-of-the-art methods.

We organize the paper as follows.We give the problem statement and describe our solutionin detail in Sect. 2. We conduct extensive experiments and report personalized recom-mendation performance in Sect. 3. We then discuss some related works on personalizedrecommendation with implicit feedback in Sect. 4. Finally, we conclude the paper with someremarks and indicate its future directions.

2 Learning pairwise preferences over item-sets

2.1 Problem definition

Suppose we have some observed feedback,Rtr = {(u, i)}, from n users and m items, wherethe (user, item) pair (u, i) denotes that user u has bought an item i for example. Our goal isthen to recommend a personalized ranking list of items for each user u. Our studied problem

123

298 W. Pan et al.

Table 1 Some notations used inthe paper

Notation Description

n Number of users

m Number of items

U tr = {u}nu=1 Training set of users

U tri Training set of users w.r.t. item i

U te ⊆ U tr Test set of users

Itr = {i}mi=1 Training set of items

Itru Training set of items w.r.t. user u

Iteu Test set of items w.r.t. user u

P ⊆ Itru Set of items (presence of observation)

A ⊆ Itr\Itru Set of items (absence of observation)

u ∈ U tr User index

i, i ′, j ∈ Itr Item index

Rtr = {(u, i)} Training data

Rte = {(u, i)} Test data

rui Preference of user u on item i

ru j Preference of user u on item j

ruP Preference of user u on item-set PruA Preference of user u on item-set Arui j , ruiA, ruPA, ruP j Pairwise preferences of user u

Θ Set of model parameters

d Number of latent dimensions

Uu· ∈ R1×d User u’s latent feature vector

Vi · ∈ R1×d Item i’s latent feature vector

bi ∈ R Item i’s bias

is usually called one-class collaborative filtering [27], collaborative filtering with implicitfeedback [14,35] or personalized recommendation with implicit feedback in general, whichis an important application of smart computing.

In this paper, we adopt latent factor models [18,35] considering their state-of-the-artperformance, for which some notations are listed in Table 1. Note that the sets are denotedas capital letters with italics (e.g., P), the scalars are denoted as small letters (e.g., d), thevectors are denoted as capital letters with a dot (e.g., Uu·), and the hat is used to denote thepredicted value (e.g., rui ).

In the following, we first describe two existing assumptions and their limitations for thepersonalized recommendation with implicit feedback, and then propose a new and relaxedassumption.

2.2 Preference assumption on items

Personalized recommendation with implicit feedback is quite different from the task of 5-starnumerical rating estimation [18], since there is only some observed relatively “positive” feed-back,making it difficult to learn a predictionmodel [14,27,35]. So far, there have beenmainlytwo types of assumptions proposed to model the implicit feedback, pointwise preference onan item [14,27], and pairwise preferences over two items [35].

123


The assumption of pointwise preference on an item [14,27] can be represented as follows,

rui = 1, ru j = 0, i ∈ I tru , j ∈ I tr\I tr

u , (1)

where 1 and 0 are used to denote “like” and “dislike” for an observed (user, item) pairand an unobserved (user, item) pair, respectively. With this assumption, confidence-basedweighting strategies are incorporated into the objective function [14,27]. However, finding agoodweighting strategy for each observed feedback is a very difficult task in real applications,since we usually have implicit feedback only. Furthermore, treating all observed feedback as“likes” and unobserved feedback as “dislikes” may mislead the learning process.

The assumption of pairwise preferences over two items [35] relaxes the assumption ofpointwise preferences [14,27], which can be represented as follows,

rui > ru j , i ∈ I tru , j ∈ I tr\I tr

u (2)

where the relationship rui > ru j means that a user u is likely to prefer an item i ∈ I tru to an

item j ∈ I tr\I tru . Empirically, this assumption generates better recommendation results than

pointwise assumptions in [14,27].However, as mentioned in the introduction, in real situations, such pairwise assumption

may not hold for each item pair (i, j), i ∈ I tru , j ∈ I tr\I tr

u . Specifically, there are twophenomena: first, there may exist some item i ∈ I tr

u that user u does not like very much;second, theremay exist some item j ∈ I tr\I tr

u that user u inherently likes but has not observedyet, which also motivates a recommender system to help a user explore the items. The secondcase ismore likely to occur since a user’s preferences on items from I tr\I tr

u are usually not thesame, including both “likes” and “dislikes.” In either of the above two cases, the relationshiprui > ru j in Eq. (2) does not hold. Thus, the assumption of pairwise preferences over items[35] may not be true for all of the item pairs.

2.3 Preference assumption on item-sets

Before we present a new type of assumption, we first introduce two definitions, a user u’spreference on an item-set and pairwise preferences over two item-sets.

Definition A user u’s preference on an item-set (a set of items) is defined as a functionof user u’s preferences on items in the item-set. For example, a user u’s preference on anitem-set P can be ruP = ∑

i∈P rui/|P|, or in other forms.

Definition A user u’s pairwise preference over two item-sets is defined as the differencebetween user u’s preferences on two item-sets. For example, a user u’s pairwise preferencesover item-sets P andA can be ruPA = ruP − ruA, or ruPA = ruP − ru j , j ∈ A, or in otherforms.

With the above two definitions, we further relax the assumption of pairwise preferencesover items as made in [35] and propose a new assumption called pairwise preferences overitem-sets, represented in the following four forms,

Set vs. Set (SS) : ruP > ruA, (3)

Many “One vs. One” (MOO) : rui > ru j , i ∈ P, j ∈ A, (4)

Many “One vs. Set” (MOS) : rui > ruA, i ∈ P, (5)

Many “Set vs. One” (MSO) : ruP > ru j , j ∈ A, (6)

where ruP and ruA are the user u’s overall preferences on the items from item-setP ⊆ I tru and

item-set A ⊆ I tr\I tru , respectively. The first form can be described as “Set vs. Set” denoting

123

300 W. Pan et al.

pairwise preferences over one set of items relative to another set of items. The second formcan be described as “Many ‘One vs. One”’ representing many pairwise preferences overone item relative to another item. The third form can be described as “Many ‘One vs. Set”’representing many pairwise preferences over one item relative to a set of items. The fourthform can be described as “Many ‘Set vs. One”’ representingmany pairwise preferences overone set of items relative to another item. For a user u, P ⊆ I tr

u denotes a set of items withobserved feedback from user u (presence of observation), and A ⊆ I tr\I tr

u denotes a set ofitems without observed feedback from user u (absence of observation). Interestingly, we cansee the transformations and relationship among those four variants: (1) when |A| = 1, “SS”and “MSO” reduce to “SO,” and “MOO” and “MOS” reduce to “MOO”; (2) when |P| = 1,“SS” and “MOS” reduce to “OS,” and “MOO” and “MSO” reduce to “MOO”; and (3) when|P| = |A| = 1, all reduce to “OO” in BPRMF [35]. Note that the granularity of pairwisepreference in our assumption is the item-set for the “Set vs. Set” case and both the item-setand item for the “Many ‘One vs. Set”’ and “Many ‘Set vs. One”’ cases, rather than the itemonly in [35], which is likely to be closer to real situations. Our proposed assumption is alsomore general and can embody the assumption of pairwise preferences over items [35] as aspecial case.

2.4 Model formulation

Assuming that a user u is likely to prefer an item-set P to an item-set A, we may introducethe aforementioned constraints in Eqs. (3–6) when learning the parameters of the predictionmodel. Specifically, for a pair of item-sets P andA, we can have the following optimizationproblems,

SS : minΘu

R(u,P,A), s.t. ruP > ruA, (7)

MOO : minΘu

R(u,P,A), s.t. rui > ru j , i ∈ P, j ∈ A, (8)

MOS : minΘu

R(u,P,A), s.t. rui > ruA, i ∈ P, (9)

MSO : minΘu

R(u,P,A), s.t. ruP > ru j , j ∈ A, (10)

where rui = bi + Uu·V Ti · is the predicted preference of user u on item i , and Θu =

{Uu·, Vi ·, bi , i ∈ I tr} denotes the model parameters w.r.t. user u, including the user u’slatent feature vector Uu· ∈ R

1×d , the item i’s latent feature vector Vi · ∈ R1×d , and the item

i’s bias bi ∈ R. Note that the hard constraints are based on a user’s pairwise preferences overitem-sets, and R(u,P,A) is an L2 regularization term used to avoid overfitting. Since theabove optimization problems are difficult to be solved due to the hard constraints, we relaxthe constraints, and introduce a loss function in the objective function,

minΘu

L(u,P,A) + R(u,P,A), (11)

where L(u,P,A) is the loss function w.r.t. user u’s preferences on item-sets P andA. Then,for each user u, we have the following optimization problem,

minΘu

∑

P⊆Itru

∑

A⊆Itr\Itru

L(u,P,A) + R(u,P,A), (12)

123


where P is a subset of items randomly sampled from I tru that denotes a set of items with

observed feedback from user u, and A is a subset of items randomly sampled from I tr\I tru

that denotes a set of items without observed feedback from user u.Finally, in order to encourage collaborations among users as that in collaborative filtering

methods, we put all users together and reach the following optimization problem,

minΘ

∑

u∈U tr

∑

P⊆Itru

∑

A⊆Itr\Itru

L(u,P,A) + R(u,P,A), (13)

where Θ = {Uu·, Vi ·, bi , u ∈ U tr, i ∈ I tr} denotes the parameters to be learned. The lossfunction L(u,P,A) is defined on the user u’s pairwise preferences over item-sets. The regu-larization termR(u,L,A) = αu

2 ‖Uu·‖2+∑i∈P [αv

2 ‖Vi ·‖2+ βv

2 ‖bi‖2]+∑j∈A[αv

2 ‖Vj ·‖2+βv

2 ‖b j‖2] is used to avoid overfitting during parameter learning [19], where αu , αv , βv arehyper-parameters.

Note again that the core concept in our preference assumption and objective function is“item-set” (a set of items), not “item” in [14,27,35]. For this reason, we call our solutionCoFiSet (collaborative filtering via learning pairwise preferences over item-sets).

2.5 Loss function

For the loss function L(u,P,A) in Eq. (13), although we only use a closely related formto that of BPRMF [35] for direct and fair comparison in our empirical studies, we can havevarious specific forms, e.g., − ln σ(ruPA), 1

2 (ruPA − 1)2, max(0, 1 − ruPA), or 12 (ruP −

1)2 + 12 (ruA − 0)2, where ruPA = ruP − ruA is the difference between user u’s preferences

on two item-sets P and A, and σ(z) = 1/(1 + exp(−z)) is the sigmoid function. Morespecifically,

1. The loss function − ln σ(ruPA) absorbs that of BPRMF [35] as a special case whenP = {i} and A = { j}, L(u,P,A) = − ln σ(rui j ) = − ln σ(rui − ru j );

2. The loss function 12 (ruPA − 1)2 absorbs that of RankALS [46] as a special case when

P = {i} and A = { j}, L(u,P,A) = 12 (rui j − 1)2 = 1

2 (rui − ru j − 1)2;3. The loss function max(0, 1 − ruPA) absorbs that of CCF(Hinge) [48] as a special case

when P = {i}, L(u,P,A) = max(0, 1 − ruiA) = max(0, 1 − [rui − ruA]); and4. The loss function 1

2 (ruP −1)2+ 12 (ruA−0)2 absorbs iMF [14] and OCCF [27] as special

cases when the confidence values are fixed as a constant value and P = {i}, A = { j},L(u,P,A) = 1

2 (rui − 1)2 + 12 (ru j − 0)2.

We summarize the above discussions in Table 2 (the detailed descriptions of these relatedworks can be seen in Sect. 4).

Table 2 Summary of the relationship between different loss functions. Note that i ∈ Itru , j ∈ Itr\Itru ,P ⊆ Itru , andA ⊆ Itr\ItruL(u,P,A) P A Special case

− ln σ(ruPA) {i} { j} − ln σ(rui − ru j ) [35]12 (ruPA − 1)2 {i} { j} 1

2 (rui − ru j − 1)2 [46]

max(0, 1 − ruPA) {i} A max(0, 1 − [rui − ruA]) [48]12 (ruP − 1)2 + 1

2 (ruA − 0)2 {i} { j} 12 (rui − 1)2 + 1

2 (ru j − 0)2 [14]

123

302 W. Pan et al.

For CoFiSet, in order to directly compare our pairwise preferences over item-sets assump-tion with pairwise preferences over items as made in BPRMF [35], we study the followingfour specific loss functions,

SS : L(u,P,A) = − ln σ(ruPA), (14)

MOO : L(u,P,A) = − 1

|P|1

|A|∑

i∈P

∑

j∈Aln σ(rui j ), (15)

MOS : L(u,P,A) = − 1

|P|∑

i∈Pln σ(ruiA), (16)

MSO : L(u,P,A) = − 1

|A|∑

j∈Aln σ(ruP j ), (17)

where ruPA = ruP − ruA, rui j = rui − ru j , rui A = rui − ru A, ruP j = ruP − ru j , ruP =∑i∈P rui/|P| and ruA = ∑

j∈A ru j/|A|. Thus, we have four variants of CoFiSet, which areCoFiSet(SS) for the first loss function of “Set vs. Set,” CoFiSet(MOO) for the second lossfunction of “Many ‘One vs. One’,” CoFiSet(MOS) for the third loss function of “Many ‘Onevs. Set”’ andCoFiSet(MSO) for the fourth loss function of “Many ‘Set vs.One’,” respectively.Note that all loss functions are designed tomaximize (or encourage) the preference differencewith the same spirit of BPRMF [35].

2.6 Learning the CoFiSet

We adopt the widely used SGD (stochastic gradient descent) algorithmic framework in col-laborative filtering [18] to learn themodel parameters.Wefirst derive the gradients and updaterules for each variable in CoFiSet(SS), CoFiSet(MOO), CoFiSet(MOS) and CoFiSet(MSO),respectively, and then describe the complete algorithm.

2.6.1 The gradients in CoFiSet(SS)

For CoFiSet(SS), we have the loss L(u,P,A) = − ln σ(ruPA), and the gradient ∂L(u,P,A)∂ ruPA

as follows,

∂L(u,P,A)

∂ ruPA= ∂ − ln σ(ruPA)

∂ ruPA= − σ(− ruPA), (18)

where ruPA = ruP − ruA is user u’s preference difference on item-set P and item-set A,and σ(x) = 1

1+exp(−x) is again the sigmoid function.We then have the gradients of each variable, Uu·, Vi ·, bi , i ∈ P and Vj ·, b j , j ∈ A, w.r.t.

the objective function,

∇Uu· = ∂L(u,P,A)

∂ ruPA(VP· − VA·) + αuUu·, (19)

∇Vi · = ∂L(u,P,A)

∂ ruPAUu·|P| + αvVi ·, i ∈ P, (20)

∇Vj · = ∂L(u,P,A)

∂ ruPA−Uu·|A| + αvVj ·, j ∈ A, (21)

∇bi = ∂L(u,P,A)

∂ ruPA1

|P| + βvbi , i ∈ P, (22)

123


∇b j = ∂L(u,P,A)

∂ ruPA−1

|A| + βvb j , j ∈ A, (23)

where VP· = ∑i∈P Vj ·/|P| and VA· = ∑

j∈A Vj ·/|A| are the average latent feature repre-sentations of items in item-sets P and A, respectively. We then have the complete forms ofthe gradients ∇Uu·, ∇Vi ·, ∇Vj ·, ∇bi , and ∇b j with

∂L(u,P,A)∂ ruPA = − σ(− ruPA) in Eq. (18).

2.6.2 The gradients in CoFiSet(MOO)

For CoFiSet(MOO), we have the loss L(u,P,A) = − 1|P|

1|A|

∑i∈P

∑i∈A ln σ(rui j ), and

the gradient ∂L(u,P,A)∂ rui j

as follows,

∂L(u,P,A)

∂ rui j= − 1

|P|1

|A|σ(− rui j ) (24)

where rui j = rui − ru j , i ∈ P, j ∈ A.We then have the gradients of each variable, Uu·, Vi ·, bi , i ∈ P and Vj ·, b j , j ∈ A, w.r.t.


∇Uu· =∑

i∈P

∑

j∈A

∂L(u,P,A)

∂ rui j(Vi · − Vj ·) + αuUu·, (25)

∇Vi · =∑

j∈A

∂L(u,P,A)

∂ rui jUu· + αvVi ·, i ∈ P, (26)

∇Vj · =∑

i∈P

∂L(u,P,A)

∂ rui j(−Uu·) + αvVj ·, j ∈ A, (27)

∇bi =∑

j∈A

∂L(u,P,A)

∂ rui j1 + βvbi , i ∈ P, (28)

∇b j =∑

i∈P

∂L(u,P,A)

∂ rui j(−1) + βvb j , j ∈ A. (29)

2.6.3 The gradients in CoFiSet(MOS)

For CoFiSet(MOS), we have the lossL(u,P,A) = − 1|P|

∑j∈P ln σ(ruiA), and the gradient

∂L(u,P,A)∂ ruiA as follows,

∂L(u,P,A)

∂ ruiA= − 1

|P|σ(−ruiA), (30)

where ruiA = rui − ruA, i ∈ P .We then have the gradients of each variable, Uu·, Vi ·, bi , i ∈ P and Vj ·, b j , j ∈ A, w.r.t.


∇Uu· =∑

i∈P

∂L(u,P,A)

∂ ruiA(Vi · − VA·) + αuUu·, (31)

∇Vi · = ∂L(u,P,A)

∂ ruiAUu· + αvVi ·, i ∈ P, (32)

123

304 W. Pan et al.

∇Vj · =∑

i∈P

∂L(u,P,A)

∂ ruiA−Uu·|A| + αvVj ·, j ∈ A, (33)

∇bi = ∂L(u,P,A)

∂ ruiA1 + βvbi , i ∈ P, (34)

∇b j =∑

i∈P

∂L(u,P,A)

∂ ruiA−1

|A| + βvb j , j ∈ A, (35)

where VA· = ∑j∈A Vj ·/|A| is the average representation of items in item-set A.

2.6.4 The gradients in CoFiSet(MSO)

For CoFiSet(MSO), we have the lossL(u,P,A) = − 1|A|

∑j∈A ln σ(ruP j ), and the gradient

∂L(u,P,A)∂ ruP j

as follows,

∂L(u,P,A)

∂ ruP j= − 1

|A|σ(− ruP j ), (36)

where ruP j = ruP − ru j , j ∈ A.We then have the gradients of each variable, Uu·, Vi ·, bi , i ∈ P and Vj ·, b j , j ∈ A, w.r.t.


∇Uu· =∑

j∈A

∂L(u,P,A)

∂ ruP j(VP· − Vj ·) + αuUu·, (37)

∇Vi · =∑

j∈A

∂L(u,P,A)

∂ ruP j

Uu·|P| + αvVi ·, i ∈ P, (38)

∇Vj · = ∂L(u,P,A)

∂ ruP j(−Uu·) + αvVj ·, j ∈ A, (39)

∇bi =∑

j∈A

∂L(u,P,A)

∂ ruP j

1

|P| + βvbi , i ∈ P, (40)

∇b j = ∂L(u,P,A)

∂ ruP j(−1) + βvb j , j ∈ A, (41)

where VP· = ∑i∈P Vi ·/|P| is the average representation of items in item-set P .

2.6.5 Algorithm

With the gradients in Sects. 2.6.1, 2.6.2, 2.6.3 and 2.6.4, we have the update rules for eachvariable in CoFiSet(SS), CoFiSet(MOO), CoFiSet(MOS) and CoFiSet(MSO),

Uu· = Uu· − γ∇Uu· (42)

Vi · = Vi · − γ∇Vi ·, i ∈ P, (43)

Vj · = Vj · − γ∇Vj ·, j ∈ A, (44)

bi = bi − γ∇bi , i ∈ P, (45)

b j = b j − γ∇b j , j ∈ A, (46)

where γ > 0 is the learning rate.

123


Input: Training data Rtr = {(u, i)} of observed feedback, the size of item-set P (pres-ence of observation), and the size of item-set A (absence of observation).Output: The learned model parameters Θ = {Uu·, Vi·, bi·, u ∈ Utr, i ∈ Itr}, whereUu· ∈ R

1×d is the user-specific latent feature vector of user u, Vi· ∈ R1×d is the item-

specific latent feature vector of item i, and bi ∈ R is the bias of item i.Initialization: For u ∈ Utr and i ∈ Itr, Uuf = (r − 0.5) × 0.01, Vif =(r − 0.5) × 0.01, bi = u∈Utr

i1/|Utr| − (u,i)∈Rtr 1/|Utr|/|Itr|, where r

(0 ≤ r < 1) is a random value, and 1 ≤ f ≤ d is the index of the latent dimension.1: for t1 = 1, . . . , T do2: for t2 = 1, . . . , n do3: Randomly pick a user u ∈ Utr .4: Randomly pick an item-set P ⊆ Itr

u .5: Randomly pick an item-set A ⊆ Itr\Itr

u .

6: Calculate ∂L(u,P,A)∂ruPA

via Eq.(18), ∂L(u,P,A)∂ruij

via Eq.(24), ∂L(u,P,A)∂ruiA

vi-

a Eq.(30) and ∂L(u,P,A)∂ruPj

via Eq.(36) for CoFiSet(SS), CoFiSet(MOO),

CoFiSet(MOS) and CoFiSet(MSO), respectively.7: Update Uu· via Eq.(42).8: Update Vi·, i ∈ P via Eq.(43) and the latest Uu·.9: Update Vj·, j ∈ A via Eq.(44) and the latest Uu·.10: Update bi, i ∈ P via Eq.(45).11: Update bj , j ∈ A via Eq.(46).12: Θ.

Fig. 2 The algorithm of CoFiSet

In the SGD algorithmic framework, we approximate the objective function in Eq. (13) viarandomly sampling one subset P ⊆ I tr

u and one subsetA ⊆ I tr\I tru in each iteration, instead

of enumerating all possible subsets of P andA. The algorithm steps of CoFiSet are depictedin Fig. 2, which go through the whole data with T outer loops and n inner loops (one for eachuser on average) with t1 and t2 as their iteration variables, respectively. For each iteration,we first randomly sample a user u, and then randomly sample an item-set P ⊆ I tr

u and anitem-set A ⊆ I tr\I tr

u . Once we have updated Uu·, the latest Uu· is used to update Vi ·, i ∈ Pand Vj ·, j ∈ A.

2.6.6 Complexity analysis

The time complexity of four variants of our CoFiSet, i.e., CoFiSet(SS), CoFiSet(MOO),CoFiSet(MOS) and CoFiSet(MSO), in each single iteration, is O(d max(|P|, |A|)),O(d|A||P|), O(d|A||P|), and O(d|A||P|), respectively, where d is the number of latentfeatures, |P| is the size of item-set P , and |A| is the size of item-set A. Thus, the over-all time complexity of CoFiSet(SS) is O(Tnd max(|P|, |A|)) and that of CoFiSet(MOO),CoFiSet(MOS) and CoFiSet(MSO) is O(Tnd|A||P|), where T is the number of iterationsand n is the number of users. Note that the time complexity of the closely related model(i.e., BPRMF [35]) is O(Tnd), which shows that the time complexity of our CoFiSet andBPRMF is comparable when the item-sets P and A are small. For the memory requirementof CoFiSet and BPRMF, they are the same because their model parameters are exactly thesame.

123

306 W. Pan et al.

3 Experimental results

3.1 Data sets

We use three real-world data sets, MovieLens1M,1 Netflix2 and XING,3 to empirically studyour assumption of pairwise preferences over item-sets. MovieLens1M and Netflix containusers’ 5-star numerical ratings on movies, where movies are taken as “items” in the exper-iments. For Netflix, we first randomly sample 5000 users, and then randomly sample 5000items, and take ratings of the sampled users and items as a subset of Netflix. For bothMovieLens1M and Netflix, we keep ratings larger than 3 as observed feedback [9,43]. XINGcontains users’ interactions with items (i.e., job postings) such as click, bookmark, reply anddelete.We take those nonnegative feedback (i.e., click, bookmark and reply) as observed rela-tively positive feedback, and then take 5000most active users and 5000most popular items inorder to construct a subset of XING. Finally, we have 575,281 observations from 6040 usersand 3952 items in MovieLens1M, 155,872 observations from 5000 users and 5000 items inNetflix, and 78,294 observations from 5000 users and 5000 items in XING. The densities are575,281/6040/3952 = 2.41%, 155,872/5000/5000 = 0.62% and 78,294/5000/5000 = 0.31%for MovieLens1M, Netflix and XING, respectively.

In our experiments, we randomly take 50% observed feedback as training data and the rest50% as test data. We repeat this for 30 times to generate 30 copies of training data and testdata, and report the average performance on those 30 copies of data. Note that we constructa validation data from training data via sampling one (user, item) pair per user on average,which is used for parameter searching. Note that the first two data sets, i.e., MovieLens1Mand Netflix, are exactly the same with that of [30], which are used for a strict and directcomparison with existing methods. The reasons that we used MovieLens1M instead of thesmall data set MovieLens100K in our conference paper [29] are twofolds: (1) we did nothave a validation set for parameter tuning in the conference paper [29], and (2) we hope toconduct a direct comparative study between the proposed method and the existing methodsonMovieLens1M. The source code of CoFiSet and the three data sets used in the experimentscan be found at https://sites.google.com/site/weikep/cofiset.

3.2 Evaluation metrics

Once we have learned the model parameters, Θ = {Uu·, Vi ·, bi , u ∈ U tr, i ∈ I tr}, we cancalculate the prediction score for user u on item i , rui = bi +Uu·V T

i · , and then get a rankinglist, i(1), . . . , i(�), . . . , i(k), . . ., where i(�) represents the item located at position �. Notethat we recommend items for user u from the whole set of unobserved items I tr\I tr

u , and useI teu as the ground truth.We study the recommendation performance on five top-k ranking-oriented evaluationmet-

rics [4,37], including precision, recall, F1, normalized discounted cumulative gain (NDCG)and 1-call, i.e., Pre@k, Rec@k, F1@k, NDCG@k and 1-call@k.

1. Pre@k The precision of user u is defined as,

Preu@k = 1

k

k∑

�=1

δ(i(�) ∈ I teu ), (47)

1 http://www.grouplens.org/.2 https://www.netflix.com/.3 https://www.xing.com/.

123

https://sites.google.com/site/weikep/cofiset

http://www.grouplens.org/

https://www.netflix.com/

https://www.xing.com/


where δ(x) = 1 if x is true and δ(x) = 0 otherwise.∑k

�=1 δ(i(�) ∈ I teu ) thus denotes

the number of items among the top-k recommended items that have observed feedbackfrom user u. Then, we have Pre@k = ∑

u∈U te Preu@k/|U te|.2. Recall@k The recall of user u is defined as,

Recu@k = 1

|I teu |

k∑

�=1

δ(i(�) ∈ I teu ), (48)

which means how many relevant items are recommended in the top-k list. Then, we haveRec@k = ∑

u∈U te Recu@k/|U te|.3. F1@k The F1 score of user u is defined based on the precision and recall,

F1u@k = 2 × Preu@k × Recu@k

Preu@k + Recu@k. (49)

Then, we have F1@k = ∑u∈U te F1u@k/|U te|.

4. NDCG@k The NDCG of user u is defined as,

NDCGu@k = 1

ZuDCGu@k, (50)

where DCGu@k = ∑k�=1

2δ(i(�)∈Iteu )−1

log(�+1) and Zu is the best DCGu@k score with I teu on

top of the list. Then, we have NDCG@k = ∑u∈U te NDCGu@k/|U te|.

5. 1-call@k The 1-call of user u is defined as,

1-callu@k = δ

(k∑

�=1

δ(i(�) ∈ I te

u

) ≥ 1

)

, (51)

which means whether there is at least one relevant item among the top-k recommendeditems. Then, we have 1-call@k = ∑

u∈U te 1-callu@k/|U te|.3.3 Baselines and parameter settings

We compare the four variants of CoFiSet with three state-of-the-art methods, includingBayesian personalized ranking with matrix factorization (BPRMF) [35], association rulemining (ARM) [2] and probabilistic matrix factorization (PMF) [38]. We also compare themwith a commonly used baseline, i.e., PopRank (ranking via popularity of the items) [41].

For fair comparison, we implement BPRMF and CoFiSet in the same code frameworkwritten in Java and use the same initializations for the model variables [32],

Uu f = (r − 0.5) × 0.01, f = 1, . . . , d, (52)

Vi f = (r − 0.5) × 0.01, f = 1, . . . , d, (53)

bi =∑

u∈U tri

1/n −∑

(u,i)∈Rtr

1/n/m, (54)

where r (0 ≤ r < 1) is a random value. Note that random restarts are usually not adopted inSGD-based algorithms, particularly in factorization-basedmethods for collaborative filtering.The order of updating the variables in each iteration is also the same as that shown in Fig. 2.Note that we can use the initialization of item bias, bi , to rank the items, which is actuallyPopRank [41].

123

308 W. Pan et al.

For the trade-off parameters in all methods, we search the best values from αu = αv =βv ∈ {0.001, 0.01, 0.1} using the performance on NDCG@5 of the validation data of thefirst copy of each data, and then fix them in the rest 29 copies of each data. For the iterationnumber, we fix T = 105 in CoFiSet and BPRMF for sufficient training and search it fromT ∈ {100, 500, 1000} in PMF. For the number of latent features, we use d = 20 [29]. Wefind that the best values of the trade-off parameters for different models on different data canbe different, which are reported in Tables 3, 4 and 5for reference. The learning rate is fixed asγ = 0.01 similar to that of [18]. For ARM, we include all possible rules. That is, we set thethreshold of confidence as 0 and the threshold of the number of common users (i.e., support)as 1, because of the sparsity of the data.

For PMF, we take the value of observed feedback as 1s and randomly sample 3|I tr|unobserved feedback as 0s in each iteration, and then apply the commonly used stochasticgradient descent algorithm to learn the model parameters. For CoFiSet, we first fix |P| =|A| = 3, and then try different values in {1, 2, 3, 4, 5}. In the case that there is not enoughobserved feedback in I tr

u , we use P = I tru .

3.4 Overall recommendation performance

The recommendation performance of our four variants of CoFiSet and baselines are shownin Tables 3, 4 and 5, from which we can have the following observations: (1) for two of thethree data sets, i.e., MovieLens1M and Netflix, our CoFiSet(MSO) achieves significantlybetter recommendation performance than all the four baselines regarding all the evaluationmetrics4; for XING, our CoFiSet(MSO) performs consistently better than the baselines. Theresult demonstrates the superior recommendation ability of CoFiSet(MSO), especially on thetwo relatively dense data sets MovieLens1M and Netflix. (2) For the baselines, we can seethat all algorithms beat PopRank, which shows the effectiveness of the preference pointwiseassumption in PMF and pairwise assumption in BPRMF, though their performances are stillworse than that of our CoFiSet(MSO). (3) Among the baselines, BPRMF is best in all cases,which shows the effectiveness of modeling implicit feedback via pairwise assumption. (4)Among the four variants of our CoFiSet, the overall performance ordering is CoFiSet(MSO)>CoFiSet(MOO),CoFiSet(SS)>CoFiSet(MOS),which shows the effectiveness of differentpairwise comparisons. This phenomenon can be explained by the fact that there is a higherchance to have inconsistent preferences on items from item-set A than from item-set P .Specifically, the chance that a user likes someobserved items anddislikes someother observeditems should be lower than the chance that a user likes some unobserved items and dislikessome unobserved items.

3.5 Top-k recommendation performance

We further study the top-k recommendation results with different values of k, which areshown in Fig. 3. Note that the improvements on top-k related metrics have been known tobe critical for a real recommender system, since users usually only check a few items whichare ranked in top positions [5]. We can see that CoFiSet(MSO) is much better than all otherbaselines on NDCG@k with different values of k ∈ {1, 2, 3, 4, 5}, which again shows theeffectiveness of our proposed assumption and the corresponding algorithm.

4 We used two-sample t-test in the statistical significance test and calculated the p value according to thethirty result scores of two compared methods on thirty copies of data sets via the MATLAB function ttest2.mas shown at http://www.mathworks.com/help/stats/ttest2.html.

123

http://www.mathworks.com/help/stats/ttest2.html.


Table3

Recom

mendatio

nperformance

offour

variantsof

ourCoF

iSetandotheralgo

rithmson

MovieLens1M

Algorith

mTrade.p

ara.

Pre@5

Rec@5

F1@

5NDCG@5

1-call@5

PopR

ank

0.28

09±

0.00

230.04

06±

0.00

060.06

33±

0.00

080.29

16±

0.00

210.66

91±

0.00

61

ARM

0.35

71±

0.00

540.05

51±

0.00

100.08

55±

0.00

150.37

56±

0.00

690.75

93±

0.00

65

PMF

0.01

0.41

77±

0.00

350.07

16±

0.00

100.10

96±

0.00

140.43

02±

0.00

340.84

61±

0.00

36

BPR

MF

0.01

0.44

10±

0.00

250.07

53±

0.00

070.11

46±

0.00

100.45

44±

0.00

280.85

39±

0.00

42

CoF

iSet(SS)

0.00

10.44

66±

0.00

210.07

56±

0.00

060.11

54±

0.00

080.46

24±

0.00

230.85

37±

0.00

41

CoF

iSet(M

OO)

0.01

0.45

62±

0.00

260.07

82±

0.00

090.11

90±

0.00

110.47

07±

0.00

290.86

36±

0.00

43

CoF

iSet(M

OS)

0.01

0.41

14±

0.00

310.06

75±

0.00

090.10

36±

0.00

120.42

57±

0.00

350.82

37±

0.00

43

CoF

iSet(M

SO)

0.01

0.47

89±

0.00

230.08

30±

0.00

070.12

61±

0.00

090.49

77±

0.00

210.88

10±

0.00

35

The

significantly

bestresults

incomparisonwith

thefour

baselin

esaremarkedin

bold

(pvalue

<0.01

)

123

310 W. Pan et al.

Table4

Recom

mendatio

nperformance

offour

variantsof

ourCoF

iSetandotheralgo

rithmson

Netflix

Algorith

mTrade

para.

Pre@5

Rec@5

F1@

5NDCG@5

1-call@5

PopR

ank

0.17

03±

0.00

180.05

62±

0.00

130.06

79±

0.00

100.17

79±

0.00

170.44

55±

0.00

65

ARM

0.20

04±

0.00

300.07

38±

0.00

350.08

53±

0.00

240.21

71±

0.00

400.51

61±

0.01

01

PMF

0.01

0.19

60±

0.00

290.09

30±

0.00

310.09

85±

0.00

230.21

23±

0.00

410.55

66±

0.00

72

BPR

MF

0.01

0.23

47±

0.00

260.10

33±

0.00

350.11

07±

0.00

240.25

70±

0.00

340.58

57±

0.00

78

CoF

iSet(SS)

0.00

10.22

31±

0.00

370.09

31±

0.00

370.10

16±

0.00

280.24

24±

0.00

490.56

19±

0.00

90

CoF

iSet(M

OO)

0.01

0.24

23±

0.00

220.10

58±

0.00

330.11

38±

0.00

230.26

49±

0.00

310.59

63±

0.00

75

CoF

iSet(M

OS)

0.00

10.18

39±

0.00

350.08

43±

0.00

270.08

89±

0.00

220.19

97±

0.00

460.51

58±

0.00

68

CoF

iSet(M

SO)

0.01

0.25

71±

0.00

240.11

07±

0.00

390.12

00±

0.00

270.28

29±

0.00

400.61

41±

0.00

88

The

significantly

bestresults

incomparisonwith

thefour

baselin

esaremarkedin

bold

(pvalue

<0.01

)

123


Table5

Recom

mendatio

nperformance

offour

variantsof

ourCoF

iSetandotheralgo

rithmson

XIN

G

Algorith

mTrade.p

ara.

Pre@5

Rec@5

F1@

5NDCG@5

1-call@5

PopR

ank

0.02

53±

0.00

100.01

23±

0.00

090.01

38±

0.00

090.02

82±

0.00

110.10

94±

0.00

49

ARM

0.07

67±

0.00

180.03

37±

0.00

150.03

79±

0.00

120.08

42±

0.00

230.23

39±

0.00

57

PMF

0.01

0.06

73±

0.00

170.02

99±

0.00

170.03

34±

0.00

130.07

23±

0.00

220.21

30±

0.00

51

BPR

MF

0.1

0.07

96±

0.00

180.03

93±

0.00

160.04

22±

0.00

130.08

79±

0.00

220.25

49±

0.00

62

CoF

iSet(SS)

0.01

0.07

87±

0.00

140.03

78±

0.00

160.04

11±

0.00

100.08

68±

0.00

190.24

94±

0.00

47

CoF

iSet(M

OO)

0.1

0.06

23±

0.00

300.03

09±

0.00

130.03

26±

0.00

110.06

86±

0.00

280.20

78±

0.00

47

CoF

iSet(M

OS)

0.01

0.07

46±

0.00

170.03

58±

0.00

180.03

87±

0.00

130.08

15±

0.00

220.23

69±

0.00

56

CoF

iSet(M

SO)

0.01

0.08

28±

0.00

200.03

96±

0.00

170.04

32±

0.00

150.09

10±

0.00

260.25

78±

0.00

60

The

significantly

bestresults

incomparisonwith

thefour

baselin

esaremarkedin

bold

(pvalue

<0.01

)

123

312 W. Pan et al.

1 2 3 4 50.25

0.35

0.45

0.55

0.65

k

ND

CG

@k

PopRankARMPMFBPRMFCoFiSet(MSO)

1 2 3 4 50.15

0.2

0.25

0.3

0.35

k

ND

CG

@k


1 2 3 4 50.02

0.04

0.06

0.08

0.1

k

ND

CG

@k


MovieLens1M Netflix XING

Fig. 3 Top-k recommendation performance

1 2 3 4 50.44

0.46

0.48

0.5

(2,1)(3,1) (4,1)

(5,1)

(2,2)(3,3)

(4,4)(5,5)

ND

CG

@5

CoFiSet(MSO) w/ |A|=1CoFiSet(MSO)

1 2 3 4 50.25

0.26

0.27

0.28

(2,1) (3,1)

(4,1)

(5,1)

(2,2) (3,3)(4,4)

(5,5)

ND

CG

@5


1 2 3 4 50.085

0.087

0.089

0.091

(2,1)

(3,1)(4,1)

(5,1)(2,2)

(3,3)

(4,4)(5,5)

ND

CG

@5


MovieLens1M Netflix XING

Fig. 4 Recommendation performance of CoFiSet(MSO) with different configurations, (|P|, |A|), of item-setP and item-set A

Fig. 5 The CPU time on trainingCoFiSet(MSO) with differentvalues of |P| and |A| andBPRMF on MovieLens1M. Wefix T = 105 and d = 20. Theexperiments are conducted onLinux machines (Ubuntu4.9.1-16ubuntu6) with Intel(R)Core(TM) i7-3770 CPU @ 3.40GHz (1-CPU/4-core)/16 GBRAM

1 2 3 4 5

1,000

1,500

2,000

2,500

|A|

Tim

e (s

econ

d)

|P| = 1|P| = 2|P| = 3|P| = 4|P| = 5BPRMF

3.6 Impact and time cost with different sizes of item-sets

In order to gain deeper understanding of the performance of CoFiSet(MSO), we study itsperformancewith different sizes of item-sets, which is shown in Fig. 4.We have the followingobservations: (1) the performance of CoFiSet(MSO)with different configurations, i.e., |P| =|A| ∈ {2, 3, 4, 5}, is much better than that of |P| = |A| = 1 (i.e., BPRMF). We can thussee that the pairwise comparison of “Set vs. One” is effective in digesting the implicit data;and (2) the performance of CoFiSet(MSO) is usually better than that of CoFiSet(MSO)with |A| = 1, which clearly shows the improvement from “Many ‘Set vs. One’.” As for aconfiguration between the two curves, e.g., (3, 2) for |P| = 3, |A| = 2, the performance islikely to be between (3, 1) and (3, 3).

We also study the efficiency of CoFiSet(MSO) with different values of |P| and |A|, whichis shown in Fig. 5.We can see that (1) the time cost is almost linear w.r.t. the value of |A| given|P|, and (2) CoFiSet(MSO) is very efficient since both CoFiSet(MSO) and BPRMF are of thesame order of CPU time. This result is consistent to the time complexity of CoFiSet(MSO)and BPRMF in Sect. 2.6.6.

123


4 Related work

Personalized recommendation is a ubiquitous smart computing service, which has attractedenormous interest from both researchers and practitioners. In this section, we discuss someclosely related algorithms on personalized recommendation with implicit feedback. The taskof recommendation with implicit feedback aims to provide a personalized ranking list foreach user via exploiting the users’ implicit feedback, e.g., users’ browsing or clicking records.

CLiMF (collaborative less-is-more filtering) [41] proposes to encourage self-competitionsamong observed items only via maximizing

∑i∈Itr

u[ln σ(rui ) + ∑

i ′∈Itru \{i} ln σ(rui − rui ′)]

for each user u. The unobserved items from I tr\I tru are ignored, which may miss some

information during model training.iMF (implicit matrix factorization) [14] and OCCF (one-class collaborative filtering) [27]

propose to minimize∑

i∈Itrucui (1− rui )2 + ∑

j∈Itr\Itrucu j (0− ru j )2 for each user u, where

cui and cu j are estimated confidence values [14,27]. We can see that this objective function isbased on pointwise preferences on items, which is empirically shown being less competitivethan pairwise preferences [35]. ZaOV (Zeros as Optimization Variables) [43] proposes tolearn users’ preferences on unobserved items, ru j , and to minimize

∑i∈Itr

ucui (1 − rui )2 +

∑j∈Itr\Itr

ucu j (ru j − ru j )2, simultaneously.

SLIM (sparse linear methods) [26] treats each item independently and learns (item, item)similarity, si ′i , via minimizing

∑u∈U tr

i(1− ∑

i ′∈Itrusi ′i )2 + ∑

u∈U tr\U tri(0 − ∑

i ′∈Itrusi ′i )2 +

β2

∑ni ′=1 s

2i ′i + λ

∑ni ′=1 si ′i for each item i , with nonnegative constraints si ′i ≥ 0 and hard

constraint sii = 0. The idea to learn similarity measures is interesting, while the pointwiseassumption is the same as that of iMF [14] and OCCF [27]. FISM (factored item similarity-based method) extends SLIM [26] without nonnegative constraints and learns (item, item)similarity by latent variables, and proposes two variants, FISMrmse and FISMauc, based onpointwise and pairwise preference assumptions, respectively.

BPRMF (Bayesian personalized rankingwithmatrix factorization) [35] proposes a relaxedassumption of pairwise preferences over items andminimizes

∑i∈Itr

u

∑j∈Itr\Itr

u− ln σ(rui j )

for each user u. The difference between user u’s preferences on items i and j , rui j = rui −ru j ,is a special case of that in our proposed CoFiSet. In some recommender system like LinkedIn,a user u may click more than one social updates (or items) in one single impression (orsession), and PLMF (pairwise learning via matrix factorization) [13] adopts a similar ideaof [35] and minimizes 1

|O+us ||O−

us |∑

i∈O+us

∑j∈O−

us−σ(rui j ), where O+

us and O−us are sets of

clicked and un-clicked items, respectively, by user u in session s .We can see that the pairwisepreference rui j is also defined on clicked and un-clicked items instead of item-sets as usedin our CoFiSet.

RankALS (ranking-based alternative least square) [46] adopts a square loss and mini-mizes

∑i∈Itr

u

∑j∈Itr\Itr

u

12 (rui j − 1)2 for each user, where rui j is again the user u’s pairwise

preferences over items i and j . RankALS is actually motivated by incorporating the pref-erence difference on two items [15], rui − ru j = 1, into the ALS (alternative least square)algorithmic framework [14] and optimizes a slightly different objective function with someweighting strategies in the loss function.

CCF(SoftMax) [48] assumes that there is a candidate setOui for each observed pair (u, i),which can be formally written asOui = {i}∪A. CCF(SoftMax) models the data as a compet-itive game and proposes to minimize − ln exp(rui )

exp(rui )+∑j∈A exp(ru j )

= ln[1+∑j∈A exp(−rui j )]

for each observed pair (u, i), where rui j = rui − ru j . We can see that CCF(SoftMax) definesthe loss function on pairwise preferences over items instead of item-sets, which is thus

123

314 W. Pan et al.

Table 6 Summary of CoFiSet and some related works on personalized recommendation with implicit feed-back. Note that i, i ′ ∈ Itru , j ∈ Itr\Itru , P ⊆ Itru , and A ⊆ Itr\Itru . The relationship “x v.s. y” denotesencouragement of competitions between x and y, “x − y = c” means that the difference between a user’spreference on x and the one on y is a constant, and “x � y” means that a user prefers x to y. “Batch” and“SGD” denote batch style algorithms and SGD (stochastic gradient descent) style algorithms, respectively

Preference type/assumption Algorithm

Self-comp. i v.s. i ′ One vs. one CLiMF [41] Batch

Pointwise i : like, j :dislike w/o competition iMF [14] Batch

OCCF [27] Batch

SLIM [26] Batch

FISMrmse [16] SGD

i : like, j : learned ZaOV [43] Batch

Pairwise i − j = c One vs. one RankALS [46] Batch

SVD(ranking) [15] SGD

FISMauc [16] SGD

i � j One vs. one BPRMF [35] SGD

PLMF [13] SGD

i � j, j ∈ A Many “one vs. one” CCF(SoftMax) [48] SGD

i � A One vs. set CCF(Hinge) [48] SGD

P � A Set vs. set CoFiSet(SS) [29] SGD

i � j, i ∈ P, j ∈ A Many “one vs. one” CoFiSet(MOO) SGD

i � A, i ∈ P Many “one vs. set” CoFiSet(MOS) SGD

P � j, j ∈ A Many “set vs. one” CoFiSet(MSO) SGD

different from the focus of our CoFiSet. Note that when A = { j}, the loss function ofCCF(SoftMax) reduces to that of BPRMF [35], which is a special case of CoFiSet.

CCF(Hinge) [48] adopts a Hinge loss over an item i and an item-setOui\{i} = A for eachobserved pair (u, i) and minimizes max(0, 1 − ruiA), where ruiA = rui − ruA with ruA =∑

j∈A ru j/|A|. We can see that the loss function of CCF(Hinge) [48] can be considered asa special case of our CoFiSet(SS) when P = {i}. Note that in both CCF(Hinge) [48] andCCF(SoftMax) [48], item i is considered as a preferred or chosen one given a candidate set{i} ∪ A, which is motivated from industry recommender systems with impression data (i.e.,{i} ∪ A) as users’ choice context.

The above related works can also be categorized from the perspective of preference typesor assumptions, which are summarized in Table 6. From Table 6 and our discussions above,we can see that (1) our method CoFiSet is essentially different from these related algorithms,since it is based on a new assumption of pairwise preferences over item-sets, and (2) theworks that are most closely related to ours are BPRMF [35], CCF(SoftMax) [48] and PLMF[13], because they also adopt pairwise preference assumptions, exponential family functionsin loss functions and SGD (stochastic gradient descent) style algorithms (instead of batchstyle algorithms).

Very recently, there are also some works on personalized recommendation with heteroge-nous feedback such as explicit numerical ratings and implicit examinations [33], explicittransaction records and implicit browsing records [31], etc.

123


5 Conclusions and future work

In this paper, we study an important smart computing problem of preference learningfor personalized recommendations. We propose a novel algorithm, i.e., CoFiSet, for solv-ing personalized recommendation with implicit feedback. Specifically, we propose a newassumption, pairwise preferences over item-sets, which is more relaxed than pairwise pref-erences over items as proposed in recent works [35]. With this assumption, we developa general algorithm with four variants, CoFiSet(SS), CoFiSet(MOO), CoFiSet(MOS) andCoFiSet(MSO), which represent “Set vs. Set,” “Many ‘One vs. One’,” “Many ‘One vs. Set”’and “Many ‘Set vs. One”’ pairwise comparisons, respectively. We discuss the relationshipbetween CoFiSet and some recently proposed algorithms such as BPRMF [35], CCF(Hinge)[48] and RankALS [46], and find that CoFiSet is able to embody them as special cases.

We then study CoFiSet on three real-world data sets using five ranking-oriented evaluationmetrics and observe that CoFiSet(MSO) generates better recommendation results than thestate-of-the-art methods.

Our proposed pairwise preference assumption over two item-sets is likely to be moreaccurate than that defined on two items, because our assumption is more relaxed. However,the aggregate functions defined on the randomly constructed item-sets may not be appro-priate when the items are associated with different profiles or contexts. Hence, for futureworks, we are mainly interested in extending CoFiSet in two aspects, (1) studying item-setselection strategies via incorporating the taxonomy information [17] or the cluster structure[50] because items’ relationships can help refine the item-set construction procedure, and (2)studying different forms (e.g., maximum, minimum, medium and weighted linear combina-tion) of users’ aggregated preferences on item-sets, and embedding them in more preferencehandling algorithms such as factorization methods with learned similarities [16,26].

Acknowledgements We thank the support of Hong Kong RGC under the Project RGC/HKBU12200415,Natural Science Foundation of China Nos. 61272365, 61502307 and 61672358

References

1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of thestate-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749

2. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in largedatabases. ACM SIGMOD Rec 22(2):207–216

3. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborativefiltering. In: Proceedings of the 14th conference on uncertainty in artificial intelligence, UAI’98, pp 43–52

4. Chen H, Karger DR (2006) Less is more: probabilistic models for retrieving fewer relevant documents.In: Proceedings of the 29th annual international ACM sigir conference on research and development ininformation retrieval, SIGIR ’06, pp 429–436

5. Chen L, Pu P (2011) Users’ eye gaze pattern in organization-based recommender interfaces. In: Proceed-ings of the 16th international conference on intelligent user interfaces, IUI ’11, pp 311–314

6. Chen T, Tang L, Liu Q, Yang D, Xie S, Cao X, Wu C, Yao E, Liu Z, Jiang Z, Chen C, Kong W, YuY(2012) Combining factorization model and additive forest for collaborative followee recommendation.In: Proceedings of the KDD cup 2012 workshop, KDDCUP ’12

7. Davidson J, Liebald B, Liu J, Nandy P, Van Vleet T, Gargi U, Gupta S, He Y, Lambert M, LivingstonB, Sampath D (2010) The youtube video recommendation system. In: Proceedings of the 4th ACMconference on recommender systems, RecSys ’10, pp 293–296

8. Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Syst22(1):143–177

9. Du L, Li X, Shen Y-D (2011) User graph regularized pairwise matrix factorization for item rec-ommendation. In: Proceedings of the 7th international conference on advanced data mining andapplications—volume part II, ADMA ’11, pp 372–385

123

316 W. Pan et al.

10. Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to weave an informationtapestry. Commun ACM 35(12):61–70

11. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Morgan Kaufmann, Burlington12. Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Syst 22(1):89–11513. Hong L, Bekkerman R, Adler J, Davison BD (2012) Learning to rank social update streams. In: Pro-

ceedings of the 35th international ACM SIGIR conference on research and development in informationretrieval, SIGIR ’12, pp 651–660

14. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: Proceedingsof the 8th IEEE international conference on data mining, ICDM ’08, pp 263–272

15. Jahrer M, Toscher A (2011) Collaborative filtering ensemble for ranking. In: Proceedings of the KDDcup 2011 workshop, KDDCUP ’11

16. Kabbur S, Ning X, Karypis G (2013) Fism: factored item similarity models for top-n recommendersystems. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discoveryand data mining, KDD ’13, pp 659–667

17. Koenigstein N, Dror G, Koren Y (2011) Yahoo! music recommendations: modeling music ratings withtemporal dynamics and item taxonomy. In: Proceedings of the 5th ACM conference on recommendersystems, RecSys ’11, pp 165–172

18. Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In:Proceedings of the 14thACMSIGKDD international conference on knowledge discovery and datamining,KDD ’08, pp 426–434

19. Koren Y (2010) Factor in the neighbors: scalable and accurate collaborative filtering. ACM Trans KnowlDiscov Data 4(1):1:1–1:24

20. Li B, Yang Q, Xue X (2009) Can movies and books collaborate?: cross-domain collaborative filteringfor sparsity reduction. In: Proceedings of the 21st international joint conference on artificial intelligence,IJCAI ’09, pp 2052–2057

21. Linden G, Smith B, York J (2003) Amazon.com recommendations: item-to-item collaborative filtering.IEEE Internet Comput 7(1):76–80

22. Liu M, Zeng Z, Pan W, Peng X, Shan Z, Ming Z (2016) Hybrid one-class collaborative filtering for jobrecommendation. In: Proceedings of international conference on smart computing and communication,SmartCom’16, pp 250–259

23. Liu NN, Zhao M, Yang Q (2009) Probabilistic latent preference analysis for collaborative filtering. In:Proceedings of the 18th ACM conference on information and knowledge management, CIKM ’09, pp759–766

24. Ma H, King I, Lyu MR (2011) Learning to recommend with explicit and implicit social relations. ACMTrans Intell Syst Technol 2(3):29:1–29:19

25. Nichols DM (1997) Implicit rating and filtering. In: Proceedings of the 5th DELOS workshop on filteringand collaborative filtering

26. Ning X, Karypis G (2011) Slim: sparse linear methods for top-n recommender systems. In: Proceedingsof the 2011 IEEE 11th International conference on data mining, ICDM ’11, pp 497–506

27. Pan R, Zhou Y, Cao B, Liu NN, Lukose R, Scholz M, Yang Q (2008) One-class collaborative filtering.In: Proceedings of the 8th IEEE international conference on data mining, ICDM ’08, pp 502–511

28. Pan W (2016) A survey of transfer learning for collaborative recommendation with auxiliary data. Neu-rocomputing 177:447–453

29. Pan W, Chen L (2013) Cofiset: collaborative filtering via learning pairwise preferences over item-sets.In: Proceedings of the 13th SIAM international conference on data mining, SDM ’13, pp 180–188

30. Pan W, Chen L (2013) GBPR: group preference based Bayesian personalized ranking for one-classcollaborative filtering. In: Proceedings of the 23rd international joint conference on artificial intelligence,IJCAI’13, pp 2691–2697

31. PanW, Liu M, Ming Z (2016) Transfer learning for heterogeneous one-class collaborative filtering. IEEEIntell Syst 31(4):43–49

32. Pan W, Xiang EW , Yang Q (2012) Transfer learning in collaborative filtering with uncertain ratings. In:Proceedings of the 26th AAAI conference on artificial intelligence, AAAI ’12, pp 662–668

33. Pan W, Yang Q, Duan Y, Ming Z (2016) Transfer learning for semisupervised collaborative recommen-dation. ACM Trans Interact Intell Syst 6(2):10:1–10:21

34. Rendle S (2012) Factorization machines with libfm. ACM Trans Intell Syst Technol 3(3):57:1–57:2235. Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009) BPR: Bayesian personalized ranking

from implicit feedback. In: Proceedings of the 25th conference on uncertainty in artificial intelligence,UAI ’09, pp 452–461

123


36. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture forcollaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supportedcooperative work, CSCW ’94, pp 175–186

37. Ricci F, Rokach L, Shapira B (2015) Recommender systems handbook, 2nd edn. Springer, New York38. Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. In: annual conference on neural

information processing systems 20, NIPS ’08, pp 1257–126439. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algo-

rithms. In: Proceedings of the 10th international conference on world wide web, WWW ’01, pp 285–29540. Shardanand U, Maes P (1995) Social information filtering: algorithms for automating “word of mouth”.

In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’95, pp 210–21741. Shi Y, Karatzoglou A, Baltrunas L, Larson M, Oliver N, Hanjalic A (2012) Climf: learning to maximize

reciprocal rank with collaborative less-is-more filtering. In: Proceedings of the 6th ACM conference onrecommender systems, RecSys ’12, pp 139–146

42. Si L, Jin R (2003) Flexible mixture model for collaborative filtering. In: Proceedings of the 20th interna-tional conference on machine learning, ICML ’03, pp 704–711

43. Sindhwani V, Bucak SS, Hu J, Mojsilovic A (2009) A family of non-negative matrix factorizations forone-class collaborative filtering. In: The 1st international workshop on recommendation-based industrialapplications held in the 3rd ACM conference on recommender systems, RecSys: RIA ’09

44. Srebro N, Rennie JDM, Jaakkola T (2004) Maximum-margin matrix factorization. In: Annual conferenceon neural information processing systems 16, NIPS ’04

45. Su J-H, Wang B-W, Hsiao C-Y, Tseng VS (2010) Personalized rough-set-based recommendation byintegrating multiple contents and collaborative information. Inf Sci 180(1):113–131

46. Takács G, Tikk D (2012) Alternating least squares for personalized ranking. In: Proceedings of the 6thACM conference on Recommender systems, RecSys ’12, pp 83–90

47. Weimer M, Karatzoglou A, Smola A (2008) Improving maximum margin matrix factorization. MachLearn 72(3):263–276

48. Yang S-H, Long B, Smola AJ, ZhaH, Zheng Z (2011) Collaborative competitive filtering: learning recom-mender using context of user choice. In: Proceedings of the 34th international ACM SIGIR Conferenceon research and development in information retrieval, SIGIR ’11, pp 295–304

49. YuanQ, Chen L, Zhao S (2011) Factorization vs. regularization: fusing heterogeneous social relationshipsin top-n recommendation. In: Proceedings of the 5th ACM conference on recommender systems, RecSys’11, pp 245–252

50. Zhu K, Huang J, Zhong N (2016) Exploiting group pairwise preference influences for recommendations.In: Proceedings of the international joint conference on rough sets, IJCRS ’16, pp 429–438

Weike Pan is an Associate Professor in the College of ComputerScience and Software Engineering at Shenzhen University, China.He received a Ph.D. in computer science and engineering from theHong Kong University of Science and Technology. His research inter-ests include transfer learning, intelligent recommendation and machinelearning.

123

318 W. Pan et al.

Li Chen is an Assistant Professor at Hong Kong Baptist University.She obtained her Ph.D. degree in human computer interaction at SwissFederal Institute of Technology in Lausanne (EPFL), and Bachelorand Master degrees in computer science at Peking University, China.Her research interests are mainly in the areas of human–computerinteraction, user-centered development of recommender systems and e-commerce decision supports. She is now ACM senior member, and edi-torial board member of User Modeling and User-Adapted InteractionJournal (UMUAI). She has also been serving in a number of journalsand conferences as guest editor, co-organizer, (senior) PC member andreviewer.

Zhong Ming is a Professor in the College of Computer Science andSoftware Engineering at Shenzhen University, China. He received aPh.D. in computer science and technology from Sun Yat-Sen Univer-sity, China. His research interests include software engineering andWeb intelligence

123

personalized recommendation with implicit feedback via learning...

Documents