collaborative error-reflected models for cold-start recommender systems

13
Collaborative error-reected models for cold-start recommender systems Heung-Nam Kim a, , Abdulmotaleb El-Saddik a,c , Geun-Sik Jo b a School of Information Technology and Engineering, University of Ottawa, Canada b Department of Information Engineering, Inha University, Korea c Faculty of Engineering, New York University Abu Dhabi, UAE abstract article info Article history: Received 16 April 2010 Received in revised form 8 December 2010 Accepted 27 February 2011 Available online 4 March 2011 Keywords: Collaborative ltering Cold start problems Recommender systems Collaborative Filtering (CF), one of the most successful technologies among recommender systems, is a system assisting users to easily nd useful information. One notable challenge in practical CF is the cold start problem, which can be divided into cold start items and cold start users. Traditional CF systems are typically unable to make good quality recommendations in the situation where users and items have few opinions. To address these issues, in this paper, we propose a unique method of building models derived from explicit ratings and we apply the models to CF recommender systems. The proposed method rst predicts actual ratings and subsequently identies prediction errors for each user. From this error information, pre-computed models, collectively called the error-reected model, are built. We then apply the models to new predictions. Experimental results show that our approach obtains signicant improvement in dealing with cold start problems, compared to existing work. © 2011 Elsevier B.V. All rights reserved. 1. Introduction The prevalence of digital devices and the development of Web 2.0 technologies and services enable end-users to be producers as well as consumers of media content. Even in a single day, an enormous amount of content including digital video, blogging, photography and wikis is generated on the Web. It is getting more difcult to make a recommendation to a user about what he/she will prefer among those items automatically, not only because of the huge amount of data, but also because of the difculty of automatically grasping the meanings of such data. Recommender systems that have emerged in response to the above challenges provide users with recommendations of items that are likely to t their needs [2]. One of the most successful technologies among recommender systems is Collaborative Filtering (CF). Numerous on-line companies (e.g., Amazon.com, Netix.com, and Last.fm) apply CF to provide recommendations to their customers. CF has an advantage over content-based ltering, which is the ability to lter any type of items, such as text, music, videos and photos [11]. Because the ltering process is only based on historical information of whether or not a given target user has preferred an item before, analysis of the actual content itself is not necessarily required. However, despite its success and popularity, CF encounters serious limitations with quality evaluation, namely the cold start problems. The cold start problems that can be divided into cold start items and cold start users occur when available data is insufcient to enable the user to grasp such data [2]. A cold start item is cause by a new item. In a CF-based recommender system, an item cannot be recommended until a number of users have previously rated it. This is known as the cold start item problem [27]. This item is hardly ever recommended to users due to insufcient user opinions. Another notable challenge in recommender systems is the cold start user problem [1,30,31]. A cold start user describes a new user who has joined a CF-based recommender system and has presented few opinions. With this situation, it is often the case that there is no intersection at all between two users, and hence the similarity is not computable. Even when the computation of similarity is possible, it may not be very reliable because of the insufcient processed information. Accordingly, the system is generally unable to make high quality recommendations [2]. These problems, particularly for the cold start items, can be partially alleviated by content-based technologies because they provide recommendations by comparing properties or content contained in an item to those of a user's interest items. Therefore, a number of studies have attempted to incorporate content-based techniques into collaborative ltering [15,20,27]. Although such systems give promise of overcoming the problems, the main drawback to the systems is that ltering processes generally depend on a type of items (e.g., articles, images, music, and videos); consequently, a system working a particular application domain could not be directly applied to different domains without any modications. Moreover, for some domains it is hard to automatically analyze the underlying content, as well as a user's interest cannot always be characterized by content properties contained in an item [6,7]. Decision Support Systems 51 (2011) 519531 Corresponding author. Tel.: +1 613 562 5800x6248; fax: +1 613 562 5664. E-mail address: [email protected] (H.-N. Kim). 0167-9236/$ see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.dss.2011.02.015 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss

Upload: heung-nam-kim

Post on 04-Sep-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Collaborative error-reflected models for cold-start recommender systems

Decision Support Systems 51 (2011) 519–531

Contents lists available at ScienceDirect

Decision Support Systems

j ourna l homepage: www.e lsev ie r.com/ locate /dss

Collaborative error-reflected models for cold-start recommender systems

Heung-Nam Kim a,⁎, Abdulmotaleb El-Saddik a,c, Geun-Sik Jo b

a School of Information Technology and Engineering, University of Ottawa, Canadab Department of Information Engineering, Inha University, Koreac Faculty of Engineering, New York University Abu Dhabi, UAE

⁎ Corresponding author. Tel.: +1 613 562 5800x6248E-mail address: [email protected] (H.-N. Ki

0167-9236/$ – see front matter © 2011 Elsevier B.V. Aldoi:10.1016/j.dss.2011.02.015

a b s t r a c t

a r t i c l e i n f o

Article history:Received 16 April 2010Received in revised form 8 December 2010Accepted 27 February 2011Available online 4 March 2011

Keywords:Collaborative filteringCold start problemsRecommender systems

Collaborative Filtering (CF), one of the most successful technologies among recommender systems, is asystem assisting users to easily find useful information. One notable challenge in practical CF is the cold startproblem, which can be divided into cold start items and cold start users. Traditional CF systems are typicallyunable to make good quality recommendations in the situation where users and items have few opinions. Toaddress these issues, in this paper, we propose a unique method of building models derived from explicitratings and we apply the models to CF recommender systems. The proposed method first predicts actualratings and subsequently identifies prediction errors for each user. From this error information, pre-computedmodels, collectively called the error-reflected model, are built. We then apply the models to new predictions.Experimental results show that our approach obtains significant improvement in dealing with cold startproblems, compared to existing work.

; fax: +1 613 562 5664.m).

l rights reserved.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

The prevalence of digital devices and the development of Web 2.0technologies and services enable end-users to be producers as well asconsumers of media content. Even in a single day, an enormousamount of content including digital video, blogging, photography andwikis is generated on the Web. It is getting more difficult to make arecommendation to a user about what he/she will prefer among thoseitems automatically, not only because of the huge amount of data, butalso because of the difficulty of automatically grasping the meaningsof such data. Recommender systems that have emerged in response tothe above challenges provide users with recommendations of itemsthat are likely to fit their needs [2].

One of the most successful technologies among recommendersystems is Collaborative Filtering (CF). Numerous on-line companies(e.g., Amazon.com, Netflix.com, and Last.fm) apply CF to providerecommendations to their customers. CF has an advantage overcontent-based filtering, which is the ability to filter any type of items,such as text, music, videos and photos [11]. Because the filteringprocess is only based on historical information of whether or not agiven target user has preferred an item before, analysis of the actualcontent itself is not necessarily required. However, despite its successand popularity, CF encounters serious limitations with qualityevaluation, namely the cold start problems.

The cold start problems that can be divided into cold start items andcold start users occur when available data is insufficient to enable theuser to grasp such data [2]. A cold start item is cause by a new item. Ina CF-based recommender system, an item cannot be recommendeduntil a number of users have previously rated it. This is known as thecold start item problem [27]. This item is hardly ever recommended tousers due to insufficient user opinions. Another notable challenge inrecommender systems is the cold start user problem [1,30,31]. A coldstart user describes a new user who has joined a CF-basedrecommender system and has presented few opinions. With thissituation, it is often the case that there is no intersection at all betweentwo users, and hence the similarity is not computable. Even when thecomputation of similarity is possible, it may not be very reliablebecause of the insufficient processed information. Accordingly, thesystem is generally unable tomake high quality recommendations [2].These problems, particularly for the cold start items, can be partiallyalleviated by content-based technologies because they providerecommendations by comparing properties or content contained inan item to those of a user's interest items. Therefore, a number ofstudies have attempted to incorporate content-based techniques intocollaborative filtering [15,20,27]. Although such systems give promiseof overcoming the problems, themain drawback to the systems is thatfiltering processes generally depend on a type of items (e.g., articles,images, music, and videos); consequently, a system working aparticular application domain could not be directly applied todifferent domains without any modifications. Moreover, for somedomains it is hard to automatically analyze the underlying content, aswell as a user's interest cannot always be characterized by contentproperties contained in an item [6,7].

Page 2: Collaborative error-reflected models for cold-start recommender systems

520 H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

In this paper, we address the above issues by introducing a uniquemethod of building models that can be applied to CF recommendersystems. Our aim is to build a recommender system derived fromexplicit ratings so that it can be flexible for any type of items. Theproposed method is divided into two phases, an offline phase and anonline phase. The offline phase is a pre-computed model buildingphase in which most tasks can be conducted. The online phase iseither a prediction or recommendation phase in which the models areused. In the model building phase, we first determine pre-predictedratings and subsequently identify the pre-prediction errors for eachuser. From the error information, error-reflected models are built. Theerror-reflected models that reflect the average pre-prediction errorsof user neighbors and of item neighbors can make accuratepredictions in the situation where users or items have few opinions.In addition, in order to reduce the re-building tasks, the error-reflected models are designed such that the models are effectivelyupdated and users' new opinions are incrementally reflected, evenwhen users present a new rating feedback.

The subsequent sections are organized as follows: Section 2 brieflydiscusses previous studies related to collaborative filtering. InSection 3, we describe a detailed method of building models. InSection 4, we then provide a description of how the system uses themodels for item predictions. In Section 5, an experimental evaluationis presented comparing our approach with existing work. Finally, wepresent the conclusions and future work.

2. Background and preliminaries

In this section, we briefly explain the concepts used in our researchon recommender systems, especially those related to a user-based CFand an item-based CF. CF is based on the fact that “word of mouth”opinions of other people have considerable influence on the buyers'decision making [14,28]. If advisors have similar preferences to thebuyer, the buyer is much more likely to be affected by their opinions.The most common ways to obtain users' opinions is to use ratinginformation given explicitly or observed implicitly [6,12]. In the caseof explicit ratings indicating how relevant or interesting a specificitem is to a user, users are required to explicitly evaluate an item usinglike/dislike, thumbs up/down (a binary scale), or numerical values(e.g., a scale of 1–5 points). On the other hand, in the case of implicitratings, information observed implicitly from users' behaviors istreated as preference indicators, e.g., a user viewed, accessed, listenedto, or bought an item [21]. The best use of implicit ratings has beenseen in Amazon.com where a user's past purchased items are used tomake product recommendations [18].

Although the field of CF research has a large number of informationfiltering problems, in this paper, we focus on explicit numerical ratingsthat can be represented as an m×n user–item rating matrix R [26].

Definition 1. User–item rating matrix, R.

If there is a list of m users U={u1,u2,…,um} and a list of n items I={i1,i2,…,in} mapping between the user–item pairs and the explicitratings, then m×n user–item data can be represented as a ratingmatrix. This matrix is called a user–item rating matrix, R. The matrixrows represent users, the columns represent items and Ru,j representsthe rating of user u of item j. Some of the entries are not filled, as thereare items not rated by some users.

In matrix R, an element Ru,j either exists as numerical ordinal scalebetween Rmin and Rmax, or is empty. If user u rates item j with Rmin, itimplies he/she does not have any preference for the item j. On thecontrary, if user u rates item jwith Rmax, it means the item is suited to his/her preference. If user u has not previously rated item j (i.e., a blankelement), we assign ∅ to the value of Ru,j (i.e., Ru,j=∅), and ultimatelythose items (i.e., Ru,*=∅) which have not yet been rated by the targetuser can be considered for recommendation to the target user.

In CF-based recommendation schemes, two approaches havemainly been developed: memory-based CF (also known as user-based CF) and model-based CF [5]. Following the proposal ofGroupLens [24], the first system generates automated recommen-dations, user-based CF approaches have seen the widest use inrecommender systems. User-based CF uses a similarity measure-ment between neighbors and a target user to learn and predictpreferences toward new items or unrated products by the targetuser. However, despite the popularity of user-based CF algorithms,they have some serious problems relating to the increasingcomputation complexity of recommendations as the number ofusers and items increases. In addition, problems of sparsity due tothe insufficiency of users' historical information should be seriouslyconsidered [25]. In order to improve the scalability and real-timeperformance of large applications, a variety of model-basedrecommendation techniques have been developed. Model-basedapproaches, such as our algorithm, provide item recommendationsby first developing a pre-computed model [13]. In comparison touser-based approaches, model-based approaches are typically fasterin terms of recommendation time, though the method may have anexpensive learning or model building process. A new class of model-based CF, called an item-based CF, has been proposed [9,25] andapplied to commercial recommender systems such as Amazon.com[18]. Instead of computing the similarities between users, an item-based CF reviews a set of items that the target user has rated andselects the most similar items based on the similarities between theitems.

Usually, user-based and item-based CF systems involve two steps:first, the neighbor group, which are users who have a similarpreference to the target user (for user-based CF) or the set of itemsthat is similar to the item rated by the target user (for an item-basedCF), should be determined by using a variety of similarity computingmethods. Based on the group of neighbors, we obtain the predictionvalues of particular items, estimating how much the target user islikely to prefer the items, and then the top-N items with the highestpredicted values of interest to the target user are identified.

2.1. Neighborhood formation

The important task in CF-based recommendations is neighborhoodformation, because different neighbor users or items lead to differentrecommendations. Here, neighbors simply mean a group of like-minded users similar to a target user or a set of items similar to thosethat were previously identified as being preferred by the target user.The number of neighbors may be varied depending on thecharacteristics of the domains and the application. Since the numberalso has significant impact on the quality of results from the CF, therecommender systems should determine the size of the neighborhoodin order to compute the prediction results effectively [26].

2.1.1. User neighborhoodThe main goal of neighborhood formation for a user-based CF is to

identify the set of user neighbors which is defined as the group ofusers exhibiting preferred items similar to those of the target user. Forfinding the nearest neighbors, a variety of similarity methods havebeen researched, such as the Pearson correlation [24], which is widelyused, cosine similarity, weight amplification, inverse user frequency,and default rating [5], including probability-based approaches.According to the results of the selected similarity measure, particulark users with highest similarity are identified as neighbors. Fig. 1 showsexamples of calculating the similarity of two users, u and v from auser–item rating matrix R. Finally, for m users, the similarity of userscan be represented as an m×m user–user similarity matrix A, whereboth rows and columns represent users. In matrix A, Au,v, whichrepresents the u-th user for the v-th user, is set to the similarity valuebetween a pair of users u and v if the corresponding similarity value is

Page 3: Collaborative error-reflected models for cold-start recommender systems

u1

uu

uv

um

itemuser i1 ii ij in

user-item rating matrix R

R1,1 R1,j R1,n

... ...

......

Rv,i

Ru,1

Rv,n

Rm,i Rm,j Rm,n

Ru,j

Rv,j

Ru,nRu,iTar

getU

ser

user-user similarity matrix A

u1

uu

uv

um

useruser u1 uu uv um... ...

......

Au,v

Fig. 1. A user–user similarity matrix A used in user-based CF.

521H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

greater than the k highest similarity value in the u-th row of A, and 0otherwise. Non-zero entries of each row, often called k nearestneighbors (KNN), are used to recommend items for each user of therow.

2.1.2. Item neighborhoodInstead of computing similarities between users, an item

neighborhood for an item-based CF is generated by computingsimilarities between items. The main use of the neighborhood is toidentify, for each item, the set of items that is most likely to bepreferred by users. For capturing the similarity relationshipsbetween pairs of items, Sarwar et al. [25] proposed several similaritymeasures between pairs of items such as cosine-based similarity,correlation-based similarity and adjusted cosine similarity. Thebasic idea of computing similarity between two items is to first lookinto users who have rated both of these two items and then to applyone of the similarity measures to calculate a similarity valuebetween the two items [25].

Fig. 2 illustrates computation of the similarity for a pair of items iand j corresponding to the i-th and j-th columns of the ratingmatrix R.Similar to the user–user similarity matrix A, for n items, the similarityof items can be represented as an n×n item–item similarity matrix D,where the i-th row stores the k′most similar items to item i. In matrixD, Di,j is set to the similarity value between two items i and j if the

u1

uv

uu

um

itemuser i1 ii ij in

user-item rating matrix R

R1,1 R1,j R1,n

... ...

......

Rv,2

Ru,1

Rv,n

Rm,2 Rm,j Rm,n

Ru,j

Rv,j

Ru,nRu,2

Target Item

Fig. 2. An item–item similarity ma

corresponding similarity value is greater than the k′ highest similarityvalue in the i-th row of D, and 0 otherwise. Non-zero entries per eachrow, often called k′ most similar items (MSI), are used to recommenditems for the target user [9].

2.2. Predictions and recommendations

Once the neighborhood is generated, various methods can be usedto combine the ratings of neighbors to compute a prediction value onunrated items for the target user. The preference rating of eachneighbor is usually weighted by the similarity value, which iscomputed when the neighbors are determined. The more a neighboris similar to a target user or item, the more influence he/she has forcalculating a prediction value. After predicting howmuch a target userwill like particular items not previously rated by him/her, the top-Nitem set, the set of ordered items with the highest predicted values isidentified and recommended. The target user can present feedback onwhether he/she actually likes the recommend top-N items or howmuch he/she prefers those items as scaled ratings.

2.2.1. User-based predictionIn a user-based CF, we can predict the target user's interest in the

target item based on its ratings from other similar users. The mainidea is that ratings by more similar users contribute more to

item-item similarity matrix D

i1

ii

ij

in

itemitem

......

Di,j

i1 ii ij in... ...

trix D used in item-based CF.

Page 4: Collaborative error-reflected models for cold-start recommender systems

Table 1An example of a user–item rating matrix, R.

Seven JFK Titanic Shrek AI Godzilla

Alice 3 5 1 5Bob 4 4 5 4 3John 2 5 4 4Dannis 1 2 5 4

522 H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

predicting the target item rating [11]. Formally, the measurement ofhow much target user u prefers item j is given by:

R∨u; j = Ru +

∑v∈KNN Rv; j−Rv

� �·sim u; vð Þ

∑v∈KNN jsim u; vð Þ j ð1Þ

where KNN is a set of k nearest neighbors of user u, and Rv,j is therating of user v on item j. In addition, Ru and Rv refer to the averagerating of users u and v, and sim(u,v) represents the similarity betweenusers u and v, which can be calculated by a number of differentmethods, as discussed in Section 2.1.1.

2.2.2. Item-based predictionEssentially, item-based prediction tries to capture how the target

user has rated similar items [25]. For predicting a particular item in anitem-based CF, we can calculate a weighted average of the user'sratings Ru,j using the similarity between two items as the weight.Formally, we can calculate the predicted rating of target user u fortarget item j using the following formula:

R∨u; j =

∑i∈MSIsim i; jð Þ·Ru;i

∑i∈MSI jsim i; jð Þ j ð2Þ

where MSI is the set of k′ most similar items to item j. sim(i, j)represents the similarity between items i and j, which can becalculated in the manner mentioned in Section 2.1.2. The main ideabehind this prediction is that the weighted rating of items that aresimilar to the target item is a good estimate of the rating for that item.

3. Building collaborative models using pre-prediction errors

In this section, we describe our method of building models,collectively called an error-reflected model, in detail. According to atype of neighbors used in building the model, the error-reflectedmodel is divided into three classes: a user-based model, an item-based model, and a hybrid model.

In this paper, prior to generating a rating prediction of the items thatusers have not yet rated, we predict values of the items that the usershave previously rated. That is, we validate how accurately the rating of agivenuser for an item is predicted, compared to anactual rating givenbyhim/her for that item. In this sense, a prediction can be divided into twocases: a prediction of a target user on items that have already been ratedby the target user and a prediction of a target user on items that have notyet been rated by the target user. To differentiate the former from thelatter, we label the former case a pre-prediction.

3.1. Computing pre-predictions

Prior to generating a pre-prediction, we first identify k nearestneighbors of each user by using cosine similarity with the inverse userfrequency [5] and k′ most similar items of each item by using cosinesimilarity with the inverse item frequency. For the pre-prediction ofthe items, we withhold a single selected item for a target user withinthe entire selection of items he/she rated, and then try to predict itsvalue. In order to compute the pre-prediction value of target user u foritem j, we consider not only the rating propensity of users who havesimilar tastes with user u but also the past rating propensity of user ufor items similar to item j. Formally, the measurement of the pre-prediction is given by:

Pu; j = Rjknn uð Þ +

∑i∈MSIu jð Þ Ru;i−Riknn uð Þ

� �× sim i; jð Þ

∑i∈MSIu jð Þsim i; jð Þ ð3Þ

where Pu,j is a pre-predicted value of user u on item j and MSIu(j) is aset of most similar items of item j. Rknn(u)i and Rknn(u)

j , respectively,

refer to the average rating of the nearest neighbors of user u for items iand j. Note that if the average rating of the user neighborhood for acertain item is unavailable, we always use the average rating of item jrated by all users instead. sim(i, j) represents the similarity betweenitems i and j, which can be calculated using diverse similarityalgorithms such as cosine-based similarity, correlation-based simi-larity and adjusted cosine similarity [25]. However, we also considerthe number of users' ratings of items in generating item-to-itemsimilarities, namely the inverse item frequency. When the inverse itemfrequency is applied to the cosine similarity technique, the similaritybetween two items, i and j is measured by Eq. (4):

sim i; jð Þ =∑u∈ Ui∩Ujð Þ Ru;i × log n= fuð Þ

� �× Ru; j × log n = fuð Þ

� �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑u∈Ui

Ru;i× log n= fuð Þ� �2

r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑u∈Uj

Ru; j× log n= fuð Þ� �2

r ð4Þ

where Ui and Uj refer to a set of users who rated items i and j,respectively. Ru,i is the rating of user u on item i whereas Ru,j is therating of user u on item j. The inverse item frequency of user u isdefined as log(n/fu), where fu is the number of items rated by user uand n is the total number of items in the system. If user u rated allitems, then the value of the inverse item frequency is 0. Likewise, inthe inverse user frequency [5], the main concept of the inverse itemfrequency dictates that users rating numerous items present lesscontribution with regard to similarity than users rating a smallernumber of items [9].

To illustrate a simple example for computing a pre-prediction,consider the following user–item rating matrix R as shown in Table 1.Alice has already rated the movies “Seven,” “JFK,” “Shrek” and“Godzilla,” but she has not yet seen “Titanic” and “AI”. For Alice, themovies for the pre-prediction are thosemovies that have already beenrated by her, IAlice={Seven, JFK, Shrek, Godzilla}.

Assume that we calculate the prediction value for “JFK” by usingEq. (3). And suppose KNN(Alice), the similar user neighborhood ofAlice, and MSI(JFK), the similar item neighborhood of “JFK,” are asfollows:

− KNN(Alice)={John, Bob}− MSI(JFK)={(Seven, 0.95), (Godzilla, 0.88), (Titanic, 0.71)}.

Analyzing the rating for “JFK” of the neighbors of Alice, John ratedit with 5 points and Bob rated it with 4 points. Hence, the averagerating of the neighborhood Rknn(Alice)

JFK becomes 4.5. In addition,analyzing the rating of Alice for items that are similar to “JFK,” thevalue is calculated as follows:

3−3ð Þ × 0:95 + 5−4ð Þ × 0:880:95 + 0:88

= 0:48:

In this calculation, “Titanic” is excluded because Alice has not yetrated it. Finally, we can calculate the pre-predicted value of Alice for“JFK” as follows:

PAlice; JFK = 4:5 + 0:48 = 4:98:

This implies that the movie “JFK” is pre-predicted as 4.98 eventhough the actual rating of Alice on it is 5.

Page 5: Collaborative error-reflected models for cold-start recommender systems

523H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

Definition 2. User–item pre-prediction matrix, P.

For an m×n user–item rating matrix, R, the pre-predictions canbe represented as an m×n user–item pre-prediction matrix, P. Thematrix rows represent users, the columns represent items, and Pu, j,Rmin≤Pu,j≤Rmax, represents the pre-predicted rating of user u onitem j.

Analogous to the rating matrix R, the value Pu,j may be assigned to∅, indicating user u has not previously rated item j. Occasionally, wecannot compute the pre-prediction due to the following cases: i) theuser neighborhood of user u does not exist and ii) the itemneighborhood of item j does not exist. If the cases happen, theaverage rating value of user u is used as Pu,j.

3.2. Computing pre-prediction errors

Once the predictions for users on items are represented on the pre-prediction matrix, error of each prediction can be computed bysubtracting the pre-predicted value from the actual rating. Given theset of actual and pre-predicted rating pairs bRu,j, Pu,jN for all actualrating value in the rating matrix R and the corresponding pre-predicted value in the pre-prediction matrix P, a prediction error iscalculated as:

Eu; j = Ru; j−Pu; j: ð5Þ

Fig. 3 illustrates the process of computing the pre-prediction error.For example, the error of the pre-predicted value of Alice for “JFK,”EAlice,JFK, as mentioned in the previous section, becomes 0.02.

Formally, from matrices R and P, the prediction errors can berepresented by a user–item error matrix.

Definition 3. User–item error matrix, E.

From the given set of actual rating and pre-predicted value pairsbRu, j, Pu, jN for all the data in matrices R and P, a user–item errormatrix, E, can be filled with error entries. Each entry, Eu, j, in Erepresents the pre-prediction error of the uth user of the jth item. Eu, jis in the range of (Rmin–Rmax) and (Rmax–Rmin). Some of the entries arenot filled, as there are items that are not rated by some users.

In the case that the pre-prediction was overestimated, the pre-prediction error value Eu, j becomes negative; on the contrary, in thecase that the value was underestimated, Eu, j becomes positive (Fig. 4).If Eu, j=0, it means that the algorithm exactly estimated the actualrating. The closer to 0 the value approaches, the higher the accuracy of

5 3

3

Ru,1

2 4

Ru,2 Ru,j Ru,n

Rm,1 Rm,j

4.8 3.

4 3.6

Pm,1 P

3 – 3.9 = - 0.9

user-item rating matrix, R user-item a prior pred

u1

uu

um

...

itemuser

u2

...

i1 i2 ij in... ...

Pu,1 Pu,2 P

Fig. 3. The process of computing a pre-prediction error. The pre-prediction error ca

a pre-prediction value. A pre-prediction error can be analyzed asfollows:

− Ru, jbPu, j (overestimation): In the case of the overestimation, thepre-predicted value is estimated as being higher than the actualrating value of the user. This is the result of the prediction basedon the rating tendency of similar users with target user u and thepast rating tendency of target user u for items that are similar toitem j. Considering this point, with respect to a new prediction ofitem j for a certain user similar to the target user u, it may benecessary to slightly decrease the value predicted. In addition, inthe case of a new prediction of the target user u for items similarto item j, it may also be necessary to slightly decrease the valuepredicted.

− Ru,jNPu,j (underestimation): Contrary to the overestimation, it maybe necessary to slightly increase the predicted value with respectto a new prediction for the case of the underestimation.

3.3. Building error-reflected models

As mentioned in the previous section, the pre-prediction error of atarget user is the result that reflects the opinion of like-minded usersand the target user's own rating tastes. Therefore, pre-predictionerrors of users similar to a target user on a certain item may containvaluable information to make a prediction of the target user for thatitem. Likewise, pre-prediction errors of a target user for items that aresimilar to a certain item may be helpful to estimate the rating of thetarget user for that item. In fact, there are recent studies [4,10,16,22]that have made attempts to utilize pre-predictions to CF recommend-er systems. Similar to their motivation, the fundamental assumptionof our study is that there are systematic and thus exploitable pre-prediction errors for predicting items that have not yet been rated bythe target user.

Theoretically, the error matrix E itself can be used for a newprediction. Intuitively, however, if a pre-prediction value isaccurate, it implies that the pre-prediction process accuratelyreflects the tendency of the user's past ratings on similar items andthe tendency of similar other users' ratings on the same items. Onthe contrary, if a pre-prediction value appears to be of a highdeviation from a corresponding actual rating, there may be noiseinformation used in making the prediction. Therefore, to avoid innot only increasing unnecessary computation cost but alsoincluding unnecessary noise information, the model is built bythe only data within a predetermined threshold (θ). Note that theprediction error value becomes negative in the case of overesti-mation whereas it becomes positive in the case of underestimation.

9

2.9

m,j

iction matrix, P

0.2 -0.9

0.1

Eu,1

-2 0.4

Eu,2 Eu,j Eu,n

Em,1 Em,j

user-item error matrix, E

u1

uu

um

...

itemuser

u2

...

i1 i2 ij in... ...

u,j Pu,n

n be calculated by subtracting the pre-prediction value from the actual rating.

Page 6: Collaborative error-reflected models for cold-start recommender systems

Rmin

Ru,j

Rmax

Pu,j

Pu,j

overestimation

underestimation

Fig. 4. Overestimation and underestimation of a pre-prediction value.

524 H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

Therefore, we select the elements of the error matrix E that satisfythe following conditions:

Eu;j��� ���bθ; forRu;j≠∅: ð6Þ

The method of buildingmodels that are reflected by the predictionerrors can be divided into three approaches: a user-based approach,an item-based approach, and a hybrid approach.

3.3.1. The user-based error-reflected modelThe user-based error-reflected model is built by utilizing the pre-

prediction errors of similar user neighbors with the target user u for acertain item. The built model can be represented as an m×n user–item matrix Ê(θ). The matrix rows represent users, the columnsrepresent items and jth column of uth row implies the average error ofthe pre-prediction on similar users' rating of the target user u for itemj, as defined in Eq. (7).

E θð Þu; j

∧=

∑v∈KNNθj uð ÞEv; j

card KNNθj uð Þ

� � ð7Þ

where KNNjθ(u) denotes the set of users whose absolute value of the

pre-prediction error for item j is less than θ among the similarneighbors of the target user u. In addition, card(KNNj

θ(u)) refers to thecardinality of the set KNNj

θ(u) (i.e., the number of elements in the set).For example, assume that the pre-prediction errors for Table 1 are

shown in Table 2. If an error threshold value θ is 0.8 (θ=0.8), and thesize of the user neighborhood is 2 (k=2), then the average of the pre-prediction errors of Alice's neighbors for “Titanic” can be calculatedfrom the given values EBob,Titanic=0.7 and EJohn,Titanic=−0.4.

E 0:8ð ÞAlice;Titanic = 0:7−0:4ð Þ= 2 = 0:15

In an analogous fashion, Ê(0.8)Alice,AI of Alice for “AI” is calculated as−0.02. In the case of Bob, it is possible to compute that Ê(0.8)Bob,Godzilla for“Godzilla” from KNN(Bob)={Alice, Dannis}. However, the predictionerror value of Alice for “Godzilla” is 0.9, which is greater than thethreshold value 0.8. Accordingly, Ê(0.8)Bob,Godzilla is 0.6, as the onlyprediction error value of Dannis is selected. Finally, the user-basederror-reflected model of Table 2 can be built as shown in Table 3.

3.3.2. The item-based error-reflected modelThe item-based error-reflected model is built using the pre-

prediction errors of the target user u for the items that are similar tothe target item. This method is similar to the method of building the

Table 2An example of a user–item prediction error.

Seven JFK Titanic Shrek AI Godzilla

Alice −0.3 0.02 −0.4 0.9Bob 0.1 −2 0.7 2 −0.02John −0.15 0.2 −0.4 −0.3Dannis −2 0.5 −0.03 0.6

user-based error-reflected model. The difference is merely that thesimilar item neighborhood is used instead of the user neighborhood.The item-based error-reflected model can also be represented as anm×n user–item matrix Ě(θ). The matrix rows represent users, thecolumns represent items and jth column of uth row implies theaverage of the pre-prediction errors of user u for items similar to itemj, as defined in Eq. (8).

E θð Þu; j

∨=

∑i∈MSIθu jð ÞEu;icard MSIθu jð Þ� � ð8Þ

where MSIuθ(j) denotes the set of items whose absolute value of the

pre-prediction errors for user u is less than θ among the similar itemsof the target item i. And card(MSIu

θ(j)) is the number of elements in theset.

Let us calculate the average of the prediction errors of Alice forsimilar items to “Titanic” in Table 2. The prediction error of Alice for“Seven” and “JFK” that are similar to “Titanic” is EAlice,Seven=−0.3 andEAlice,JFK=0.02, respectively. Therefore, Ě(0.8)Alice,Titanic can be calculat-ed as follows:

Ě 0:8ð ÞAlice;Titanic = −0:3 + 0:02ð Þ= 2 = −0:14:

If the case ofMSI(AI)={Titanic, Shrek, Godzilla}, then Ě(0.8)Alice,AIof Alicefor “AI” is calculated as −0.4. Since the prediction error of Alice for“Godzilla” is 0.9, EAlice,Godzilla=0.9, it is not reflected from calculation. Inthe case of John for “AI,” Ě(0.8)John,AI is calculated as Ě(0.8)John,AI=−0.35 fromEJohn,Titanic=−0.4 and EJohn,Godzilla=−0.3. Finally, the item-based error-reflected model of Table 2 can be built as Table 4.

3.3.3. The hybrid error-reflected modelThe hybrid error-reflected model, which is represented as m×n

user–itemmatrixĤ(θ), is built by unifying the user-basedmodel and theitem-basedmodel. The entry inĤ(θ) isfilled as thevalue that is close to0,but is not 0 among the values of jth column of uth row in Ê(θ) and Ě(θ).Formally, Ĥu,j is defined as in Eq. (9).

H∧u; j =

E∧u; j ifð jE

∨u; j j≥ jE

∧u; j jand E

∧u; j ≠0Þor ð E

∨u; j = 0Þ

E∨u; j ifð jE

∨u; j jb jE

∧u; j jand E

∨u; j ≠0Þor ð E

∧u; j = 0Þ

0 otherwise

:

8>><>>:

ð9Þ

From two examples in Tables 3 and 4, the hybrid model unifiedtwo models can be built as Table 5.

Table 3An example of the user-based error-reflected model.

Seven JFK Titanic Shrek AI Godzilla

Alice 0 0 0.15 0 −0.02 0Bob 0 0 0 0 0 0.6John 0 0 0 −0.4 −0.02 0Dannis −0.05 0 0 0.3 0 0

Page 7: Collaborative error-reflected models for cold-start recommender systems

Table 4An example of the item-based error-reflected model.

Seven JFK Titanic Shrek AI Godzilla

Alice 0 0 −0.14 0 −0.4 0Bob 0 0 0 0 0 0.4John 0 0 0 −0.12 −0.35 0Dannis 0.6 0 0 0.5 0 0

525H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

3.3.4. Error-reflected models for cold start problemsAs clearly discussed in [27], for cold start users and cold start items,

recommender systems are generally unable to provide high qualityrecommendations. With respect to the cold start users, they should beencouraged to continuously provide their opinions because they donot have enough rating information. However, inaccurate predictionsfrom the insufficiency of the users' historical information lead them toundermine the credibility of the system, and thus, cause theirdeviation from the system. Likewise, the cold start items can hardlybe recommended compared to items which have sufficient users'ratings. Considering these points, a differentiated strategy is necessaryto generate the prediction for both of the cold start users and items.

Since a cold start item has a few ratings given by users, we takeinto consideration all users who rated the cold start item whenbuilding the user-based model for such item. Analogously, withrespect to a cold start user, because he/she has rated insufficientitems, most of items similar to a target item may not be rated by him/her. Rather than similar items, therefore, we consider all items ratedby the cold start user when building the item-based model for thatuser. Formally, the error-reflectedmodels of cold start users and itemsare built by revising Eqs. (7) and (8). That is, the user-based modeland item-based model respectively use Eqs. (10) and (11), respec-tively.

E θð Þu; j

∧=

∑v∈UjEv; j

card Uj

� � if j ∈ CSI ð10Þ

E θð Þu; j

∨=

∑i∈IuEu;i

card Iuð Þ if u ∈ CSU ð11Þ

where CSU and CSI are sets of cold start users and cold start items,respectively. In addition, Uj is a set of users who has rated item j and Iuis a set of items that has been rated by user u; thus, card(Uj) and card(Iu) are the numbers of elements of the sets Uj and Iu, respectively.

4. Applying collaborative models to recommender systems

Fig. 5 illustrates our method with two phases: an offline phase andan online phase. The offline phase is a building model phase asexplained in Section 3 and the online phase is a prediction phase usingthe error-reflected models.

4.1. Generating a prediction

The final step in a collaborative filtering is the process ofgenerating the prediction by attempting to guess the rating that auser would provide for an item. In collaborative recommender

Table 5An example of the hybrid error-reflected model.

Seven JFK Titanic Shrek AI Godzilla

Alice 0 0 −0.14 0 −0.02 0Bob 0 0 0 0 0 0.4John 0 0 0 −0.12 −0.02 0Dannis −0.05 0 0 0.3 0 0

systems, it is crucial to accurately predict how much a certain userprefers a certain item based on historical information as this directlyinfluences the decision making of the users for purchasing, selectingor watching. In addition, processing time required to generateprediction is also an important issue. As noted previously, theproposed CF approach constructs the error-reflected models whichcan be accomplished offline, prior to online prediction or recommen-dation. Since most tasks can be conducted in the offline phase, thesystem can result in fast online performance. The salient conceptbehind our prediction scheme is that prediction errors derived fromsimilar users and similar items can help in predicting a user similar tothe users on an item similar to the items. The online predictionapplied for each constructed model can be divided into threemethods.

The first approach is a method of applying the user-based errorreflected model to the prediction. The basic concept of this method isto reflect whether the pre-predictions of similar neighbors for thetarget item are overestimated or underestimated. Formally, the valueof target user u for item j, Řu,j, is computed by:

R∨u; j = Rj

knn uð Þ + E∧u; j ð12Þ

where Êu,j is the value of the uth row of the jth column in the user-based error-reflected model, Ê and Rknn(u)

j refers to the average ratingof neighborhood of user u for item j. If the average rating of theneighborhood for item j is unavailable, the average rating value ofitem j rated by all users is used instead. For example, the rating valueof Alice for “Titanic,” ŘAlice,Titanic, in Table 1 is predicted as 4.65 from theaverage rating of similar users Rknn(Alice)

Titanic =4.5 and the value in theuser-based error-reflected model Ê(0.8)Alice,Titanic=0.15 as described inTable 3.

For the second approach, we apply the item-based error reflectedmodel to the online prediction. This approach reflects whether pre-predictions of the target user of items similar to the target item areoverestimated or underestimated. Formally, the measurement of howmuch the target user u prefers item j is given by:

R∨u; j = Rj

knn uð Þ + E∨u;j ð13Þ

where Ěu, j is the value of the uth row for the jth column in the item-based error-reflected model, Ě. Applying this way, the predicted valueof Alice for “Titanic”, ŘAlice,Titanic, is calculated as 4.36 from the averagerating of similar users, Rknn(Alice)Titanic =4.5, and the value in the item-basederror-reflected model Ě(0.8)Alice,Titanic=−0.14 as described in Table 4.

Finally, themeasurement using the hybrid error-reflectedmodel isdefined as:

R∨u; j = Rj

knn uð Þ + H∧u; j ð14Þ

where the value of the uth row for the jth column in the hybridmodel,Ĥ. Since |Ê(0.8)Alice,Titanic|N−Ě(0.8)Alice,Titanic| in the previous example, in thiscase, ŘAlice,Titanic can be predicted as 4.36.

4.2. Model incremental updates

Model-based CF is generally faster in recommendation timecompared to memory-based CF due to the advantageous aspect ofprior use of the pre-computed model [13]. However, this approachtends to require expensive learning time for building a model.Moreover, once the model is built, it is difficult to immediately reflectusers' feedback despite its significance in the recommender system[7]. In other words, the new information of user preference for theitems is difficult to reflect until the model is re-built. Generally,building, renewing or rebuilding the model is not frequently

Page 8: Collaborative error-reflected models for cold-start recommender systems

1 The dataset can be downloaded from http://www.grouplens.org/node/73.

PredictingRating

RR RR

Rating [mxn]i1 i2 ij in

u1

u2

uu

um

Building models in Offline phase

Target userFeedback

Updating matrices

Uu

i1 i2 ij in

Pre-Prediction[mxn]

Pre-PredictionError [mxn]

Prediction on target item j for the

target user u, u,j = ?

Unknown rating of the target user Rating of the target user

User u

m: users n: items

NeighborhoodFormations

(users, items)

Recommendation

Input

Error-reflectedModels [mxn]

R

Fig. 5. An overview of the proposed approach for item recommendations.

526 H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

considered due to the time consuming process. Accordingly, theefficient method of rebuilding the model is required.

In order to alleviate the weak points of model-based CF, theproposed approach is designed such so that the model is updatedeffectively as illustrated in Fig. 6. In addition, users' new opinions arereflected incrementally, even when users present explicit feedback.For example, we assume that the system predicted the rating of Alicefor movie “Titanic” as 4.36 and recommended it to Alice. If Aliceprovided 4.0 as explicit feedback of her actual rating after watchingthe movie, then the new prediction error can be calculated from theactual rating and the predicted value as follows:

RAlice;Titanic−ŘAlice;Titanic = 4:0−4:36 = −0:36:

The basic concept of measuring the new prediction error is thesame as that of measuring the pre-prediction error. In the case whereusers present explicit feedback about the prediction, the models caneasily update the error value which is computed by subtracting thepredicted value from feedback rating. Therefore, the proposedmethod can use the updated information in the process of any furthernew predictions as well as enhance the quality of recommendationsregarding user preferences. We believe our incremental update of themodels is muchmore attractive and particularly efficient for cold startusers before rebuilding process of the models.

5. Experimental evaluations

In this section, we empirically evaluate the proposed predictionmethods using the error-reflected models and compare thoseperformances against the performances of the benchmark algorithms.To this end, we implemented a user-based CF algorithm, wherein thesimilarity is computed by the well-known Pearson correlationcoefficient (denoted as UserCF) [5], and the item-based CF approachwhich employs cosine-based similarity (denoted as ItemCF) [25]. Theperformance applied the user-basedmodel (denoted as UErrorCF), theitem-based model (denoted as IErrorCF), and the hybrid model(denoted as HErrorCF) were evaluated in comparison with thebenchmark algorithms.

5.1. The dataset and evaluation metric

Experimental data comes from MovieLens, a web-based researchrecommender system (www.movielens.org). The dataset used in thispaper is 100 k ratings dataset inMovieLens containing 100,000 ratingsof 1682 movies rated by 943 users in the system (943 rows and 1682columns of a user–item matrix R). This dataset is publicly available.1

We used two different training data: full training dataset and coldstart training dataset. First, for the full dataset that includes allavailable ratings of users, the entire data was divided into two groups;80% of the data (80,000 ratings) was used as a training set and 20% ofthe data (20,000 ratings) was used as a test set. A five-fold crossvalidation scheme was used. This dataset is used to examine thequality of the prediction no matter whether users or items havesufficient ratings or not. Second, for the cold start dataset, weartificially generated two groups that satisfy cold starting conditionsbecause the original dataset contains a minimum of 10 ratings peruser. The first group contains 100 users who have three ratings peruser (the number of items rated by each user) and the second groupcontains 100 users who have five ratings per user.

In order to measure the accuracy of the predictions, we adoptedthemean absolute error (MAE) that was widely used for the statisticalaccuracy measurements in the diverse algorithms [12]. The meanabsolute error of user u for N items in the test data is defined as:

MAUE uð Þ = ∑Nj = 1 jRu; j−R

∨u; j j

Nð15Þ

where bRu, j, Řu, jN is the actual/predicted rating pairs of user u in thetest data. Finally, the MAE of allM users in the test set is computed as:

MAE =∑M

u = 1MAUE uð ÞM

: ð16Þ

Page 9: Collaborative error-reflected models for cold-start recommender systems

i1 i2 ij in

u1

uu

um

Ea,n0.4 0.2?-0.1-0.6

user-item error matrix, E

4.3

user feedback

4

Target User u

Updating matrix

u,j = ?

Eu,j = - 0.3Prediction Error

itemuser

Predict Rating

( ) Ê( )

Error-reflectedModels

R

Fig. 6. Updating the error models incrementally by using user feedback.

527H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

5.2. Parameter tuning experiments

In this section, we present detailed experimental results accordingto three parameters: the size of user neighborhood k, the size of itemneighborhood k′ and the error threshold θ.

5.2.1. Accuracy of the pre-prediction according to neighborhood sizeThe pre-prediction is influenced by the size of the user neighbor-

hood, KNN, and the size of the item neighborhood, MSI. Accordingly,for building accurate prediction error models, we should firstdetermine a proper size of the user neighborhood k and the itemneighborhood k′, respectively. Hence, in this section, we examined theaccuracy of the pre-prediction in order to choose optimal values forthe number of nearest neighbors and most similar items.

First, we measured MAE of the pre-prediction according to thevariation of the item neighborhood k′. According to previous studies,we set the size of the user neighborhood k to 50 (k=50). Theexperimental result is depicted in Fig. 7 (left graph). It can beobserved from the graph that the size of the item neighborhoodaffects the prediction quality. The quality of the pre-predictionimproved as k′ value was increased from 10 and 60, and after thisvalue, the curve tends to become flat. We also observed that the MAEvalue increases after the item neighborhood size of 80. These resultsindicate that when the item neighborhood size is too small, theaccuracy of the pre-prediction is remarkably decreased. In addition,too large of a size can also negatively impact the accuracy.

In the subsequent experiment, we continued to examine theaccuracy by changing the number of user neighbors k. During thisexperiment, k′ was set to 60 according to the previous result. Asshown in Fig. 7 (right graph), the number of the neighbor users alsoaffected the pre-prediction. However, unlike the size of the itemneighborhood, the curve of the graph tends to be flat at the relativelysmall size of the user neighborhood. For example, MAE considerably

Fig. 7. MAE according to variation of user neighbor size and

decreases as the size of the user neighborhood increases from 10 to20; beyond this point, any further increase of the neighborhood sizedid not affect the accuracy even though the slight variation of MAEvalues appeared. When the neighborhood size was 10, we foundmany cases that nearest users of a target user have not rated a targetitem yet while the pre-prediction of the target user on the target itemwas generated. This fact might affect the result that the accuracy of thepre-prediction becomes worse when the size of the user neighbor-hood is small.

5.2.2. Experiments with the error thresholdIn this section, we investigate the effect of an error threshold on

the performance of the prediction. As described in Section 3.3, weexpected that the threshold θ could be a significant factor affecting thequality of the prediction in our study because different error-reflectedmodels (i.e., Ê(θ) and Ě(θ)) are built depending on the threshold. So, wemeasured MAE of the prediction according to the θ value variationfrom 0.2 to 2.0. Based on the previous experiment, the size of the userneighbors and the item neighbors are set to 50 and 60, respectively(k=50, k′=60).

Fig. 8 illustrates the variation of MAE for UErrorCF, IErrorCF andHerrorCF. It can be observed from the graph that the three methodsdemonstrate similar types of charts. In the case of UErrorCF, thedownward curve appeared until θ value became 1.2, in the case ofIErrorCF, the curve appeared until θ value bacame 1.6 and in the case ofHerrorCF, the curve appeared until θ value became 1.4, respectively.After those values, the upward curves gradually appeared in thegraph. That is, a low threshold value discarded more pre-predictionerrors, and thus, the three methods obtained the poor predictionquality because remaining pre-prediction errors were not sufficient tobuild the error-reflected models. Contrarily, a high threshold valueincluded unnecessary noise information that could give negativeinfluence on accuracy. When the threshold is 1.4, the models can be

item neighbor size used in generating a pre-prediction.

Page 10: Collaborative error-reflected models for cold-start recommender systems

0.750

0.755

0.760

0.765

0.770

0.775

0.780

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Error threshold

MA

E

UErrorCF

IErrorCF

HErrorCF

Fig. 8. MAE according to variation of the error threshold.

528 H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

built by eliminating approximately 10,000 pieces of superfluousinformation; consequently, UErrorCF, IErrorCF and HErrorCF canprovide the enhanced prediction quality.

Examining the best prediction quality of the three methods,UerrorCF, IErrorCF and HerrorCF obtains an MAE of 0.7584 (θ=1.2),0.7556 (θ=1.6), and 0.7543 (θ=1.4), respectively. Based on thisexperiment result, in the subsequent experiments we selected 1.2, 1.6,and 1.4 as the error threshold of UErrorCF, IErrorCF and HerrorCF,respectively. That is, for UErrorCF, model Ê(1.2) was used whereasmodel Ě(1.6) was used for IErrorCF. In the case of HErrorCF, we used themodel unified Ê(1.4) and Ě(1.4).

5.3. Comparison with other methods

In this section, we present detailed experimental results incomparison with the benchmark methods. The performance compar-ison is divided into three dimensions. The accuracy of the prediction isfirst evaluated, and then, the accuracy of the prediction to the coldstart problems is evaluated. Finally, we compare computationalcomplexity with related studies.

5.3.1. Comparison of the prediction accuracyAs noted in a number of previous studies, the number of

neighbors has significant impact on the prediction accuracy ofneighborhood-based algorithms [25,26]. Therefore, different numb-

0.750

0.755

0.760

0.765

0.770

0.775

0.780

10 20 30 40 50 60 70 80 90 100

k nearest neighbors

MA

E

UserCF

UErrorCF

a b

Fig. 9. (a) A comparison of MAE achieved by UserCF and UErrorCF as the user neighborhoodneighborhood size (k′) grows.

ers of user or item neighbors from 10 to 100 were used for theprediction generation.

Fig. 9(a) illustrates MAE of UserCF and UErrorCFwith respect to thevariation in the user neighborhood size. In UErrorCF, the userneighborhood size denotes the number of nearest neighbors k thatis exploited for building Ê(θ) in Eq. (7). In the experimental results, atmost neighborhood size, the overall prediction accuracy of UserCFappears to be better than that of UErrorCF. However, we found that theprediction accuracy of UErrorCF is superior to that of UserCFwhen theneighborhood size is small (e.g., k=10). This result can imply thatUErrorCF can provide a more accurate prediction performance thanUserCFwhen the information data is sparse or available data for usersis relatively insufficient.

We continued to examine the prediction accuracy of ItemCF andIErrorCF. In IErrorCF, the item neighborhood size denotes the numberof most similar items k′ that is exploited for building Ě(θ) in Eq. (8). Inthe rating prediction for IErrorCF, we set the user neighborhood to 50in order to calculate the average rating of the user neighborhood for acertain item in Eq. (13). Fig. 9(b) shows MAE obtained by ItemCF andIErrorCF with respect to the variation of the item neighborhood size.The result demonstrates that, at all neighborhood size levels, IErrorCFprovides more accurate predictions than ItemCF.

ItemCF elevates the prediction accuracy as the neighborhood sizeincreases from 10 to 50; after this value, the accuracy decreasedslightly. On the contrary, in the case of IErrorCF, after the size passed acertain level (k′=40–50), the variation of the accuracy almost neveroccurs. Comparing MAE obtained by the user-based approaches andthe item-based approaches in a neighborhood size of 10, the accuracyof IErrorCF and ItemCF is remarkably worse than UserCF and UErrorCF.These results may be affected by the fact that the item-basedapproaches essentially attempt to capture how the target user hasrated the similar items. In the case of a too-small size of the itemneighborhood, the items similar to a certain item were not rated bythe target user, and thus, it was more difficult to predict the rating ofthe item for him/her.

Table 6 summarizes the comparison of the best results achieved bythe five methods. The comparison results of MAE show that themethods based on the proposed models (such as UErrorCF, IErrorCF,and HErrorCF) provide slightly worse accuracy than UserCF; however,the difference appears insignificant in a comparative fashion. Ourmethods obtain nearly 7% improvements of the prediction accuracycompared to ItemCF. To analyze statistical significance, we alsoconducted two-tailed paired t-tests (per user) on MAUE results ofHErrorCF and those of the benchmark algorithms [8]. As a result, weobserved that the p-value obtained from the t-test on HErrorCF andUserCF was 0.5074 (t[942]=0.66) indicating there was no significant

0.750

0.770

0.790

0.810

0.830

0.850

0.870

0.890

10 20 30 40 50 60 70 80 90 100

k most similar items

MA

E

ItemCF

IErrorCF

size (k) grows; (b) a comparison of MAE achieved by ItemCF and IErrorCF as the item

Page 11: Collaborative error-reflected models for cold-start recommender systems

Table 6A comparison of the best results achieved by the five methods.

Method: UserCF(k=60)

ItemCF(k′=50)

Error-reflected models

UErrorCF(k=60)

IErrorCF(k=50, k′=60)

HErrorCF(k=50, k′=60)

MAE 0.7534 0.8230 0.7572 0.7556 0.7543

Table 8A comparison of MAE for cold start users.

Test user Cold start users Original

# of ratings for each user: 3 5 Average

UserCF 1.2360 1.0730 1.1545 0.7942ItemCF 1.2234 1.0365 1.1299 0.832UErrorCF 0.9907 1.0374 1.0140 0.7915IErrorCF 0.9846 0.9726 0.9786 0.8052HErrorCF 1.1312 1.0395 1.0853 0.7897

529H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

difference. However, the difference between HErrorCF and ItemCF isstatistically significant (t[942]=−8.34, pb0.001).

5.3.2. Comparison of cold start users and itemsIn this section,we investigate theprediction accuracyof theproposed

models for cold start problems in comparison with the benchmarkmethods. First, in order to analyze the prediction accuracy of cold startitems, we analyzed MAE of the previous prediction results according tothe number of users' ratings that items contained in the training set.

Table 7 summarized the results of this analysis, showing how ourmethods outperformed the other methods. As can be seen from theresults, for all methods, the more the users' ratings contained, thehigher the prediction accuracy obtained. Comparing MAE of cold startitems (less than five ratings) obtained by each method, on the whole,CF applied error-reflected models provide accurate prediction.Particularly, HErrorCF provides more accurate prediction performancethan the other methods. HErrorCF achieves 3% and 13% improvementscompared to UserCF and ItemCF, respectively. Two-tailed paired t-tests(per item) were also performed to determine if there are significantdifferences. With respect to HErrorCF and UserCF, there is a smalldifference at a level of 10% (t[332]=−1.864, pb0.1). Comparing thet-test obtained in HErrorCF and ItemCF, the difference appears to bestatistically significant (t[332]=−2.834, pb0.01).

We carried out a further experiment with the cold start dataset toexamine prediction accuracy for cold start users. For the experiment,we considered different subsets of users who had few ratings in thetraining dataset (e.g., users who have three ratings and five ratings).

Table 8 summarizes MAE of the cold start users. As expected, weobserved that the quality of prediction for the cold start users isconsiderably lower than that of prediction when we used the originaldataset. Such results were caused by the fact that it was hard toanalyze the users' propensity to rate items. Nevertheless, CF based onthe error-reflected models significantly outperforms the benchmarkmethods. Comparing MAE achieved by UErrorCF, IErrorCF, andHErrorCF, interesting results were observed. A higher accuracyprediction is achieved by the independent models like UErrorCF andIErrorCF over the hybrid model HErrorCF, particularly when the usershave only three ratings. We identified that, on average, IErrorCFprovides improved prediction performance by 17.5% more thanUserCF and by 15.1% more than ItemCF, respectively. In addition,UErrorCF obtains 14.1% and 11.6% improvements forMAE compared toUserCF and ItemCF, respectively. We continued to compute two-tailedpaired t-tests (per user). First, for the cold start users who have threeratings, we observed that there were significant differences betweenIErrorCF and UserCF (t[99]=−3.74, pb0.01), and between IErrorCFand ItemCF (t[99]=−3.56, pb0.01). Second, for the cold start users

Table 7A comparison of MAE for cold start items.

Test item Cold start items (b5) b10 b15 b20

# of items 333 530 650 743

UserCF 0.9661 0.9199 0.8734 0.8412ItemCF 1.067 1.011 0.9976 0.9912UErrorCF 0.9405 0.8878 0.87714 0.8632IErrorCF 0.9542 0.9015 0.8908 0.8870HErrorCF 0.9352 0.8864 0.8528 0.8465

that have five ratings, the obtained p-values were also less than 0.01(pb0.01) between IErrorCF and the other two methods. The resultsindicate that the differences in MAE are significantly different fromzero. We conclude from these experiments that the proposed CFutilizing the error-reflected models can improve the predictionquality of the cold start items and the cold start users.

5.4. Discussion of computational complexities

In this section, we discuss computational complexity of previousstudies in comparison with our complexity. High computationalcomplexity is often demanded to enhance the quality of predictionsand recommendations. The scalability of CF is a critical challenge inpractical recommender systemswith a huge number of users and items.

We first analyzed the computational complexities of our methodsaccording to the number of usersm, the number of items n, the numberof rating values v, the number of similar users k and the number ofsimilar items k′. In the model-based point of view, the computationalcomplexity can be distinguished between an offline phase and an onlinephase. The former can be accomplished offline, prior to actualrecommendations for a given user, whereas the latter has to be doneonline and often in real time [13]. The offline computation is closelyconnected with time required to build the error-reflected model basedon pre-prediction errors. For calculating a pre-predicted value, the set ofuser neighbors and the set of item neighbors should be determined. Theupper bound on the complexity of this step is O(m2n) and O(mn2),respectively. Additionally, the time of O(kmn) and O(k′mn) is spent forbuilding the user-based model and for building the item-based model,respectively. Therefore, the total computational complexity in the offlinephase becomes approximatelyO(m2n+mn2+kmn)≅O(m2n+mn2) forthe user-based model, O(m2n+mn2+k′mn)≅O(m2n+mn2) for theitem-basedmodel andO(m2n+mn2+kmn+k′mn)≅O(m2n+mn2) forthe hybrid model. However, in practice, since user–item ratingmatrix isvery sparse, the actual computational complexity for the pre-predictionerrors of users can be approximately reduced to O(mv+nv). In theonline phase, the complexity required to predict a certain item j of atarget user u, is given as O(k) because we need to compute the averagerating of k number of users similar to the user u. Accordingly, thecomputational complexity of predicting all the items becomes approx-imately O(kn)≅O(n). In fact, when we measured the server responsetime on an ApacheWeb server environment, on average, the respondingtime required to predict an item was 0.0042 s. In addition, it took, onaverage, 2.79 s to generate predictions of all items for each user.

As for the complexity of previous studies, a memory-based CF suchas UserCF provides an advantage to easily take new data into account.This is because it utilizes entire data in real-time when generatingrecommendations in an online phase. However, as both the number ofusers m and the number of items n grow, computation cost alsorapidly grows as well. In the case of UserCF [5], for a target user, O(mn)is required to determine k nearest neighbors. And O(kn) isadditionally required to predict all items; therefore the computationalcomplexity becomes O(mn+kn) during the online phase. However,for most recommender systems, the online complexity in which therating prediction of items that the users have not yet rated is moreimportant compared to those of the offline case [13]. Similar to our

Page 12: Collaborative error-reflected models for cold-start recommender systems

530 H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

models, to reduce the online complexity, diverse models such as anitem–item similarity model [9,25], Aspect model [13], User RatingProfile (URP) model [19], Unified Relevance (UR) model [30], andWeighted Low Rank Approximations (WLRA) model [29], wereproposed. The aim of such models is to support fast recommendationsonline by first developing a pre-computed model which most time-consuming tasks can be conducted in the offline. If a user–usersimilarity model for UserCF is previously built in the offline [17], theonline cost can be rapidly diminished as O(kn) whereas the timerequired to build the model becomes O(m2n). Similarly, with respectto an item–item similarity model for ItemCF, O(mn2) time complexityis needed in the offline; for the online prediction the complexity is O(kn). Therefore, in the cases that the number of users is relativelylarger than the number of items (mNNn) or the number of users thatchange is more dynamic than those of items that change, the item–

item similarity pre-computed is practically more efficient than theuser–user similarity pre-computed [25].

In probabilistic approaches for building models such as WLRA, URP,UR and Aspect, Expectation Maximization (EM) algorithm is generallyused to estimate themodels for CF. Hence, the complexity in the offlineis divided into E-step and M-step; thus, the number of iterations thataffect the complexity is required to estimate stable parameters. For eachiteration, WLRA using Singular Value Decomposition (SVD) is needed O(mn2+m3) for building the model. In the case of URP and Aspect, bothcomplexities in building the model are O(kmnv); in the worst casethey become O(km2n2) because the number of total ratings vbecomes mn at the worst (v=mn). With respect to UR model, thecomplexity is O(m2n+mn2+k2mn) though the iteration process isnot required to build the model.

Table 9 summarizes the comparisons of the computation com-plexity in terms of the offline and the online. Although the complexityof the item–item model and the user–user model is much lower thanthat of our models, in the experiments we observed that UserCF andItemCF performed worse for cold start users and items. That is, ourapproach provides advantages both in terms of improving the qualityand in dealing with fast recommendation time. In comparison withthe probabilistic models (WLRA, URP, UR, and Aspect), our approachdoes not include iterative building processes required in theprobabilistic approaches. In addition, we support incremental updatesof the models as presented in Section 4.2.

6. Conclusions and future work

In this paper, we have proposed a unique method of buildingmodels derived from explicit ratings. The proposed method firstdetermines a pre-predicted rating, and subsequently, identifiesprediction errors for each user. Pre-computed models, namely theerror-reflected model, are built by reflecting the prediction errors. Themajor advantage of the proposed models is that it supportsincremental updating of the model by using explicit user feedback.We also presented a newmethod of applying the proposed models to

Table 9Comparison of computational complexities.

Collaborative filtering algorithms OfflineModel building

OnlinePrediction

Memory-based UserCF – O(mn+kn)Model-based User–user model O(m2n) O(kn)

Item–item model O(mn2) O(kn)WLRA model O(mn2+m3) O(kn)URP model O(kmnv) O(knv)UR model O(mn2+m2n+mnk2) O(k2n)Aspect model O(kmnv) O(knv)

Proposed models(model-based)

UErrorCF O(mn2+m2n+kmn) O(kn)IErrorCF O(mn2+m2n+kmn) O(kn)HErrorCF O(mn2+m2n+2kmn) O(kn)

m: # of total users, n: # of total items, v: # of ratings, k: model size.

CF recommender systems that can enhance the accuracy of theprediction with respect to the cold start problem. As noted in theexperimental results, our models obtained significantly betterprediction accuracy in dealing with both cold start users and coldstart items, compared to the benchmark methods.

In futurework, we plan to exploit social networks to build ourmodeland generate item predictions, which is an emerging research area inrecommender systems. We expect that the model incorporated withreliable social friends may offer more trustworthy items relevant tousers' needs. Another interesting direction to address is the problem ofmanipulated ratings by unreliable users, often called shilling attacks [23].We intend to detect unreliable user ratings by analyzing diverse types ofattack model. We will investigate the possible usages of pre-predictionerrors for robust recommender systems against shilling attacks. Finally,we plan to examine stability of how the proposed models provideconsistent predictionsover a period of time evenwhen ratings are newlyadded to a system before rebuilding the models [3].

Acknowledgments

The authors would like to acknowledge the support of the NaturalSciences and Engineering Research Council of Canada (NSERC) andUniversidad Carlos III de Madrid and Banco Santander through aCatedra de Excelencia.

References

[1] H.J. Ahn, A new similarity measure for collaborative filtering to alleviate the newuser cold-starting problem, Information Sciences 178 (1) (2008) 37–51.

[2] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommendersystems: a survey of the state-of-the-art and possible extensions, IEEE Transac-tions on Knowledge and Data Engineering 17 (6) (2005) 734–749.

[3] G. Adomavicius, J. Zhang, On the stability of recommendation algorithms,Proceedings of the 4th ACM Conference on Recommender Systems, 2010,pp. 47–54.

[4] G. Bogdanova, T. Georgieva, Using error-correcting dependencies for collaborativefiltering, Data & Knowledge Engineering 66 (3) (2008) 402–413.

[5] J.S. Breese, D. Heckerman, C. Kadie, Empirical analysis of predictive algorithms forcollaborative filtering, Proceedings of the 14th Annual Conference on Uncertaintyin Artificial Intelligence, 1998, pp. 43–52.

[6] K.-W. Cheung, J.T. Kwok, M.H. Law, K.-C. Tsui, Mining customer product ratings forpersonalized marketing, Decision Support Systems 35 (2003) 231–243.

[7] A. Das, M. Datar, A. Garg, S. Rajaram, Google news personalization: scalable onlinecollaborative filtering, Proceedings of the 16th International World Wide WebConference, 2007, pp. 271–280.

[8] J. Demsar, Statistical comparisons of classifiers over multiple data sets, Journal ofMachine Learning Research 7 (2006) 1–30.

[9] M. Deshpande, G. Karypis, Item-based top-n recommendation algorithms, ACMTransactions on Information Systems 22 (1) (2004) 143–177.

[10] S. Ding, S. Zhao, Q. Yuan, X. Zhang, R. Fu, L. Bergman, Boosting collaborativefiltering based on statistical prediction errors, Proceedings of the 2nd ACMConference on Recommender Systems, 2008, pp. 3–10.

[11] J.L. Herlocker, J.A. Konstan, A. Borchers, J. Riedl, An algorithmic framework forperforming collaborative filtering, Proceedings of the 22nd Annual InternationalACM SIGIR Conference on Research and Development in Information Retrieval,1999, pp. 230–237.

[12] J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborativefiltering recommender systems, ACM Transactions on Information Systems 22 (1)(2004) 5–53.

[13] T. Hofmann, Latent semantic models for collaborative filtering, ACM Transactionson Information Systems 22 (1) (2004) 89–115.

[14] Y. Jiang, J. Shang, Y. Liu, Maximizing customer satisfaction through an onlinerecommendation system: a novel associative classification model, DecisionSupport Systems 48 (2010) 470–479.

[15] C.Y. Kim, J.K. Lee, Y.H. Cho, D.H. Kim, VISCORS: a visual-content recommender forthe mobile Web, IEEE Intelligent Systems 19 (6) (2004) 32–39.

[16] H.-N. Kim, A.-T. Ji, H.-J. Kim, G.-S. Jo, Error-based collaborative filtering algorithmfor top-n recommendation, Proceedings of the Joint 9th Asia-Pacific Web and 8thInternational Conference on Web-Age Information Management Conference onAdvances in Data and Web Management, 2007, pp. 594–605.

[17] J.A. Konstan, B.N. Miller, D. Maltz, J.L. Herlocker, L.R. Gordon, J. Riedl, GroupLens:applying collaborative filtering to Usenet news, Communications of the ACM 40(1997) 77–87.

[18] G. Linden, B. Smith, J. York, Amazon.com recommendations: item-to-itemcollaborative filtering, IEEE Internet Computing 7 (1) (2003) 210–217.

[19] B. Marlin, Modeling user rating profiles for collaborative filtering, Proceedings ofthe 7th Annual Conference on Neural Information Processing Systems, 2003.

Page 13: Collaborative error-reflected models for cold-start recommender systems

531H.-N. Kim et al. / Decision Support Systems 51 (2011) 519–531

[20] P. Melville, R.J. Mooney, R. Nagarajan, Content-boosted collaborative filtering forimproved recommendations, Proceedings of the 18th National Conference onArtificial Intelligence, 2002.

[21] D.M. Nichols, Implicit rating and filtering, Proceedings of the 5th DELOSWorkshop on Filtering and Collaborative Filtering, 1997, pp. 31–36.

[22] J. O'Donovan, B. Smyth, Mining trust values from recommendation errors,International Journal on Artificial Intelligence Tools 15 (6) (2006) 945–962.

[23] M. O'Mahony, N. Hurley, N. Kushmerick, G. Silverstre, Collaborative recommen-dation: a robustness analysis, ACM Transactions on Internet Technology 4 (4)(2004) 344–377.

[24] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J. Riedl, GroupLens: an openarchitecture for collaborative filtering of Netnews, Proceedings of the ACM 1994Conference on Computer Supported Cooperative Work, 1994, pp. 175–186.

[25] B. Sarwar, G. Karypis, J. Konstan, J. Reidl, Item-based collaborative filteringrecommendation algorithms, Proceedings of the 10th International World WideWeb Conference, 2001, pp. 285–295.

[26] B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Analysis of recommendation algorithmsfor e-commerce, Proceedings of ACM Conference on Electronic Commerce, 2000,pp. 158–167.

[27] A.I. Schein, A. Popescul, L.H. Ungar, Methods and metrics for cold-startrecommendations, Proceedings of the 25th International ACM Conference onResearch and Development in Information Retrieval, 2002, pp. 253–260.

[28] U. Shardanand, P. Maes, Social information filtering: algorithms for automatingword of mouth, Proceedings of the SIGCHI Conference on Human Factors inComputing Systems, 1995, pp. 210–217.

[29] N. Srebro, T. Jaakkola, Weighted low-rank approximations, Proceedings of the20th International Conference on Machine Learning, 2003, pp. 720–727.

[30] J. Wang, A.P. de Vries, M.J.T. Reinders, Unified relevance models for ratingprediction in collaborative filtering, ACM Transactions on Information Systems 26(3) (2008) 1–42.

[31] Y. Zhen, W.-J. Li, D.-Y. Yeung, TagiCoFi: tag informed collaborative filtering,Proceedings of the 3rd ACM Conference on Recommender Systems, 2009, pp. 69–76.

Heung-Nam Kim is a postdoctoral fellow in the Multimedia Communications ResearchLaboratory (MCRLab) at University of Ottawa, Canada. His research interests includecollaborative filtering, recommender systems, semantic Web, data mining, and socialnetworking applications. He received a PhD in Computer and Information Engineeringfrom Inha University, Korea.

Abdulmotaleb El Saddik is University Research Chair and Professor, SITE, University ofOttawa and recipient of the Professional of the Year Award (2008), the FriedrichWilhelm-Bessel Research Award from Germany's Alexander von Humboldt Foundation(2007) the Premier’s Research Excellence Award (PREA 2004), and the National CapitalInstitute of Telecommunications (NCIT) New Professorship Incentive Award (2004). Heis the director of the Multimedia Communications Research Laboratory (MCRLab). Hewas Director of the Information Technology Cluster, Ontario Research Network onElectronic Commerce (2005-2008). He is Associate Editor of the ACM Transactions onMultimedia Computing, Communications and Applications (ACM TOMCCAP), IEEETransactions on Multimedia (TMM) and IEEE Transactions on ComputationalIntelligence and AI in Games (IEEE TCIAIG) and Guest Editor for several IEEETransactions and Journals. Dr. El Saddik has been serving on several technical programcommittees of numerous IEEE and ACM events. He has been the General Chair and/orTechnical Program Chair of more than 25 international conferences symposia andworkshops on collaborative hapto-audio-visual environments, multimedia commu-nications and instrumentation and measurement. He is leading researcher in haptics,service-oriented architectures, collaborative environments and ambient interactivemedia and communications. He has authored and co-authored two books and morethan 280 publications. He has received research grants and contracts totalingmore than$12 million and has supervised more than 90 researchers. His research has beenselected for the BEST Paper Award three times. Dr. El Saddik is a Distinguished Memberof ACM, an IEEE Distinguished Lecturer, Fellow of the Canadian Academy ofEngineering, Fellow of the Engineering Institute of Canada and Fellow of IEEE.

Geun-Sik Jo Professor, Computer and Information Engineering, Inha University, Korea.He is the chairman of the school of Computer and Information Engineering at InhaUniversity. He received a B.S. degree in Computer Science from Inha University in 1982.He received M.S. and Ph.D. degrees in Computer Science from City University of NewYork in 1985 and 1991, respectively. His research interests include knowledge-basedscheduling, ontology, semantic web, intelligent E-Commerce, constraint-directedscheduling, knowledge-based systems, decision support systems, and intelligentagents. He has authored and coauthored five books and more than 200 publications.