methods to improve the accuracy of recommender systems · item-based systems are explored in 3 and...

Methods to improve the accuracy of recommendersystems

Martin PowersUniversity of Minnesota - Morris

600 East 4th StreetMorris, Minnesota

[email protected]

ABSTRACTWith commercial recommender systems becoming more andmore popular, new ways to improve their accuracy and easeof implementation are being evaluated. These methods aimat creating more scalable and accurate systems by exam-ining trends and finding patterns within the data. Twomethods that have been developed achieve these goals byusing item-based algorithms instead of user-based ones andby adding a temporal dynamic. Item-based systems can pro-duce similar results to user-based systems while the numberof users increases by comparing item similarities which arefairly constant. Temporal dynamics are important to exam-ine because user preference changes over time and so doestheir average rating of an item.

Categories and Subject DescriptorsH.2.8 [Database Management]: Database Applications—Data Mining

General TermsAlgorithms

Keywordscollaborative filtering, recommender systems

1. INTRODUCTIONIn the age of E-commerce, more products are being releasedto the public than any one person has time to experience.With this vast number of options to process, people relyon recommendations on which movies they should watchand books they should read [4]. These software systems arecalled Recommender Systems and can now replace the needto know someone who has seen a movie you’re interested in;recommender systems often give more accurate predictionswhen a large dataset is used. This computation goes mostlyunnoticed by users while browsing E-commerce websites buthas become some company’s central way of making money.Because it can be extremely beneficial to both a company

This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. To view a copyof this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ orsend a letter to Creative Commons, 171 Second Street, Suite 300, San Fran-cisco, California, 94105, USA.

and its users, there has been a great deal of research to im-prove the efficiency and scalability of recommender systems[4]. Out of the many areas where recommender systems arebeing improved, this paper will focus on two: item-basedalgorithms and the temporal model. Item-based algorithmsfocus on the similarities between items rather than usersand can show improvements over other methods in certaincases. A recommender system that uses a temporal model isone that takes into account how a user’s or item’s propertieschange over time. By analyzing user and item trends whiletracking temporal dynamics, it is possible to make improve-ments to a recommender system.

In this paper, we will look at the fundamental aspects of rec-ommender systems and explore some of the new advances.Starting in 2, we will take a look at what a user-based rec-ommender system looks like. From there we will look at twonew methods to see how they differ from common systems.Item-based systems are explored in 3 and a temporal awaremodel is shown in 4.

2. BACKGROUNDRecommender Systems make suggestions to users based oninformation the system has available to it such as item prop-erties or user preference patterns. The majority of recom-mender systems used today are based on information thatusers provide to the system. These types of recommendersystems are called Collaborative Filters and rely on a largenumber of users rating a set of items. These ratings couldbe collected from users explicitly, like having them give amovie a star rating, or behind the scenes, like counting howmany times a song is played. Users are more inclined toinput their ratings into a system where there is a noticablereward such as accurate suggestions [2]. It is easy to see whythis type is the most successful and common recommendersystems as they help both the user find things they like andthe owner of the implementation get more business.

2.1 Collaborative filter methodThe basic elements needed in a collaborative filter is a listof m users U = {u1, u2, · · · , um} and a list of n items I ={i1, i2, · · · , in}; these are commonly very large lists. Eachuser ui, has a list of items Iui that they have rated in someway, and using these ratings and the ratings of other users wecan find users with similar taste or items that have similarlikeness. The idea behind these suggestions is that if twousers rate an item or items similarly, they might like theother’s highly rated items. There are two methods through

which collaborative filters can suggest items to an activeuser: Making a prediction or making a recommendation.

A prediction is described in [4] as “a numerical value, Pa,j ,expressing the predictive likeliness of item ij /∈ Iua for theactive user ua.” The item must not exist within the list ofitems already rated by the user. This method can be usedto predict how a user might like an item based on propertiesthe item shares with other items the user has rated. This isused for items that are newly introduced and have no ratingsyet, hoping to find users that will like it based on little data.

The recommendation method is also known as the Top-Nrecommendation and is a list of items that the user willprobably like based on comparing ratings with other usersor items with similar qualities to the active user or the itemsthe active user likes.

The collaborative process is diagramed in Figure 1, takenfrom [4], where the process is shown in the three steps: build-ing the user-item matrix from the data, running a filteringalgorithm, and finally the outputted information.

2.2 User-based algorithmsThe most common collaborative filtering algorithms findpredictions for a user based on other users with similar taste.Similar users are called neighbors and are determined bycomparing the items both of them have rated. The Pear-son correlation algorithm is one of the methods to find howsimilar two users are to each other by comparing the set ofitems both users have rated:

userSim(u, n) =

∑i∈Iu,n(rui − ru)(rni − rn)√∑

i∈Iu,n(rui − ru)2√∑

i∈Iu,n(rni − rn)2

This algorithm will find how similar user u is to its neighbor,user n based on all the items that both have rated, shownas Iu,n. The rating user u has given item i is shown as rui.The result will be within the range from -1, showing perfectdisagreement, and 1, being perfect agreement [5].

This similarity is used when determining a predicted rated,Pu,i for an item i that user u hasn’t rated yet. This iscalculated by finding the sum of all ratings for i given byu’s neighbors, each weighted by how similar u is to eachneighbor. This is divided by the sum of u’s similarity withtheir neighbors so that the result will be within the samescale as the ratings. This prediction is given by:

Pu,i =

∑n∈Nu userSim(u, n) · rni∑

n∈Nu userSim(u, n)

where Nu is the set of u’s neighbors.

This algorithm works best when all users have the same rat-ing distribution which can be seen by looking at the averagerating of of a user, ru. In a system that uses a 5 star ratingscale, one user might rate items conservatively and neverhave a 5 star rating, using 4 stars as a sign of a great item.Compared to another user who has many items rated as 5stars, they both have a skewed scale in relation to each other.Because most users use a unique rating scale, compensatingfor this bias greatly improves prediction accuracy. To dothis, we subtract each neighbor’s average rating, rn, from

the rating they gave item i and add user u’s average rating,ru to the total prediction:

Pu,i = ru +

∑n∈Nu userSim(u, n) · (rni − rn)∑

n∈Nu userSim(u, n)

3. ITEM-BASED METHODItem-based collaborative filtering algorithms use propertiesof the items instead of using user similarity as the means tocreate recommendations and predictions. The properties ofitems used by a recommender system change less often thanthe number of users and their preferences. This consistencyhelps improve how scalable the system is, which is importantas the number of users increases and each user provides moreratings.

3.1 Challenges of user-based collaborative fil-tering algorithms

There are two major challenges that the user-based methodfaces: scalability and sparsity. When a user-based recom-mender system needs to make a suggestion for a user, itmust first find the user’s neighborhood of users with similartaste which requires making a fair number of comparisonsbetween how users have rated a set of items. With each userhaving the ability to rate as many items as they want, thisnumber of comparisons grows quickly with each new user.

Also, some users also might not give any ratings or a verysmall number of them and the system needs to be able towork with these users as well. This sparsity is a problem fornew users who haven’t rated any items and expect resultsfrom the system immediately, but get poor recommenda-tions. Both these challenges are addressed by item-basedsystems and are a major reason that large scale systemsmight use them as an alternative or in addition to othermethods.

3.2 Item similarity computationItem-based algorithms start by taking two items and findingtheir similarities through a similarity algorithm. These algo-rithms use the list of all items and find users who have ratedboth items. With this list they can figure out how similar thetwo items are. If a lot of users have given both items similarratings, it can be said that the items are alike, and the moresimilar the ratings, the more similiar the items. This canbe done many different ways and three are explained here:cosine-based, correlation-based, and adjusted cosine.

3.2.1 Cosine-based similarityThe two items i and j are represented as the vectors ~i and~j in the m dimensional user-space to be able to calculatetheir similarity. The similari]ty is measured by computingthe cosine of the angle between the two vectors. This isformally shown as

itemSim(i, j) = cos(~i,~j) =~i ·~j

||~i||2 ∗ ||~j||2

where “·” equals the dot product of two vectors.

u1

u2

ua

um

.

.

.

.

i1 i2 ij in. . . .

Input (ratings table)Active user

Item for which predictionis sought

Prediction

Recommendation

CF-Algorithm

Pa,j (prediction onitem j for the active

user)

{Ti1, Ti2, ..., T iN} Top-Nlist of items for the

active user

Output interface

3. ITEM-BASED COLLABORATIVE FILT-

ERING ALGORITHM

3.1 Item Similarity Computation

3.1.1 Cosine-based Similarity

3.1.2 Correlation-based Similarity

3.1.3 Adjusted Cosine Similarity

288

Figure 1: Collaborative Filtering Process

3.2.2 Correlation-based similarityThis method computes the Pearson-r correlation, as seenin Section 2.2, to measure the likeness between two items.First, the users who have rated both i and j are found andrepresented by U . Then the similarity is

itemSim(i, j) =

∑u∈U (ru,i − ri)(ru,j − rj)√∑

u∈U (ru,i − ri)2√∑

u∈U (ru,j − rj)2

where ru,i is the rating user u gave item i and ri is theaverage rating of item i over all users in U .

3.2.3 Adjusted cosine similarityUnlike user-based algorithms, item-based comparisons aremade using many users, each of which have a different rat-ing style. Some users, for example, might reserve the high-est rating for only a few items and give all other items alower-than-average rating. To compensate for this, the co-sine similarity is adjusted by subtracting a user’s averagerating from their rating of the two items being compared.This is another form of the Pearson-r correlation agorithm,adjusted to compensate for a user’s average rating insteadof an item’s. This is shown as

itemSim(i, j) =

∑u∈U (ru,i − ru)(ru,j − ru)√∑

u∈U (ru,i − ru)2√∑

u∈U (ru,j − ru)2

.

3.3 Prediction computationOnce a set of items similar to each other is generated, thenext step in an item-based system is to consider a user’srecorded ratings and create a predicted rating for each un-rated item. This can be done a number of ways, two of whichare thes weighted sum algorithm and a regression model.

3.3.1 Weighted sumThe weighted sum method computes the predicted score ofan item for a user by analyzing the other items that theuser has rated that are similar to the item being considered.Each other rating is weighted according to its similarity tothe new item and then is scaled by the sum of all similaritiesso it is within the same rating scale as other items. This isshown as

Pu,i =

∑n∈Ni itemSim(i, n) · run∑

n∈Ni itemSim(i, n)

where Ni is the set of items similar to i.

3.3.2 RegressionThis method can work better than a weighted sum becausethe two item’s vectors might be distant computationally buthave a high similarity to users. To avoid this error, a linearregression model is created and used in place of the similaritems’ ratings in the weighted sum formula. This algorithmrequires a model to be derived from the items available andcan work better than a weighted sum prediction.

3.4 Experimental evaluation3.4.1 Dataset

The MovieLens dataset was used for evaluation of the item-based algorithm by randomly selecting enough users to have100,000 ratings from the dataset only considering users thathave rated 20 or more movies [2]. The dataset was thenseparated into two sets, a training and a test set. The per-centage of data to be used as the training set is representedby the variable x so that when x = 0.8, 80% of the datawas used for training and 20% was used for testing. Thedataset was converted into a user-item matrix A that had943 rows and 1682 columns. Each row relates to one userand each column relates to one movie rated by at least oneuser. The sparsity level of the data sets was also consideredand defined as 1 − nonzero entries

total entries. The sparsity level of the

MovieLens dataset is then 1− 100,000943×1682

, which is 0.9369.

3.4.2 Evaluation metricsThe accuracy of the item-based algorithms was computedusing the Mean Absolute Error (MAE) between the pre-dicted rating and the actual user rating. MAE is the mea-sure of the deviation of recommendations, and the lowerthe number is, the more accurate the recommendations are.Each movie has a pair of ratings, (pi, qi) associated with itthat represent the predicted rating and the acutal rating.MAE is calculated by taking each of these pairs and com-puting the absolute error between them, |pi − qi|. Each ofthese absolute errors are summed and then averaged:

MAE =

∑Ni=1 |pi − qi|

N

3.4.3 Experiment procedureFor the experiments, the data was first divided into trainingand test sets. Using the training set, each of the three simi-larity algorithm’s parameters were tuned by testing differentvalues for each. These tests were done by splitting the train-ing set into its own training and test sets, so as to not derive

Item-item vs. User-user at SelectedNeighborhood Sizes (at x=0.8)

0.725

0.73

0.735

0.74

0.745

0.75

0.755

10 20 60 90 125 200

No. of neighbors

MA

E

user-user item-itemitem-item-regression nonpers

Item-item vs. User-user at Selected Density Levels (at No. of Nbr = 30)

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.2 0.5 0.8 0.9Train/test ratio, x

MA

E

user-user item-itemitem-item-regression nonpers

4.4.1 Impact of themodel size on run-time and through-put

4.5 Discussion

5. CONCLUSION

6. ACKNOWLEDGMENTS

293

Figure 2: Comparison between different prediciton algorithms

any extra accuracy improvement from the data. The algo-rithms were then tested and the MAE was recorded. Eachalgorithm was tested ten times on randomly chosen trainingand test sets and the MAEs were averaged. To compare theperformance of the item-based algorithms, a user-based al-gorithm was tested as well. This algorithm was a Pearsonnearest neighbor algorithm, based on the best published oneavailable.

3.4.4 Experimental resultsThe first thing tested was the accuracy of each of the threesimilarity algorithms mentioned earlier. Each algorithm wasimplemented to predict the similar items and then the weightedsum algorithm was used to generate predictions. The resultsshowed that the adjusted cosine algorithm was the best per-former and so it was used for the rest of the experiments.The next test was to determine what amount of the data setshould be used for training and a value of 80% was foundto be the optimal choice for both the weighted sum and re-gression methods. The neighborhood size, or the numberof similar items used in the prediction computation showedthat the weighted sum algorithm performance peaks at size30 and the regression model diminishes in accuracy as thesize is increased. The neighborhood size was set at 30 forbest performance from both algorithms.

Comparing both prediction methods with the user-based al-gorithm was done by letting one of the parameters vary whilesetting the other at its optimal value. When varying x be-tween 0.2 and 0.9 by 0.1 intervals, neighborhood size was setas 30, then x was set at 0.8 when varying neighborhood sizebetween 10 and 200 at intervals of 10. Using these settings,the weighted sum algorithm out performed the user-basedalgorithm at all values of x (neighborhood size = 30) and allvalues of neighborhood size (x = 0.8), as shown in Figure2, taken from [4]. The regression based algorithm showedbetter performance with low values of both x and neighbor-hood size than both other algorithms but lost its lead asparameters were increased. These results show that item-based algorithms can produce more accurate results thanthe standard user-based methods.

4. TEMPORAL DYNAMICSWhen considering the methods that people use to rate things,it is not clear that anyone uses a set method for their entireinteraction with a collaborative filter. A user could have a

skewed rating system that tends to be above the median andthen switch to rating closer to the minimum. This doesn’tnecessary mean that the items the user is rating have gottenworse, but could show a trend in that user’s ratings. Whenlooking at the NetFlix dataset in Figure 3 from [3], thereis a clear trend that a movie’s ratings tend to increase withthe age of the movie. The more time between when the auser rates a movie and when it was released, the higher theaverage rating. This trend can help improve the accuracyof recommender systems if they could take temporal changeinto consideration.

4.1 Netflix prize and datasetThe Netflix Prize is a noteworthy competition that broughta lot of attention and research to the collaborative filteringfield, both from academic and private interests. Netflix is amovie rental service that lends movies to customers via themail and also has streaming services that allow customersto watch movies in places with an internet connection. Net-flix’s recommender system, Cinematch, was very successfulat suggesting new movies to users and this accuracy was amain reason for customers to keep their subscriptions. InOctober 2006, Netflix announced the Netflix Prize to im-prove Cinematch’s accuracy, they releasied a large data setto the public. With a $1,000,000 prize for the team thatcould reduce the root mean squared error (RSME) that Cin-ematch was capable of by 10 percent. Teams were given over100 million ratings from 480,000 users on 17,770 movies. [1]

4.2 Parts of a temporal modelTo incorporate into predictions, the time, t, at which a ratingis made is recorded along with the user, u, and the item, i.A rating rui(t) indicates the preference by user u of item iat day t and a predicted rating r̂ui(t) is the predicted ratinguser u will give item i at time t. Each (u, i, t) triplet whererui(t) is known is stored in the set K = {(u, i, t)|rui(t)isknown} and is used to give predicted ratings. To find r̂ui,each user u is associated with a vector pu ∈ Rf and eachitem i is associated with a vector qi ∈ Rf space. A rating ispredicted by the rule:

r̂ui = qTi pu

This rating does not do a good job at detecting effects thatdoes not involve user-item interaction and so do not allowroom for time to be included as a factor. To solve this abaseline rating is made by adding the average rating of every

Figure 3: Movie rating by movie age

item and the deviations both the item and the user have fromthe average. We denote µ to be the overall average ratingsof all items by all users and bu and bi to be the deviationfor the user and item respectively. This creates the baselinefor an unknown rating, bui to be:

bui = µ+ bu + bi

[3] explains this formula with an example:

Suppose that we want a baseline estimate forthe rating of the movie Titanic by user Joe. Now,say that the average rating over all movies, µ, is3.7 stars. Furthermore, Titanic is better than anaverage movie, so it tends to be rated 0.5 starsabove the average. On the other hand, Joe is acritical user, who tends to rate 0.3 stars lowerthan the average. Thus, the baseline estimatefor Titanic’s rating by Joe would be 3.9 stars bycalculating 3.7 - 0.3 + 0.5.

Adding the predicted rating, r̂ui into the baseline predic-tion, we get an equation that has the main effects isolated,allowing them to be modified later to take time as a factor.The combined equation is:

r̂ui = µ+ bu + bi + qTi pu

Separating each of these factors allows each to be treatedwith a different temporal effect. User bias (bu) changes overtime, item bias (bi) changes over time, and user preference(pu) also changes over time. Meanwhile, the item character-istics, represented by the vector (qi), would not be expectedto change over time.

4.3 Baseline with temporal dynamicsThe two parts of a baseline prediction that are affected bytime are the item and user bias, bi and bu. An item’s av-erage rating might fluctuate depending on outside factors

such as a sequel being made or an actor’s change in pop-ularity. A user might also change rating bias by setting alower or higher standard for how they determine a movie’squality. Making both these parameters as functions of time,the baseline prediction becomes:

bui(t) = µ+ bu(t) + bi(t)

The function bui(t) represents the baseline estimate of useru’s rating of item i at day t. When finding bi(t), it is im-portant to note that within the context of a movie recom-mender, movies do not change popularity on a daily basisand so a less specific view can be used. The dataset was di-vided into sections called bins that represent a time period.Each time period should be large enough for a movie’s rat-ing to be noticeably changed. Each time period will relateto a single bin and have a single bias amount bi(t). For theNetflix dataset, it was found that there is a wide range ofbin sizes that resulted in the same accuracy and so 30 binswere used. These 30 bins span the entire dataset and eachrelates to ten weeks of data. A day t is associated with aninteger Bin(t) depending on which time period it is in. Thenthe temporal baseline is added to the baseline to form:

bi(t) = bi + bi,Bin(t)

Applying this technique to users is not as simple due tothe frequency that users rate items being low. A varietyof functions can be created to model a user’s rating withtemporal dynamics, ranging from simple to complex. Thesedifferent predictors are:

• static: no temporal effects: bui(t) = µ+ bu + bi

• mov : accounting only to movie-related temporal ef-fects: bui(t) = µ+ bu + bi + bi,Bin(t)

• linear : linear modeling of user biases: bui(t) = µ +bu + αu · devu(t) + bi + bi,Bin(t)

• spline: spline modeling of user biases: bui(t) = µ +

bu +∑kul=1

e−γ|t−tul |butl∑ku

l=1e−γ|t−tu

l| + bi + bi,Bin(t)

• linear+: linear modeling of user biases and single dayeffect: bui(t) = µ+bu +αu ·devu(t)+bu,t +bi +bi,Bin(t)

• spline+: spline modeling of user biases and single day

effect: bui(t) = µ+ bu +∑kul=1

e−γ|t−tul |butl∑ku

l=1e−γ|t−tu

l| + bu,t + bi +

bi,Bin(t)

When compared with the static predictor, each algorithmshows improvement in accuracy with the linear+ and spline+showing the greatest decrease in error.

5. CONCLUSIONAs recommender systems become used more, improvementshave been made on the standard approach to the point wherethe accuracy of a system depends on the traits of its uniquedataset. A user-based system can be improved by imple-menting new features dependant on a particular dataset,but most recommenders have a peak performance and thendon’t improve too much. This calls for new ways to createcollaborative filtering techniques that improve performanceon a systematic level and not an implementation one. Thetwo techniques discussed here, item-based algorithms andtemporal aware models, show fairly large improvements onthe standard user-based methods and are important to thefuture of recommender systems. Both of these systems aredeveloped by recognizing traits about data and then creat-ing a way to use the traits to improve accuracy. When look-ing for ways to improve recommender systems it becomesimportant to recognize attributes of a dataset such as thestatic nature of items and trends in user and item ratings.Using these observations, new techniques such as item-basedalgorithms and temporal dynamics may be developed.

6. REFERENCES[1] R. M. Bell and Y. Koren. Lessons from the netflix prize

challenge. SIGKDD Explor. Newsl., 9:75–79, December2007.

[2] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T.Riedl. Evaluating collaborative filtering recommendersystems. ACM Trans. Inf. Syst., 22:5–53, January 2004.

[3] Y. Koren. Collaborative filtering with temporaldynamics. In Proceedings of the 15th ACM SIGKDDinternational conference on Knowledge discovery anddata mining, KDD ’09, pages 447–456, New York, NY,USA, 2009. ACM.

[4] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl.Item-based collaborative filtering recommendationalgorithms. In Proceedings of the 10th internationalconference on World Wide Web, WWW ’01, pages285–295, New York, NY, USA, 2001. ACM.

[5] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen.The adaptive web. chapter Collaborative filteringrecommender systems, pages 291–324. Springer-Verlag,Berlin, Heidelberg, 2007.

methods to improve the accuracy of recommender systems · item-based systems are explored in 3 and...

Documents