![Page 1: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/1.jpg)
Recommender SystemsNew Approaches with Netflix Dataset
Robert Bell
Yehuda Koren
AT&T Labs
ICDM 2007
Presented by Matt Rodriguez
![Page 2: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/2.jpg)
Outline
Overview of Recommender System Approaches which are Content based or Collaborative Filtering
New Nearest Neighbor model from the leaders of the Netflix Competition
![Page 3: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/3.jpg)
Background
[Adomvicious, Tuzihilan IEEE 2005] Toward the Next Generation of Recommender Systems : A Survey of the State-of-the-Art and Possible Extensions
[Bell, Koren ACM 2007] Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
![Page 4: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/4.jpg)
Approaches for Recommender Systems
Content Based Approach: user is recommended items similar to the ones the user preferred in the past.
Collaborative Filtering: user is recommended items that people with similar tastes liked in the past
Hybrid Approach: combine the two approaches
![Page 5: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/5.jpg)
Content Based approachContent based system uses a utility function
The Drawbacks:1. Costs associated with gathering users
profiles and extracting features from the content2. Limited Content Analysis3. Overspecialization4. No notion of overall popularity; global
effect5. No new features for the new trends.
![Page 6: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/6.jpg)
Collaborative Filtering approach
CF uses ratings from other users to recommend items. Useful when item features are expensive or difficult to determine.
Memory based approach uses a function to determine rating of a user item pair, the function examines the ratings of the nearest neighbors(NNs).
1 . Neighbor of the user2. Neighbor of the item
Model based approach: uses the ratings to learn a model which is then used to make rating predictions.
![Page 7: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/7.jpg)
Memory Based Approach
Derive a rating from similar users, or items. A similarity function determines the neighbors.
![Page 8: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/8.jpg)
Unified Collaborative and Content Based Model
![Page 9: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/9.jpg)
Pitfalls of Recommender Systems
How to handle new users or new items?
Overspecialization: Bad to recommend items that are too similar to items previously. Goal is to find new items that are desirable.
Sparsity:
1. Many users do not have sufficient ratings
2. Many items do not have sufficient ratings
![Page 10: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/10.jpg)
Outline
Overview of Recommender System Approaches which are Content based or Collaborative Filtering
New Nearest Neighbor model from the leading group in the Netflix Competition
![Page 11: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/11.jpg)
Scalable Collaborative Filtering with Jointly Derived Neighborhood Weights
Contributions are normalizing global effects through the dataset.
Capture interdependence of the neighbors by deriving their weights simultaneously.
Present a scalable technique for finding similarities between large number of users.
![Page 12: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/12.jpg)
Previous Rating Techniques
Compare user-user similarities or item-item similarities
![Page 13: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/13.jpg)
Koren-Bell New Strategy
1. Remove global effects.
2. Find Interpolation weights for the neighbors simultaneously, by solving an optimization problem
3. Reduce the dimensionality of users by performing an iterative version of SVD.
![Page 14: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/14.jpg)
Addresses the following concerns
1. NN methods are not good at accounting for global effects.
2. Previous weighting methods fail to account for interdependencies between neighbors.
3. Previously, interpolation weights always sum to one . If there is no useful neighborhood information then it is better to ignore it.
4. NN methods work poorly if the number of ratings differs substantially.
![Page 15: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/15.jpg)
Normalizing by Removing Global Effects
Some users may give higher ratings than others users.
Some items may systematically receive higher ratings that other items.
Ratings may change over time.
For global effects consider:
1. average rating for an item
2. average rating for a user
3. number of ratings for an item
4. number of ratings for a user
![Page 16: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/16.jpg)
Parameter Estimation for Global Effects
Sequentially estimate one effect at a time. At each step use the residuals from the previous step.
![Page 17: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/17.jpg)
Parameter Estimation
For sparse data user parameter estimate is not reliable because there is too much variance.
Shrink the parameter based on how many ratings it had.
This is an example of Bayesian shrinkage.
![Page 18: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/18.jpg)
Approximation in Estimating User Parameter
Koren-Bell use a simpler estimator used for the user parameter.
The user parameter can be estimated from the posterior mean.
![Page 19: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/19.jpg)
Examples of Global Effects
Results are from the Probe dataset 1.4 million ratings.
Allows rating to change over time and based on average rating or support.
Effects are processed sequentially.
![Page 20: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/20.jpg)
Neighborhood Relationships ModelOverview
Compares item-item similarity.
Defines a model that can predict a rating given weights.
Sets up an optimization problem to derive the weights.
Accounts for sparsity by shrinking ratings with low support toward the common mean.
Jointly derives the weights using a QP solver.
![Page 21: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/21.jpg)
Neighborhood Relationships Model
Find the K most similar items using similarity function
Provides a prediction for a rating once the weights have been calculated
Computing the weights simultaneously is a least squares optimization problem.
![Page 22: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/22.jpg)
Optimal Weights
A is K x K matrix that stores item-item ratings by different users.
b is a vector with ratings from similar items multiplied by the current item.
This ignores many pairwiserelationships among ratings by the same user
![Page 23: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/23.jpg)
Neighborhood Relationships Model
Normalizes A and b by dividing it by the number of users that rated items i and j.
![Page 24: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/24.jpg)
Neighborhood Relationship Model
This steps shrinks ratings that have not been rated by many users.
![Page 25: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/25.jpg)
Preprocessing1. Calculate similarity
between each item pairs.
2. Shrink sij based on then number of ratings
3. Normalize A based on the number of users rated each pair of items
Preprocessing is quadratic in number of items, linear in ratings
![Page 26: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/26.jpg)
Calculation of the weights
The weights are constrained to be non-negative. This makes it a QP optimization problem.
Determine the weights by using a quadratic algorithm.
Algorithm uses Gradient Projection method.
![Page 27: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/27.jpg)
Rating process
1. Compute item-item similarity score
2. Compute item- item matrix
3. Given a user and an item, find the neighboring K items.
4. Use the QP solver to solve K x K system of equations which derives the weights.
5. Compute the rating for that specific user and item.
![Page 28: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/28.jpg)
User Oriented computation
Many internet recommendation system will have more users than items.
If a user has not rated items similar to an item, we can find other users that are similar and derive a rating from them.
Users can passively provide other information to the system: transactions, web page views use this information to find similar users
![Page 29: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/29.jpg)
Low dimensional embedding of the users
![Page 30: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/30.jpg)
Regularizing R
![Page 31: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/31.jpg)
Iteratively perform SVD
![Page 32: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/32.jpg)
Unifying user and item-relations in single model
A different user v might ratings might be a good predictor for some items but for all items
Gives higher weight to items that are similar to i.
![Page 33: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/33.jpg)
Experimental Study
The experiments were run on the Netflix Quiz and Probe test sets.
The test sets contain results from users that do not rate very much which makes it more difficult to predict.
The results measured the RMSE.
![Page 34: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/34.jpg)
Experiments with the Probe set1. Jointly interpolated weights performed better than
standard correlation based weights.2. Neighborhood model improved predictions after all data
normalizations. 3. Neighborhood size of K depended on the amount of
previous normalization. 4. Neighborhood interpolation had diminishing returns
after global effects are removed.
![Page 35: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/35.jpg)
Experiments with the Quiz set
Uses information derived from the Probe set.
Results are better than Probe Set.
RMSE is .9086 when global effects are removed.
RMSE is .8982 when factorization is applied.
![Page 36: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/36.jpg)
Experiments with the user-oriented approach
User-User experiments have a larger neighborhood size of 100.
Low dimensionality embedding improved query
Accuracy is not as good as the item-item. RMSE is 0.9157 after global effects are removed.
Future approach that uses a mixture model of user-user and item-item to provide ratings
![Page 37: New Approaches with Netflix Datasetcseweb.ucsd.edu/~elkan/291spring2008/matt.pdf · Collaborative Filtering approach CF uses ratings from other users to recommend items. Useful when](https://reader034.vdocuments.net/reader034/viewer/2022042404/5f1acba0905cec692d576428/html5/thumbnails/37.jpg)
Summary
1. Remove global effects.
2. Find Interpolation weights for the neighbors simultaneously, by solving an optimization problem
3. Reduce the dimensionality of users by performing an iterative version of SVD.
Sources
[BK07], [BKV07], [Adomvicious, Tuzihilan IEEE 2005]