g54dmt – data mining techniques and applications jqb/g54dmt jqb/g54dmt dr. jaume bacardit...

G54DMT – Data Mining Techniques and Applications

http://www.cs.nott.ac.uk/~jqb/G54DMT

Dr. Jaume [email protected]

Topic 4: ApplicationsLecture 1: The Netflix Challenge

Some material taken from http://en.wikipedia.org/wiki/Netflix_Prize, http://www.flickr.com/photos/chef_ele/3791293142/sizes/o/in/set-72157621825510293/, http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf and http://arxiv.org/abs/0911.0460

http://www.cs.nott.ac.uk/~jqb/G54DMT

mailto:[email protected]

http://en.wikipedia.org/wiki/Netflix_Prize

http://www.flickr.com/photos/chef_ele/3791293142/sizes/o/in/set-72157621825510293/

http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf

http://arxiv.org/abs/0911.0460

Outline

• The challenge and its assessment• Timeline of progress• Recommendation methods• Matrix Factorisation techniques• Ensemble methods• Lessons learnt• Resources

The Netflix Challenge

• Netflix is an online video rental company

• One of its most relevant components is its move recommendation system– Suggest movies to users based on

their past ratings

• In 2006 netflix made its recommendation database public

• Challenged the community to produce a new recommender that was 10% better than their own method

• Winner would get $1M

Training data

• Movie ratings collected from 1998 to 2005• 100,480,507 ratings that 480,189 users gave to 17,770

movies.• Training data divided in

– Training set (99,072,112 ratings) – Probe set (1,408,395 ratings)

• Each rating was a quadruplet <user,movie,date of rating,rating>

• Very sparse data: the number of ratings is a very small fraction of users x movies

Test data

• Qualifying data were triplets <user,movie,data of rating>

• Qualifying set (2,817,131 ratings) consisting of: – Test set (1,408,789 ratings), used to determine winners– Quiz set (1,408,342 ratings), used to calculate leaderboard

scores

• Participants did not know which instances were part of the test set and which part of the quiz set

• Test, quiz and probe set were created to have similar statistical properties

Assessment

• Error on the quiz and test set was computed as Root Mean Squared Error (RMSE), rounded to 4 digits

• RMSE of the Cinematch system (Netflix own predictor) = 0.9525– Target RMSE = 0.8572

• Once a participant improves the target RMSE, a “last call” period of 30 days start

• At the end of the 30 days, the participant with lowest test RMSE is declared the winner

• In case of ties, the prize goes to the earliest entry

Progress in the challenge

• Data released on October 2nd, 2006• On October 8th a participant already had better RMSE

than Cinematch• The 2007 progress prize was awarded to BellKor with

an improvement of 8.43%• The 2008 progress prize was awarded to “BellKor in

BigChaos” with an improvement of 9.44%• In June 26th, 2009, the team "BellKor's Pragmatic

Chaos" achieved an improvement of 10.05%. The “last call” period started

Progress: Last call period

• On July 25, 2009 the team "The Ensemble", a merger of the teams "Grand Prize Team" and "Opera Solutions and Vandelay United", achieved a 10.09% improvement

• After the last call period ended, two teams leaded the quiz leaderboard:– "The Ensemble" with a 10.10% improvement– "BellKor's Pragmatic Chaos" with a 10.09% improvement

• On the test set both teams were tied with an improvement of 10.06%

• BellKor's Pragmatic Chaos was declared the winner because they had submitted their entry 20 minutes before The Ensemble

http://en.wikipedia.org/wiki/Opera_Solutions

Recommender systems: Content Filtering

• Collect background information from users and movies to generate a profile of each of them– Users: demographic information– Movies: genre, actors, box office results

• Produce recommendations by matching the profiles of users and movies

• Costly as many times it’s difficult to collect all this information or it’s simply not available

Recommender systems: collaborative filtering

• Generate predictions of ratings only based on the past behavior of the users

• No background domain knowledge required• Easier to generate the models• Faces difficulties to start up: when not enough

ratings are available

Collaborative filtering: neighbourhood methods

• Compute relationship between items or users

• Identify which movies are similar to each other, based on receiving similar ratings from the same user

• Hierarchical clustering showing the similarities of 5000 movies

http://www.the-ensemble.com/content/netflix-prize-movie-similarity-visualization



Collaborative filtering: latent factor models

• Automatically map users and movies into a new space of factors (same for both of them)

Matrix Factorisation methods

• Most successful of the latent factors methods• These methods generate a vector qif for each

item and a vector puf for each user

• A prediction is a linear combination of both vectors

• The problem of finding the vectors q and p for each movie and user is defined as the following optimisation problem

Training set

Actual rating

Predictedrating

Regularisation term (avoid overfitting)

Optimisation methods• Stochastic gradient descend

– Iteratively samples training examples, computes the prediction errors and adjusts the vectors of the involved user and item accordingly

• Alternating least squares– The original definition of the optimization problem is not convex, and

hence cannot be solved to optimality– If either p or q is fixed, the problem is convex and can be solved using

least squares methods– This method alternates between two states, where in each state it

fixes either p or q

Bias in the models

• Not all movies receive the same distribution of ratings– Some are more popular

• Not all users give the same distribution of ratings– Some users are more strict than others

• Refinement of the model introducing bias terms

Averageoverallrating

Bias of item i

Bias of user u

Additional input sources

• Implicit feedback– User has preference over certain movies,

therefore it will not produce ratings for anything

• Demographic information– If available

Temporal dynamics

• Ratings change through time• Users

– May change tastes– May produce more/less strict ratings in different periods

of time

• Movies– Blockbusters may fade in popularity– Cult movies may become more popular

Impact of all components of the model (BellKor)

Ensemble methods

• All top participants methods combined (blended) the predictions of hundreds of models of many types– Matrix Factorisation– Neighbourhood methods– Restircted Boltzmann Machines methods

• Many ways of combining the models– Linear combinations– Neural networks– Regression trees

Basic linear regression method

• Need to optimize the vector of weights associated to each method

• Can use e.g. least squares method for this, optimizing over the probe set

• How to choose the models to include in the ensemble?– Forward method: start with one, keep adding until the

probe set error degrades– Backward method: start with all, keep removing while

the probe set error improves

Feature-Weighted Linear Stacking

• Method from “The Ensemble”• Not all models are suitable for all kind of movies/users• Generate a set of “meta-features” for each instance that are

used to calibrate the linear combination of weights specifically for each case

• vij = weight associated to feature j for model I

• fj(x) = value of feature j for instance x

• gi(x) = prediction of model I for instance x

Top 10 features (out of 25)

Lessons learnt from the challenge

• Well defined competition (clear rules, instant feedback of progress, forums to discuss)

• Great collaboration between participants, sharing ideas, combining efforts

• Widen the awareness of statistical and machine learning in the mainstream society

• It has provided a big challenge to the ML community, and hence, new science was done

http://justaguyinagarage.blogspot.com/2009/07/reflections-on-netflix-competition.html

Resources• Challenge web page• Very nice article about Matrix Factorisation• Article on Feature-Weighted Linear Stacking• Progress reports of

• BellKor• BigChaos• PragmaticTheory

• Web page of “The Ensemble”

http://www.netflixprize.com/index

g54dmt – data mining techniques and applications jqb/g54dmt jqb/g54dmt dr. jaume bacardit...

Documents

test set

prize http

training set

probe set

test data qualifying

applications http

challenge data

progress prize