data-driven modeling: lecture 09

Data-driven modelingAPAM E4990

Jake Hofman

Columbia University

April 2, 2012

Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 1 / 30

Personalized recommendations

http://netflixprize.com

http://netflixprize.com/rules

http://netflixprize.com/faq

Netflix prize: results

http://en.wikipedia.org/wiki/Netflix_Prize

Netflix prize: results

See [TJB09] and [Kor09] for more gory details.

Recommendation systems

High-level approaches:

• Content-based methods(e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7)

• Collaborative methods(e.g., “Users who liked this also liked”)

Netflix prize: data

(userid, movieid, rating, date)

Netflix prize: data

(movieid, year, title)

Recommendation systems

High-level approaches:

• Content-based methods(e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7)

• Collaborative methods(e.g., “Users who liked this also liked”)

Collaborative filtering

Memory-based(e.g., k-nearest neighbors)

Model-based(e.g., matrix factorization)

http://research.yahoo.com/pub/2859

Problem statement

• Given a set of past ratings Rui that user u gave item i• Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number

of stars for movie rating• Or we may infer implicit ratings from user actions, e.g.

Rui = 1 if u purchased i ; otherwise Rui = ?

• Make recommendations of several forms• Predict unseen item ratings for a particular user• Suggest items for a particular user• Suggest items similar to a particular item• . . .

• Compare to natural baselines• Guess global average for item ratings• Suggest globally popular items

Problem statement

k-nearest neighbors

Key intuition:Take a local popularity vote amongst “similar” users

k-nearest neighborsUser similarity

Quantify similarity as a function of users’ past ratings, e.g.

• Fraction of items u and v have in common

Suv =|ru ∩ rv ||ru ∪ rv |

∑i RuiRvi∑

i (Rui + Rvi − RuiRvi )(1)

Retain top-k most similar neighbors v for each user u

k-nearest neighborsUser similarity

Quantify similarity as a function of users’ past ratings, e.g.

• Angle between rating vectors

Suv =ru · rv|ru| |rv |

∑i RuiRvi√∑i R

∑j R

Retain top-k most similar neighbors v for each user u

k-nearest neighborsPredicted ratings

Predict unseen ratings R̂ui as a weighted vote over u’s neighbors’ratings for item i

R̂ui =

∑v RviSuv∑v Suv

k-nearest neighborsPractical notes

We expect most users have nothing in common, so calculatesimilarities as:

for each item i :for all pairs of users u, v that have rated i :

calculate Suv (if not already calculated)

Alternatively, we can make recommendations using an item-basedapproach [LSY03]:

• Compute similarities Sij between all pairs of items

• Predict ratings with a weighted vote R̂ui =∑

j RujSij/∑

Several (relatively) simple ways to scale:

• Sample a subset of ratings for each user (by, e.g., recency)

• Use MinHash to cluster users [DDGR07]

• Distribute calculations with MapReduce

Matrix factorization

Key intuition:Model item attributes as belonging to a set of unobserved “topics

and user preferences across these “topics”

Matrix factorizationLinear model

Start with a simple linear model:

R̂ui = b0︸︷︷︸global average

+ bu︸︷︷︸user bias

+ bi︸︷︷︸item bias

Matrix factorizationLinear model

For example, we might predict that a harsh critic would score apopular movie as

R̂ui = 3.6︸︷︷︸global average

+ −0.5︸︷︷︸user bias

+ 0.8︸︷︷︸item bias

= 3.9 (4)

Matrix factorizationLow-rank approximation

Add an interaction term:

R̂ui = b0︸︷︷︸global average

+ bu︸︷︷︸user bias

+ bi︸︷︷︸item bias

+ Wui︸︷︷︸user-item interaction

where Wui = pu · qi =∑

k PukQik

• Puk is user u’s preference for topic k

• Qik is item i ’s association with topic k

Matrix factorizationLoss function

Measure quality of model fit with squared-loss:

L =∑(u,i)

(R̂ui − Rui

=∑(u,i)

]ui− Rui

Matrix factorizationOptimization

The loss is non-convex in (P,Q), so no global minimum exists

Instead we can optimize L iteratively, e.g.:

• Alternating least squares: update each row of P, holding Qfixed, and vice-versa

• Stochastic gradient descent: update individual rows pu and qifor each observed Rui

Matrix factorizationAlternating least squares

L is convex in rows of P with Q fixed, and Q with P fixed, soalternate solutions to the normal equations:

pu =[Q(u)TQ(u)

]−1Q(u)T r(u) (8)

qi =[P(i)TP(i)

]−1P(i)T r(i) (9)

where:

• Q(u) is the item association matrix restricted to items ratedby user u

• P(i) is the user preference matrix restricted to users that haverated item i

• r(u) are ratings by user u and r(i) are ratings on item i

Matrix factorizationStochastic gradient descent

Alternatively, we can avoid inverting matrices by taking steps inthe direction of the negative gradient for each observed rating:

pu ← pu − η∂L∂pu

= pu +(Rui − R̂ui

)qi (10)

qi ← qi − η∂L∂qi

= qi +(Rui − R̂ui

)pu (11)

for some step-size η

Matrix factorizationPractical notes

Several ways to scale:

• Distribute matrix operations with MapReduce [GHNS11]

• Parallelize stochastic gradient descent [ZWSL10]

• Expectation-maximization for pLSI with MapReduce[DDGR07]

Datasets

• Movielenshttp://www.grouplens.org/node/12

• Reddithttp://bit.ly/redditdata

• CU “million songs”http://labrosa.ee.columbia.edu/millionsong/

• Yahoo Music KDDcuphttp://kddcup.yahoo.com/

• AudioScrobblerhttp://bit.ly/audioscrobblerdata

• Delicioushttp://bit.ly/deliciousdata

• . . .

Photo recommendations

http://koala.sandbox.yahoo.com

References I

AS Das, M Datar, A Garg, and S Rajaram.Google news personalization: scalable online collaborativefiltering.page 280, 2007.

R Gemulla, PJ Haas, E Nijkamp, and Y Sismanis.Large-scale matrix factorization with distributed stochasticgradient descent.2011.

Yehuda Koren.The bellkor solution to the netflix grand prize.pages 1–10, Aug 2009.

G Linden, B Smith, and J York.Amazon. com recommendations: Item-to-item collaborativefiltering.IEEE Internet computing, 7(1):76–80, 2003.

References II

A Toscher, M Jahrer, and RM Bell.The bigchaos solution to the netflix grand prize.2009.

M. Zinkevich, M. Weimer, A. Smola, and L. Li.Parallelized stochastic gradient descent.In Neural Information Processing Systems (NIPS), 2010.

data-driven modeling: lecture 09

neighbors ratings

unseen ratings rui

set of past ratings

unseen item ratings

implicit ratings

particular user

subset of ratings

rui rvisuv

Education

climate-driven uncertainties in modeling terrestrial gross...

modeling lecture 1

urban human mobility: data-driven modeling and...

complex data-driven predictive modeling in personalized

relational geometry in surface-driven modeling patrick

new challenges in data-driven modeling

atomistic modeling of capillary-driven grain boundary...

architecture modeling - inf.mit.bme.hu · architecture...

confirmation of data-driven reservoir modeling using

lecture - modeling 5

data-driven modeling, prediction and predictability ›...

tire modeling lecture

eml 2023 – modeling, parts lecture 1.11 – equation...

objective-driven systems modeling - kaist

lecture 1 introduction - ulno.net · lecture 1 introduction...

menu lecture 2: modeling computers · lecture 2: modeling...

data-driven diffusion modeling to examine...

lecture 5 data driven safety

behavioral modeling of unmatched nonlinear devices driven

mathematical modeling of pdgf-driven glioblastoma reveals...