matchbox large scale online bayesian recommendations

29
Matchbox Large Scale Online Bayesian Recommendations David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge

Upload: arnaud

Post on 20-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Matchbox Large Scale Online Bayesian Recommendations. David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge. Overview. Motivation. Message Passing on Factor Graphs. Matchbox model. Feedback models. Accuracy. Recommendation Speed. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Matchbox Large Scale Online Bayesian Recommendations

Matchbox Large Scale Online Bayesian Recommendations

David Stern, Thore Graepel, Ralf HerbrichOnline Services and Advertising Group

MSR Cambridge

Page 2: Matchbox Large Scale Online Bayesian Recommendations

Overview

• Motivation.• Message Passing on Factor Graphs.• Matchbox model.• Feedback models.• Accuracy.• Recommendation Speed.

Page 3: Matchbox Large Scale Online Bayesian Recommendations
Page 4: Matchbox Large Scale Online Bayesian Recommendations

Large scale personal recommendations

User Item

Page 5: Matchbox Large Scale Online Bayesian Recommendations

Collaborative Filtering

1 2 3 4 5 6

A

B

C

D

Use

rsItems

? ? ?

Metadata?

Page 6: Matchbox Large Scale Online Bayesian Recommendations

• Large Scale Personal Recommendations:– Products.– Services.– People.

• Leverage user and item metadata.

• Flexible feedback:– Ratings.– Clicks.

• Incremental Training.

Goals

Page 7: Matchbox Large Scale Online Bayesian Recommendations

factor graphs

Page 8: Matchbox Large Scale Online Bayesian Recommendations

factor graphs

Page 9: Matchbox Large Scale Online Bayesian Recommendations
Page 10: Matchbox Large Scale Online Bayesian Recommendations

Factor Graphs / Trees

• Definition: Graphical representation of product structure of a function (Wiberg, 1996)– Nodes: = Factors = Variables– Edges: Dependencies of factors on variables.

• Question:– What are the marginals of the function (all but one

variable are summed out)?

Page 11: Matchbox Large Scale Online Bayesian Recommendations

s s2s1

Factor Graphs and Inference

• Bayes’ law

• Factorising prior

• Factorising likelihood

• Sum out latent variables

• Message Passing

t1 t2

d

y

Page 12: Matchbox Large Scale Online Bayesian Recommendations

Gaussian Message Passing

-5 0 5 -5 0 5

-5 0 5-5 0 5-5 0 5

-5 0 5

* =

* =

?

Page 13: Matchbox Large Scale Online Bayesian Recommendations

the model

Page 14: Matchbox Large Scale Online Bayesian Recommendations

Matchbox With Metadata

r

User Metadata

*

s1+

u11 u21

s2+

u12 u22

Item Metadata

t1 +

v11 v21

t2 +

v12 v22

User ‘trait’ 1

User ‘trait’ 2

Male British Camera SLR

u01

u02

ID=234

UserItem

Rating potential ~

Page 15: Matchbox Large Scale Online Bayesian Recommendations

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

ItemUser

Trait 1Tr

ait 2 The Big

Lebowski

Lost in Transla-tion

Behind Enemy Lines

Pearl Har-bor

User/Item Trait Space

‘Preference Cone’ for user 145035

Page 16: Matchbox Large Scale Online Bayesian Recommendations

Incremental Training with ADF

1 2 3 4 5 6

A

B

C

D

Use

rsItems

Page 17: Matchbox Large Scale Online Bayesian Recommendations

feedback models

Page 18: Matchbox Large Scale Online Bayesian Recommendations

Feedback Models

r

>0=3

q

Page 19: Matchbox Large Scale Online Bayesian Recommendations

Feedback Models

t0 t1 t2 t3

> > < <

r

q

Page 20: Matchbox Large Scale Online Bayesian Recommendations

accuracy

Page 21: Matchbox Large Scale Online Bayesian Recommendations

Performance and Accuracy

Netflix Data• 100 million ratings• 17,700 movies /

400,000 users• Parallelisation with

locking: 8 cores 4x faster

MovieLens Data• 1 million ratings• 3,900 movies / 6,040

users• User / movie metadata

Page 22: Matchbox Large Scale Online Bayesian Recommendations

MovieLens – 1,000,000 ratings

User Job

Other Lawyer

Academic Programmer

Artist Retired

Admin Sales

Student Scientist

Customer Service

Self-Employed

Health Care Technician

Managerial Craftsman

Farmer Unemployed

Homemaker Writer

User Age

<18

18-25

25-34

35-44

45-49

50-55>55

User Gender

Male

Female

Movie Genre

Action Horror

Adventure Musical

Animation Mystery

Children’s Romance

Comedy Thriller

Crime Sci-Fi

Documentary War

Drama Western

Fantasy Film Noir

6040 users 3900 moviesUser ID Movie ID

Page 23: Matchbox Large Scale Online Bayesian Recommendations

MovieLensTraining Time: 5 Minutes

Page 24: Matchbox Large Scale Online Bayesian Recommendations

Netflix – 100,000,000 ratings

• 17770 Movies, 400,000 Users.• Training Time 2 hours (8 cores: 4X speedup).• 14,000 ratings per second.

Number Trait Dimensions RMSE

Cinematch 0.9514

2 0.941

5 0.930

10 0.924

20 0.916

30 0.914

Page 25: Matchbox Large Scale Online Bayesian Recommendations

recommendation speed

Page 26: Matchbox Large Scale Online Bayesian Recommendations

Prediction Speed

• Goal: find N items with highest predicted rating.

• Challenge:potentially have to consider all items.

• Two approaches to make this faster:– Locality Sensitive Hashing– KD Trees

• No Locality Sensitive Hash for inner product?• Approximate KD trees best so far.

Page 27: Matchbox Large Scale Online Bayesian Recommendations

Approximate KD Trees

• Approximate KD Trees.• Best-First Search.• Limit Number of Buckets to Search.• Non-Optimised F# code: 100ns per item.• Work in progress...

0.25s Budget

Can Recommend 2,500,000

Items

Page 28: Matchbox Large Scale Online Bayesian Recommendations

conclusions

Page 29: Matchbox Large Scale Online Bayesian Recommendations

Conclusions

• Integration of Collaborative Filtering with Content information.

• Fast, incremental training.• Users and items compared in the same space.• Flexible feedback model.• Bayesian probabilistic approach.