![Page 1: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/1.jpg)
Matchbox Large Scale Online Bayesian Recommendations
David Stern, Thore Graepel, Ralf HerbrichOnline Services and Advertising Group
MSR Cambridge
![Page 2: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/2.jpg)
Overview
• Motivation.• Message Passing on Factor Graphs.• Matchbox model.• Feedback models.• Accuracy.• Recommendation Speed.
![Page 3: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/3.jpg)
![Page 4: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/4.jpg)
Large scale personal recommendations
User Item
![Page 5: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/5.jpg)
Collaborative Filtering
1 2 3 4 5 6
A
B
C
D
Use
rsItems
? ? ?
Metadata?
![Page 6: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/6.jpg)
• Large Scale Personal Recommendations:– Products.– Services.– People.
• Leverage user and item metadata.
• Flexible feedback:– Ratings.– Clicks.
• Incremental Training.
Goals
![Page 7: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/7.jpg)
factor graphs
![Page 8: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/8.jpg)
factor graphs
![Page 9: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/9.jpg)
![Page 10: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/10.jpg)
Factor Graphs / Trees
• Definition: Graphical representation of product structure of a function (Wiberg, 1996)– Nodes: = Factors = Variables– Edges: Dependencies of factors on variables.
• Question:– What are the marginals of the function (all but one
variable are summed out)?
![Page 11: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/11.jpg)
s s2s1
Factor Graphs and Inference
• Bayes’ law
• Factorising prior
• Factorising likelihood
• Sum out latent variables
• Message Passing
t1 t2
d
y
![Page 12: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/12.jpg)
Gaussian Message Passing
-5 0 5 -5 0 5
-5 0 5-5 0 5-5 0 5
-5 0 5
* =
* =
≈
?
![Page 13: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/13.jpg)
the model
![Page 14: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/14.jpg)
Matchbox With Metadata
r
User Metadata
*
s1+
u11 u21
s2+
u12 u22
Item Metadata
t1 +
v11 v21
t2 +
v12 v22
User ‘trait’ 1
User ‘trait’ 2
Male British Camera SLR
u01
u02
ID=234
UserItem
Rating potential ~
![Page 15: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/15.jpg)
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
ItemUser
Trait 1Tr
ait 2 The Big
Lebowski
Lost in Transla-tion
Behind Enemy Lines
Pearl Har-bor
User/Item Trait Space
‘Preference Cone’ for user 145035
![Page 16: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/16.jpg)
Incremental Training with ADF
1 2 3 4 5 6
A
B
C
D
Use
rsItems
![Page 17: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/17.jpg)
feedback models
![Page 18: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/18.jpg)
Feedback Models
r
>0=3
q
![Page 19: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/19.jpg)
Feedback Models
t0 t1 t2 t3
> > < <
r
q
![Page 20: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/20.jpg)
accuracy
![Page 21: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/21.jpg)
Performance and Accuracy
Netflix Data• 100 million ratings• 17,700 movies /
400,000 users• Parallelisation with
locking: 8 cores 4x faster
MovieLens Data• 1 million ratings• 3,900 movies / 6,040
users• User / movie metadata
![Page 22: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/22.jpg)
MovieLens – 1,000,000 ratings
User Job
Other Lawyer
Academic Programmer
Artist Retired
Admin Sales
Student Scientist
Customer Service
Self-Employed
Health Care Technician
Managerial Craftsman
Farmer Unemployed
Homemaker Writer
User Age
<18
18-25
25-34
35-44
45-49
50-55>55
User Gender
Male
Female
Movie Genre
Action Horror
Adventure Musical
Animation Mystery
Children’s Romance
Comedy Thriller
Crime Sci-Fi
Documentary War
Drama Western
Fantasy Film Noir
6040 users 3900 moviesUser ID Movie ID
![Page 23: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/23.jpg)
MovieLensTraining Time: 5 Minutes
![Page 24: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/24.jpg)
Netflix – 100,000,000 ratings
• 17770 Movies, 400,000 Users.• Training Time 2 hours (8 cores: 4X speedup).• 14,000 ratings per second.
Number Trait Dimensions RMSE
Cinematch 0.9514
2 0.941
5 0.930
10 0.924
20 0.916
30 0.914
![Page 25: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/25.jpg)
recommendation speed
![Page 26: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/26.jpg)
Prediction Speed
• Goal: find N items with highest predicted rating.
• Challenge:potentially have to consider all items.
• Two approaches to make this faster:– Locality Sensitive Hashing– KD Trees
• No Locality Sensitive Hash for inner product?• Approximate KD trees best so far.
![Page 27: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/27.jpg)
Approximate KD Trees
• Approximate KD Trees.• Best-First Search.• Limit Number of Buckets to Search.• Non-Optimised F# code: 100ns per item.• Work in progress...
0.25s Budget
Can Recommend 2,500,000
Items
![Page 28: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/28.jpg)
conclusions
![Page 29: Matchbox Large Scale Online Bayesian Recommendations](https://reader036.vdocuments.net/reader036/viewer/2022062222/56815fd9550346895dcedc83/html5/thumbnails/29.jpg)
Conclusions
• Integration of Collaborative Filtering with Content information.
• Fast, incremental training.• Users and items compared in the same space.• Flexible feedback model.• Bayesian probabilistic approach.