lsh for prediction problem in recommendation
TRANSCRIPT
LSH for
Prediction Problem in Recommendation
Maruf Aytekin
PhD Student
Computer Engineering Department Bahcesehir University
May 5, 2015
Outline• User-based • Item-based • LSH • Parameters • Model Build Performance • Accuracy Performance • LSH Parameters
Data SetTotal Ratings: 100000 Number of Users : 943 Number of Items : 1682 Sparsity = 0.0630
Evaluation Methods• We use hold out cross validation methot for the
experiments
• We select %5 for test %5 for validation data randomly.
• Repeat this process 3 times and averaged out the results
User-basedNeighbors can have different levels of similarity.
Wuv: Similarity of user u and v.
rvi: Rating value of user v for item i.
Ni(u): Set of neighbors who have rated for item i.
ruj: Rating value of user u for item j.
Nu(i): the items rated by user u most similar to item i. Wij: Similarity of item i and j
Item-based
U1
U2
U3
Um
.
.
.
.
.
H1
H2
U7 U11 U10
.
.
U13 U39 Um
.
.
U1 U3 U9
.
.
U2 U5 U6
.
.
bucket 1 key: 0101
bucket 2 key: 1110
bucket 3 key: 1101
bucket 4 key: 1001
[0,1]
[0,1] AND-Construction
Locality Sensitive Hashing
Hash Tables
U2 U6 U1 U3
.
.
.
candidate set for U5: C(U5)
L = 2 K = 4
t = 1
t = 2
LSH for Prediction
L : number of hash tables (bands)
Cvi(t) : the set of candidate pairs retrieved from hash table t
rated for item i.
rvi : rating of user v (in C) on item i
Computational Complexty
|U | : User set size | I | : Item set size k : Number of neighbors used in the predictions p : Maximum number of ratings per user q : Maximum number of ratings per item
Parameters (CF)
LSH Parameters
LSH Parameters
Model Build Time
ResultsUser-based
With the optimum k = 30 and Y=7 ;
• Average MAE: 0.79527 • Average running time: 9.437 seconds.
We compare this results LSH method.
LSH & User-basedHash Functions
LSH & User-basedHash Functions
LSH & User-basedHash Tables
LSH & User-basedHash Tables
Conclusion• LSH tremendously improved the scalability • Accuracy decreased in acceptable ranges • Performance improved a lot. • LSH needs to be configured to balance MAE and
performance according to expectations from the system.
Source Code User-based Prediction:
Source CodeLSH Prediction:
Q&A