Download - Recommender system
Recommender System
Yinghan Fu
Non-personalized recommendation
โข Every customer gets the same recommendation.
Non-personalized recommendation
1
1 +1๐๐ง2๐ +1
2๐๐ง2 โ ๐ง
1
๐๐ 1 โ ๐ +
1
4๐2๐ง2
Reddit comment score:
๐ : percentage of positive ratings in all ratings
1
1 +1๐๐ง2๐ +1
2๐๐ง2 โ ๐ง
1
๐๐ 1 โ ๐ +
1
4๐2๐ง2 ,
1
1 +1๐๐ง2๐ +1
2๐๐ง2 + ๐ง
1
๐๐ 1 โ ๐ +
1
4๐2๐ง2
In binomial distribution, confidence interval (Wilson score interval) of ๐:
http://www.redditblog.com/2009/10/reddits-new-comment-sorting-system.html
Non-personalized recommendation
โข Advantage
โ Quick to calculate.
โ Under the right context, can be accurate
โข Disadvantage
โ Without the context, not so helpful
Personalized recommendation You open Amazon in the browser:
Why is this a failure compared with the Reddit comment ranking? No context!
Content-based recommendation
Content-based recommendation Based on a book
Magic
Harry Potter and the Deathly Hallows
Me
๐ก๐(๐) โ ๐๐๐(๐)
๐๐๐(๐) =1
๐(๐)
http://nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html
โข Advantage
โ Quick to compute.
โข Disadvantage
โ Need manual labelling.
Content-based recommendation
Collaborative filtering
๐ข1๐ข2๐ข3๐ข4๐ข5โฎ
4 3 5 โฆ5 5 5 4 โฏ2 3 โฏ1 4 2 โฏ
1 1 โฏโฎ โฎ โฎ โฎ โฑ
๐ฃ1 ๐ฃ2 ๐ฃ3 ๐ฃ4 โฆ
๐ข๐ vector for user ๐ ๐ฃ๐ vector for item ๐
Collaborative filtering
โข Item-item collaborative filtering
โข ๐๐๐ = ๐ (๐,๐โฒ)๐
๐๐โฒ๐โฒ
๐ (๐,๐โฒ)๐โฒ
๐ ๐, ๐โฒ = ๐(๐ฃ๐ , ๐ฃ๐โฒ)
โข User-user collaborative filtering
โข ๐๐๐ = ๐ (๐,๐โฒ)๐
๐โฒ๐๐โฒ
๐ (๐,๐โฒ)๐โฒ
๐ ๐, ๐โฒ = ๐(๐ข๐ , ๐ข๐โฒ)
โข Slow to compute, more accurate for most situations.
http://files.grouplens.org/papers/FnT%20CF%20Recsys%20Survey.pdf
http://grouplens.org/site-content/uploads/Item-Based-WWW-2001.pdf
Collaborative filtering
โข Variation of kernel
โข ๐ ๐ข๐ , ๐ข๐โฒ = ๐๐๐ ๐ข๐ , ๐ข๐โฒ cosines similarity
โข ๐ ๐ข๐ , ๐ข๐โฒ = ๐ ๐ข๐ , ๐ข๐โฒ correlation similarity
โข ๐ข๐โฒ = ๐ข๐ โ
๐ฃ1๐ฃ2๐ฃ3โฎ
๐ ๐ข๐ , ๐ข๐โฒ = ๐๐๐ ๐ข๐โฒ, ๐ข๐โฒโฒ
adjusted cosine similarity
Collaborative filtering
โข Variation of neighbor size
โข ๐๐๐ = ๐ (๐,๐โฒ)๐
๐โฒ๐๐โฒโ๐ต
๐ (๐,๐โฒ)๐โฒโ๐ต
โข Normalizing, centering and linearly transforming the vectors.
Collaborative filtering
๐๐ด๐ธ =1
๐ |๐๐ โ ๐๐|
1โค๐โค๐
http://grouplens.org/site-content/uploads/Item-Based-WWW-2001.pdf
Collaborative filtering
http://grouplens.org/site-content/uploads/Item-Based-WWW-2001.pdf
Collaborative filtering
๐11 โฏ ๐1๐โฎ โฑ โฎ๐๐1 โฏ ๐๐๐
โ๐1๐
โฎ๐๐๐
๐1 โฆ ๐๐
๐ร๐ ๐ร๐ ๐ร๐
SVD can factorize any matrix โฆ without null values! Null value is the reason we want to do matrix factorization in the first place. Quick to predict ratings โฆ if we are able to factorize the matrix.
Collaborative filtering
๐11 โฏ ๐1๐โฎ โฑ โฎ๐๐1 โฏ ๐๐๐
โ๐1๐
โฎ๐๐๐๐1 โฆ ๐๐
๐ร๐ ๐ร๐ ๐ร๐
๐ ๐๐๐ ๐ผ,๐ฝ = ๐(๐๐๐๐๐ , ๐
2)
MLE/minimize:
1
2๐๐๐ โ ๐๐
๐๐๐
2
๐๐๐โ ๐๐ข๐๐
http://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf
Collaborative filtering
๐11 โฏ ๐1๐โฎ โฑ โฎ๐๐1 โฏ ๐๐๐
โ๐1๐
โฎ๐๐๐๐1 โฆ ๐๐
๐ร๐ ๐ร๐ ๐ร๐
๐ ๐๐๐ ๐ผ,๐ฝ = ๐(๐๐๐๐๐ , ๐
2)
MLE/minimize:
๐ธ = 1
2๐๐๐ โ ๐๐
๐๐๐
2
+ฮป
2๐๐๐โ ๐๐ข๐๐
๐๐2
๐
+ ๐๐2
๐
Collaborative filtering
MLE/minimize:
๐ธ = 1
2๐๐๐ โ ๐๐
๐๐๐
2
+ฮป
2๐๐๐โ ๐๐ข๐๐
๐๐2
๐
+ ๐๐2
๐
Derivative: ๐๐ธ
๐๐๐= ๐๐๐ โ ๐๐
๐๐๐ ๐๐๐,๐๐๐โ ๐๐ข๐๐
+ ฮป๐๐
๐๐ธ
๐๐๐= ๐๐๐ โ ๐๐
๐๐๐ ๐๐๐,๐๐๐โ ๐๐ข๐๐
+ ฮป๐๐
Advantage: Quick for predicting new ratings. More accurate with enough data? Disadvantage: Difficult to update the model.
Collaborative filtering
MLE/minimize:
๐ธ = 1
2๐๐๐ โ ๐๐
๐๐๐
2
+ฮป
2๐๐๐โ ๐๐ข๐๐
๐๐2
๐
+ ๐๐2
๐
Stochastic Gradient Descent: for ๐๐๐:
๐๐โฒ = ๐ผ ๐๐๐ โ ๐๐
๐๐๐ ๐๐ โ ๐๐๐
๐๐โฒ = ๐ผ ๐๐๐ โ ๐๐
๐๐๐ ๐๐ โ ๐๐๐
๐๐= ๐๐โฒ
๐๐= ๐๐โฒ
http://sifter.org/~simon/journal/20061211.html
Advantage: Easy to update the model
Evaluation
โข Basic accuracy โ MAE
โ RMSD
โข Ranking accuracy โ Pearson correlation over ranks
โ Kendall tau test
โข Decision support โ Precision
โ Recall
https://www.coursera.org/learn/recommender-systems/supplement/Jh5Kx/pdf-version-of-module-5-presentations