Download - Recommender system

Recommender System

Yinghan Fu

Non-personalized recommendation

• Every customer gets the same recommendation.


1

1 +1𝑛𝑧2𝑝 +1

2𝑛𝑧2 − 𝑧

1

𝑛𝑝 1 − 𝑝 +

1

4𝑛2𝑧2

Reddit comment score:

𝑝 : percentage of positive ratings in all ratings

1

1 +1𝑛𝑧2𝑝 +1

2𝑛𝑧2 − 𝑧

1

𝑛𝑝 1 − 𝑝 +

1

4𝑛2𝑧2 ,

1

1 +1𝑛𝑧2𝑝 +1

2𝑛𝑧2 + 𝑧

1

𝑛𝑝 1 − 𝑝 +

1

4𝑛2𝑧2

In binomial distribution, confidence interval (Wilson score interval) of 𝑝:

http://www.redditblog.com/2009/10/reddits-new-comment-sorting-system.html












• Advantage

– Quick to calculate.

– Under the right context, can be accurate

• Disadvantage

– Without the context, not so helpful

Personalized recommendation You open Amazon in the browser:

Why is this a failure compared with the Reddit comment ranking? No context!

Content-based recommendation

Content-based recommendation Based on a book

Magic

Harry Potter and the Deathly Hallows

Me

𝑡𝑓(𝑎) ∗ 𝑖𝑑𝑓(𝑎)

𝑖𝑑𝑓(𝑎) =1

𝑁(𝑎)

http://nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html














• Advantage

– Quick to compute.

• Disadvantage

– Need manual labelling.

Content-based recommendation

Collaborative filtering

𝑢1𝑢2𝑢3𝑢4𝑢5⋮

4 3 5 …5 5 5 4 ⋯2 3 ⋯1 4 2 ⋯

1 1 ⋯⋮ ⋮ ⋮ ⋮ ⋱

𝑣1 𝑣2 𝑣3 𝑣4 …

𝑢𝑖 vector for user 𝑖 𝑣𝑖 vector for item 𝑗


• Item-item collaborative filtering

• 𝑝𝑖𝑗 = 𝑠(𝑗,𝑗′)𝑟

𝑖𝑗′𝑗′

𝑠(𝑗,𝑗′)𝑗′

𝑠 𝑗, 𝑗′ = 𝑘(𝑣𝑗 , 𝑣𝑗′)

• User-user collaborative filtering

• 𝑝𝑖𝑗 = 𝑠(𝑖,𝑖′)𝑟

𝑖′𝑗𝑖′

𝑠(𝑖,𝑖′)𝑖′

𝑠 𝑖, 𝑖′ = 𝑘(𝑢𝑖 , 𝑢𝑖′)

• Slow to compute, more accurate for most situations.

http://files.grouplens.org/papers/FnT%20CF%20Recsys%20Survey.pdf

http://grouplens.org/site-content/uploads/Item-Based-WWW-2001.pdf

http://files.grouplens.org/papers/FnT CF Recsys Survey.pdf

http://files.grouplens.org/papers/FnT CF Recsys Survey.pdf











• Variation of kernel

• 𝑘 𝑢𝑖 , 𝑢𝑖′ = 𝑐𝑜𝑠 𝑢𝑖 , 𝑢𝑖′ cosines similarity

• 𝑘 𝑢𝑖 , 𝑢𝑖′ = 𝜌 𝑢𝑖 , 𝑢𝑖′ correlation similarity

• 𝑢𝑖′ = 𝑢𝑖 −

𝑣1𝑣2𝑣3⋮

𝑘 𝑢𝑖 , 𝑢𝑖′ = 𝑐𝑜𝑠 𝑢𝑖′, 𝑢𝑖′′

adjusted cosine similarity


• Variation of neighbor size

• 𝑝𝑖𝑗 = 𝑠(𝑖,𝑖′)𝑟

𝑖′𝑗𝑖′∈𝑵

𝑠(𝑖,𝑖′)𝑖′∈𝑵

• Normalizing, centering and linearly transforming the vectors.


𝑀𝐴𝐸 =1

𝑁 |𝑝𝑘 − 𝑟𝑘|

1≤𝑘≤𝑁












𝑟11 ⋯ 𝑟1𝑚⋮ ⋱ ⋮𝑟𝑛1 ⋯ 𝑟𝑛𝑚

≈𝑈1𝑇

⋮𝑈𝑛𝑇

𝑉1 … 𝑉𝑚

𝑛×𝑚 𝑛×𝑘 𝑘×𝑚

SVD can factorize any matrix … without null values! Null value is the reason we want to do matrix factorization in the first place. Quick to predict ratings … if we are able to factorize the matrix.


𝑟11 ⋯ 𝑟1𝑚⋮ ⋱ ⋮𝑟𝑛1 ⋯ 𝑟𝑛𝑚

≈𝑈1𝑇

⋮𝑈𝑛𝑇𝑉1 … 𝑉𝑚


𝑃 𝑟𝑖𝑗 𝑼,𝑽 = 𝑁(𝑈𝑖𝑇𝑉𝑗 , 𝜎

2)

MLE/minimize:

1

2𝑟𝑖𝑗 − 𝑈𝑖

𝑇𝑉𝑗

2

𝑟𝑖𝑗≠𝑛𝑢𝑙𝑙

http://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf










𝑟11 ⋯ 𝑟1𝑚⋮ ⋱ ⋮𝑟𝑛1 ⋯ 𝑟𝑛𝑚

≈𝑈1𝑇

⋮𝑈𝑛𝑇𝑉1 … 𝑉𝑚


𝑃 𝑟𝑖𝑗 𝑼,𝑽 = 𝑁(𝑈𝑖𝑇𝑉𝑗 , 𝜎

2)

MLE/minimize:

𝐸 = 1


𝑇𝑉𝑗

2

+λ

2𝑟𝑖𝑗≠𝑛𝑢𝑙𝑙

𝑈𝑝2

𝑝

+ 𝑉𝑞2

𝑞


MLE/minimize:

𝐸 = 1


𝑇𝑉𝑗

2

+λ


𝑈𝑝2

𝑝

+ 𝑉𝑞2

𝑞

Derivative: 𝜕𝐸

𝜕𝑈𝑖= 𝑟𝑖𝑗 − 𝑈𝑖

𝑇𝑉𝑗 𝑉𝑗𝑗,𝑟𝑖𝑗≠𝑛𝑢𝑙𝑙

+ λ𝑈𝑖

𝜕𝐸

𝜕𝑉𝑗= 𝑟𝑖𝑗 − 𝑈𝑖

𝑇𝑉𝑗 𝑈𝑖𝑖,𝑟𝑖𝑗≠𝑛𝑢𝑙𝑙

+ λ𝑉𝑗

Advantage: Quick for predicting new ratings. More accurate with enough data? Disadvantage: Difficult to update the model.


MLE/minimize:

𝐸 = 1


𝑇𝑉𝑗

2

+λ


𝑈𝑝2

𝑝

+ 𝑉𝑞2

𝑞

Stochastic Gradient Descent: for 𝑟𝑖𝑗:

𝑈𝑖′ = 𝛼 𝑟𝑖𝑗 − 𝑈𝑖

𝑇𝑉𝑗 𝑉𝑗 − 𝑘𝑈𝑖

𝑉𝑗′ = 𝛼 𝑟𝑖𝑗 − 𝑈𝑖

𝑇𝑉𝑗 𝑈𝑖 − 𝑘𝑉𝑗

𝑈𝑖= 𝑈𝑖′

𝑉𝑗= 𝑉𝑗′

http://sifter.org/~simon/journal/20061211.html

Advantage: Easy to update the model





http://arxiv.org/pdf/1205.3193.pdf

http://arxiv.org/pdf/1205.3193.pdf

Evaluation

• Basic accuracy – MAE

– RMSD

• Ranking accuracy – Pearson correlation over ranks

– Kendall tau test

• Decision support – Precision

– Recall

https://www.coursera.org/learn/recommender-systems/supplement/Jh5Kx/pdf-version-of-module-5-presentations















Download - Recommender system

Top Related