recommender system

21
Recommender System Yinghan Fu

Upload: yinghan-fu

Post on 24-Jan-2017

101 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Recommender system

Recommender System

Yinghan Fu

Page 2: Recommender system

Non-personalized recommendation

โ€ข Every customer gets the same recommendation.

Page 3: Recommender system

Non-personalized recommendation

1

1 +1๐‘›๐‘ง2๐‘ +1

2๐‘›๐‘ง2 โˆ’ ๐‘ง

1

๐‘›๐‘ 1 โˆ’ ๐‘ +

1

4๐‘›2๐‘ง2

Reddit comment score:

๐‘ : percentage of positive ratings in all ratings

1

1 +1๐‘›๐‘ง2๐‘ +1

2๐‘›๐‘ง2 โˆ’ ๐‘ง

1

๐‘›๐‘ 1 โˆ’ ๐‘ +

1

4๐‘›2๐‘ง2 ,

1

1 +1๐‘›๐‘ง2๐‘ +1

2๐‘›๐‘ง2 + ๐‘ง

1

๐‘›๐‘ 1 โˆ’ ๐‘ +

1

4๐‘›2๐‘ง2

In binomial distribution, confidence interval (Wilson score interval) of ๐‘:

http://www.redditblog.com/2009/10/reddits-new-comment-sorting-system.html

Page 4: Recommender system

Non-personalized recommendation

โ€ข Advantage

โ€“ Quick to calculate.

โ€“ Under the right context, can be accurate

โ€ข Disadvantage

โ€“ Without the context, not so helpful

Page 5: Recommender system

Personalized recommendation You open Amazon in the browser:

Why is this a failure compared with the Reddit comment ranking? No context!

Page 6: Recommender system

Content-based recommendation

Page 7: Recommender system

Content-based recommendation Based on a book

Magic

Harry Potter and the Deathly Hallows

Me

๐‘ก๐‘“(๐‘Ž) โˆ— ๐‘–๐‘‘๐‘“(๐‘Ž)

๐‘–๐‘‘๐‘“(๐‘Ž) =1

๐‘(๐‘Ž)

http://nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html

Page 8: Recommender system

โ€ข Advantage

โ€“ Quick to compute.

โ€ข Disadvantage

โ€“ Need manual labelling.

Content-based recommendation

Page 9: Recommender system

Collaborative filtering

๐‘ข1๐‘ข2๐‘ข3๐‘ข4๐‘ข5โ‹ฎ

4 3 5 โ€ฆ5 5 5 4 โ‹ฏ2 3 โ‹ฏ1 4 2 โ‹ฏ

1 1 โ‹ฏโ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฑ

๐‘ฃ1 ๐‘ฃ2 ๐‘ฃ3 ๐‘ฃ4 โ€ฆ

๐‘ข๐‘– vector for user ๐‘– ๐‘ฃ๐‘– vector for item ๐‘—

Page 10: Recommender system

Collaborative filtering

โ€ข Item-item collaborative filtering

โ€ข ๐‘๐‘–๐‘— = ๐‘ (๐‘—,๐‘—โ€ฒ)๐‘Ÿ

๐‘–๐‘—โ€ฒ๐‘—โ€ฒ

๐‘ (๐‘—,๐‘—โ€ฒ)๐‘—โ€ฒ

๐‘  ๐‘—, ๐‘—โ€ฒ = ๐‘˜(๐‘ฃ๐‘— , ๐‘ฃ๐‘—โ€ฒ)

โ€ข User-user collaborative filtering

โ€ข ๐‘๐‘–๐‘— = ๐‘ (๐‘–,๐‘–โ€ฒ)๐‘Ÿ

๐‘–โ€ฒ๐‘—๐‘–โ€ฒ

๐‘ (๐‘–,๐‘–โ€ฒ)๐‘–โ€ฒ

๐‘  ๐‘–, ๐‘–โ€ฒ = ๐‘˜(๐‘ข๐‘– , ๐‘ข๐‘–โ€ฒ)

โ€ข Slow to compute, more accurate for most situations.

http://files.grouplens.org/papers/FnT%20CF%20Recsys%20Survey.pdf

http://grouplens.org/site-content/uploads/Item-Based-WWW-2001.pdf

Page 11: Recommender system

Collaborative filtering

โ€ข Variation of kernel

โ€ข ๐‘˜ ๐‘ข๐‘– , ๐‘ข๐‘–โ€ฒ = ๐‘๐‘œ๐‘  ๐‘ข๐‘– , ๐‘ข๐‘–โ€ฒ cosines similarity

โ€ข ๐‘˜ ๐‘ข๐‘– , ๐‘ข๐‘–โ€ฒ = ๐œŒ ๐‘ข๐‘– , ๐‘ข๐‘–โ€ฒ correlation similarity

โ€ข ๐‘ข๐‘–โ€ฒ = ๐‘ข๐‘– โˆ’

๐‘ฃ1๐‘ฃ2๐‘ฃ3โ‹ฎ

๐‘˜ ๐‘ข๐‘– , ๐‘ข๐‘–โ€ฒ = ๐‘๐‘œ๐‘  ๐‘ข๐‘–โ€ฒ, ๐‘ข๐‘–โ€ฒโ€ฒ

adjusted cosine similarity

Page 12: Recommender system

Collaborative filtering

โ€ข Variation of neighbor size

โ€ข ๐‘๐‘–๐‘— = ๐‘ (๐‘–,๐‘–โ€ฒ)๐‘Ÿ

๐‘–โ€ฒ๐‘—๐‘–โ€ฒโˆˆ๐‘ต

๐‘ (๐‘–,๐‘–โ€ฒ)๐‘–โ€ฒโˆˆ๐‘ต

โ€ข Normalizing, centering and linearly transforming the vectors.

Page 15: Recommender system

Collaborative filtering

๐‘Ÿ11 โ‹ฏ ๐‘Ÿ1๐‘šโ‹ฎ โ‹ฑ โ‹ฎ๐‘Ÿ๐‘›1 โ‹ฏ ๐‘Ÿ๐‘›๐‘š

โ‰ˆ๐‘ˆ1๐‘‡

โ‹ฎ๐‘ˆ๐‘›๐‘‡

๐‘‰1 โ€ฆ ๐‘‰๐‘š

๐‘›ร—๐‘š ๐‘›ร—๐‘˜ ๐‘˜ร—๐‘š

SVD can factorize any matrix โ€ฆ without null values! Null value is the reason we want to do matrix factorization in the first place. Quick to predict ratings โ€ฆ if we are able to factorize the matrix.

Page 16: Recommender system

Collaborative filtering

๐‘Ÿ11 โ‹ฏ ๐‘Ÿ1๐‘šโ‹ฎ โ‹ฑ โ‹ฎ๐‘Ÿ๐‘›1 โ‹ฏ ๐‘Ÿ๐‘›๐‘š

โ‰ˆ๐‘ˆ1๐‘‡

โ‹ฎ๐‘ˆ๐‘›๐‘‡๐‘‰1 โ€ฆ ๐‘‰๐‘š

๐‘›ร—๐‘š ๐‘›ร—๐‘˜ ๐‘˜ร—๐‘š

๐‘ƒ ๐‘Ÿ๐‘–๐‘— ๐‘ผ,๐‘ฝ = ๐‘(๐‘ˆ๐‘–๐‘‡๐‘‰๐‘— , ๐œŽ

2)

MLE/minimize:

1

2๐‘Ÿ๐‘–๐‘— โˆ’ ๐‘ˆ๐‘–

๐‘‡๐‘‰๐‘—

2

๐‘Ÿ๐‘–๐‘—โ‰ ๐‘›๐‘ข๐‘™๐‘™

http://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf

Page 17: Recommender system

Collaborative filtering

๐‘Ÿ11 โ‹ฏ ๐‘Ÿ1๐‘šโ‹ฎ โ‹ฑ โ‹ฎ๐‘Ÿ๐‘›1 โ‹ฏ ๐‘Ÿ๐‘›๐‘š

โ‰ˆ๐‘ˆ1๐‘‡

โ‹ฎ๐‘ˆ๐‘›๐‘‡๐‘‰1 โ€ฆ ๐‘‰๐‘š

๐‘›ร—๐‘š ๐‘›ร—๐‘˜ ๐‘˜ร—๐‘š

๐‘ƒ ๐‘Ÿ๐‘–๐‘— ๐‘ผ,๐‘ฝ = ๐‘(๐‘ˆ๐‘–๐‘‡๐‘‰๐‘— , ๐œŽ

2)

MLE/minimize:

๐ธ = 1

2๐‘Ÿ๐‘–๐‘— โˆ’ ๐‘ˆ๐‘–

๐‘‡๐‘‰๐‘—

2

+ฮป

2๐‘Ÿ๐‘–๐‘—โ‰ ๐‘›๐‘ข๐‘™๐‘™

๐‘ˆ๐‘2

๐‘

+ ๐‘‰๐‘ž2

๐‘ž

Page 18: Recommender system

Collaborative filtering

MLE/minimize:

๐ธ = 1

2๐‘Ÿ๐‘–๐‘— โˆ’ ๐‘ˆ๐‘–

๐‘‡๐‘‰๐‘—

2

+ฮป

2๐‘Ÿ๐‘–๐‘—โ‰ ๐‘›๐‘ข๐‘™๐‘™

๐‘ˆ๐‘2

๐‘

+ ๐‘‰๐‘ž2

๐‘ž

Derivative: ๐œ•๐ธ

๐œ•๐‘ˆ๐‘–= ๐‘Ÿ๐‘–๐‘— โˆ’ ๐‘ˆ๐‘–

๐‘‡๐‘‰๐‘— ๐‘‰๐‘—๐‘—,๐‘Ÿ๐‘–๐‘—โ‰ ๐‘›๐‘ข๐‘™๐‘™

+ ฮป๐‘ˆ๐‘–

๐œ•๐ธ

๐œ•๐‘‰๐‘—= ๐‘Ÿ๐‘–๐‘— โˆ’ ๐‘ˆ๐‘–

๐‘‡๐‘‰๐‘— ๐‘ˆ๐‘–๐‘–,๐‘Ÿ๐‘–๐‘—โ‰ ๐‘›๐‘ข๐‘™๐‘™

+ ฮป๐‘‰๐‘—

Advantage: Quick for predicting new ratings. More accurate with enough data? Disadvantage: Difficult to update the model.

Page 19: Recommender system

Collaborative filtering

MLE/minimize:

๐ธ = 1

2๐‘Ÿ๐‘–๐‘— โˆ’ ๐‘ˆ๐‘–

๐‘‡๐‘‰๐‘—

2

+ฮป

2๐‘Ÿ๐‘–๐‘—โ‰ ๐‘›๐‘ข๐‘™๐‘™

๐‘ˆ๐‘2

๐‘

+ ๐‘‰๐‘ž2

๐‘ž

Stochastic Gradient Descent: for ๐‘Ÿ๐‘–๐‘—:

๐‘ˆ๐‘–โ€ฒ = ๐›ผ ๐‘Ÿ๐‘–๐‘— โˆ’ ๐‘ˆ๐‘–

๐‘‡๐‘‰๐‘— ๐‘‰๐‘— โˆ’ ๐‘˜๐‘ˆ๐‘–

๐‘‰๐‘—โ€ฒ = ๐›ผ ๐‘Ÿ๐‘–๐‘— โˆ’ ๐‘ˆ๐‘–

๐‘‡๐‘‰๐‘— ๐‘ˆ๐‘– โˆ’ ๐‘˜๐‘‰๐‘—

๐‘ˆ๐‘–= ๐‘ˆ๐‘–โ€ฒ

๐‘‰๐‘—= ๐‘‰๐‘—โ€ฒ

http://sifter.org/~simon/journal/20061211.html

Advantage: Easy to update the model

Page 20: Recommender system

Collaborative filtering

http://arxiv.org/pdf/1205.3193.pdf

Page 21: Recommender system

Evaluation

โ€ข Basic accuracy โ€“ MAE

โ€“ RMSD

โ€ข Ranking accuracy โ€“ Pearson correlation over ranks

โ€“ Kendall tau test

โ€ข Decision support โ€“ Precision

โ€“ Recall

https://www.coursera.org/learn/recommender-systems/supplement/Jh5Kx/pdf-version-of-module-5-presentations