ranking and diversity in recommendations - recsys stammtisch at soundcloud, berlin
DESCRIPTION
Slides from my talk at the RecSys Stammtisch at SoundCloud in Berlin. The presentation is split in two part one focusing on ranking and relevance and one on diversity and how to achieve it using genres. We introduce a novel diversity metric called Binomial Diversity.TRANSCRIPT
RANKING AND DIVERSITY IN RECOMMENDATIONS Alexandros Karatzoglou @alexk_z
Thanks
Linas Baltrunas Telefonica
Saul Vargas Universidad Autonoma de Madrid
Yue Shi TU Delft
Pablo Castells Universidad Autonoma de Madrid
Telefonica Research in Barcelona • User Modeling: Recommender Systems • Data Mining, Machine Learning • Multimedia Indexing and Analysis • HCI • Mobile Computing • Systems and Networking
• http://www.tid.es
Recommendations in Telefonica
People You May Know@Tuenti
Recommendations in Telefonica
Firefox OS Marketplace
Collaborative Filtering
+
+
+ +
+
+
+
+
+
fij = hUi,Mjifij = hUiMji+X
k2Fi
↵ik
|Fi|hUkMji
Tensor Factorization HOSVD CP-Decomposition
Fijk = S ⇥U Ui ⇥M Mj ⇥C CkFijk = hUi,Mj , CkiKaratzoglou et al. @ Recsys 2010
Publications in Ranking CIKM 2013: GAPfm: Optimal Top-N Recommendations for Graded Relevance Domains RecSys 2013: xCLiMF: Optimizing Expected Reciprocal Rank for Data with
Multiple Levels of Relevance RecSys 2012: CLiMF: Learning to Maximize Reciprocal Rank with Collaborative
Less-is-More Filtering * Best Paper Award SIGIR 2012: TFMAP: Optimizing MAP for Top-N Context-aware Recommendation Machine Learning Journal, 2008: Improving Maximum Margin Matrix Factorization
* Best Machine Learning Paper Award at ECML PKDD 2008 RecSys 2010: List-wise Learning to Rank with Matrix Factorization for Collaborative Filtering NIPS 2007: CoFiRank - Maximum Margin Matrix Factorization for Collaborative Ranking
Recommendations are ranked lists
Popular Ranking Methods • In order to generate the ranked item list, we need some
relative utility score for each item • Popularity is the obvious baseline • Score could depend on the user (personalized) • Score could also depend on the other items in the list (list-wise)
• One popular way to rank the items in RS is to sort the items according to the rating prediction • Works for the domains with ratings • Wastes the modeling power for the irrelevant items
Graphical Notation
Relevant
Irrelevant
Irrelevant
Irrelevant
Irrelevant
Irrelevant
Relevant
Relevant
Ranking using latent representation
• If user = [-100, -100] • 2d latent factor • We get the corresponding ranking
Matrix Factorization (for ranking) • Randomly initialize item vectors • Randomly initialize user vectors • While not converged
• Compute rating prediction error • Update user factors • Update item factors
• Lets say user is [-100, -100] • Compute the square error
• (5-<[-100, -100], [0.180, 0.19]>)2=1764 • Update the user and item to the direction
where the error is reduced (according to the gradient of the loss)
8 items with ratings and random factors
Learning: Stochastic Gradient Descent with Square Error Loss
Square Loss User: [3, 1], RMSE=6.7
Learning to Rank for Top-‐k RecSys • Usually we care about accurate ranking and not ra=ng predic=on • Square Error loss op=mizes to accurately predict 1s and 5s.
• RS should get the top items right -‐> Ranking problem • Why not to learn how to rank directly?
• Learning to Rank methods provide up to 30% performance improvements in off-‐line evalua=ons
• It is possible, but a more complex task
Example: average precision (AP)
AP =
|S|X
k=1
P (k)
|S|
• AP: we compute the precision at each relevant position and average them
P@1+P@2+P@43
=1/1+ 2 / 2+3 / 4
3= 0.92
Why is hard? Non Smoothness Example: AP
u:[-20,-20]
u:[20,20]
AP vs RMSE
The Non-smoothness of Average Precision
APm =1
PNi=1 ymi
NX
i=1
ymi
rmi
NX
j=1
ymjI(rmj rmi)
AP =
|S|X
k=1
P (k)
|S|
ymi
rmi
I(·)is 1 if item i is relevant for user m and 0 otherwise
indicator function (1 if it is true, 0 otherwise)
Rank of item i for user m
How can we get a smooth-AP? • We replace non smooth parts of MAP with smooth approxima=on
g(x) = 1/(1 + e
�x)
1
rmi⇡ g(fmi) = g(hUm, Vii)
How can we get a smooth-MAP? • We replace non smooth parts of MAP with smooth approxima=on
g(x) = 1/(1 + e
�x)
I(rmj rmi) ⇡ g(fmj � fmi) = g(hUm, Vj � Vii)
u:[-20,-20]
u:[20,20]
Smooth version of MAP
Sometimes approximation is not very good…
Ranking Inconsistencies • Achieving a perfect ranking for all users is not possible
• Two Sources of Inconsistencies:
• 1) Factor Models (all models) have limited expressive power and cannot learn the perfect ranking for all users
• 2) Ranking functions approximations are inconsistent e.g. A >B & B>C but C > A
Summary on Ranking 101
Area Under the ROC Curve (AUC)
AUC :=1
|S+||S�|
S+X
i
S�X
j
I(Ri < Rj)
Reciprocal Rank (RR)
RR :=1
Ri
Average Precision (AP)
AP =
|S|X
k=1
P (k)
|S|
AP vs RR
DCG =X
i
2score(i) � 1
log2(i+ 2)
Normalized Discounted Cumulative Gain (nDCG)
Relevance solved! Is that all?
• Ranking “solves” the relevance problem
• Can we be happy with the results?
Relevance solved! Is that all?
• Coverage
• Diversity
• Novelty
• Serendipity
Diversity in Recommendations
• Diversity using Genres
• Movies, Music, Books
• Diversity should fulfill:
• Coverage • Redundancy • Size Awareness
Diversity methods for RecSys
• Topic List Diversification or • Maximal Margin Relevance
fMMR(i;S) = (1� �) rel(i) + � minj2S
dist(i, j)
Diversity methods for RecSys
• Intent-Aware IR metrics
ERR� IA =X
s
p(s)ERR
Example
Action Comedy Sci-Fi Western
Action Thriller Sci-Fi Western
Adventure Western
Genres 154
639
870
46
187146
1258
60
30
232
75202
171
8
21
37116
9
9
11
10
86
43
42
4
1
0
Action (517)
Comedy (1267)
Drama (1711)
Romance (583)
Thriller (600)
Genres and Popularity
Binomial Diversity
• We base a new Diversity Metric on the Binomial Distribution
P (X = k) =
✓N
k
◆pk(1� p)N�k
User Genre Relevance
• Fraction of Items of genre “g” user interacted with
• Global fraction of items of genre “g”
• Mix
p00g =kIug
|Iu|
p0g =
Pu k
IugP
u |Iu|
pg = (1� ↵) p0g + ↵ p00g
Coverage
Coverage(R) =Y
g/2G(R)
P (Xg = 0)1/|G|
• Product of the probabilities of the genres not represented in the list not being picked by random
Non-Redundancy
P (Xg � k | Xg > 0) = 1�k�1X
l=1
P (Xg = l | Xg > 0)
NonRed(R) =Y
g2G(R)
P (Xg � k
Rg | Xg > 0)1/|G(R)|
Non-Redundancy
Binomial Diversity
BinomDiv(R) = Coverage(R) ·NonRed(R)
Re-Ranking • Re-Rank based on Relevance and Binomial Diversity
f
BinomDiv
(i;S) = (1� �) normrel
(rel(i)) + � norm
div
(div(i;S))
normX(x) =x� µX
�X
Example
Thanks! • Questions?