matrix factorizations for recommender systems

37
Matrix Factorizations for Recommender Systems Dmitriy Selivanov [email protected] 2017-11-16

Upload: dmitriy-selivanov

Post on 21-Jan-2018

728 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Matrix Factorizations for Recommender Systems

Matrix Factorizations for Recommender Systems

Dmitriy [email protected]

2017-11-16

Page 2: Matrix Factorizations for Recommender Systems

Recommender systems are everywhere

Figure 1:

Page 3: Matrix Factorizations for Recommender Systems

Recommender systems are everywhere

Figure 2:

Page 4: Matrix Factorizations for Recommender Systems

Recommender systems are everywhere

Figure 3:

Page 5: Matrix Factorizations for Recommender Systems

Recommender systems are everywhere

Figure 4:

Page 6: Matrix Factorizations for Recommender Systems

Goals

Propose “relevant” items to customers

I RetentionI Exploration

I Up-saleI Personalized offers

I recommended items for a customer given history of activities (transactions, browsinghistory, favourites)

I Similar itemsI substitutionsI bundles - frequently bought togetherI . . .

Page 7: Matrix Factorizations for Recommender Systems

Live demoDataset - LastFM-360K:

I 360k usersI 160k artistsI 17M observationsI sparsity - 0.9999999

Figure 5:

Page 8: Matrix Factorizations for Recommender Systems

Explicit feedbackRatings, likes/dislikes, purchases:

I cleaner dataI smallerI hard to collect

RMSE 2 = 1D

∑u,i∈D

(rui − r̂ui )2

Page 9: Matrix Factorizations for Recommender Systems

Netflix prize

I ~ 480k users, 18k movies, 100m ratingsI sparsity ~ 90%I goal is to reduce RMSE by 10% - from 0.9514 to 0.8563

Page 10: Matrix Factorizations for Recommender Systems

Implicit feedback

I noisy feedback (click, likes, purchases, search, . . . )I much easier to collectI wider user/item coverage

I usually sparsity > 99.9%

One-Class Collaborative Filtering

I observed entries are positive preferencesI should have high confidence

I missed entries in matrix are mix of negative preferences and positive preferencesI consider them as negative with low confidenceI we cannot really distinguish that user did not click a banner because of a lack of

interest or lack of awareness

Page 11: Matrix Factorizations for Recommender Systems

Evaluation

Recap: we only care about how to produce small set of highly relevant items.RMSE is bad metrics - very weak connection to business goals.

Only interested about relevance precision of retreived items:

I space on the screen is limitedI only order matters - most relevant items should be in top

Page 12: Matrix Factorizations for Recommender Systems

Ranking - Mean average precision

AveragePrecision =∑n

k=1(P(k)×rel(k))number of relevant documents

## index relevant precision_at_k## 1: 1 0 0.0000000## 2: 2 0 0.0000000## 3: 3 1 0.3333333## 4: 4 0 0.2500000## 5: 5 0 0.2000000

map@5 = 0.1566667

Page 13: Matrix Factorizations for Recommender Systems

Ranking - Normalized Discounted Cumulative Gain

Intuition is the same as for MAP@K, but also takes into account value of relevance:

DCGp =p∑

i=1

2reli − 1log2(i + 1)

nDCGp = DCGpIDCGp

IDCGp =|REL|∑i=1

2reli − 1log2(i + 1)

Page 14: Matrix Factorizations for Recommender Systems

Approaches

I Content basedI good for cold startI not personalized

I Collaborative filteringI vanilla collaborative fitleringI matrix factorizationsI . . .

I Hybrid and context aware recommender systemsI best of two worlds

Page 15: Matrix Factorizations for Recommender Systems

Focus today

I WRMF (Weighted Regularized Matrix Factorization) - Collaborative Filtering forImplicit Feedback Datasets (2008)

I efficient learning with accelerated approximate Alternating Least SquaresI inference time

I Linear-FLow - Practical Linear Models for Large-Scale One-Class CollaborativeFiltering (2016)

I efficient truncated SVDI cheap cross-validation with full path regularization

Page 16: Matrix Factorizations for Recommender Systems

Matrix FactorizationsI Users can be described by small number of latent factors pukI Items can be described by small number of latent factors qki

Figure 6:

Page 17: Matrix Factorizations for Recommender Systems

Sparse data

items

user

s

Page 18: Matrix Factorizations for Recommender Systems

Low rank matrix factorization

R = P × Q

factors

user

s

itemsfact

ors

Page 19: Matrix Factorizations for Recommender Systems

Reconstruction

items

user

s

items

user

s

Page 20: Matrix Factorizations for Recommender Systems

Truncated SVD

Take k largest singular values:X ≈ UkDkV T

k

- Xk ∈ Rm∗n - Uk ,V - columns are orthonormal bases (dot product of any 2 columns iszero, unit norm) - Dk - matrix with singular values on diagonal

Truncated SVD is the best rank k approximation of the matrix X in terms ofFrobenius norm:

||X − UkDkV Tk ||F

P = Uk√

Dk

Q =√

DkV Tk

Page 21: Matrix Factorizations for Recommender Systems

Issue with truncated SVD for “explicit” feedback

I Optimal in terms of Frobenius norm - takes into account zeros in ratings -

RMSE =√√√√ 1

users × items∑

u∈users,i∈items(rui − r̂ui )2

I Overfits data

Objective = error only in “observed” ratings:

RMSE =√√√√ 1

Observed∑

u,i∈Observed(rui − r̂ui )2

Page 22: Matrix Factorizations for Recommender Systems

SVD-like matrix factorization with ALS

J =∑

u,i∈Observed(rui − pu × qi )2 + λ(||Q2||+ ||P2||)

Given Q fixed solve for p:

min∑

i∈Observed(ri − qi × P)2 + λ

u∑j=1

p2j

Given P fixed solve for q:

min∑

u∈Observed(ru − pu × Q)2 + λ

i∑j=1

q2j

Ridge regression: P = (QT Q + λI)−1QT r , Q = (PT P + λI)−1PT r

Page 23: Matrix Factorizations for Recommender Systems

“Collaborative Filtering for Implicit Feedback Datasets”WRMF - Weighted Regularized Matrix Factorization

I “Default” approachI Proposed in 2008, but still widely used in industry (even at youtube)I several high-quality open-source implementations

J =∑u,i

Cui (Pui − XuYi )2 + λ(||X ||F + ||Y ||F )

I Preferences - binary

Pij ={1 if Rij > 00 otherwise

I Confidence - Cui = 1 + f (Rui )

Page 24: Matrix Factorizations for Recommender Systems

Alternating Least Squares for implicit feedback

For fixed Y :dL/dxu = −2

∑i=item

cui (pui − xTu yi )yi + 2λxu =

−2∑

i=itemcui (pui − yT

i xu)yi + 2λxu =

−2Y T Cup(u) + 2Y T CuYxu + 2λxu

I Setting dL/dxu = 0 for optimal solution gives us (Y T CuY + λI)xu = Y T Cup(u)I xu can be obtained by solving system of linear equations:

xu = solve(Y T CuY + λI,Y T Cup(u))

Page 25: Matrix Factorizations for Recommender Systems

Alternating Least Squares for implicit feedback

Similarly for fixed X :

I dL/dyi = −2XT C ip(i) + 2XT C iYyi + 2λyiI yi = solve(XT C iX + λI,XT C ip(i))

Another optimization:

I XT C iX = XT X + XT (C i − I)XI Y T CuY = Y T Y + Y T (Cu − I)Y

XT X and Y T Y can be precomputed

Page 26: Matrix Factorizations for Recommender Systems

Accelerated Approximate Alternating Least Squaresyi = solve(XT C iX + λI,XT C ip(i))

Iterative methods

I Conjugate GradientI Coordinate Descend

Fixed number of steps of (usually 3-4 is enough):

Page 27: Matrix Factorizations for Recommender Systems

Inference time

How to make recommendations for new users?There are no user embeddings since users are not in original matrix!

Page 28: Matrix Factorizations for Recommender Systems

Inference time

Make one step on ALS with fixed item embeddings matrix => get new user embeddings:

I given Y fixed, Cnew - new user-item interactions confidenceI xunew = solve(Y T Cunew Y + λI,Y T Cunew p(unew ))I scores = Xnew Y T

Page 29: Matrix Factorizations for Recommender Systems

WRMF Implementations

I python implicit - implemets Conjugate Gradient. With GPU support recently!I R reco - implemets Conjugate GradientI Spark ALSI Quora qmfI Google tensorflow

*titles are clickable

Page 30: Matrix Factorizations for Recommender Systems

Linear-Flow

Idea is to learn item-item similarity matrix W from the data.

First

min J = ||X − XWk ||F + λ||Wk ||FWith constraint:

rank(W ) ≤ k

Page 31: Matrix Factorizations for Recommender Systems

Linear-Flow observations

1. Whithout L2 regularization optimal solution is Wk = QkQTk where

SVDk(X ) = PkΣkQTk

2. Whithout rank(W ) ≤ k optimal solution is just solution for ridge regression:W = (XT X + λI)−1XT X - infeasible.

Page 32: Matrix Factorizations for Recommender Systems

Linear-Flow reparametrization

SVDk(X ) = PkΣkQTk

Let W = QkY :

argmin(Y ) : ||X − XQkY ||F + λ||QkY ||F

Motivation

λ = 0 => W = QkQTk and also soliton for current problem Y = QT

k

Page 33: Matrix Factorizations for Recommender Systems

Linear-Flow closed-form solution

I Notice that if Qk orthogogal then ||QkY ||F = ||Y ||FI Solve ||X − XQkY ||F + λ||Y ||FI Simple ridge regression with close form solution

Y = (QTk XT XQk + λI)−1QT

k XT X

Very cheap inversion of the matrix of rank k!

Page 34: Matrix Factorizations for Recommender Systems

Linear-Flow hassle-free cross-validation

Y = (QTk XT XQk + λI)−1QT

k XT X

How to find lamda with cross-validation?

I pre-compute Z = QTk XT X so Y = (ZQk + λI)−1Z -

I pre-compute ZQkI notice that value of lambda affects only diagonal of ZQkI generate sequence of lambda (say of length 50) based on min/max diagonal valuesI solving 50 rigde regression of a small rank is super-fast

Page 35: Matrix Factorizations for Recommender Systems

Linear-Flow hassle-free cross-validation

Figure 7:

Page 36: Matrix Factorizations for Recommender Systems

Suggestions

I start simple - SVD, WRMFI design proper cross-validation - both objective and data splitI think about how to incorporate business logic (for example how to exclude

something)I use single machine implementationsI think about inference timeI don’t waste time with libraries/articles/blogposts wich demonstrate MF with dense

matrices

Page 37: Matrix Factorizations for Recommender Systems

Questions?

I http://dsnotes.com/tags/recommender-systems/I https://github.com/dselivanov/reco

Contacts:

I [email protected] https://github.com/dselivanovI https://www.linkedin.com/in/dselivanov1