big, practical recommendations with alternating least squares

17
Big, Practical Recommendations with Alternating Least Squares Sean Owen • Apache Mahout / Myrrix.com

Upload: sean-owen

Post on 27-Jan-2015

151 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Big, Practical Recommendations with Alternating Least Squares

Big, Practical Recommendations with Alternating Least Squares

Sean Owen • Apache Mahout / Myrrix.com

Page 2: Big, Practical Recommendations with Alternating Least Squares

WHERE’S BIG LEARNING?

Next: Application Layer Analytics Machine Learning

Like Apache Mahout Common Big Data app

today Clustering, recommenders,

classifiers on Hadoop Free, open source; not

mature

Where’s commercialized Big Learning?

Storage

Database

Processing

Applications

Page 3: Big, Practical Recommendations with Alternating Least Squares

A RECOMMENDER SHOULD …

Answer in Real-time Ingest new data, now Modify recommendations

based on newest data No “cold start” for new data

Scale Horizontally For queries per second For size of data set

Accept Diverse Input Not just people and

products Not just explicit ratings Clicks, views, buys Side information

Be “Pretty Accurate”

Page 4: Big, Practical Recommendations with Alternating Least Squares

NEED: 2-TIER ARCHITECTURE

Real-time Serving Layer Quick results based on

precomputed model Incremental update Partitionable for scale

Batch Computation Layer Builds model Scales out (on Hadoop?) Asynchronous, occasional,

long-lived runs

Page 5: Big, Practical Recommendations with Alternating Least Squares

A PRACTICAL ALGORITHM

MATRIX FACTORIZATION BENEFITS

Factor user-item matrix to user-feature + feature-item matrix

Well understood in ML, as: Principal Component

Analysis Latent Semantic Indexing

Several algorithms, like: Singular Value

Decomposition Alternating Least Squares

Models intuition Factorization is batch

parallelizable Reconstruction (recs) in

low-dimension is fast Allows projection of new

data Cold start solution Approximate update solution

Page 6: Big, Practical Recommendations with Alternating Least Squares

A PRACTICAL IMPLEMENTATION

ALTERNATING LEAST SQUARES BENEFITS

Simple factorization P ≈ X YT

Approximate: X, Y are “skinny” (low-rank)

Faster than the SVD Trivially parallel, iterative

Dumber than the SVD No singular values,

orthonormal basis

Parallelizable by row -- very Hadoop-friendly

Iterative: OK answer fast,

refine as long as desired Yields to “binary” input

model Ratings as regularization

instead Sparseness / 0s no longer a

problem

Page 7: Big, Practical Recommendations with Alternating Least Squares

ALS ALGORITHM 1

Input: (user, item, strength) tuples Anything you can quantify is

input Strength is positive

Many tuples per user-item R is sparse user-item

interaction matrix rij = total strength of

interaction between user i and item j

1 4 3

3

4 3 2

5 2 3

5

2 4 R

Page 8: Big, Practical Recommendations with Alternating Least Squares

ALS ALGORITHM 2

Follow “Collaborative Filtering for Implicit Feedback Datasets”www2.research.att.com/~yifanhu/PUB/cf.pdf

Construct “binary” matrix P 1 where R > 0 0 where R = 0

Factor P, not R R returns in regularization

Still sparse; implicit 0s fine

1 1 1 0 0

0 0 1 0 0

0 1 0 1 1

1 0 1 0 1

0 0 0 1 0

1 1 0 0 0 P

Page 9: Big, Practical Recommendations with Alternating Least Squares

ALS ALGORITHM 3

P is m x n Choose k << m, n Factor P as Q = X YT, Q ≈

P

X is m x k ; YT is k x n

Find best approximation Q Minimize L2 norm of diff: || P-

Q ||2

Minimal squared error: “Least Squares”

Recommendations are largest values in Q

YT

X

Page 10: Big, Practical Recommendations with Alternating Least Squares

ALS ALGORITHM 4

Optimizing X, Y simultaneously is non-convex, hard

If X or Y are fixed, system of linear equations: convex, easy

Initialize Y with random values

Solve for X Fix X, solve for Y Repeat (“Alternating”)

YT

X

Page 11: Big, Practical Recommendations with Alternating Least Squares

ALS ALGORITHM 5

Define regularization weights cui = 1 + α rui

Minimize:

Σ cui(pui – xuTyi)2 + λ(Σ||xu||2 + Σ||yi||2)

Simple least-squares regression objective, plus Weighted least-squared error terms by strength,

a penalty for not reconstructing 1 at “strong” association is higher Standard L2 regularization term

Page 12: Big, Practical Recommendations with Alternating Least Squares

ALS ALGORITHM 6

With fixed Y, compute optimal X Each row xu is independent

Define Cu as diagonal matrix of cu (user strength weights)

xu = (YTCuY + λI)-1 YTCupu

Compare to simple least-squares regression solution (YTY)-1 YTpu

Adds Tikhonov / ridge regression regularization term λI

Attaches cu weights to YT

See paper for how YTCuY is computed efficiently;skipping the engineering!

Page 13: Big, Practical Recommendations with Alternating Least Squares

1 1 1 0 0

0 0 1 0 0

0 1 0 1 1

1 0 1 0 1

0 0 0 1 0

1 1 0 0 0

EXAMPLE FACTORIZATION

k = 3, λ = 2, α = 40, 10 iterations

0.96 0.99 0.99 0.38 0.93

0.44 0.39 0.98 -0.11

0.39

0.70 0.99 0.42 0.98 0.98

1.00 1.04 0.99 0.44 0.98

0.11 0.51 -0.13

1.00 0.57

0.97 1.00 0.68 0.47 0.91

Q = X•YT

Page 14: Big, Practical Recommendations with Alternating Least Squares

FOLD-IN

Need immediate, if approximate, updates for new data

New user u needs new row Qu = Xu YT

We have Pu ≈ Qu

Compute Xu via right inverse:X YT(YT)-1 = Q(YT)-1 so:X = Q(YT)-1

What is (YT)-1?

Note (YTY)(YTY)-1 = I Gives YT’s right inverse:

YT (Y(YTY)-1) = I Xu = Qu Y(YTY)-1

Xu ≈ Pu Y(YTY)-1

Recommend as usual: Qu = XuYT

For existing user, instead add to existing row Xu

Page 15: Big, Practical Recommendations with Alternating Least Squares

THIS IS MYRRIX

Soft-launched Serving Layer available

as open source download Computation Layer available

as beta Ready on Amazon EC2 / EMR Full launch Q4 2012 myrrix.com

[email protected]

Page 16: Big, Practical Recommendations with Alternating Least Squares

APPENDIX

Page 17: Big, Practical Recommendations with Alternating Least Squares

EXAMPLES

STACKOVERFLOW TAGS WIKIPEDIA LINKS

Recommend tags to questions

Tag questions automatically, improve tag coverage

3.5M questions x 30K tags

4.3 hours x 5 machines on Amazon EMR

$3.03 ≈ $0.08 per 100,000 recs

Recommend new linked articles from existing links

Propose missing, related links

2.5M articles x 1.8M articles

28 hours x 2 PCs on Apache Hadoop 1.0.3