scaling recommendations at quora (recsys talk 9/16/2016)

31
Scaling Recommendations at Quora Nikhil Dandekar @nikhilbd 9/16/2016

Upload: nikhil-dandekar

Post on 17-Jan-2017

659 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Scaling Recommendations at Quora

Nikhil Dandekar @nikhilbd

9/16/2016

Page 2: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Quora’s Mission

“To share and grow the world’s knowledge”

● Millions of questions & answers

● Millions of users

● Over a million topics

● Growing exponentially...

Page 3: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Lots of high-quality textual information

Page 4: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Lots of data relations

Page 5: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Scaling the home page feed

● Scaling the Machine Learning environment

● Pragmatism: aka don’t chase every new, shiny object

Agenda

Page 6: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Scaling the Home Page Feed

Page 7: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Recommendations at Quora

● Home feed

● Digest emails

● Topics to follow

● Users to follow

● Related Questions

● Related Topics (topic → topic)

● Trending topics

● …..

Page 8: Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Page 9: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Home feed

● Goal: personalized, engaging experience for

reading/writing

● Show a ranked list of stories (questions/answers)

● ML model predicts an interestingness score for each

story

● Training data:

○ impression logs from the past

○ x: features about user/story/interactions

○ y: score based on actions (answer/follow,

upvote/click)

Page 10: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

What is interestingness?

click

upvote

downvote

expand

share

click

answer pass

downvote

follow

Page 11: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Performance and Cost

Millions of questions and answers

The best 20 questions and answers

Personalized Ranking

x millions of users

Scaling challenge:

● Content growing exponentially

○ Time spent per ranking request growing

exponentially

● Users growing exponentially

○ Number of ranking requests growing

exponentially

● Computational resources spent on ranking

growing quadratically with respect to user

growth

Page 12: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Solution: Multi-phase ranking!

● Use an unpersonalized model to reduce the

number of candidates for the personalized

model

● Cache the computed score in storage

Performance and Cost

Millions of questions and answers

The best 20 questions and answers

Ranking

x millions of users

Thousands of questions and answers

Unpersonalized (1p)

Personalized (2p)

Page 13: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Feed backend system

Aggregator 1 Aggregator 2 Aggregator 3

Leaf 1 Leaf 2 Leaf 3

Aggregator

Leaf

Requests from Web (python)

...

...

...

user_id

object_id

Page 14: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Scaling the Machine Learning Environment

Page 15: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

ML applications

● Feed / digest

● Search

● Answer ranking / Answer collapsing

● User-user, user-topic recommendations

● Related questions

● Duplicate questions

● Question-topics

● Question quality

● Spam users / content

● ….and a lot more

Machine Learning environment

ML Models

● Logistic Regression

● Gradient Boosted Decision Trees

● LambdaMART

● Random Forest

● Matrix Factorization

● Deep Neural Networks

● LDA

● k-means

● k-NNs

● ...and others

Page 16: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Productionizing ML training

○ Continuous retraining of models to

adapt to new data

○ Use Luigi to keep track of task

dependencies

Machine Learning environment

Page 17: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Productionizing ML training:

○ Continuous retraining of models to

adapt to new data

○ Use Luigi to keep track of task

dependencies

● Use Amazon EC2 spot instance for

training tasks

○ Usually much cheaper than

on-demand price

○ Can spawn multiple boxes at once and

shut them down after training is

complete

Machine Learning environment

Page 18: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Productionizing ML training:

○ Continuous retraining of models to

adapt to new data

○ Use Luigi to keep track of task

dependencies

● Use Amazon EC2 spot instance for training

tasks

● Extremely important to have automatic

monitoring of each task’s input/output

○ Data can change in unexpected ways

○ Don’t want bugs in upstream models

to affect downstream models

Machine Learning environment

Data populator

Training model 1

Training model 2 Training model 3

Page 19: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Productionizing ML training:

○ Continuous retraining of models to

adapt to new data

○ Use Luigi to keep track of task

dependencies

● Use Amazon EC2 spot instance for training

tasks

● Extremely important to have automatic

monitoring of each task’s input/output

○ Data can change in unexpected ways

○ Don’t want bugs in upstream models

to affect downstream models

Machine Learning environment

Data populator

Training model 1

Training model 2 Training model 3

Verify data

Verify metrics

Counts, class proportions,...

MSE, R2, AUC,...

Page 20: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Need a ML platform that is

○ Easy to ramp up on

○ Easy to iterate on

○ Fast

○ Reliable

○ Reusable

○ Production-ready

Machine Learning platform goals

Page 21: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Have a centralized ML platform that is shared across teams

○ Write training scripts in C++/Python and run them on remote boxes

○ Provide Python wrappers with iPython integration

○ Store data on Redshift/S3 and have training boxes communicate with them directly

Machine Learning platform

Dev laptop

Storage services (Redshift, S3…)

Training boxes

CPU/GPU

Page 22: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● In an IPython notebook

Lego ML platform

Page 23: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Lego ML platform

Page 24: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Single way to define and add ML features

● Features are reusable

○ Different ML applications do not define / calculate them separately

● Available both offline (training time) and online (prediction time)

● Single point for logging, monitoring, documentation etc.

Alchemy Feature Engineering Framework

Page 25: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Pragmatism

Page 26: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Relevance

● Speed: Fast prediction, (relatively) fast

training

● Fast development and iteration time

● Reliability / Robustness

● Cost

● Debuggability

● Low technical debt

What all matters for your ML algorithm:

Page 27: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Occam’s razor for Machine Learning

● Given two models that perform more or

less equally, you should always prefer

the less complex

● E.g. A Deep Learning model:

○ +1% in accuracy

○ 10x training time

○ 1.5x prediction time

○ Costly to store and maintain

● Look at all the factors, not just

relevance

Page 28: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Distributing ML training

● Distributed ML training helps you scale with data

● But most of what people do in practice can fit into a single, multi-core

machine

● Trade-offs:

○ Relevance gains

○ Training speed

○ Development and iteration time

○ Costs

● Use what works best given these factors, with an eye out for the future

Page 29: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

● Figure out how to scale up your data and your models

● But scaling is not just about data and the models

○ Think about your ML environment too

● Be Pragmatic

○ Don’t chase every new, shiny object

In summary

Page 31: Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Thanks!