scaling recommendations at quora (recsys talk 9/16/2016)

Scaling Recommendations at Quora

Nikhil Dandekar @nikhilbd

9/16/2016

Quora’s Mission

“To share and grow the world’s knowledge”

● Millions of questions & answers

● Millions of users

● Over a million topics

● Growing exponentially...

Lots of high-quality textual information

Lots of data relations

● Scaling the home page feed

● Scaling the Machine Learning environment

● Pragmatism: aka don’t chase every new, shiny object

Agenda

Scaling the Home Page Feed

Recommendations at Quora

● Home feed

● Digest emails

● Topics to follow

● Users to follow

● Related Questions

● Related Topics (topic → topic)

● Trending topics

● …..

Home feed

● Goal: personalized, engaging experience for

reading/writing

● Show a ranked list of stories (questions/answers)

● ML model predicts an interestingness score for each

story

● Training data:

○ impression logs from the past

○ x: features about user/story/interactions

○ y: score based on actions (answer/follow,

upvote/click)

What is interestingness?

click

upvote

downvote

expand

share

click

answer pass

downvote

follow

Performance and Cost

Millions of questions and answers

The best 20 questions and answers

Personalized Ranking

x millions of users

Scaling challenge:

● Content growing exponentially

○ Time spent per ranking request growing

exponentially

● Users growing exponentially

○ Number of ranking requests growing

exponentially

● Computational resources spent on ranking

growing quadratically with respect to user

growth

● Solution: Multi-phase ranking!

● Use an unpersonalized model to reduce the

number of candidates for the personalized

model

● Cache the computed score in storage

Performance and Cost

Millions of questions and answers

The best 20 questions and answers

Ranking

x millions of users

Thousands of questions and answers

Unpersonalized (1p)

Personalized (2p)

Feed backend system

Aggregator 1 Aggregator 2 Aggregator 3

Leaf 1 Leaf 2 Leaf 3

Aggregator

Leaf

Requests from Web (python)

...

...

...

user_id

object_id

Scaling the Machine Learning Environment

ML applications

● Feed / digest

● Search

● Answer ranking / Answer collapsing

● User-user, user-topic recommendations

● Related questions

● Duplicate questions

● Question-topics

● Question quality

● Spam users / content

● ….and a lot more

Machine Learning environment

ML Models

● Logistic Regression

● Gradient Boosted Decision Trees

● LambdaMART

● Random Forest

● Matrix Factorization

● Deep Neural Networks

● LDA

● k-means

● k-NNs

● ...and others

● Productionizing ML training

○ Continuous retraining of models to

adapt to new data

○ Use Luigi to keep track of task

dependencies


https://github.com/spotify/luigi

● Productionizing ML training:


adapt to new data


dependencies

● Use Amazon EC2 spot instance for

training tasks

○ Usually much cheaper than

on-demand price

○ Can spawn multiple boxes at once and

shut them down after training is

complete




adapt to new data


dependencies

● Use Amazon EC2 spot instance for training

tasks

● Extremely important to have automatic

monitoring of each task’s input/output

○ Data can change in unexpected ways

○ Don’t want bugs in upstream models

to affect downstream models


Data populator

Training model 1

Training model 2 Training model 3



adapt to new data


dependencies

● Use Amazon EC2 spot instance for training

tasks

● Extremely important to have automatic

monitoring of each task’s input/output

○ Data can change in unexpected ways

○ Don’t want bugs in upstream models

to affect downstream models


Data populator

Training model 1

Training model 2 Training model 3

Verify data

Verify metrics

Counts, class proportions,...

MSE, R2, AUC,...

● Need a ML platform that is

○ Easy to ramp up on

○ Easy to iterate on

○ Fast

○ Reliable

○ Reusable

○ Production-ready

Machine Learning platform goals

● Have a centralized ML platform that is shared across teams

○ Write training scripts in C++/Python and run them on remote boxes

○ Provide Python wrappers with iPython integration

○ Store data on Redshift/S3 and have training boxes communicate with them directly

Machine Learning platform

Dev laptop

Storage services (Redshift, S3…)

Training boxes

CPU/GPU

● In an IPython notebook

Lego ML platform

Lego ML platform

● Single way to define and add ML features

● Features are reusable

○ Different ML applications do not define / calculate them separately

● Available both offline (training time) and online (prediction time)

● Single point for logging, monitoring, documentation etc.

Alchemy Feature Engineering Framework

Pragmatism

● Relevance

● Speed: Fast prediction, (relatively) fast

training

● Fast development and iteration time

● Reliability / Robustness

● Cost

● Debuggability

● Low technical debt

What all matters for your ML algorithm:

Occam’s razor for Machine Learning

● Given two models that perform more or

less equally, you should always prefer

the less complex

● E.g. A Deep Learning model:

○ +1% in accuracy

○ 10x training time

○ 1.5x prediction time

○ Costly to store and maintain

● Look at all the factors, not just

relevance

Distributing ML training

● Distributed ML training helps you scale with data

● But most of what people do in practice can fit into a single, multi-core

machine

● Trade-offs:

○ Relevance gains

○ Training speed

○ Development and iteration time

○ Costs

● Use what works best given these factors, with an eye out for the future

● Figure out how to scale up your data and your models

● But scaling is not just about data and the models

○ Think about your ML environment too

● Be Pragmatic

○ Don’t chase every new, shiny object

In summary

● https://www.quora.com/careers

● Technical Lead - Machine Learning

● Software Engineer - Machine Learning

● Software Engineer - NLP

● Engineering Manager - Machine Learning

We are hiring!

https://www.quora.com/careers/technical_lead_machine_learning

https://www.quora.com/careers/technical_lead_machine_learning

https://www.quora.com/careers/software_engineer_machine_learning

https://www.quora.com/careers/software_engineer_machine_learning

https://www.quora.com/careers/software_engineer_nlp

https://www.quora.com/careers/software_engineer_nlp

https://www.quora.com/careers/engineering_manager_machine_learning

https://www.quora.com/careers/engineering_manager_machine_learning

Thanks!