scaling recommendations at quora (recsys talk 9/16/2016)
TRANSCRIPT
![Page 1: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/1.jpg)
Scaling Recommendations at Quora
Nikhil Dandekar @nikhilbd
9/16/2016
![Page 2: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/2.jpg)
Quora’s Mission
“To share and grow the world’s knowledge”
● Millions of questions & answers
● Millions of users
● Over a million topics
● Growing exponentially...
![Page 3: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/3.jpg)
Lots of high-quality textual information
![Page 4: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/4.jpg)
Lots of data relations
![Page 5: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/5.jpg)
● Scaling the home page feed
● Scaling the Machine Learning environment
● Pragmatism: aka don’t chase every new, shiny object
Agenda
![Page 6: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/6.jpg)
Scaling the Home Page Feed
![Page 7: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/7.jpg)
Recommendations at Quora
● Home feed
● Digest emails
● Topics to follow
● Users to follow
● Related Questions
● Related Topics (topic → topic)
● Trending topics
● …..
![Page 8: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/8.jpg)
![Page 9: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/9.jpg)
Home feed
● Goal: personalized, engaging experience for
reading/writing
● Show a ranked list of stories (questions/answers)
● ML model predicts an interestingness score for each
story
● Training data:
○ impression logs from the past
○ x: features about user/story/interactions
○ y: score based on actions (answer/follow,
upvote/click)
![Page 10: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/10.jpg)
What is interestingness?
click
upvote
downvote
expand
share
click
answer pass
downvote
follow
![Page 11: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/11.jpg)
Performance and Cost
Millions of questions and answers
The best 20 questions and answers
Personalized Ranking
x millions of users
Scaling challenge:
● Content growing exponentially
○ Time spent per ranking request growing
exponentially
● Users growing exponentially
○ Number of ranking requests growing
exponentially
● Computational resources spent on ranking
growing quadratically with respect to user
growth
![Page 12: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/12.jpg)
● Solution: Multi-phase ranking!
● Use an unpersonalized model to reduce the
number of candidates for the personalized
model
● Cache the computed score in storage
Performance and Cost
Millions of questions and answers
The best 20 questions and answers
Ranking
x millions of users
Thousands of questions and answers
Unpersonalized (1p)
Personalized (2p)
![Page 13: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/13.jpg)
Feed backend system
Aggregator 1 Aggregator 2 Aggregator 3
Leaf 1 Leaf 2 Leaf 3
Aggregator
Leaf
Requests from Web (python)
...
...
...
user_id
object_id
![Page 14: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/14.jpg)
Scaling the Machine Learning Environment
![Page 15: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/15.jpg)
ML applications
● Feed / digest
● Search
● Answer ranking / Answer collapsing
● User-user, user-topic recommendations
● Related questions
● Duplicate questions
● Question-topics
● Question quality
● Spam users / content
● ….and a lot more
Machine Learning environment
ML Models
● Logistic Regression
● Gradient Boosted Decision Trees
● LambdaMART
● Random Forest
● Matrix Factorization
● Deep Neural Networks
● LDA
● k-means
● k-NNs
● ...and others
![Page 16: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/16.jpg)
● Productionizing ML training
○ Continuous retraining of models to
adapt to new data
○ Use Luigi to keep track of task
dependencies
Machine Learning environment
![Page 17: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/17.jpg)
● Productionizing ML training:
○ Continuous retraining of models to
adapt to new data
○ Use Luigi to keep track of task
dependencies
● Use Amazon EC2 spot instance for
training tasks
○ Usually much cheaper than
on-demand price
○ Can spawn multiple boxes at once and
shut them down after training is
complete
Machine Learning environment
![Page 18: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/18.jpg)
● Productionizing ML training:
○ Continuous retraining of models to
adapt to new data
○ Use Luigi to keep track of task
dependencies
● Use Amazon EC2 spot instance for training
tasks
● Extremely important to have automatic
monitoring of each task’s input/output
○ Data can change in unexpected ways
○ Don’t want bugs in upstream models
to affect downstream models
Machine Learning environment
Data populator
Training model 1
Training model 2 Training model 3
![Page 19: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/19.jpg)
● Productionizing ML training:
○ Continuous retraining of models to
adapt to new data
○ Use Luigi to keep track of task
dependencies
● Use Amazon EC2 spot instance for training
tasks
● Extremely important to have automatic
monitoring of each task’s input/output
○ Data can change in unexpected ways
○ Don’t want bugs in upstream models
to affect downstream models
Machine Learning environment
Data populator
Training model 1
Training model 2 Training model 3
Verify data
Verify metrics
Counts, class proportions,...
MSE, R2, AUC,...
![Page 20: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/20.jpg)
● Need a ML platform that is
○ Easy to ramp up on
○ Easy to iterate on
○ Fast
○ Reliable
○ Reusable
○ Production-ready
Machine Learning platform goals
![Page 21: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/21.jpg)
● Have a centralized ML platform that is shared across teams
○ Write training scripts in C++/Python and run them on remote boxes
○ Provide Python wrappers with iPython integration
○ Store data on Redshift/S3 and have training boxes communicate with them directly
Machine Learning platform
Dev laptop
Storage services (Redshift, S3…)
Training boxes
CPU/GPU
![Page 22: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/22.jpg)
● In an IPython notebook
Lego ML platform
![Page 23: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/23.jpg)
Lego ML platform
![Page 24: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/24.jpg)
● Single way to define and add ML features
● Features are reusable
○ Different ML applications do not define / calculate them separately
● Available both offline (training time) and online (prediction time)
● Single point for logging, monitoring, documentation etc.
Alchemy Feature Engineering Framework
![Page 25: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/25.jpg)
Pragmatism
![Page 26: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/26.jpg)
● Relevance
● Speed: Fast prediction, (relatively) fast
training
● Fast development and iteration time
● Reliability / Robustness
● Cost
● Debuggability
● Low technical debt
What all matters for your ML algorithm:
![Page 27: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/27.jpg)
Occam’s razor for Machine Learning
● Given two models that perform more or
less equally, you should always prefer
the less complex
● E.g. A Deep Learning model:
○ +1% in accuracy
○ 10x training time
○ 1.5x prediction time
○ Costly to store and maintain
● Look at all the factors, not just
relevance
![Page 28: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/28.jpg)
Distributing ML training
● Distributed ML training helps you scale with data
● But most of what people do in practice can fit into a single, multi-core
machine
● Trade-offs:
○ Relevance gains
○ Training speed
○ Development and iteration time
○ Costs
● Use what works best given these factors, with an eye out for the future
![Page 29: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/29.jpg)
● Figure out how to scale up your data and your models
● But scaling is not just about data and the models
○ Think about your ML environment too
● Be Pragmatic
○ Don’t chase every new, shiny object
In summary
![Page 30: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/30.jpg)
● https://www.quora.com/careers
● Technical Lead - Machine Learning
● Software Engineer - Machine Learning
● Software Engineer - NLP
● Engineering Manager - Machine Learning
We are hiring!
![Page 31: Scaling Recommendations at Quora (RecSys talk 9/16/2016)](https://reader031.vdocuments.net/reader031/viewer/2022021506/587e14ab1a28abbc2e8b4f7f/html5/thumbnails/31.jpg)
Thanks!