lei yang, senior engineering manager, quora at mlconf nyc - 4/15/16
TRANSCRIPT
![Page 1: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/1.jpg)
Sharing and growing the world's knowledge with machine learning
Lei Yang ([email protected])
April 2016
![Page 2: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/2.jpg)
Our mission
“To share and grow the world’s
knowledge”
● Millions of questions & answers
● Millions of users
● Thousands of topics
● ...
![Page 3: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/3.jpg)
Demand
What we care about
Quality
Relevance
![Page 4: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/4.jpg)
Data@Quora
![Page 5: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/5.jpg)
Topic Question
User
Answer
Actions
![Page 6: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/6.jpg)
Lots of data relations
![Page 7: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/7.jpg)
Complex network propagation effects
![Page 8: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/8.jpg)
Importance of topics & semantics
![Page 9: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/9.jpg)
Machine Learning@Quora
![Page 10: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/10.jpg)
Ranking - Answer ranking
What is a good Quora answer?
● Truthful
● Reusable
● Provides explanation
● well formatted
...
![Page 11: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/11.jpg)
Ranking - Answer ranking
How are those criteria translated
into features?
● Features that relate to the text quality
itself
● Interaction features (upvotes/downvotes,
clicks, comments…)
● User features (e.g. expertise in topic)
![Page 12: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/12.jpg)
Ranking - Feed
Present most interesting stories for a user at
a given time
● Interesting = topical relevance +
social relevance + timeliness
● Stories = questions + answers
● Personalized learning-to-rank approach
● Relevance-ordered vs time-ordered = big
gains in engagement
● Challenges
○ Potentially many candidate stories
○ Real-time ranking
○ Objective function
![Page 13: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/13.jpg)
Ranking - Feed
● Personalized LTR model
● Features
○ Quality of question/answer
○ Topics the user is interested in
or knows about
○ Users the user is following
○ What is trending/popular
○ ...
● Different temporal windows
● Multi-stage solution with different
“streams”
![Page 14: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/14.jpg)
Recommendations - Topics
Recommend new topics for the user
to follow, based on
● Topics you already follow
● Users you already follow
● Interactions with questions/answers
● Topic-related features
● ...
![Page 15: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/15.jpg)
Recommendations - Users
Recommend new users for the user
to follow, based on:
● Users you already follow
● Topics you already follow
● Interactions with users
● User-related features
● ...
![Page 16: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/16.jpg)
Related questions
Given interest in a question, what other questions
are interesting?
● Not only about similarity, but also “interestingness”
● Features such as:
○ Textual
○ Co-visit
○ Topics
○ …
● Important for logged-out use case
![Page 17: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/17.jpg)
Duplicate questions
● Important issue for Quora
○ Want to make sure we don’t disperse
knowledge to the same question
● Binary classifier trained with labelled data
● Features
○ Textual vector space models
○ Usage-based features
○ ...
![Page 18: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/18.jpg)
User expertise inference
Infer user’s trustworthiness in relation
to a given topic
● We take into account:
○ Answers written on topic
○ Upvotes/downvotes received
○ Endorsements
○ ...
● Trust/expertise propagates through the network
● Useful as input/features in other models
![Page 19: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/19.jpg)
Spam detection and moderation
● Very important for Quora to keep quality of
content
● Pure manual approaches do not scale
● Hard to get algorithms 100% right
● ML algorithms detect content/user issues
○ Output of the algorithms feed manually
curated moderation queues
![Page 20: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/20.jpg)
Content creation prediction
● Quora’s algorithms not only optimize for
probability of reading
● Important to predict probability of a user
answering a question
● Some product features completely rely
on that prediction
○ E.g. A2A (ask to answer) suggestions
![Page 21: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/21.jpg)
Trending topics
Highlight current events that are interesting
to the user
● We take into account:
○ Global “Trendiness”
○ Social “Trendiness”
○ User’s interest
○ ...
● Trending topics are a great discovery mechanism
![Page 22: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/22.jpg)
Models &Experimentation
![Page 23: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/23.jpg)
Models
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision Trees
● Random Forests
● (Deep) Neural Networks
● LambdaMART
● Matrix Factorization
● LDA
● ...
![Page 24: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/24.jpg)
Open source project -- QMF
Quora Matrix Factorization
https://github.com/quora/qmf
● Currently BPR and WALS
● Multithreaded implementation
in C++14
![Page 25: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/25.jpg)
ML platform
● Allow ML Engineers and Data
Scientists to collaborate within
the same ML framework
● Easy integration with well known
tools and open source libraries
● Offline evaluation and debugging
● User friendly Python frontend
● High performance and scalable
C++/CUDA backend
Redshift MySQL
S3 PythonUser Interface
Trainer Box
Session
CPU GPU
Disk
...WALS BPR
![Page 26: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/26.jpg)
● Extensive A/B testing, data-driven
decision-making
● Separate, orthogonal “layers” for
different parts of the system
● Experiment framework showing
comparisons for various metrics
Experimentation
![Page 27: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/27.jpg)
Conclusions
![Page 28: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/28.jpg)
Conclusions
● At Quora we have not only Big, but also “rich” data
● Our algorithms need to understand and optimize complex aspects such
as quality, interestingness, relevance, or user expertise
● We believe ML will be one of the keys to our success
● We have many interesting problems, and many unsolved challenges
![Page 29: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16](https://reader034.vdocuments.net/reader034/viewer/2022042723/58f0f7931a28abb04c8b45ff/html5/thumbnails/29.jpg)
We are hiring! www.quora.com/careers