1
Dhaval ShahR&D Software Engineer, Bloomberg L. P.
Recommender Systems at scale using HBase and Hadoop
Bloomberg
2
Agenda Introduction to Recommender Systems Types of Recommender Systems Building a Recommender System Summary (Hopefully) Lots of Q&A
Bloomberg
3
What is a Recommender System? Wikipedia1 – Recommender systems are a subclass of
information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item or social element they had not yet considered, using a model built from the characteristics of an item (content-based approaches) or the user’s social environment (collaborative filtering approaches).
Introduction to Recommender Systems
Bloomberg
Introduction to Recommender Systems
Where are Recommender Systems used? Everywhere! (Well almost!)
E-Commerce Web Portals Online Radio Streaming Movies Media/News
4Bloomberg
8
Introduction to Recommender Systems
Bloomberg
11
Why do you need a Recommender System? Too much useful information Bloomberg.com statistics
o 500-1000 stories, 100-200 videos published per dayo Average user consumption << Articles publishedo Satisfied user = Content Quality + User preferenceo Double digit increases in CTR
Introduction to Recommender Systems
Bloomberg
12
Types of Recommender Systems
Content-Based Collaborative filter based
User-based Item-based
Hybrid
Bloomberg
13
Building a Recommender System
Collect/Generate metadata about stories/videos Identify and track users Track user activity Store user activity Generate user models Serve recommendations
Bloomberg
14
Collect metadata about stories/videos URLs, Headlines, etc. Sqoop, Custom Scripts
Generate features for stories LDA from Mahout Custom extensions
Bloomberg
Building a Recommender System
15
Identify and track users Registered Anonymous
o Cookie based trackingo IP based tracking
Bloomberg
Building a Recommender System
16
Types of user activity Explicit interactions Implicit interactions
Bloomberg
Building a Recommender System
17
Tracking user activity
Bloomberg
Building a Recommender System
Browser(Javascript)
HTTP
ServerD
Flume HBase
18
Tracking : Key Features 1000s of ppm Asynchronous - Instantaneous responses to client Reliability Multiple HTTP Servers → Multiple Clusters Client to HBase in milliseconds
Bloomberg
Building a Recommender System
19
Why HBase? Scalable Fault-tolerant Auto-sharding Schema-less and sparse Real-time queries MR integration
Bloomberg
Building a Recommender System
20
Store user activity 100s of millions of users Millions of stories/videos TBs of data Wide Tables – 1 row per user High load Sub-second response times Multiple MR jobs every few mins
Bloomberg
Building a Recommender System
21
Generate user models using ML 100s of millions of users High IO/Processing power Train multiple times an hour
Bloomberg
Building a Recommender System
22
Content-based Recommender Models User model independent of other users Train only when user has new interaction Easily parallelizable No Reducer Incremental training Train 1000 user models a minute
Bloomberg
Building a Recommender System
23
Collaborative filter based Recommender Models User model dependent of other users Train all models frequently Map side self join No Reducer Batch training Train 10s of millions of user models on each batch
Bloomberg
Building a Recommender System
24
Serve recommendations Query HBase Evaluate articles against user models In-memory cache 1000s of requests per minute 50ms responses
Bloomberg
Building a Recommender System
25
Summary
Recommender System are important Content based and Collaborative filter based Cross domain expertise – Big Data, Machine Learning Hadoop/MapReduce for offline components HBase as a hybrid data store
Bloomberg
27
Questions?
Bloomberg