hbasecon 2013: real-time model scoring in recommender systems
DESCRIPTION
Presented by: Jonathan Natkins (WibiData) and Juliet Hougland (WibiData)TRANSCRIPT
Real-‐Time Model Scoring in Recommender Systems
(c) 2013 WibiData, Inc.
Juliet Hougland and Jonathan Natkins
Why Real-‐Time?
Who Are We? • Jon "Na@y" Natkins (@na@yice) • Field Engineer at WibiData • Before that, Cloudera SoJware Engineer • Before that, VerMca SoJware/Field Engineer • Juliet Hougland (@JulietHougland) • PlaPorm Engineer at WibiData • MS in Applied Math and BA in Math-‐Physics
What is Kiji? The Kiji Project is a modular, open-‐source framework that enables developers and analysts to collect, analyze and use data in real-‐Mme applicaMons. • kiji.org • github.com/kijiproject
Genera<ng Recommenda<ons
Genera<ng Recommenda<ons
Modeling with KijiMR Producers • Operates on a single row in a table. • Generate derived data:
o Apply a classifier o Assign a user to a cluster or segment o Recommend new items
Gatherers • Mapper with KijiTable input. • Used when training models.
Genera<ng Recommenda<ons
Genera<ng Recommenda<ons
Genera<ng Recommenda<ons
Batch Isn't Good For Everything
Batch Isn't Good For Everything
Batch Isn't Good For Everything
Fresheners Compute Lazily
Freshness Policy
Read a column
Get from HBase
Fresh? Yes, return to client
KijiScoring API HBase
Fresheners Compute Lazily
Freshness Policy
Read a column
Get from HBase
Fresh?
Yes, return to client
KijiScoring API HBase
Producer Freshen
Cache for next Mme
How can we make "freshenable" models?
Population interests change slowly
Individual interests change quickly
How can we make "freshenable" models?
Population interests change slowly
Individual interests change quickly
Models don't need to retrained frequently
ApplicaMon of a model should be fast
How can we make "freshenable" models?
Individual interests change quickly
ApplicaMon of a model should be fast • Train a model over your
enMre data set • Save fi@ed model
parameters to a file, or another table
• Access the model parameters through a KeyValueStore when scoring new data with a producer.
More Modeling with KijiMR KeyValueStores • Allows access to external data in Producers and
Gatherers. • Supports various file formats as well as tables. • Makes joining dataset together very easy. • The mechanism for accessing fi@ed model
parameters when freshening.
• A real-time product recommendation system • Content-based model using product
descriptions and TF-IDF
KijiShopping
Users KijiShopping Web Application
KijiSchema Avro, HBase
KijiMR MapReduce KijiScoring
KijiShopping Data Collec<on
• User Logins • Product Information
o Names, descriptions, SKU information
• User Ratings o Explicit ratings from users
How do we go from data to recommendations?
Finding Useful Features
• TF-IDF
TF-‐IDF
• Term Frequency o How often does this term appear in this document?
• Document Frequency o How many documents does this term appear in?
• TF-IDF o How important is this term to this document?
• In KijiShopping, each is a separate job
• Written as a Producer o Executed on the Product table as a Map-only job o WordCount on a per-record basis
Compu<ng Term Frequency
HBase
Read Product Description
Count Words in Product Description
Write Word Counts Back
• Written as a Gatherer o Executed on the Product table as a MapReduce job o Groups by words
Compu<ng Document Frequency
HBase
Read Term Frequencies Map
Emit (Word, 1)
Write Document Frequencies
HDFS Reduce
Group By Word
• Written as a Producer o Executed on the Product table as a Map-only job o Pulls in Document Frequencies as a KVStore
Compu<ng TF-‐IDF
HBase
Read Term Frequencies
Divide TF by DF
Write TF-IDFs Back
HDFS
Read Document
Frequencies via KVStore
• Batch training process • Associations stored in a model table
Associa<ng Words with Products
gourmet
knife
"gourmet" Products
"knife" Products
tfidfgourmet
tfidfknife
Determine a User's Preferred Words
• Stored in a user table
Natty
gourmet
knife
wgourmet
wknife
• Producers incorporate models using KeyValueStores
Combining User Ra<ngs and Models
Natty
gourmet
knife
"gourmet" Products
"knife" Products
wgourmet
wknife
tfidfgourmet
tfidfknife
Genera<ng a Recommenda<on
• Pick the best products for your user
KijiShopping
The model was built with KijiMR-‐ an extension of Hadoop MapReduce.
KijiShopping
The model was built with KijiMR-‐ an extension of Hadoop MapReduce.
KijiExpress Modeling Lifecycle
Want to know more?
• The Kiji Project o kiji.org o github.com/kijiproject
• KijiShopping o github.com/wibidata/kiji-shopping
Questions about this presentation? o [email protected] o [email protected]
Want to know more?
• Come see us at the WibiData booth
• Join us at KijiCon tomorrow