hbasecon 2013: real-time model scoring in recommender systems

35
RealTime Model Scoring in Recommender Systems (c) 2013 WibiData, Inc. Juliet Hougland and Jonathan Natkins

Upload: cloudera-inc

Post on 10-May-2015

2.056 views

Category:

Technology


1 download

DESCRIPTION

Presented by: Jonathan Natkins (WibiData) and Juliet Hougland (WibiData)

TRANSCRIPT

Page 1: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Real-­‐Time  Model  Scoring  in  Recommender  Systems  

(c)  2013  WibiData,  Inc.  

   Juliet  Hougland  and  Jonathan  Natkins  

Page 2: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Why  Real-­‐Time?  

Page 3: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Who  Are  We?  •  Jon  "Na@y"  Natkins  (@na@yice)  •  Field  Engineer  at  WibiData  •  Before  that,  Cloudera  SoJware  Engineer  •  Before  that,  VerMca  SoJware/Field  Engineer  •  Juliet  Hougland  (@JulietHougland)  •  PlaPorm  Engineer  at  WibiData  •  MS  in  Applied  Math  and  BA  in  Math-­‐Physics  

Page 4: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

What  is  Kiji?  The  Kiji  Project  is  a  modular,  open-­‐source  framework  that  enables  developers  and  analysts  to  collect,  analyze  and  use  data  in  real-­‐Mme  applicaMons.  •  kiji.org  •  github.com/kijiproject  

Page 5: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Genera<ng  Recommenda<ons  

Page 6: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Genera<ng  Recommenda<ons  

Page 7: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Modeling  with  KijiMR  Producers  •  Operates  on  a  single  row  in  a  table.  •  Generate  derived  data:  

o  Apply  a  classifier  o  Assign  a  user  to  a  cluster  or  segment  o  Recommend  new  items  

Gatherers  •  Mapper  with  KijiTable  input.  •  Used  when  training  models.  

Page 8: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Genera<ng  Recommenda<ons  

Page 9: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Genera<ng  Recommenda<ons  

Page 10: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Genera<ng  Recommenda<ons  

Page 11: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Batch  Isn't  Good  For  Everything  

Page 12: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Batch  Isn't  Good  For  Everything  

Page 13: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Batch  Isn't  Good  For  Everything  

Page 14: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Fresheners  Compute  Lazily  

Freshness  Policy  

Read  a  column  

Get  from  HBase  

Fresh?  Yes,  return  to  client  

KijiScoring  API   HBase  

Page 15: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Fresheners  Compute  Lazily  

Freshness  Policy  

Read  a  column  

Get  from  HBase  

Fresh?  

Yes,  return  to  client  

KijiScoring  API   HBase  

Producer  Freshen  

Cache  for  next  Mme  

Page 16: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

How  can  we  make  "freshenable"  models?  

Population interests change slowly

Individual interests change quickly

Page 17: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

How  can  we  make  "freshenable"  models?  

Population interests change slowly

Individual interests change quickly

Models  don't  need  to  retrained  frequently  

ApplicaMon  of  a  model  should  be  fast  

Page 18: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

How  can  we  make  "freshenable"  models?  

Individual interests change quickly

ApplicaMon  of  a  model  should  be  fast  •  Train  a  model  over  your  

enMre  data  set  •  Save  fi@ed  model  

parameters  to  a  file,  or  another  table  

•  Access  the  model  parameters  through  a  KeyValueStore  when  scoring  new  data  with  a  producer.  

Page 19: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

More  Modeling  with  KijiMR  KeyValueStores  •  Allows  access  to  external  data  in  Producers  and  

Gatherers.  •  Supports  various  file  formats  as  well  as  tables.  •  Makes  joining  dataset  together  very  easy.  •  The  mechanism  for  accessing  fi@ed  model  

parameters  when  freshening.  

Page 20: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

•  A real-time product recommendation system •  Content-based model using product

descriptions and TF-IDF

KijiShopping  

Users KijiShopping Web Application

KijiSchema Avro, HBase

KijiMR MapReduce KijiScoring

Page 21: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

KijiShopping  Data  Collec<on  

•  User Logins •  Product Information

o  Names, descriptions, SKU information

•  User Ratings o  Explicit ratings from users

How do we go from data to recommendations?

Page 22: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Finding  Useful  Features  

•  TF-IDF

Page 23: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

TF-­‐IDF  

•  Term Frequency o  How often does this term appear in this document?

•  Document Frequency o  How many documents does this term appear in?

•  TF-IDF o  How important is this term to this document?

•  In KijiShopping, each is a separate job

Page 24: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

•  Written as a Producer o  Executed on the Product table as a Map-only job o  WordCount on a per-record basis

Compu<ng  Term  Frequency  

HBase

Read Product Description

Count Words in Product Description

Write Word Counts Back

Page 25: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

•  Written as a Gatherer o  Executed on the Product table as a MapReduce job o  Groups by words

Compu<ng  Document  Frequency  

HBase

Read Term Frequencies Map

Emit (Word, 1)

Write Document Frequencies

HDFS Reduce

Group By Word

Page 26: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

•  Written as a Producer o  Executed on the Product table as a Map-only job o  Pulls in Document Frequencies as a KVStore

Compu<ng  TF-­‐IDF  

HBase

Read Term Frequencies

Divide TF by DF

Write TF-IDFs Back

HDFS

Read Document

Frequencies via KVStore

Page 27: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

•  Batch training process •  Associations stored in a model table

Associa<ng  Words  with  Products  

gourmet

knife

"gourmet" Products

"knife" Products

tfidfgourmet

tfidfknife

Page 28: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Determine  a  User's  Preferred  Words  

•  Stored in a user table

Natty

gourmet

knife

wgourmet

wknife

Page 29: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

•  Producers incorporate models using KeyValueStores

Combining  User  Ra<ngs  and  Models  

Natty

gourmet

knife

"gourmet" Products

"knife" Products

wgourmet

wknife

tfidfgourmet

tfidfknife

Page 30: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Genera<ng  a  Recommenda<on  

•  Pick the best products for your user

Page 31: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

KijiShopping  

The  model  was  built  with  KijiMR-­‐  an  extension  of  Hadoop  MapReduce.      

Page 32: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

KijiShopping  

The  model  was  built  with  KijiMR-­‐  an  extension  of  Hadoop  MapReduce.      

Page 33: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

KijiExpress  Modeling  Lifecycle  

Page 34: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Want  to  know  more?  

•  The Kiji Project o  kiji.org o  github.com/kijiproject

•  KijiShopping o  github.com/wibidata/kiji-shopping

Questions about this presentation? o  [email protected] o  [email protected]

Page 35: HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Want  to  know  more?  

•  Come see us at the WibiData booth

•  Join us at KijiCon tomorrow