large-scale recommendations in a dynamic marketplace

Post on 25-Feb-2016

82 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Large-scale Recommendations in a Dynamic Marketplace. Jay Katukuri Rajyashree Mukherjee Tolga Konik Chu-Cheng Hsieh. Meet John Doe. John is interested in an item: “iPhone 5 64gb white”, should we recommends “iPhone 5 case” (or) “iPhone 5s gold”. Recommendation on e-marketplace. - PowerPoint PPT Presentation

TRANSCRIPT

1

Large-scale Recommendations in a Dynamic Marketplace

Jay KatukuriRajyashree Mukherjee

Tolga KonikChu-Cheng Hsieh

LSRS 2013

2

John is interested in an item: “iPhone 5 64gb white”, should we recommends– “iPhone 5 case”

(or)– “iPhone 5s gold”

Meet John Doe

LSRS 2013

3

Recommendation on e-marketplace

• Recommendation “before” purchase– iPhone 5S gold

• Recommendation “after” purchase– iPhone 5 case

Similar Item Recommendation (SIR)

Related Item Recommendation (RIR)

LSRS 2013

4

SIR- Example 1

LSRS 2013

5

SIR Example 2

LSRS 2013

Related Item Recommendation

6

Recommendations forXbox 360 4GB on Checkout page

LSRS 2013

7

Main Idea

• Similar Item Clustering (SIC)– Titles–Attributes (Price, etc.)– Images

• Recommendation– SIR: (same cluster)– RIR: (neighbor clusters)

LSRS 2013

8

Models

• Item clustersCluster represented by meaningful keywords– “clarks women shoe pumps classics”– “authentic handmade amish quilt”

• Cluster-Cluster Relations– “samsung galaxy s4” – “samsung galaxy s4 screen

protector”– “wolfgang puck electric pressure cooker” –

“kitchenaid food processor”

LSRS 2013

LSRS 2013 9

System Architecture - Overview

Inventory

Cluster-ClusterRelations

Transactions

Clusters

Conceptual Knowledgebase

Offline Model Generation The Data Store Real-time Performance System

Similar Items Recommender

(SIR)

Related Items Recommender

(RIR)

Clusters Model Generation

Related Clusters Model

Generation

Clickstream

Lost Item

Similar Items

?similarTo(item)

Bought Item

Related Items

?relatedTo(item)

10

Cluster Generation(offline)

LSRS 2013

11

Data on eBay

• Item-item co-occurrences on transaction logs• Large Data – Much bigger data set in both users and inventory

than other ecommerce sites.• Scale – More than 300M listings.– More than 10M new items every day

LSRS 2013

12

Challenges

• Global clustering not feasible• Size bias on different categories• Performance

LSRS 2013

13

Model Generation - Clusters

1. Select a few keyword to represents “big notions”, e.g. iPhone, Handbags, etc.– How to select?

2. Clustering by K-means– How to set K?

LSRS 2013

14

Model Generation - Clusters

new clustersitems user queries

concepts,categories

query-to-itemsQuery-Recall Generation

Cluster Generation

Clusters Model Generation

Data Store

Clusters

Inventory

Clickstream

Conceptual Knowledgebase

• Problem:Global clustering not feasible

• Solution:Partition input data by user queries

• Parallel distributed K-Means in Hadoop MapReduce

• Dedupe and merge overlapping clusters(100X reduction in size over inventory with over 90% coverage)

LSRS 2013

15

Base Cluster Generation

• Base Cluster ≡ Query• Find merge candidates based on query term

overlap– Eg: “nike airmax tennis shoes” -> “nike airmax”

• Score candidates using cosine similarity– Term weight : TF-IDF in the query

space(document=query)• TF : Query Demand• IDF : Number of Queries

LSRS 2013

16

Step 1: base cluster candidates• Method for choosing the ``base clusters’’ (initial states):– Minimum frequency– Supply threshold (Enough Inventory)– Min and max token constraint (Length of queries)– Heuristic constraints • Queries that have only numbers are not

allowed: “10 5”• …

–Merge similar clusters into one

LSRS 2013

17

candidates merge

• 4.34M base clusters merged into 1.95M• Example

phrase(hand,made) phrase(king,s) queen quiltphrase(hand,made) phrase(pink,s) quilt phrase(hand,made) phrase(prae,owned) queen quiltphrase(hand,made) queen quiltphrase(hand,made) phrase(prae,owned) quiltphrase(hand,made) quilt size twinphrase(hand,made) quilt silkphrase(hand,made) quilt twinphrase(hand,made) phrase(patch,work) quiltphrase(hand,made) quilt whitephrase(hand,made) phrase(king,size) quiltphrase(hand,made) phrase(yo,yo,s) quiltphrase(hand,made) quilt salephrase(hand,made) quilt red

phrase(hand,made) quilt

LSRS 2013

18

Step 2: K-Means Clustering

Split Clusters

Query to Items Data

Base Cluster Generation

K-Means Clustering of Base Clusters

Generate Item Features

Transaction Logs

Inventory Logs

Scoring Models

LSRS 2013

19

Clusters on Item Signature

apple ipod touch 4g clear film protector screen

Cluster

clarks women shoe pumps classics

LSRS 2013

20

Recommendation (online)

LSRS 2013

21

Performance System

Clusters InventoryConceptual Knowledgebase

?similarTo(item)

SIR query formation

Item

Sel

ectio

nCluster Assignment

SIR Ranking

 

items

Data Store

Lost Item Similar

Items

recommendations

Item Search

 

query

Clusters

Inventory

Conceptual Knowledgebase

?relatedTo(item)

Item

Sel

ectio

n

Cluster Assignment

RIR Ranking

 

items

Data Store

BoughtItem Related

Items

recommendations

Item Search

 

queriesRIR Query Formation

Cluster-ClusterRelations

clusters related clusters

LSRS 2013

22

Items in the same cluster

LSRS 2013

23

Similar Item Recommendations

LSRS 2013

LSRS 2013 24

Experimental Results• A/B Tests comparing against legacy systems

– SIR legacy system• Completely online• Naïve approach of using seed item title as a search query

– RIR legacy system• Chen, Y. and J.F. Canny, Recommending ephemeral items at web scale,

ACM SIGIR 2011• Collaborative Filtering on stable representations of items

– Significant improvements at 90% confidence interval• SIR resulted in 38.18% higher user engagement (CTR)• RIR resulted in 10.5% higher CTR• Statistically significant improvement in site-wide business metrics from

both SIR & RIR

LSRS 2013 25

Conclusion

• Balance between similarity and quality crucial in driving user engagement and conversion

• Clusters of similar items in the inventory– Local clustering in the coverage set of user queries

• Offline models built using Map-Reduce– Huge input datasets including inventory, clickstream

and transactional data• Efficient real-time performance system• Currently deployed on ebay.com

LSRS 2013 26

Acknowledgments

• Current & Past team members– Kranthi Chalasani – Santanu Kolay – Riyaaz Shaik – Venkat Sundaranatha

LSRS 2013 27

WE’RE HIRINGChu-Cheng Hsieh chsieh@ebay.com

top related