dataengconf sf16 - recommendations at instacart
TRANSCRIPT
v
The Instacart Value Proposition
Groceries from stores you love
deliveredto your
doorstep
in as little as an hour
+ + + =
v
Customer Experience
Select a Store
Shop for Groceries
Checkout Select Delivery Time
Delivered to Doorstep
v
Shopper Experience
Accept Order Find the Groceries
Out for Delivery
Delivered to DoorstepScan Barcode
v
Four Sided Marketplace
Customers Shoppers
Products(Advertisers)
Search
Advertising
Shopping
Delivery
Customer Service
Inventory
Picking
Loyalty
Stores(Retailers)
v
What this talk is about
A new collaborative filtering algorithm
• A case-study • live end to end recommendation system• one person month • hundreds of millions of transactions
v
Repeat purchases increase LTV of recommendations
$5.49
$549
Today A year later
1 +….+ 100
$549
$549
v
Personalized Top N recommendations
Promote broad-based discovery in a dynamic catalog
Including from stores customers may have never shopped
v
“Frequently bought with” Recommendations
Not necessarily consumed together
Help customers shop for complementary products
and try alternatives
Probablyconsumed together
v
Learning from feedback
Traditionally collaborative filtering used explicit feedback to predict ratings
There may still bias in whether the user chooses to rate
Explicit Feedback Implicit Feedback
v
Learning from Explicit Feedback
• Explicit feedback may be more reliable but there is much less of it
• Less reliable if users rate based on aspirations instead of true preferences
vs
v
Architecture
Event Data Score and Select Top N (Spark/EMR)
User/Product Factors
Event Data
Run-time ranking for diversity
Candidate Selection
ALS (Spark/EMR)
Generate User-Product
Matrix
v
A Matrix Factorization Formulation for Implicit Feedback
N Products
M U
sers
1
-
- 9
-
- -
3
20
User Product Matrix R; (M x N)
1
0
0 1
0
0 0
1
1binary preferences
Preference Matrix R; (M x N)
“Collaborative Filtering for Implicit Feedback” - Hu et. al
v
A Matrix Factorization Formulation for Implicit Feedback
~Y
XT
Product Factors (k x N)
User Factors (M x k)
1
0
0 1
0
0 0
1
1x
Preference Matrix R; (M x N)
v
Matrix Factorization from Implicit Feedback - The Intuition
#Purchases Preference p Confidence c
0 0 Low
1 1 Low
>>1 1 High
• Confidence increases linearly with purchases r • c = 1 + alpha * r
• alpha controls the marginal rate of learning from user purchases
• Key questions• How should the unobserved events be treated• How should one trade-off observed and the unobserved
v
Regularized Weighted Squared Loss
Confidence
User Factors Matrix
Product Factors Matrix
Preference Matrix Regularization
Solve using Alternating Least Squares
v
Architecture
Generate User-Product
Matrix
ALS (Spark/EMR)
Score and Select Top N (Spark/EMR)
User/Product Factors
Run-time ranking for diversity
Candidate Selection
Event Data
Event Data
v
Spark ALS Hyper-parameter Tuning
• rank k - diminishing returns after 150
• alpha - controls rate of learning from observed events
• iterations - ALS tends to converge within 5, seldom more than 10
• lambda - regularization parameter
v
Architecture
Generate User-Product
Matrix
ALS Matrix Factorization (Spark/EMR)
Candidate Selection
Score and Select Top N (Spark/EMR)
User/Product Factors
Run-time ranking for diversity
Event Data
Event Data
v
Scoring user and products
With millions of products and users, scoring every (user, product) pair is prohibitive
Two goals in selecting products to score• Products that have an a priori high purchase rate (popular)• Long tail which have not been discovered
Exclude previously purchased products
~
v
Candidate Product Selection
We start with simple stratified sampling
For each user, score N products
Sample h products from Head
Sample t products from tail
N ~ 10000 h ~ 3000 t ~7000
v
Architecture
Generate User-Product
Matrix
ALS (Spark/EMR)
Score and Select Top N (Spark/EMR)
User/Product Factors
Run-time diversity ranking
Candidate Selection
Event Data
Event Data
v
Offline evaluation
• Ideally we want to evaluate user response to recommendations• But we will only know this from an live A/B test
• Recall based metrics are an offline proxy (albeit not the best)• Recall: “Fraction of purchased products covered among Top N
recommendations”• We only use this for hyper parameter tuning
v
Tuning Spark For ALS
Understanding Spark execution model and its implementation of ALS helps
• Training is communication heavy1, set partitions <= #CPU cores
• Scoring is memory intensive
• Broad guidelines2 • Limit executor memory to 64GB • 5 cores per executor • Set executors based on data size
1 - http://apache-spark-user-list.1001560.n3.nabble.com/Error-No-space-left-on-device-tp9887p9896.html 2 - http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
v
Online ranking for diversity
“Diversity within sessions, Novelty across sessions”
“Establish trust in a fresh and comprehensive catalog”
“Less is more”
Cached list of ~1000 products
per userFinal list of
<100 products
promote diversity
v
Diversity
Top K products - ranked by score
Rank product categories by their median product score
> > >
v
Weighted sampling for diversity
Sample category in proportion to score
Within category, sample in proportion to product score
v
A/B Test Setup
Generate User-Product
Matrix
ALS (Spark/EMR)
Score and Select Top N (Spark/EMR)
User/Product Factors
Run-time diversity ranking
Candidate Selection
Event Data
Event Data
Weekly for past N months data
Weekly for users with recent activity
v
A/B Test Results
• Statistically significant increases• Items per order• GMV per order
• Total product sales spread over more categories
v
What else do you do with user and product factors?
Score (user, product) pair on demand
Get Top N similar users
Get Top N similar product
As features in other models
v
Products similar to “Haigs Spicy Hummus"
More “Spicy Hummus”
Spicy Salsas
Generated using Approximate Nearest Neighbor (“annoy” from Spotify)
v
Ensembles
Use different types of evidence and/or product metadata to easily create ensembles
User x Products Purchased
User x Products Viewed
User x Brands Purchased
Model or Linear Combination
…
v
What next
• Improve candidate selection by leverage user and product factors
• Make recommendations more contextual
• Address cold-start problems, particularly for users
• Explain recommendations (“Because you did X”)
v
Fulfillment in Traditional E-commerce
• Manage inventory in warehouses optimized for quick fulfillment
• Users only specify the “What” they want
• Disallow users from ordering out of stock products
• Set expectations• “3 day shipping” but will ship in 10 business days
v
Fulfillment for on-demand delivery from local retailers
• Shoppers navigate a complex environment where products • may have run out • may be misplaced• may be damaged
• User specifies “What”, “When” and “Where from”
• Improvise under uncertainty
v
Addressing new challenges in on-demand delivery
• Tight technology integrations help improve tracking of in-store availability
• Complemented by predictive models that estimate availability in real-time
• Last minute out of stocks can still happen
v
What makes a replacement acceptable?
Flavor PackingSizeBrand Price
• Several product attributes matter
• Context matters, might benefit from personalization
• Must scale to millions of products
• Not always symmetric
• May be ok to replace X with gluten free X but not the other way around
Diet Info
v• Shoppers are trained to pick replacements
• But shoppers can benefit from algorithmic suggestions
• Many unfamiliar products in a vast catalog
• Validation for common products
• Finding replacements fast improves operational efficiency
Replacement Recommendations for Shoppers
v
• Customers can specify replacements while placing the order
• Can choose to communicate with the shopper in store to verify
Replacement Recommendations for Customers