journal club: meta-prod2vec
TRANSCRIPT
Meta-Prod2Vec - Product Embeddings Using Side-Information for Recommendation
Yuya Kanemoto
Vasile F et al. RecSys 2016
Neural embedding: Word2Vec (Skip-gram)• A method for learning distributed vector representations that capture a large
number of syntactic and semantic word relationships
• Example: Tokyo - Japan + Germany = Berlin
• Word2Vec is essentially a two-layer neural network
• Objective function:
Mikolov T et al. 2013
Skip-gram with negative sampling
• Data sets are often too large to perform SGD as iterations at the denominator of conditional probability takes time
• We could set the task to distinguish the target word co-occurrences and k negative samples
Mikolov T et al. 2013
: Objective function
: Objective function with negative sampling
Embedding and Matrix Factorisation
• The objective of the embedding is closely related to matrix factorisation
• Embedding can be considered as decomposition of SPMI (shifted pointwise mutual information) matrix
Levy O et al. 2014
Neural embedding: Prod2Vec
• A method applying Skip-gram model for product recommendation
• When an user buys a product, products with similar vector representation will be recommended
Grbovic M et al. 2015
Prod2Vec for popular songs
“Shake It Off” “All About That Bass”
Vasile F et al. 2016
Prod2Vec in cold start case
“You’re Not Sorry” “Du Hast”
Vasile F et al. 2016
Meta-Prod2Vec constraints
• Meta-Prod2Vec = Prod2Vec + product meta-data
• The aim is to deal with cold start problems
Vasile F et al. 2016
Loss function of Prod2Vec
Vasile F et al. 2016
Negative sampling for Meta-Prod2Vec
Vasile F et al. 2016
Loss function of Meta-Prod2Vec
Vasile F et al. 2016
I: input J: output M: meta-data
Evaluation of experiments
Vasile F et al. 2016
• Hit ratio at K (HR@K): whether product appears in the top K list of recommended products (doesn’t care the rank of test product in the recommendation list)
• Normalised discounted cumulative gain (NDCG@K): measurement of the performance of a recommendation system based on the graded relevance of the recommended entities. It varies from 0 to 1, with 1 representing the ideal ranking of the entities.
IDCG is the maximum possible (ideal) DCG for a given set of queries rel: graded relevance of the result at position i k: maximum number of entities that can be recommended
Methods for comparison
Vasile F et al. 2016
• BestOf: based on popularity
• CoCounts: based on cosine similarity (basic collaborative filtering)
• Prod2Vec
• Meta-Prod2Vec
• Mix(Prod2Vec,CoCounts):
• Mix(Meta-Prod2Vec,CoCounts):
Parameters Number of songs: 433k Number of artists: 67k Embedding dimension: 50 Context window size: 3 λ: 1 α: 0.15
Relative importance of meta data
Vasile F et al. 2016
Improvement in cold start
Vasile F et al. 2016
Cold start
Improvement in cold start
Vasile F et al. 2016
Better performance in ensemble model
Vasile F et al. 2016
Discussion
• Meta data was informative, especially for cold start case
• Ensemble method (with 15% Meta-Prod2Vec) worked well
• No comparison with matrix factorisation methods/other meta-data
utilising Word2Vec variants