content2vec: a joint architecture to use product image and text for the task of product...

Thomas Nedelec

01/02/2107

RecSys Meetup

CONTENT2VEC: a Joint

Architecture to use Product

Image and Text for the task

of Product Recommendation

Copyright © 2016 Criteo

Talk outline

I. Presentation of our architecture: goals and main modules

II. Details on the TextCNN module

III. Our experimental results

IV.Future applications and directions of research


Motivation

Goal for Content2Vec: build the best product representation, meaning that it:

1. Takes into account all product signal in order to help with overall recommendation performance and especially with performance on new products (cold start)

2. Defines the product-2-product similarity as a function of P(co-event of the product pair) in order to optimize for the scenario where the recommended products are retrieved by their similarity with a query product*

*assuming optimizing the AUC of link prediction is a good proxy for online performance


1. Takes into account all product signal

• Represents Product Sequences:

• Product co-occurrences Representation: Prod2Vec

• Represents Product Information:

• Category Representation: Meta-Prod2Vec

• Image Representation: AlexNet

• Text Representation: Word2Vec, TextCNN


2. Merge the different signals

Adapt the initial product representations to the final task of predicting P(co-event):

• Find the representation that optimizes the P(co-event): Metric learning (Logistic Syamese Nets)

• Merge the representations from different signal: Ensemble learning

General Architecture

Integrating product embeddings in recommendation engines

Content2vec

I. Product Text Representation


I Product Text Representation.

Goal: To be able to estimate the similarities of products based on their text descriptions.


I.1 Words Representation.

Embedding Solution:

• Word2Vec on the product description corpus• Concatenate all products description from Amazon dataset

• Ran Word2vec on top of this big file

• Get representation for each word of the corpus


I.1 Words Representation.

Similar Word Examples:

• Startup: ['startups’,'ecommerce’, 'company’, 'entrepreneurial’, 'b2b’,'businesses’, 'entrepreneurs’, u'homebased' u'entrepreneur’,'cfo']

• Owner: ['proprietor’, 'owners’, 'franchisee’, 'manager’, 'coo' , 'partner’,'breeder’, 'founder' ,'realtor' , 'franchisor']

• Manual: ['handbook’,'workbook','guide’,'manuals’,'manualis’,'sourcebook,'kit' 'labsim’,'guidebook’,'essentials']


I.2 Product Text Representation.

From word embeddings to full product description embedding

3 implemented architectures:

- sum of embeddings

- cross similarities

- TextCNN

I.3 Product Text Representation: TextCNN

Convolutional Neural Networks for Sentence Classification :

http://arxiv.org/abs/1408.5882


I.4 Examples of filters

I.5 TextCNN implementation in TensorFlow

TextCNN TF Syamese Architecture


I.6 Other implemented architectures

• Prod2vec:

I.7 Other implemented architectures

• Image CNN:


II. Merge the different scores: A monster model is born!


II Merge the representations from different signal

Baseline: Linear combination of the modality specific similarities (C2V-linear)


II Other types of ensemble methods

Other implemented models:

•Cross features (C2V-crossfeat)

•A fully connected layer to compress the features

•Learn a residual layer to keep using the strong signal from the different modalities and learn some dependences between signals (C2V-res)

III. Experimental Results



Task: Link Prediction – predict on hold out set of product co-events based on a training set of product co-events and their content features (catalog)

Dataset: Amazon book dataset with info on title, description, image urland related products (co-view, co-sale)

Hard Cold Start Setting: the products in test have not been seen at training time, e.g. no CF signal is available

Metrics: AUC loss (classification loss on true co-event vs. spurious)


Task1:

Hard Cold Start


IV. Scalability and putting it in production


IV. Scalability and putting it in production

• A lot of CPUs is great for evaluation

• Multi-modular architecture: easier to debug

• Make the model work better for cross-category pairs

• Next: Experiments for making a category classifier (see Is a picture worth thousand words? Work done by a team working with Walmart)

• Link to our paper (in preparation for KDD 2017)

https://openreview.net/pdf?id=rJJ3YU5ge

https://openreview.net/pdf?id=ryTYxh5ll

Thank you!

content2vec: a joint architecture to use product image and text for the task of product...

Internet