new advances in graph representations for ecommerce search - …€¦ · data science portugal...

Post on 11-Aug-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

New advances in Graph Representations for Ecommerce Search

Pedro Balage

Data Science Portugal Meetup - DSPT #47

Lisbon, 08th January

About me

2

● Lead Data Scientist @ Farfetch - Search team

● Research and Industry experience in NLP

● Affiliated to Instituto de Telecomunicações

● Member of organization for the Lisbon Machine Learning School (LxMLS)

context3

● Online retail platform for fashion

● +100k search queries every day

● Search experience is key in ecommerce

business

The problem: How to make a good search engine for ecommerce?

4

Steps to develop a good search engine

5

Information Retrieval● Indexing● Retrieval● Learning-to-rank

Natural Language Processing● Language Analysers● Synonyms/Acronyms● Query understanding● Auto-suggestion● Did you mean?

Relevance Search● Measure● Tweak relevance minimum● Tweak boost field● Manage exceptions

Next steps...

6

● Why XYZ is not matching with XZY?

● What happens if I add this synonym?

● Let's add just one more exception.

Data Science

7

Click-through query logs

● Click-through logs provide query-document relevance!

● Incorporating user feedback is one of the most effective ways to improve a search engine.

● Lot of papers on the topic: SIGIR, KDD, RecSys, etc

Pros and Cons of using clickthrough data

8

Pros

● Relies on user's feedback

● Link interactions among products

Cons

● Noisy and Sparse

● Only provides a true positive set for relevance

● Relevance vs popularity

The problem: How to use the clickthrough data?

9

An overview of literature papers on representation learning for click through logs

11

Click Graph (SIGIR 2016)● Paper: Learning Query and Document Relevance from a Web-scale Click Graph

● Motivation

○ Improving coverage over purely click-based approaches

○ Efficient and scalable approach that can be easily applied to large scale click logs

■ Matrix factorization-based methods are not efficient for large graphs

■ Paper reports experiments with 25 billion query-document pairs

● Approach

○ Learn vector representation based on both content and click information

Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly, Jr., Dawei Yin, Yi Chang, and Chengxiang Zhai. 2016. Learning Query and Document Relevance from a Web-scale Click Graph. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 185-194.

12

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

QueriesProducts

iPhone XS

Huawei Mate 20 Pro

13

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Bipartite graph

14

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Feature vector ['Galaxy', 'S9', 'Samsung', 'Smartphone', 'Plus', 'J6', 'iPhone', 'XS', 'Huawei', 'Mate', '20', 'Pro']

|V| = 12

15

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Feature vector ['Galaxy', 'S9', 'Samsung', 'Smartphone', 'Plus', 'J6', 'iPhone', 'XS', 'Huawei', 'Mate', '20', 'Pro']

|V| = 12

16

Click Graph (SIGIR 2016)

0.17 0.380 0 00 0 00 0 00Samsung Galaxy

Tf-IDF Bag-of-words vector

17

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Transfer from one side to another

18

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Transfer from one side to another

3,000

1,700

520

19

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Transfer from one side to another

3,000

1,700

520

20

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Transfer from one side to another

21

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Transfer from one side to another

22

Click Graph (SIGIR 2016)

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Transfer from one side to another

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

23

Click Graph (SIGIR 2016)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

QueriesProducts

Transfer back

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

24

Click Graph (SIGIR 2016)

Galaxy S9

QueriesProducts

Transfer back

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Samsung Galaxy

Smartphone

Samsung

25

Click Graph (SIGIR 2016)

QueriesProducts

Transfer back

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

26

Click Graph (SIGIR 2016)

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

QueriesProducts

Transfer from one side to another

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

27

Click Graph (SIGIR 2016)

QueriesProducts

Transfer back

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

6,200

Galaxy iPhone

28

Click Graph (SIGIR 2016)

QueriesProducts

Transfer back

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

29

Click Graph (SIGIR 2016)

QueriesProducts

And now?

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

30

Click Graph (SIGIR 2016)

QueriesProducts

And now?

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Repeat both transfers again up to convergence!

31

How to include new queries?

Queries

Feature vector ['Galaxy', 'S9', 'Samsung', 'Smartphone', 'Plus', 'J6', 'iPhone', 'XS', 'Huawei', 'Mate', '20', 'Pro']

|V| = 12

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

Galaxy S10 ?

Split the new query into word units and calculate a linear combination of known queries with these words

32

How to retrieve products given a query?

● After convergence, both queries and products vectors are in the same vector space.

● For retrieving products, just compute the cosine similarity between vectors.

33

Click Graph (SIGIR 2016)

Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly, Jr., Dawei Yin, Yi Chang, and Chengxiang Zhai. 2016. Learning Query and Document Relevance from a Web-scale Click Graph. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 185-194.

34

Click Graph (SIGIR 2016)Pros

● Maximization for CTR (Click Through Rate)

● Scalable approach

● Improve coverage

Cons

● Sparse representation (based on bag-of-words)

● Too much focus on CTR

A new era: Neural Approaches

36

Why Neural Graphs?

Source: https://github.com/chihming/awesome-network-embedding

37

Representation Learning for Graphs

Source: Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).

38

Why Representation Learning for Graphs is hard?● Text and Images have a fixed grid structure

● Graphs are non-euclidian!

“The Brown fox jump”

Source: Hamilton, William L., Rex Ying, and Jure Leskovec. “TutorialRepresentation Learning on Networks”. The Web Conference, 2018

39

● Solution: Linearization of graph network by extracting a corpus of “sentences” (walks)

○ Random Walks!

● DeepWalk: apply Skip-Gram over the “sentences”

to extract node embeddings

DeepWalk (KDD 2014)

Source: "Deep Learning". Udacity https://www.udacity.com/course/ud730

40

● Graph Convolutional Networks

● Autoencoders

Other approaches

Source: Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).

Source: Thomas Kipf. Graph Convolutional Networks. https://tkipf.github.io/graph-convolutional-networks/

41

● 60+ different methods

○ 25+ only in 2018!

○ Trending topic

● More Info:

○ WWW-18 Tutorial Representation Learning on Networks

○ Awesome Network Embedding github repo

Other approaches

42

● Paper: BiNE: Bipartite Network Embedding

● Motivation

○ No previous work focused on bipartite graphs (recommendations, search logs, etc)

○ Model both explicit (observed links) and implicit information (unobserved but transitive links)

● Approach

○ Learn vector representation based on both content and click information

Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. BiNE: Bipartite Network Embedding. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). ACM, New York, NY, USA, 715-724.

BiNE (SIGIR 2018)

43

BiNE (SIGIR 2018)Explicit relations

● A good network embedding should be capable of reconstructing the original network!

Embedding space

44

BiNE (SIGIR 2018)Implicit relations

● Too many possible latent combinations

● Let's use random walks

45

BiNE (SIGIR 2018)

Galaxy S9

Samsung Galaxy

Smartphone

Samsung

QueriesProducts

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

46

BiNE (SIGIR 2018)

1Galaxy S9

Samsung Galaxy

Smartphone

Samsung

QueriesProducts

Random Walk

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

3,000

1,700

520

?

47

BiNE (SIGIR 2018)

1Galaxy S9

Samsung Galaxy

Smartphone

Samsung

2

QueriesProducts

Random Walk

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

3,000

1,700

520

?

48

BiNE (SIGIR 2018)

1Galaxy S9

Samsung Galaxy

Smartphone

Samsung

2

QueriesProducts

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Random Walk

?

49

BiNE (SIGIR 2018)

1Galaxy S9

Samsung Galaxy

Smartphone

3Samsung

2

QueriesProducts

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Random Walk

?

50

BiNE (SIGIR 2018)

1Galaxy S9

Samsung Galaxy

Smartphone

3Samsung

2

QueriesProducts

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Random Walk

51

BiNE (SIGIR 2018)

1Galaxy S9

Samsung Galaxy

Smartphone

3Samsung

2

4

QueriesProducts

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Random Walk

52

BiNE (SIGIR 2018)

1Galaxy S9

Samsung Galaxy

Smartphone

3Samsung

2

4

QueriesProducts

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Random Walk

53

BiNE (SIGIR 2018)

1Galaxy S9

Samsung Galaxy

5Smartphone

3Samsung

2

4

QueriesProducts

Samsung Galaxy S9

Samsung Galaxy S9 Plus

Samsung Galaxy J6

iPhone XS

Huawei Mate 20 Pro

Random Walk

Walk for Queries = ['Galaxy S9', 'Samsung', 'Smartphone']

54

BiNE (SIGIR 2018)

● DeepWalk

○ Collect a corpus of 'sentences' using deep walk

○ Compute Skip-Gram

Source: Vector Representations of Words. TensorFlow. https://www.tensorflow.org/tutorials/representation/word2vec

55

BiNE (SIGIR 2018)Implicit relations

● Too many possible latent combinations

● Let's use random walks

Objective function for implicit relations in U Objective function for implicit relations in V

56

BiNE (SIGIR 2018)Global objective function

● Allow to model both explicit and implicit relations

● α, β and γ allow a linear regularization between the explicit and implicit components.

57

BiNE (SIGIR 2018)

Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. BiNE: Bipartite Network Embedding. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). ACM, New York, NY, USA, 715-724.

58

Next steps

● Include side information about product and query (taxonomy, knowledge graph)

● Improve representations for the “cold start” problem

● Deal with the position bias introduced by the results

● Use as a feature for learning-to-rank approaches

● Neural Information Retrieval

New advances in Graph Representations for Ecommerce Search

Pedro Balage

Data Science Portugal Meetup - DSPT #47

Lisbon, 08th January

top related