new advances in graph representations for ecommerce search - …€¦ · data science portugal...
Post on 11-Aug-2020
6 Views
Preview:
TRANSCRIPT
New advances in Graph Representations for Ecommerce Search
Pedro Balage
Data Science Portugal Meetup - DSPT #47
Lisbon, 08th January
About me
2
● Lead Data Scientist @ Farfetch - Search team
● Research and Industry experience in NLP
● Affiliated to Instituto de Telecomunicações
● Member of organization for the Lisbon Machine Learning School (LxMLS)
context3
● Online retail platform for fashion
● +100k search queries every day
● Search experience is key in ecommerce
business
The problem: How to make a good search engine for ecommerce?
4
Steps to develop a good search engine
5
Information Retrieval● Indexing● Retrieval● Learning-to-rank
Natural Language Processing● Language Analysers● Synonyms/Acronyms● Query understanding● Auto-suggestion● Did you mean?
Relevance Search● Measure● Tweak relevance minimum● Tweak boost field● Manage exceptions
Next steps...
6
● Why XYZ is not matching with XZY?
● What happens if I add this synonym?
● Let's add just one more exception.
Data Science
7
Click-through query logs
● Click-through logs provide query-document relevance!
● Incorporating user feedback is one of the most effective ways to improve a search engine.
● Lot of papers on the topic: SIGIR, KDD, RecSys, etc
Pros and Cons of using clickthrough data
8
Pros
● Relies on user's feedback
● Link interactions among products
Cons
● Noisy and Sparse
● Only provides a true positive set for relevance
● Relevance vs popularity
The problem: How to use the clickthrough data?
9
An overview of literature papers on representation learning for click through logs
11
Click Graph (SIGIR 2016)● Paper: Learning Query and Document Relevance from a Web-scale Click Graph
● Motivation
○ Improving coverage over purely click-based approaches
○ Efficient and scalable approach that can be easily applied to large scale click logs
■ Matrix factorization-based methods are not efficient for large graphs
■ Paper reports experiments with 25 billion query-document pairs
● Approach
○ Learn vector representation based on both content and click information
Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly, Jr., Dawei Yin, Yi Chang, and Chengxiang Zhai. 2016. Learning Query and Document Relevance from a Web-scale Click Graph. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 185-194.
12
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
QueriesProducts
iPhone XS
Huawei Mate 20 Pro
13
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Bipartite graph
14
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Feature vector ['Galaxy', 'S9', 'Samsung', 'Smartphone', 'Plus', 'J6', 'iPhone', 'XS', 'Huawei', 'Mate', '20', 'Pro']
|V| = 12
15
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Feature vector ['Galaxy', 'S9', 'Samsung', 'Smartphone', 'Plus', 'J6', 'iPhone', 'XS', 'Huawei', 'Mate', '20', 'Pro']
|V| = 12
16
Click Graph (SIGIR 2016)
0.17 0.380 0 00 0 00 0 00Samsung Galaxy
Tf-IDF Bag-of-words vector
17
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Transfer from one side to another
18
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Transfer from one side to another
3,000
1,700
520
19
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Transfer from one side to another
3,000
1,700
520
20
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Transfer from one side to another
21
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Transfer from one side to another
22
Click Graph (SIGIR 2016)
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Transfer from one side to another
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
23
Click Graph (SIGIR 2016)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
QueriesProducts
Transfer back
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
24
Click Graph (SIGIR 2016)
Galaxy S9
QueriesProducts
Transfer back
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Samsung Galaxy
Smartphone
Samsung
25
Click Graph (SIGIR 2016)
QueriesProducts
Transfer back
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
26
Click Graph (SIGIR 2016)
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
QueriesProducts
Transfer from one side to another
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
27
Click Graph (SIGIR 2016)
QueriesProducts
Transfer back
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
6,200
Galaxy iPhone
28
Click Graph (SIGIR 2016)
QueriesProducts
Transfer back
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
29
Click Graph (SIGIR 2016)
QueriesProducts
And now?
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
30
Click Graph (SIGIR 2016)
QueriesProducts
And now?
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Repeat both transfers again up to convergence!
31
How to include new queries?
Queries
Feature vector ['Galaxy', 'S9', 'Samsung', 'Smartphone', 'Plus', 'J6', 'iPhone', 'XS', 'Huawei', 'Mate', '20', 'Pro']
|V| = 12
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
Galaxy S10 ?
Split the new query into word units and calculate a linear combination of known queries with these words
32
How to retrieve products given a query?
● After convergence, both queries and products vectors are in the same vector space.
● For retrieving products, just compute the cosine similarity between vectors.
33
Click Graph (SIGIR 2016)
Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly, Jr., Dawei Yin, Yi Chang, and Chengxiang Zhai. 2016. Learning Query and Document Relevance from a Web-scale Click Graph. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 185-194.
34
Click Graph (SIGIR 2016)Pros
● Maximization for CTR (Click Through Rate)
● Scalable approach
● Improve coverage
Cons
● Sparse representation (based on bag-of-words)
● Too much focus on CTR
A new era: Neural Approaches
36
Why Neural Graphs?
Source: https://github.com/chihming/awesome-network-embedding
37
Representation Learning for Graphs
Source: Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).
38
Why Representation Learning for Graphs is hard?● Text and Images have a fixed grid structure
● Graphs are non-euclidian!
“The Brown fox jump”
Source: Hamilton, William L., Rex Ying, and Jure Leskovec. “TutorialRepresentation Learning on Networks”. The Web Conference, 2018
39
● Solution: Linearization of graph network by extracting a corpus of “sentences” (walks)
○ Random Walks!
● DeepWalk: apply Skip-Gram over the “sentences”
to extract node embeddings
DeepWalk (KDD 2014)
Source: "Deep Learning". Udacity https://www.udacity.com/course/ud730
40
● Graph Convolutional Networks
● Autoencoders
Other approaches
Source: Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).
Source: Thomas Kipf. Graph Convolutional Networks. https://tkipf.github.io/graph-convolutional-networks/
41
● 60+ different methods
○ 25+ only in 2018!
○ Trending topic
● More Info:
○ WWW-18 Tutorial Representation Learning on Networks
○ Awesome Network Embedding github repo
Other approaches
42
● Paper: BiNE: Bipartite Network Embedding
● Motivation
○ No previous work focused on bipartite graphs (recommendations, search logs, etc)
○ Model both explicit (observed links) and implicit information (unobserved but transitive links)
● Approach
○ Learn vector representation based on both content and click information
Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. BiNE: Bipartite Network Embedding. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). ACM, New York, NY, USA, 715-724.
BiNE (SIGIR 2018)
43
BiNE (SIGIR 2018)Explicit relations
● A good network embedding should be capable of reconstructing the original network!
Embedding space
44
BiNE (SIGIR 2018)Implicit relations
● Too many possible latent combinations
● Let's use random walks
45
BiNE (SIGIR 2018)
Galaxy S9
Samsung Galaxy
Smartphone
Samsung
QueriesProducts
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
46
BiNE (SIGIR 2018)
1Galaxy S9
Samsung Galaxy
Smartphone
Samsung
QueriesProducts
Random Walk
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
3,000
1,700
520
?
47
BiNE (SIGIR 2018)
1Galaxy S9
Samsung Galaxy
Smartphone
Samsung
2
QueriesProducts
Random Walk
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
3,000
1,700
520
?
48
BiNE (SIGIR 2018)
1Galaxy S9
Samsung Galaxy
Smartphone
Samsung
2
QueriesProducts
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Random Walk
?
49
BiNE (SIGIR 2018)
1Galaxy S9
Samsung Galaxy
Smartphone
3Samsung
2
QueriesProducts
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Random Walk
?
50
BiNE (SIGIR 2018)
1Galaxy S9
Samsung Galaxy
Smartphone
3Samsung
2
QueriesProducts
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Random Walk
51
BiNE (SIGIR 2018)
1Galaxy S9
Samsung Galaxy
Smartphone
3Samsung
2
4
QueriesProducts
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Random Walk
52
BiNE (SIGIR 2018)
1Galaxy S9
Samsung Galaxy
Smartphone
3Samsung
2
4
QueriesProducts
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Random Walk
53
BiNE (SIGIR 2018)
1Galaxy S9
Samsung Galaxy
5Smartphone
3Samsung
2
4
QueriesProducts
Samsung Galaxy S9
Samsung Galaxy S9 Plus
Samsung Galaxy J6
iPhone XS
Huawei Mate 20 Pro
Random Walk
Walk for Queries = ['Galaxy S9', 'Samsung', 'Smartphone']
54
BiNE (SIGIR 2018)
● DeepWalk
○ Collect a corpus of 'sentences' using deep walk
○ Compute Skip-Gram
Source: Vector Representations of Words. TensorFlow. https://www.tensorflow.org/tutorials/representation/word2vec
55
BiNE (SIGIR 2018)Implicit relations
● Too many possible latent combinations
● Let's use random walks
Objective function for implicit relations in U Objective function for implicit relations in V
56
BiNE (SIGIR 2018)Global objective function
● Allow to model both explicit and implicit relations
● α, β and γ allow a linear regularization between the explicit and implicit components.
57
BiNE (SIGIR 2018)
Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. BiNE: Bipartite Network Embedding. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). ACM, New York, NY, USA, 715-724.
58
Next steps
● Include side information about product and query (taxonomy, knowledge graph)
● Improve representations for the “cold start” problem
● Deal with the position bias introduced by the results
● Use as a feature for learning-to-rank approaches
● Neural Information Retrieval
New advances in Graph Representations for Ecommerce Search
Pedro Balage
Data Science Portugal Meetup - DSPT #47
Lisbon, 08th January
top related