polar: attention-based cnn for one-shot personalized ... · 1vinyals et al., matching networks for...

33
Introduction & Problem Definition Approach Experiments & Analysis POLAR: Attention-based CNN for One-shot Personalized Article Recommendation Zhengxiao Du, Jie Tang, Yuhui Ding Tsinghua University {duzx16, dingyh15}@mails.tsinghua.edu.cn, [email protected] September 13, 2018 1 / 20

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

POLAR: Attention-based CNN for One-shotPersonalized Article Recommendation

Zhengxiao Du, Jie Tang, Yuhui Ding

Tsinghua University

{duzx16, dingyh15}@mails.tsinghua.edu.cn,[email protected]

September 13, 2018

1 / 20

Page 2: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Motivation

The publication output is growing every year(data source: DBLP)

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 20160k

50k

100k

150k

200k

250k

300kBook and ThesesConference and Workshop PapersEditorshipInformal PublicationsJournal ArticlesParts in Books or CollectionsReference Works

2 / 20

Page 3: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Related-Article Recommendation

Figure: An example from AMiner.org

3 / 20

Page 4: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Challenge

How to provide personalized and non-personalizedrecommendation?

How to overcome the sparsity of user feedback?

How to utilize representative texts of articles effectively?

4 / 20

Page 5: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Problem Definition

Definition

One-shot Personalized Article Recommendation Problem

Input: query article dqcandidate set D = {d1, d2, · · · , dN}support set S = {(di , yi )}Ti=1 related to user u

Output: a totally ordered set R(dq,S) ⊂ D with |R| = k

5 / 20

Page 6: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Problem Definition

Definition

One-shot Personalized Article Recommendation Problem

Input: query article dqcandidate set D = {d1, d2, · · · , dN}support set S = {(di , yi )}Ti=1 related to user u

Output: a totally ordered set R(dq,S) ⊂ D with |R| = k

5 / 20

Page 7: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Problem Definition

Definition

One-shot Personalized Article Recommendation Problem

Input: query article dqcandidate set D = {d1, d2, · · · , dN}support set S = {(di , yi )}Ti=1 related to user u

Output: a totally ordered set R(dq, S) ⊂ D with |R| = k

5 / 20

Page 8: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

One-shot Learning

Image Classification1

y =k∑

i=1

a(x , xi )yi

Article Recommendation

Query article dq

Support set{(di , yi )}Ti=1

si = c(dq, di ) + 1T

∑Tj=1 c(di , dj)yj

the matching to the queryarticle

the matching to the userpreference(maybe missing)

1Vinyals et al., Matching Networks for One Shot Learning.6 / 20

Page 9: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

One-shot Learning

Image Classification1

y =k∑

i=1

a(x , xi )yi

Article Recommendation

Query article dq

Support set{(di , yi )}Ti=1

si = c(dq, di ) +

1T

∑Tj=1 c(di , dj)yj

the matching to the queryarticle

the matching to the userpreference(maybe missing)

1Vinyals et al., Matching Networks for One Shot Learning.6 / 20

Page 10: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

One-shot Learning

Image Classification1

y =k∑

i=1

a(x , xi )yi

Article Recommendation

Query article dq

Support set{(di , yi )}Ti=1

si =

c(dq, di ) +

1T

∑Tj=1 c(di , dj)yj

the matching to the queryarticle

the matching to the userpreference(maybe missing)

1Vinyals et al., Matching Networks for One Shot Learning.6 / 20

Page 11: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

One-shot Learning

Image Classification1

y =k∑

i=1

a(x , xi )yi

Article Recommendation

Query article dq

Support set{(di , yi )}Ti=1

si = c(dq, di ) + 1T

∑Tj=1 c(di , dj)yj

the matching to the queryarticle

the matching to the userpreference(maybe missing)

1Vinyals et al., Matching Networks for One Shot Learning.6 / 20

Page 12: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Architecture

EmbeddingLayer

Candidate

kwdld

kwd2

kwd1

· · ·

Query

kwqlq

kwq2

kwq1

· · ·

PersonalizedScore

7 / 20

Page 13: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Architecture

EmbeddingLayer

Candidate

kwdld

kwd2

kwd1

· · ·

Query

kwqlq

kwq2

kwq1

· · ·

MatchingMatrix

AttentionMatrix

ConvInput

PersonalizedScore

7 / 20

Page 14: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Architecture

EmbeddingLayer

Candidate

kwdld

kwd2

kwd1

· · ·

Query

kwqlq

kwq2

kwq1

· · ·

MatchingMatrix

AttentionMatrix

FeatureMap

ConvInput

Convolution andMax-Pooling

Hidden State

Full-ConnectedLayer

MatchingScore

PersonalizedScore

7 / 20

Page 15: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Architecture

EmbeddingLayer

Candidate

kwdld

kwd2

kwd1

· · ·

Query

kwqlq

kwq2

kwq1

· · ·

MatchingMatrix

AttentionMatrix

FeatureMap

ConvInput

Convolution andMax-Pooling

Hidden State

Full-ConnectedLayer

Support Set

d1

y1

d2

y2

dT

yT

MatchingScore

PersonalizedScore

One ShotMatching

7 / 20

Page 16: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Architecture

EmbeddingLayer

Candidate

kwdld

kwd2

kwd1

· · ·

Query

kwqlq

kwq2

kwq1

· · ·

MatchingMatrix

AttentionMatrix

FeatureMap

ConvInput

Convolution andMax-Pooling

Hidden State

Full-ConnectedLayer

Support Set

d1

y1

d2

y2

dT

yT

MatchingScore

PersonalizedScore

FinalScore

One ShotMatching

7 / 20

Page 17: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Matching Matrix and Attention Matrix

Matching Matrix:(dm, dn)→ Rlm×ln

the similarity between the words of two articles.

M(m,n)i ,j =

~wTmi · ~wnj

‖~wmi‖ · ‖~wnj‖

Attention Matrix:(dm, dn)→ Rlm×ln

the importance of the matching signals

A(m,n)i ,j = rmi · rnj

8 / 20

Page 18: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Matching Matrix and Attention Matrix

Matching Matrix:(dm, dn)→ Rlm×ln

the similarity between the words of two articles.

M(m,n)i ,j =

~wTmi · ~wnj

‖~wmi‖ · ‖~wnj‖

Attention Matrix:(dm, dn)→ Rlm×ln

the importance of the matching signals

A(m,n)i ,j = rmi · rnj

8 / 20

Page 19: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Local Weight and Global Weight

The word weight rt is the product of its local weight and globalweight.

Global Weight: The importance of a word in thecorpus(shared among different articles)

υij = [IDF(tij)]β

The local weight is a little more complicated. . .

9 / 20

Page 20: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Local Weight and Global Weight

The word weight rt is the product of its local weight and globalweight.

Global Weight: The importance of a word in thecorpus(shared among different articles)

υij = [IDF(tij)]β

The local weight is a little more complicated. . .

9 / 20

Page 21: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Local Weight and Global Weight

The word weight rt is the product of its local weight and globalweight.

Global Weight: The importance of a word in thecorpus(shared among different articles)

υij = [IDF(tij)]β

The local weight is a little more complicated. . .

9 / 20

Page 22: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Local Weight

Local Weight: The importance of a word in the article

A neural network is employed to compute the local weight.

The feature vector forword tij

~xij = ~wij − ~w i

1

1

2

2

3

3

4

4

5

5

6

6

7

7

0

The triangular points denote thevectors of the words in two texts

The circular points denote themean vectors of the texts.

The lines with arrowsdenote the feature vectors

10 / 20

Page 23: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Local Weight Network

The feature vector ~xij represents the semantic differencebetween the article and the term.

Let ~u(L)ij be the output of the last linear layer, the output of

the local weight network is

µij = σ(W (L) · ~u (L)ij + b(L)) + α

α sets a lower bound for local weights.

11 / 20

Page 24: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

CNN & Training

The matching matrix and attention matrix are combined byelement-wise multiplication and sent to a CNN.

MatchingMatrix

AttentionMatrix

FeatureMap

ConvInput

Convolution andMax-Pooling

Hidden State

Full-ConnectedLayer

MatchingScore

The entire model, including the local weight network, istrained on the target task.

12 / 20

Page 25: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Dataset

AMiner: papers from ArnetMiner1

Patent: patent documents from USPTO

RARD (Related Article Recommendation Dataset2):fromSowiport, a digital library service provider.

1Tang et al. ArnetMiner: Extraction and Mining of Academic SocialNetworks. In SIGKDD’2008.

2Beel et al. Rard: The related-article recommendation dataset (2017)13 / 20

Page 26: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Experimental Results

Table: Results of recommendation without personalization(%).

AMiner Patent RARDMethod NG@3 NG@5 NG@10 NG@3 NG@5 NG@10 NG@1 NG@3 NG@5

TF-IDF 74.3 81.8 87.5 51.8 56.4 63.4 37.6 39.8 46.3Doc2Vec 60.0 65.8 79.1 44.6 45.6 53.5 28.4 34.0 40.0

WMD 73.0 76.3 86.2 57.4 58.5 61.9 23.4 38.2 46.8

MV-LSTM 56.2 61.2 76.2 60.2 59.0 65.0 22.2 30.7 39.3Duet 66.6 74.4 82.6 54.5 57.5 64.6 22.3 31.1 39.8

DRMM 75.0 79.9 87.1 55.0 56.2 64.7 33.1 36.3 40.6MatchPyramid 73.5 80.0 86.8 56.4 61.4 64.4 29.1 36.2 42.8

POLAR 80.3 85.2 90.1 67.8 69.5 73.6 42.8 46.3 51.5

1For the fairness of comparison, all models don’t involve personalization.2NG stands for NDCG. 14 / 20

Page 27: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

How One-shot Personalization Can Help

Randomly divide the labeled articles into the support set andthe candidate set to recommend.

POLAR-OS is the proposed one-shot framework andPOLAR-ALL the best model that ignores support sets in theprevious part.

Table: Performance for the model with and without personalization.

AMiner Patent RARDMethod NDCG@1 NDCG@3 NDCG@1 NDCG@3 NDCG@1 NDCG@3

POLAR-OS 79.1 81.9 57.1 69.7 39.4 39.2POLAR-ALL 76.1 79.2 52.3 66.2 36.5 36.5

15 / 20

Page 28: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

How Local and Global Weights Can Help

When computing the attention matrix, POLAR-LOC only useslocal weights while POLAR-GLO only global weights.

Aminer76

77

78

79

80

81

NDCG

@3

Patent

60

62

64

66

68

RARD43.0

43.5

44.0

44.5

45.0

45.5

46.0

46.5

47.0POLAR-LOCPOLAR-GLOPOLAR-ALL

Figure: The performance of different attention matrices

16 / 20

Page 29: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Case Study: How Local and Global Weights work?

0 2 4 6 8

0

2

4

6

8

matching matrix

0 2 4 6 8

0

2

4

6

8

global weights

0 2 4 6 8

0

2

4

6

8

local weights

0 2 4 6 8

0

2

4

6

8

weighted matching matrix

Figure: The visualization result of four matrices used in the matching of apair of texts. The brighter the pixel is, the larger value it has.T1:novel robust stability criteria (for) stochastic hopfield neural networks(with) time delays.T2:new delay dependent stability criteria (for) neural networks (with)time varying delay

17 / 20

Page 30: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Case Study: How Local and Global Weights work?

Table: The statistical analysis of the local and global weights

Weight Max Min Mean Std

Local 2.00 1.00 1.20 0.15

Global 1.96 1.08 1.86 0.08

17 / 20

Page 31: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Sensitivity Analysis of Hyperparameters

0 0.25 0.5 1 2 4local weight coefficient

86.0

86.5

87.0

87.5

88.0

88.5

89.0

89.5

NDCG

@10

0 1/8 1/4 1/2 1global weight coefficient

86.0

86.5

87.0

87.5

88.0

88.5

89.0

89.5

NDCG

@10

Figure: Performance comparison for POLAR-LOC with different αs andPOLAR-GLO with different βs on the AMiner dataset

18 / 20

Page 32: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Conclusion

We define the problem of one-shot personalized articlerecommendation.

We utilize the framework of one-shot learning to deal with thesparse user feedback and propose an attention-based CNN fortext similarity.

We conduct experiments, whose results prove the effectivenessof the proposed model.

19 / 20

Page 33: POLAR: Attention-based CNN for One-shot Personalized ... · 1Vinyals et al., Matching Networks for One Shot Learning. 6/20. Introduction & Problem De nitionApproachExperiments & Analysis

Introduction & Problem Definition Approach Experiments & Analysis

Any Questions?

20 / 20