sentiment analysis of peer review texts for scholarly papers · sentiment analysis of peer review...

Sentiment Analysis of Peer Review Textsfor Scholarly Papers

Ke Wang & Xiaojun Wan{wangke17,wanxiaojun}@pku.edu.cn

July 9, 2018

Institute of Computer Science and Technology, Peking UniversityBeijing , China

Outline

1. Introduction

2. Related Work

3. Framework

4. Experiments

5. Conclusion and Future Work

1/29

Outline

1. Introduction

2. Related Work

3. Framework

4. Experiments


2/29

Introduction

• The boom of scholarly papers• Motivations

• Help review submission system todetect the consistency of reviewtexts and scores.

• Help the chair to write acomprehensive meta-review.

• Help authors to further improve theirpaper.

Figure 1: An example of peer reviewtext and the analysis results.

3/29

Introduction

• Challenges• Long length.• Mixture of non-opinionated and opinionated texts.• Mixture of pros and cons.

• Contributions• We built two evaluation datasets. (ICLR-2017 and ICLR-2018)• We propose a multiple instance learning network with a novel

abstract-based memory mechanism (MILAM)• Evaluation results demonstrate the efficacy of our proposed model

and show the great helpfulness of using abstract as memory.

4/29

Outline

1. Introduction

2. Related Work

3. Framework

4. Experiments


5/29

Related Work

• Sentiment ClassificationSentiment analysis has been widely explored in many textdomains, but few studies trying to perform it in the domain ofpeer reviews for scholarly papers.

• Multiple Instance LearningMIL can extract instance labels(sentence-level polarities)from bags (reviews in our case), but none of previous workwas applied to this challenging task.

• Memory NetworkMemory network utilizes external information for greatercapacity and efficiency.

• Study on Peer ReviewsThese tasks are related but different from the sentimentanalysis task addressed in this study. 6/29

Outline

1. Introduction

2. Related Work

3. Framework

4. Experiments


7/29

Framework

• Architecture1 Input

Representation2 Sentence

Classification3 Review

Classification

...

1I 2I nI...

...

...

...

1M 2M mM...

...

...

MLP MLP MLP

1V 2V nV

...

...

...

1V 2V nV

2h nh1h...

...

document attention

(2)E( )nE

(2)R ( )nR

(1)E

Input

Representation

Layer

Sentence

Classification

Layer

nP1P2P

reviewP

abstractT

1

aS 2

aS a

mS

reviewT

1

rS2

rSr

nS

matched

attention

response

content

sentence

embedding

convolution

...

max pooling

1a 2a nasoftmax

Review

Classification

Layer

Abstract-based Memory Mechanism

Sum

( )iR(1)R

( )

1

ie ( )

2

ie ( )i

me( )iE

Figure 2: The architecture of MILAM

8/29

Framework

1 Input Representation Layer:I A sentence S of length L (padded where necessary) is represented

as:S = w1 ⊕ w2 ⊕ · · · ⊕ wL, S ∈ RL×d, (1)

II The convolutional layer:

fk = tanh(Wc ·Wk−l+1:k + bc), (2)

f (q) = [f (q)1 , f (q)

2 , · · · , f (q)L−l+1], (3)

III A max-pooling layer:uq = max{f (q)}. (4)

Finally, the representations of the review text {Sri}n

i=1 and theabstract text {Sa

j }mj=1 are denoted as [Ii]

ni=1, [Mj]

mi=1

respectively. where Ii,Mj ∈ Rz.

9/29

Framework

2 Sentence Classification Layer:I Obtain a matched attention vector E(i) = [e(i)

t ]mt=1 which indicatesthe weight of memories.

II Calculate the response content R(i) ∈ Rz using this matchedattention vector.

III Use a MLP to obtain the final representation vector of eachsentence in the review text.

Vi = fmlp(Ii||R(i); θmlp), (5)

IV Use the softmax classifier to get sentence-level distribution oversentiment labels.

Pi = softmax(Wp · Vi + bp), (6)

Finally, we obtained new high-level representations ofsentences in the review text by leveraging relevant abstractinformation.

10/29

Framework

3 Review Classification Layer:I use separate LSTM modules to produce forward and back- ward

hidden vectors:

−→hi =

−−−→LSTM(Vi),

←−hi =

←−−−LSTM(Vi), hi =

−→hi ||←−hi (7)

II The importance (ai) of each sentence is measured as follows:

h′i = tanh(Wa · hi + ba), ai =

exp(h′i )∑

j exp(h′j )

(8)

III Finally, we obtain a document-level distribution over sentimentlabels as the weighted sum of sentence-level distributions:

P(c)review =

∑i

aiP(c)i , c ∈ [1,C] (9)

11/29

Framework

• Abstract-based Memory Mechanism1 Get the matched attention vector E(i) of memories:

e′t = LSTM(ht−1,Mt), (h0 = Ii, t = 1, ...,m) (10)

e(i)t =

exp(e′t )∑

j exp(e′j )

(11)

E(i) = [e(i)t ]mt=1 (12)

2 Calculate the response content R(i):

R(i) =m∑

t=1

e(i)t Mt (13)

3 Use R(i) and Ii to compute the new sentence representationvector Vi:

Vi = fmlp(Ii||R(i); θmlp), (14)

12/29

Framework

• Objective Function

• Our model only needs the review’s sentiment label while eachsentence’s sentiment label is unobserved.

• The categorical cross-entropy loss:

L(θ) =∑

Treview

C∑c=1

−P(c)review log(P(c)

review) (15)

13/29

Outline

1. Introduction

2. Related Work

3. Framework

4. Experiments


14/29

Experiments

• Evaluation Datasets• Statistics for ICLR-2017 and ICLR-2018 datasets.

Data Set #Papers #Reviews #Sentences #WordsICLR-2017 490 1517 24497 9868ICLR-2018 954 2875 58329 13503

• The score distributions:

15/29

Experiments

• Comparison of review sentiment classification accuracy onthe 2-class task {accept(score ∈ [1, 5]), reject(score ∈ [6,10])}

16/29

Experiments

• Comparison of review sentiment classification accuracy onthe 3-class task {accept(score ∈ [1, 4]), borderline(score ∈[5, 6]), reject(score ∈ [7, 10])}

17/29

Experiments

• Sentence-Level Classification Results.We randomly selected 20 reviews, a total of 213 sentences, andmanually labeled the sentiment polarity of each sentence.

Figure 3: Example opinionated sentences with predicted polarityscores extracted from a review text.

18/29

Experiments

• Influence of Abstract Text.

Figure 4: Example sentences in a review text and its most relevant sentencein the paper abstract text. The sentence with the largest weight in thematched attention vector E(i) is considered most relevant. The red textsindicate similarities in the review text and the abstract text.

19/29

Experiments

• Influence of Abstract Text.• A simple method of using abstract texts as a contrast experiment

Remove the sentences that are similar to the paper abstract’ssentences from the review text and use the remaining text forclassification.(The threshold is set to 0.7)

Figure 5: The comparison of using and not using the paper abstract viaa simple method.

20/29

Experiments

• Influence of Borderline Reviews.

Figure 6: Experimental results on different datasets with, without and onlyborderline reviews.

21/29

Experiments

• Cross-Year Experiments.

Figure 7: Results of cross-year experiments. Model@ICLR− ∗ meansthe model is trained on ICLR− ∗ dataset.

22/29

Experiments

• Cross-Domain Experiments.We further collected 87 peer reviews for submissions in the NLPconferences (CoNLL, ACL, EMNLP, etc.), including 57 positive reviews(accept) and 30 negative reviews (reject).

Figure 8: Results of cross-domain experiments.∗ means the performanceimprovement over the first three methods is statistically significant withp-value < 0.05 for sign-test. Model@ICLR− ∗ means the model is trained onICLR− ∗ dataset. 23/29

Experiments

• Final Decision Prediction for Scholarly Papers.• Methods to predict the final decision of a paper based on several

review scores.

• Voting:

Decision =

{Accept if #accept > #reject

Reject Otherwise(16)

• Simple Average:Simply average the scores of all reviews. If the average score is largerthan or equal to 0.6, then the paper is predicted as final accept, andotherwise final reject.

• Confidence-based Average:

overall_score =1

|S|

|S|∑i=1

Si ∗1

(6 − ReviewerConfidencei)(17)

24/29

Experiments

• Final Decision Prediction for Scholarly Papers.• Results of final decision prediction for scholarly papers.

Figure 9: Results of final decision prediction for scholarly papers.

25/29

Outline

1. Introduction

2. Related Work

3. Framework

4. Experiments


26/29

Conclusion and Future Work

• Contributions• We built two evaluation datasets. (ICLR-2017 and ICLR-2018)• We propose a multiple instance learning network with a novel

abstract-based memory mechanism (MILAM)• Evaluation results demonstrate the efficacy of our proposed model

and show the great helpfulness of using abstract as memory.

• Future Work

• Collect more peer reviews.• Try more sophisticated deep learning techniques.• Several other sentiment analysis tasks:

Prediction of the fine-granularity scores of reviews, Automaticwriting of meta-reviews, Prediction of the best papers...

27/29

Acknowledgments

• National Natural Science Foundation of China.

• Anonymous reviewers for their helpful comments.

• SIGIR Student Travel Grant.

28/29

sentiment analysis of peer review texts for scholarly papers · sentiment analysis of peer review...

Documents