[introduction] neural network-based abstract generation for opinions and arguments

22
Neural Network-Based Abstract Generation for Opinions and Arguments Lu Wang and Wang Ling NAACL 2016 論文紹介 Presentation:Tomonori Kodaira 1

Upload: kodaira-tomonori

Post on 15-Apr-2017

131 views

Category:

Science


1 download

TRANSCRIPT

Neural Network-Based Abstract Generation for

Opinions and ArgumentsLu Wang and Wang Ling

NAACL 2016

論文紹介

Presentation:Tomonori Kodaira

1

Abstract

• 文書to文のAbstractive要約

• 文書をエンコードする際,Samplingによって計算量の問題を解決その際,text unitの重要度を基準にTop-Kのデータを用いる

2

Introduction• Authors present an attention-based NN model for

generating abstractive summaries of opinionated text.

• Their system takes as input a set of text units, and then outputs a one-sentence abstractive summary.

• Two type of opinionated text: Movie reviews Arguments on controversial topic

3

Introduction

• Systems Attention-based model (Bahdanau et al. 2014) An importance-based sampling method.

• The importance score of a text unit is estimated from a regression model with pairwise preference-based sampling.

4

Data CollectionRotten Tomatoes (www.rottentomatoes.com)

• There are professional critics and user-generated reviews.

• For each movie has a one-sentence critic consensus.

Data

• 246,164 critics and their opinion consensus for 3,731 movies

• train: 2,458, validation: 536, test: 737 movies

Movie Reviews

5

Data Collectionidebate.org

• This site is wikipedia-style website for gathering pro and con arguments on controversial issues.

• Each point contains a one-sentence central claim.

Data

• 676 debates with 2,259 claims.

• train: 450, validation: 67, test: 150 debates

Arguments on controversial topic

6

Text units

Data Collection

7

The Neural Network-Based Abstract Generation Model

• summary y (composed by the sequence of words y1 , …, |y|.

• input consists of an arbitrary number of reviews or arguments -> text units x = {x1, … , xM}

• Each text unit xk is composed by a sequence of words xk1 , …., xk|xk|.

Problem Formulation

8

The Neural Network-Based Abstract Generation Model

• a sequence of word-level predictions: log P(y|x) = ∑j=1 log P(yj| y1, ….yj-1, x) P(yi | y1 , …, yj-1, x) = sofmax(hj)

• hj is RNNs state variable. hj = g(yj-1, hj-1, s)

• g is LSTM network (Hochreiter and Schmidhuber, 1997)

Decoder

9

The Neural Network-Based Abstract Generation Model

• LSTM

• The model concatenates the representation of previous output word yj-1 and the input representation s as uj

Decoder

10

The Neural Network-Based Abstract Generation Model

• The representation of input text units s is computed using an attention model (Bahdanau et al., 2014) -> ∑i=1aibi

• Authors construct bi by building a bidirectional LSTM.They use the LSTM formulation by setting uj = xj.

• ai = softmax(v(bi, hj-1)) v(bi, hj-1) = Ws•tanh(Wcgbi + Whghj-1)

Encoder

11

The Neural Network-Based Abstract Generation Model

Their input consists multiple separate text units. • one sequence z =

There two problem: • The model is sensitive to the order of text units

• z may contain thousands of words.

Attention Over Multiple Inputs

12

The Neural Network-Based Abstract Generation Model

Sub-sampling from the input

• They define an importance score f (xk) ∈ [0, 1] for each document xk.

• K candidates are sampled

Attention Over Multiple Inputs

13

The Neural Network-Based Abstract Generation Model

a ridge regression model and a regularizer.

• Learning f(xk) = rk•w by minimizing ||Rw - L ||22 + λ•||R’w-L’||22 + β•||w||22.

• text unit xk is represented as an d-dimentional feature vector rk ∈ Rd.

Importance Estimation

14

The Neural Network-Based Abstract Generation Model

• For testing phase, they re-rank the n-best summaries according to their cosine similarity with the input text units.

• The one with the highest similarity is included in the final summary.

Post-processing

15

Experimental Setup• Data Preprocessing

Stanford CoreNLP (Manning et al., 2014)

• Pre trained Embeddings and Features word embedding: 300 dimension They extend their model with additional features.

16

• Hyper parameters The LSTMs are defined with states and cells of 150 dimensions. The attention: 100 dimensions. Training is performed via Adagrad (Duchi et al. 2011)

• Evaluation : BLEU

• The importance-based sampling rate K is set of 5

• Decoding: beam serch -> 20

Experimental Setup

17

Results

• MRR (Mean Reciprocal Rank • NDCG (normalized Discounted Cumulative Gain)

Importance Estimation Evaluation

18

ResultsImportance Estimation Evaluation

19

ResultsHuman Evaluation on Summary Quality

20

ResultsSampling Effect

21

Conclusion• Authors presented a neural approach to generate

abstractive summaries for opinionated text.

• They employed an attention-based method that finds salient information from different input text units.

• They deploy an importance-based sampling mechanism for model training.

• Their system obtained sota results.

22