research on extractive summarization at ufpe/ufrpe...

71
Research on Extractive Summarization at UFPE/UFRPE Universities: Developed Approaches and Perspectives Rinaldo Lima Federal Rural University of Pernambuco, Recife/Brazil [email protected] [email protected] Dec/2019

Upload: others

Post on 09-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Research on Extractive Summarization

at UFPE/UFRPE Universities:Developed Approaches and Perspectives

Rinaldo Lima

Federal Rural University of Pernambuco, Recife/Brazil

[email protected]

[email protected]

Dec/2019

Page 2: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Agenda

• Presenting UFPE/UFRPE

• Automatic Text Summarization (ATS)

– Main Approaches

– Evaluation Methods, Metrics, and Datasets

• ATS Work at UFRPE/UFPE

• Current Trends in ATS

• Research Collaboration

Page 3: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Presenting:

UFRPE and UFPE

Page 4: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

4

RECIFE (PE/Brazil)

Olinda Porto de Galinhas Beach

Old Recife

One the Recife’s Brigde (downtown)

Page 5: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

UFRPE/UFPE

UFRPE• Headquartered on Recife Campus with other 5 campuses in Pernambuco

• More than 1.000 teachers, 900 technicians, 17,000 students

• UFRPE provides courses on Agricultural sciences, Veterinary Medicine, Humanities, Biological, Computer Science, Physics, etc.

• The Computing Department conducts research on Computational Intelligence and

Modelling, Software Engineering, Computational Systems, Text Mining, among others

5

Page 6: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

UFPE• The Informatics Center (CIn) at UFPE was established 35 years ago in 1974

• Research at CIn encompasses the following areas: Databases, Computer Architecture, Software Engineering, Computational Intelligence, Distributed Systems, among others

• CIn is ranked among the top quality academic centers in Latin America

• Its faculty members consist of more than 80 PhDs

6

UFRPE/UFPE

Page 7: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Automatic Text Summarization

Page 8: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

8

Automatic Text Summarization (ATS/TS)

Definitions

• A summary is a reductive transformation of a source text into a summary text by extraction or generation [Spärck-Jones and Sakai (2001)]

• A condensed version of a source document having a recognizable genre and a very specific purpose, to give the reader an exact and concise idea of the contents of the source [ Saggion and Lapalme (2002)]

• The ratio between the length of the summary and the length of the source document is calculated by the compression rate:

Page 9: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Motivation to create summaries

• more and more text data

• less time to process or read it

Main goals in ATS:

• optimizing topic coverage (informativeness)

• optimizing readability

• obtaining cohesive and coherent summaries

• avoiding redundancy

9

Automatic Text Summarization (ATS/TS)

Page 10: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

ATS Categorization

10

Page 11: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

11

Automatic Text Summarization (ATS/TS)

Simplified General Summarization Process

Page 12: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

TS Applications• increasing the performance of traditional IR and IE systems (coupled with

Question-Answering system)

• opinion summarization

• news summarization

• blog, tweet, web page, email thread summarization

• report summarization for business men, politicians, researchers, etc.;

• meeting summarization;

• biographical extracts

• automatic extraction and generation of titles and papers abstracts

• domain-specific summarization (domains of medicine, chemistry, law, etc.)

• …

12

Automatic Text Summarization (ATS/TS)

Page 13: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

13

Fields of research which have influenced the development of ATS

Automatic Text Summarization (ATS/TS)

Page 14: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Text Summarization

Main Approaches

Page 15: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

15

• Supervised Content Selection

• Unsupervised Content Selection (topic-based)

• Graph-based (LexRank)

• Lexical Chains

• Deep NLP-based

Mono and Multidocument

Main Approaches to ATS

Page 16: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

16

Supervised Content Selection

16

Page 17: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

17

Supervised Content Selection

• Given: – a labeled training set of good

summaries for each document

• Align:– the sentences in the document

with sentences in the summary

• Extract features– position (first sentence?)

– length of sentence

– word informativeness, cue phrases

• Train

• Problems:• hard to get labeled training

data

• alignment difficult

• performance not better than unsupervised algorithms

• So in practice:• Unsupervised content

selection is more common

• a binary classifier (put sentence in summary? yes or no)

Page 18: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

18

Unsupervised content selection

• Intuition dating back to Luhn (1958):

– Choose sentences that have salient or informative words

• Two approaches to defining salient words

1. tf-idf: weigh each word wi in document j by tf-idf

2. topic signature: choose a smaller set of salient words

• mutual information

• log-likelihood ratio (LLR) Dunning (1993), Lin and Hovy (2000)

18

weight(wi ) = tfij ´ idfi

weight(wi ) =1 if -2 logl(wi ) >10

0 otherwise

ìíï

îï

Page 19: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Graph-based (Ranking Methods)

19

The main idea is that sentence“recommends” another one in the graph

N = number of vertices

d = damping factor

Graph-based Approach (Ranking idea)

• Sentence as vertices

• Similarity as edges

Iterative ranking

• LexRank, TextRank (inspired by

PageRank)

Page 20: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

20

Graph-based ATS

Components in a Graph-based ATS system

Page 21: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Graph-based (LexRank Demo)

21

• https://yale-lily.github.io/demos/demos/lexrank/lexrankmead.html

• https://github.com/rrajasek95/lexrank-demo (Python code)

Page 22: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Lexical Chains

22

Page 23: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

23

Lexical Chains (Example)

Page 24: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

24

Lexical Chains

A sentence is extracted from strong chains

Page 25: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

25

Rhetorical Structure Theory (RST) makes the assumption that a text is

divided into a hierarchical structure of Elementary Discourse Units (EDUs)

The EDUs are either satellites or nuclei

• A satellite needs a nucleus to be intelligible, but the opposite is not true

An algorithm is applied which weights and orders each EDU for the tree

structure of the discourse

• The higher the element in the structure, the more significant its weight

Rethorical Analysis for ATS

In RST, there is a limited number of universal and fixed relations, including elaboration, condition, illustration, opposition, concession, justification, etc.

Page 26: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Text Summarization

Summary Evaluation

Page 27: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Summary Evaluation

• Challenging task because human judges differ in scores assigned to same summaries

• There no sense of having an “ideal summary”

– It depends on the domain and user's intentions

• DUC (2001-2007)/ TAC(2008-2011) evaluation campaigns proposed many types of tasks and evaluation metrics

27

Page 28: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Evaluation Methods

Categorized acording to the level of human effort in the evaluation:

– Manual Evaluation

– Semi-automatic methods

• ROUGE

• Pyramid

– Automatic methods

• FRESA

28

Summary Evaluation

Page 29: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Summary Evaluation

29

[Lin and Hovy,2003]

ROUGE (Recall Oriented Understudy for Gisting

Evaluation)

Page 30: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

ROUGE

(Basic Idea)

30

Summary Evaluation

Page 31: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

31www.rxnlp.com/rouge-2-0

Page 32: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

FRESA – Framework for evaluating summaries automatically [TORRES-MORENO

ET AL., 2010)

32

Summary Evaluation

It is an automatic method

based on information

theory, which evaluates

summaries without

using human references

Standard preprocessing

of documents (filtering

and lemmatization)

before

calculating the

probability

distributions between

the summary and the

original document https://github.com/fabrer/automatic-summarization/tree/master/EVALUATION

Page 33: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Text Summarization Research

at UFRPE/UFPE

Page 34: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Rafael LinsCoordinator

Eduardo SilvaProject Manager

Bruno ÁvilaTechnical Leader

George CavalcantiResearch Fellow

Fred FreitasResearch Fellow

Gabriel SilvaResearcher

Hilário OliveiraResearcher

Luciano cabralResearcher

Rafael MelloResearcher

Rinaldo LimaResearcher

Functional Summarization – UFPE Team Funded by HP (2012-2016)

Page 35: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Ensemble Approach to ATS

35

Research GoalProposing and evaluating a concept-based approach using ILP and regression for

the tasks of mono/multidocument summarization of news articles

The architecture of the proposed solution is composed by two stages:

1. The generation of many candidate summaries.

• Adopting a concept-based approach using ILP

2. The estimation and selection of the most informative summary.

• Applying the regression algorithm

Research Questions:

1. Is it possible to increase the informativeness of the generated summaries by

combining ILP and regression?

2. Can this approach also take into account both the redundancy and the

cohesion of the generated sumaries?

Automatic Text Summarization based on Concepts via Integer Linear Programming (PhD Thesis 2018)

Page 36: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

36

ILP & Regression-based Approach (ensemble)

Ensemble Approach to ATS

Page 37: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

37

Bigrams are used to represent the notion of concepts

• It has showed better results in the preliminary experiments

• Bigrams formed only by stopwords are removed (Boudin et al., 2015)

Entity Graph

• We adopted the Entity Graph model (Guinaudeau, Strube,2013) to estimate an

approximate local cohesion score for the sentences in texts

The weight of a concept is estimated based on its position and sentence coverage

• Explore the weights of your neighbors (Concept Graph).

• Edges denote adjacency relations between two concepts

• New weight distribution strategy (contribution)

Concept Graph of 4 sentences and

its entities

Ensemble Approach to ATS

Page 38: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

38

Cohesion constraints are included in the ILP model to minimize two typical problems

in Extractive Summarization

• Coreference Resolution

• Explicit discourse dependencies between a given pair of sentences.

ILP-based Sentence Selection

Optimization Problem

Ensemble Approach to ATS

Page 39: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

39

Comparative Results on 2 datasets :

Multidocument Scenario

Ensemble Approach to ATS

Page 40: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

40

Comparative Results on 3 datasets :

Multidocument Scenario

Ensemble Approach to ATS

Page 41: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Motivation

The extractive approach is the most studied approach in theliterature.

However, most current extractive summarization systems usuallyproduce summaries with 2 major issues:

– loose sentences lacking relationship among them,

– dangling coreferences that breaks the natural discourse flow.

Adding coherence in extractive summarization is one of the mainopen problems in ATS.

41/22

Towards Coherent Single-Document Summarization

Towards coherent single-document summarization: an integer linear programming-based approachRodrigo Garcia, Rinaldo Lima, Bernard Espinasse, Hilario Oliveira Proceedings of the 33rd Annual ACM SAC, 2018

Page 42: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Towards Coherent Single-Document Summarization

General Architecture of the proposed solution:

Page 43: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Preprocessing Task: annotation

43

Page 44: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Concept Extraction Task

44

Page 45: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Concept Scoring Task

45

Page 46: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Local coherence scoring task: Entity Graph

46

Page 47: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

47

Local coherence scoring task: Entity Graph

Page 48: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Summary generation task: ILP optimization

48

Page 49: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

The DUC 2001 and 2002 datasets: used for evaluating generic single-document summarizers composed of news articles written in English contain abstractive summaries with approximately 100 words

each one

Evaluation measure ROUGE-N provided by the ROUGE package consists of an n-gram recall between the summary generated

by an ATS system, and a set of golden standard summariesproduced by humans

49

DUC Datasets and ROUGE measure

Page 50: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Comparative Assessment: Informativeness

Selected systems in the comparison:• the best summarizes participating in the DUC 2001 (System T) and DUC 2002 (System

28) conferences• AutoSummarizer (2016) • Classifier4J (Lothian, 2003) • HP-UFPE FS (highest R-1 performance reported in (Batista et al., 2015) • TextRank (Mihalcea & Tarau, 2004) • Parvens's summarizer (Parveen & Strube, 2015)

The proposed summarizer obtained very competitive results

50 /22

Page 51: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Comparative Assessment: Coherence (1)

• ROUGE measures cannot take into account text properties such as readability and mainly coherence (Guinaudeau & Strube, 2013).

• We performed a preliminary human comparison of the coherence level of some summaries

51 /22

Page 52: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Comparative Assessment: Coherence (2)

• Our finding: the longer the input documents, the more likely is to obtain a less cohesive summary

• The summary generated by the proposed system has superior coherence among the sentences

Page 53: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Text Summarization Research

Proposed Datasets forExtractive TS

Page 54: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Rafael Dueire Lins, Hilario Oliveira, Luciano Cabral,

Jamilson Batista, Bruno Tenorio, Rafael Ferreira, Rinaldo Lima,

Gabriel Pereira e Silva , Steven J. Simske

The CNN-CorpusA Large Textual Corpus for

Single-Document Extractive Summarization

Recife

FortCollins

Page 55: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

55

Page 56: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

56

The CNN - CorpusText:1. Canadian health officials have confirmed three cases

of Enterovirus D68 in British Columbia.

2. A fourth suspected case from a patient with severe respiratory illness is still under investigation.

3. Two of the confirmed cases are children between 5 and 9, said Dr. Danuta Skowronski, lead epidemiologist on emerging respiratory viruses at the British Columbia Centre for Disease Control.

4. The third is a teen between 15 and 19.

5. Enteroviruses are common, especially in the summer and fall months.

6. The U.S. Centers for Disease Control and Prevention estimates that 10 million to 15 million infections occur in the United States each year.

7. These viruses usually appear like the common cold; symptoms include sneezing, a runny nose and a cough.

8. Most people recover without any treatment.

9. …

Highlights:1. Canada confirms three cases of

Enterovirus D68 in British Columbia.

2. A fourth suspected case is still under investigation.

3. Enterovirus D68 worsens breathing problems for children who have asthma.

4. CDC has confirmed more than 100 cases in 12 U.S. states since mid-August.

Gold Standard:1, 2, 13, 26

www.cnn.com

Page 57: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

57

Overview statistics of the CNN-corpus

• Strict quality policies were enforced: every summary agreed on by, at least, two experts.

• Avoiding human subjectivity and mistakes.

Page 58: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

58

• Rafael Dueire Lins, Rafael Ferreira , Steven J. Simske

DocEng’19 Competition on Extractive Text Summarization

Recife

FortCollins

The CNN-Corpus was used in the:

a Large Corpus for Extractive Text Summarization

in the Spanish Language

The same development methodology was used for:

Page 59: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Text Summarization Research

Trends in TS

Page 60: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Limitations of Extractive Summarization

• Abstraction not as easy to do since it requires semantic understanding of text

• No coherence/cohesion treatment in most of the current ATS systems

• Redundancy problem

• Better (semi)automatic evaluation methods need to be devised particularly for multi-document systems

60Human performance vs system performance TAC'09

Page 61: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Neural-based Sentence Compression

• The next logical step is to eliminate redundant or less informative content from the extracted sentences

• A possible way to achieve this is by sentencecompression

Current approaches

• Sentence Compression by Deletion

• Sentence Compression as a Sequence to Sequence Problem

61

Neural-based Sentence Compression

Page 62: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Neural-based Sentence Compression

62

The LSTM-based sentence

compression model

It uses a parallel corpus of 2 million

sentence-compression instances (pairs)

from a news corpus

One-hot encoding or word embedding

of the input dictionnary

Filippova et al., 2015. Sentence compression by deletion with LSTM. In: Proc. of the 2015 Conference on EMNLP. ACL, Lisbon, Portugal (2015)

Page 63: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Sequence to Sequence Model

63

Neural-based Sentence Compression

• Sentence encoder is responsible to sequentially read the word representations

and generate intermediate sentence representations for each state

• The decoder module generates output sentence one word at a time

• At a given step in the decoding process, the attention module assigns weights

to each of the input steps (optional)

Page 64: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Text Summarization Research

Research Collaborations

Page 65: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Text Ming Group at UFRPEResearch areas:

65

tmgufrpe.github.io

Page 66: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Future Research Projects Proposal

• Doctoral and Master thesis co-supervision

• Funded bilateral projects options:

• COFFECUB

• CNPq/CAPES

• FACEPE

Page 67: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

References

Most cited papers of our Team

67

PAPER TITLE CITED BY YEARAssessing sentence scoring techniques for extractive text summarization

212 2013R Ferreira, L de Souza Cabral, RD Lins, GP e Silva, F Freitas, ...Expert systems with applications 40 (14), 5755-5764

A multi-document summarization system based on statistics and linguistic treatment85 2014R Ferreira, L de Souza Cabral, F Freitas, RD Lins, G de França Silva, ...

Expert Systems with Applications 41 (13), 5780-5787

A context based text summarization system50 2014R Ferreira, F Freitas, L de Souza Cabral, RD Lins, R Lima, G França, ...

2014 11th IAPR International Workshop on Document Analysis Systems, 66-70

A four dimension graph model for automatic text summarization35 2013R Ferreira, F Freitas, L de Souza Cabral, RD Lins, R Lima, G França, ...

2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI …

Assessing sentence similarity through lexical, syntactic and semantic analysis34 2016R Ferreira, RD Lins, SJ Simske, F Freitas, M Riss

Computer Speech & Language 39, 1-28

Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization29 2016H Oliveira, R Ferreira, R Lima, RD Lins, F Freitas, M Riss, SJ Simske

Expert Systems with Applications 65, 68-86

A new sentence similarity assessment measure based on a three-layer sentence representation23 2014R Ferreira, RD Lins, F Freitas, SJ Simske, M Riss

Proceedings of the 2014 ACM symposium on Document engineering, 25-34

A quantitative and qualitative assessment of automatic text summarization systems14 2015J Batista, R Ferreira, H Tomaz, R Ferreira, R Dueire Lins, S Simske, ...

Proceedings of the 2015 ACM Symposium on Document Engineering, 65-68

Page 68: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

68

Automatic text document summarization based on machine learning

13 2015G Silva, R Ferreira, RD Lins, L Cabral, H Oliveira, SJ Simske, M RissProceedings of the 2015 ACM Symposium on Document Engineering, 191-194

Automatic Summarization of News Articles in Mobile Devices6 2015L Cabral, R Lima, R Lins, M Neto, R Ferreira, S Simske, M Riss

2015 Fourteenth Mexican International Conference on Artificial Intelligence …

Appling Link Target Identification and Content Extraction to improve Web News Summarization4 2016R Ferreira, R Ferreira, RD Lins, H Oliveira, M Riss, SJ Simske

Proceedings of the 2016 ACM Symposium on Document Engineering, 197-200

DocEng'19 Competition on Extractive Text Summarization2 2019RD Lins, RF Mello, S Simske

Proceedings of the ACM Symposium on Document Engineering 2019, 4

The CNN-Corpus: A Large Textual Corpus for Single-Document Extractive Summarization2 2019RD Lins, H Oliveira, L Cabral, J Batista, B Tenorio, R Ferreira, R Lima, ...

Proceedings of the ACM Symposium on Document Engineering 2019, 16

Automatic Document Classification using Summarization Strategies2 2017R Ferreira, RD Lins, L Cabral, F Freitas, SJ Simske, M Riss

Proceedings of the 2015 ACM Symposium on Document Engineering, 69-72

References

Most cited papers of our Team

Page 69: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Recent Books on ATS

69

2014 2019

Page 70: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Links to Survey Papers on ATS

70

https://www.cs.bgu.ac.il/~elhadad/nlp16/nenkova-mckeown.pdf

https://arxiv.org/pdf/1707.02268.pdf

https://www.cs.cmu.edu/~nasmith/LS2/das-martins.07.pdf

https://dergipark.org.tr/en/download/article-file/392456

https://link.springer.com/article/10.1007/s10462-016-9475-9

https://wanxiaojun.github.io/summ_survey_draft.pdf

http://jad.shahroodut.ac.ir/article_1189_28715967fcd8b7bfb463ab90aca5a9f7.pdf

Page 71: Research on Extractive Summarization at UFPE/UFRPE ...w3.erss.univ-tlse2.fr/UETAL/2019-2020/Presentation_CLLE_LIMA.pdf · Ensemble Approach to ATS 35 Research Goal Proposing and evaluating

Thank you for your attention

Any questions?