towards improving automatic text summaries

24
Towards improving automatic text summaries Vers une amélioration des résumés automatiques de textes Abdelkrime ARIES Supervisors: Pr. Zegour & Pr. Hidouci Research Group: D3 Team École nationale Supérieure d’Informatique (ESI, ex. INI), Algérie LCSI laboratory mid-term seminars: April 19th, 2016

Upload: abdelkrime-aries

Post on 15-Apr-2017

285 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Towards improving automatic text summaries

Towards improving automatic text summariesVers une amélioration des résumés automatiques de textes

Abdelkrime ARIESSupervisors: Pr. Zegour & Pr. Hidouci

Research Group: D3 Team

École nationale Supérieure d’Informatique (ESI, ex. INI), Algérie

LCSI laboratory mid-term seminars: April 19th, 2016

Page 2: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Plan

1 Problematic

2 Extractive methods

3 Abstractive methods

4 Demo

5 Thank you

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 2/24

Page 3: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives

Problematic

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 3/24

Page 4: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives

ProblematicMotivation

Why should we summarize ?

Saving reading time

Showing content on

small devices

Facilitating document selection

Helping in search

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 4/24

Page 5: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives

IntroductionSummarization classification

Following [1, 2] :

S u m m a r i z a t i o nOutput documentInput document Purpose

Source size

Single-documentMulti-document

Specificity

Domain-specificGeneral

Form

Audience

GenericQuery-oriented

Usage

Expansiveness

IndicativeInformative

Derivation

Conventionality

BackgroundJust-the-news

ExtractAbstract

Partiality

NeutralEvaluative

FixedFloating

ScaleGenre

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 5/24

Page 6: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives

ProblematicExtractive vs. Abstractive

Extractive :

+ Fast with less resources (CPU + data)

+ Can be simply applied to many languages (statistical)

- Incoherent text

- Just pertinent sentences which can have no relation between them

Abstractive :

+ Good text presentation

+ Redundancy can be dealt with

- Slow with a lot of resources (CPU + data)

- Hard to be implemented (language dependent)

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 6/24

Page 7: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives

ProblematicMulti-Lingual systems

Process more than one language.Language independent application :

Fully independentPartial independent

Also, there are Cross-lingual systems

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 7/24

Page 8: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives

ProblematicObjectives

Create a multi-lingual system.

Introduce abstractive

Improve our method [3].

Improve readability and coherence.

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 8/24

Page 9: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

AllSummarizerAmeliorationLinks

Extractive methods

AllSummarizer as example

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 9/24

Page 10: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

AllSummarizerAmeliorationLinks

Extractive methodsAllSummarizer

Inputdocument(s)

Summary

Pre-processing

Normalizer

Segmenter

Stemmer

Stop-wordeliminator

Listof sentences

List ofpre-processedwords foreach sentence

Processing

Clustering

Learning

Scoring

Listof clusters

Summary size

P(f|C)

Extraction

ExtractionSentencesscores

ReOrdering

List of firsthigher scoredsentences

Reorderedsentences

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 10/24

Page 11: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

AllSummarizerAmeliorationLinks

Extractive methodsAmelioration

Some ameliorations have been made to the original AllSummarizer system[3] :

1 Adding more features to the Unigram and Bigram term frequencies :Sentence positionSentence length with stop words.Sentence length without stop words.

2 Adding more languages to the preprocessing task (27 languages) :Arabic, Bulgarian, Catalan, Czech, German, Greek, English, Spanish,Basque, Persian, Finnish, French, Hebrew, Hindi, Hungarian,Indonesian, Italian, Japanese, Dutch, Nynorsk, Norwegian,Portuguese, Romanian, Russian, Swedish, Thai, Turkish and Chinese.

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 11/24

Page 12: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

AllSummarizerAmeliorationLinks

Extractive methodsAmelioration

3 Testing the summarizer with more than 40 languages (we used defaultpreprocessing for languages without a preprocessing task).

4 Fixing the problem of redundant sentences (especially in case ofmulti-document summarization). This was done by calculating thesimilarity between the last added sentence and the sentence to beadded. Then judging if they are similar using clustering threshold.

5 Estimating the threshold and the features for each language (multiand single document summarization). For more information, see ourparticipation in MultiLing2015 workshop (SIGDIAL conference) [4].

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 12/24

Page 13: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

AllSummarizerAmeliorationLinks

Extractive methodsLinks

Take a look :https://github.com/kariminf/AllSummarizer

Test it :allsummarizer-kariminf.rhcloud.com

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 13/24

Page 14: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methods

Our vision

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 14/24

Page 15: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methodsOur vision

ExtractedPertinentSentences

Abstractive Summary

Sentence Parsing

Syntactic analysis

Internationalization

Structred formatgeneration

WordNet

Reasoning

Information processing

response preparation

Text generation

Style decision &Realizer linking

Concepts transformation

Structred formatgeneration Realization

Request

Format hundler

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 15/24

Page 16: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methodsFormat handler

To communicate information between sentences, we proposed a newformat called STON (“Sentence object notation").

Represent sentences morphological and syntactic characteristics in amulti-lingual way.

Take a look :https://github.com/kariminf/SentRep

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 16/24

Page 17: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methodsFormat handler

Format :Roles : each entity has a role to play in the clause (or sentence) ; Itcan be a subject, object, place, time, etc.

Actions : actions are the dynamic part in a clause, they link roles.

Sentences : Role-Action model can’t represent every information. forinstance, successive actions have to be represented somewhere.

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 17/24

Page 18: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methodsFormat handler

Example : Mother stayed at home.

@r: [r:{id: mother;syn: 10332385;

r:}r:{id: home;syn: 3259505;

r:}r:]

@act: [act:{id: stay;syn: 117985;tense: PA;subj: [mother];@rel:[rel:{type: P_PLACE;ref: [home];

rel:}rel:]

act:}act:]

@st: [st:{type: AFF;act: [stay];

st:}st:]

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 18/24

Page 19: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methodsSentence Parsing

Working on it ...

For now, we code an English2Ston parser.

Syntactic analysis (English) : Stanford parser.

To this day, we just can parse sentences in the form :"Subject{simple singular} Verb{past, present simple} Object{simplesingular}".

Take a look :https://github.com/kariminf/NaLanPar

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 19/24

Page 20: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methodsReasoning

Our aim :

Thoughts are language-independent.

Mind contains many thoughts.

People has thoughts about what others think.

Thoughts have truth level : belief, thinking, fact, etc.

So ... Will be presented Next time

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 20/24

Page 21: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methodsText generation

Working on it ...

For now, we code an Ston2English, Ston2French generator.

Sentence Realization (English, French) : SimpleNLG-EnFr.

To this day, we just can parse sentences in the form : "Subject{simplesingular} Verb{past, present simple} Object{simple singular}".

Take a look :https://github.com/kariminf/NaLanGen

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 21/24

Page 22: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Demo

Demonstration

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 22/24

Page 23: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

So ...

Less has been done, more to be done

Always remember :Summarizing saves time

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 23/24

Page 24: Towards improving automatic text summaries

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Bibliography I

E. Hovy and C.-Y. Lin, “Automated text summarization and the SUMMARIST system,” in Proceedings of a workshop on held atBaltimore, Maryland : October 13-15, 1998. Association for Computational Linguistics, 1998, pp. 197–214.

K. Sparck Jones, “Automatic summarising : factors and directions,” in Advances in automatic text summarisation. CambridgeMA : MIT Press, 1999.

A. Aries, H. Oufaida, and O. Nouali, “Using clustering and a modified classification algorithm for automatic text summarization,”ser. Proc. SPIE, vol. 8658, 2013, pp. 865 811–865 811–9. [Online]. Available : http://dx.doi.org/10.1117/12.2004001

A. Aries, D. E. Zegour, and K. W. Hidouci, “Allsummarizer system at multiling 2015 : Multilingual single and multi-documentsummarization,” in Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague,Czech Republic : Association for Computational Linguistics, September 2015, pp. 237–244. [Online]. Available :http://aclweb.org/anthology/W15-4634

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 24/24