towards improving automatic text summaries

Towards improving automatic text summariesVers une amélioration des résumés automatiques de textes

Abdelkrime ARIESSupervisors: Pr. Zegour & Pr. Hidouci

Research Group: D3 Team

École nationale Supérieure d’Informatique (ESI, ex. INI), Algérie

LCSI laboratory mid-term seminars: April 19th, 2016

ProblematicExtractive methods

Abstractive methodsDemo

Thank you

Plan

1 Problematic

2 Extractive methods

3 Abstractive methods

4 Demo

5 Thank you

Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 2/24



Thank you

MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives

Problematic




Thank you


ProblematicMotivation

Why should we summarize ?

Saving reading time

Showing content on

small devices

Facilitating document selection

Helping in search




Thank you


IntroductionSummarization classification

Following [1, 2] :

S u m m a r i z a t i o nOutput documentInput document Purpose

Source size

Single-documentMulti-document

Specificity

Domain-specificGeneral

Form

Audience

GenericQuery-oriented

Usage

Expansiveness

IndicativeInformative

Derivation

Conventionality

BackgroundJust-the-news

ExtractAbstract

Partiality

NeutralEvaluative

FixedFloating

ScaleGenre




Thank you


ProblematicExtractive vs. Abstractive

Extractive :

+ Fast with less resources (CPU + data)

+ Can be simply applied to many languages (statistical)

- Incoherent text

- Just pertinent sentences which can have no relation between them

Abstractive :

+ Good text presentation

+ Redundancy can be dealt with

- Slow with a lot of resources (CPU + data)

- Hard to be implemented (language dependent)




Thank you


ProblematicMulti-Lingual systems

Process more than one language.Language independent application :

Fully independentPartial independent

Also, there are Cross-lingual systems




Thank you


ProblematicObjectives

Create a multi-lingual system.

Introduce abstractive

Improve our method [3].

Improve readability and coherence.




Thank you

AllSummarizerAmeliorationLinks

Extractive methods

AllSummarizer as example




Thank you


Extractive methodsAllSummarizer

Inputdocument(s)

Summary

Pre-processing

Normalizer

Segmenter

Stemmer

Stop-wordeliminator

Listof sentences

List ofpre-processedwords foreach sentence

Processing

Clustering

Learning

Scoring

Listof clusters

Summary size

P(f|C)

Extraction

ExtractionSentencesscores

ReOrdering

List of firsthigher scoredsentences

Reorderedsentences




Thank you


Extractive methodsAmelioration

Some ameliorations have been made to the original AllSummarizer system[3] :

1 Adding more features to the Unigram and Bigram term frequencies :Sentence positionSentence length with stop words.Sentence length without stop words.

2 Adding more languages to the preprocessing task (27 languages) :Arabic, Bulgarian, Catalan, Czech, German, Greek, English, Spanish,Basque, Persian, Finnish, French, Hebrew, Hindi, Hungarian,Indonesian, Italian, Japanese, Dutch, Nynorsk, Norwegian,Portuguese, Romanian, Russian, Swedish, Thai, Turkish and Chinese.




Thank you


Extractive methodsAmelioration

3 Testing the summarizer with more than 40 languages (we used defaultpreprocessing for languages without a preprocessing task).

4 Fixing the problem of redundant sentences (especially in case ofmulti-document summarization). This was done by calculating thesimilarity between the last added sentence and the sentence to beadded. Then judging if they are similar using clustering threshold.

5 Estimating the threshold and the features for each language (multiand single document summarization). For more information, see ourparticipation in MultiLing2015 workshop (SIGDIAL conference) [4].




Thank you


Extractive methodsLinks

Take a look :https://github.com/kariminf/AllSummarizer

Test it :allsummarizer-kariminf.rhcloud.com


https://github.com/kariminf/AllSummarizer

allsummarizer-kariminf.rhcloud.com



Thank you

Our visionFormat handlerSentence ParsingReasoningText generation

Abstractive methods

Our vision




Thank you


Abstractive methodsOur vision

ExtractedPertinentSentences

Abstractive Summary

Sentence Parsing

Syntactic analysis

Internationalization

Structred formatgeneration

WordNet

Reasoning

Information processing

response preparation

Text generation

Style decision &Realizer linking

Concepts transformation

Structred formatgeneration Realization

Request

Format hundler




Thank you


Abstractive methodsFormat handler

To communicate information between sentences, we proposed a newformat called STON (“Sentence object notation").

Represent sentences morphological and syntactic characteristics in amulti-lingual way.

Take a look :https://github.com/kariminf/SentRep


https://github.com/kariminf/SentRep



Thank you



Format :Roles : each entity has a role to play in the clause (or sentence) ; Itcan be a subject, object, place, time, etc.

Actions : actions are the dynamic part in a clause, they link roles.

Sentences : Role-Action model can’t represent every information. forinstance, successive actions have to be represented somewhere.




Thank you



Example : Mother stayed at home.

@r: [r:{id: mother;syn: 10332385;

r:}r:{id: home;syn: 3259505;

r:}r:]

@act: [act:{id: stay;syn: 117985;tense: PA;subj: [mother];@rel:[rel:{type: P_PLACE;ref: [home];

rel:}rel:]

act:}act:]

@st: [st:{type: AFF;act: [stay];

st:}st:]




Thank you


Abstractive methodsSentence Parsing

Working on it ...

For now, we code an English2Ston parser.

Syntactic analysis (English) : Stanford parser.

To this day, we just can parse sentences in the form :"Subject{simple singular} Verb{past, present simple} Object{simplesingular}".

Take a look :https://github.com/kariminf/NaLanPar


https://github.com/kariminf/NaLanPar



Thank you


Abstractive methodsReasoning

Our aim :

Thoughts are language-independent.

Mind contains many thoughts.

People has thoughts about what others think.

Thoughts have truth level : belief, thinking, fact, etc.

So ... Will be presented Next time




Thank you


Abstractive methodsText generation

Working on it ...

For now, we code an Ston2English, Ston2French generator.

Sentence Realization (English, French) : SimpleNLG-EnFr.

To this day, we just can parse sentences in the form : "Subject{simplesingular} Verb{past, present simple} Object{simple singular}".

Take a look :https://github.com/kariminf/NaLanGen


https://github.com/kariminf/NaLanGen



Thank you

Demo

Demonstration




Thank you

So ...

Less has been done, more to be done

Always remember :Summarizing saves time




Thank you

Bibliography I

E. Hovy and C.-Y. Lin, “Automated text summarization and the SUMMARIST system,” in Proceedings of a workshop on held atBaltimore, Maryland : October 13-15, 1998. Association for Computational Linguistics, 1998, pp. 197–214.

K. Sparck Jones, “Automatic summarising : factors and directions,” in Advances in automatic text summarisation. CambridgeMA : MIT Press, 1999.

A. Aries, H. Oufaida, and O. Nouali, “Using clustering and a modified classification algorithm for automatic text summarization,”ser. Proc. SPIE, vol. 8658, 2013, pp. 865 811–865 811–9. [Online]. Available : http://dx.doi.org/10.1117/12.2004001

A. Aries, D. E. Zegour, and K. W. Hidouci, “Allsummarizer system at multiling 2015 : Multilingual single and multi-documentsummarization,” in Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague,Czech Republic : Association for Computational Linguistics, September 2015, pp. 237–244. [Online]. Available :http://aclweb.org/anthology/W15-4634


http://dx.doi.org/10.1117/12.2004001

http://aclweb.org/anthology/W15-4634

towards improving automatic text summaries

Engineering