human aided text summarizer “saar” using reinforcement learning

31
Paper ID: ISCMI2014-1-031E Human Aided Text Summarizer “SAAR” using Reinforcement Learning By : Chandra Prakash ABV-IIITM Gwalior & Dr. Anupam Shukla Professor , ABV-IIITM, Gwalior 2014 Intl. Conference on Soft Computing & Machine Int elligence (ISCMI 2014)

Upload: chandra-prakash-meena

Post on 02-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 1/31

Paper ID: ISCMI2014-1-031E

Human Aided Text Summarizer“SAAR” using Reinforcement Learning

By : Chandra Prakash

ABV-IIITM Gwalior

&

Dr. Anupam Shukla

Professor, ABV-IIITM, Gwalior

2014 Intl. Conference on Soft Computing & Machine Intelligence (ISCMI 2014)

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 2/31

 Approach

Problem Definition

Motivation

Literature survey

Scope of Project Methodology/Approach

Tools used

Result

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 3/31

Introduction

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 4/31

Real time Problem

Imagine

Download 1000 + papers and now want to get thesummary..

 We have list of emails about sports event, get the summaryof those emails in one para…

 We have to study lots of books for the exam and thesummarizer gives the key concepts of the books as fewpages notes…

 Value for researchers

Get me everything/Papers say about “Automatic TextSummarization”

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 5/31

Definition

 Automatic Summaries•  An active research area where computer automatically 

summarize text from both single and multi-documents.

•   A short summary, which conveys the essence of the document 

•   Should be less than half of original text 

•   Can be extractive or abstractive based 

•   May be produced from single or multiple documents

Dipanjan Das, Andre F.T. Martins (2007). A Survey on Automatic Text Summarization. LiteratureSurvey for the Language and Statistics II course at CMU, Pittsburg

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 6/31

Problem definition

 With the advent of the information revolution (WWW), Electronic documents are becoming a principle media of business and academic

information

  Thousands of electronic documents are produced and made available on the internet each day.

  not easy to read each and every document .

Information Access Agent:

Search engines : Google, Yahoo etc.

Information retrieval is far greater than that a user can handle and manage.

User has to analyze searched result one by one until felt satisfactory, this is time

consuming and inefficient.

 What could be the possible solution than???

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 7/31

Problem definition (cont..)

Text summarization is not as per user specification.

Generic summary generation not possible as summary changes as userchanges.

Even two human can‘t generate a similar summary from a given

document.

Internal factors (background, education etc.) play vital role in generating asummary 

 What could be the possible solution now ???

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 8/31

Solution: Human A ided Text Summarization

Benefits of summarization include: Save reading time

 Value for researchers

 Abstracts for Scientific and other articles

Facilitate fast literature searches

Facilities classification of articles and other written data :

Improve Search engines indexing efficiency of web pages

 Assists in storing the text in much lesser space.

Heading of the given article/document

News summarization

Opinion Mining and Sentiment Analysis

Enables Cell phones to access the Web information

 With human feedback – user oriented summary

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 9/31

Previous Approach :

1950 : Automatic creation of literature abstracts was proposed by IBMLuhn.

Text Mining: Includes discovery of patterns and trends in data associations among entities

in a document. Consist of three steps:text preparation,text processing andtext analysis.

Text Summarization : Text Summarization Methods.

Extraction: Construct the summery by taking the most importantsentences

 Abstraction: Construct the summary by paraphrasing section of theoriginal document.

99

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 10/31

Type of Techniques:

Statistical techniques :

Based on Term Frequency.

Stop-word filtering : remove the unwanted noise.

Stemming or Lemmatization: different forms of the same word.

Determine term importance

Term Frequency/Inverse-documents-frequency (TF-IDF)  Weighting scheme, etc.

Linguistic techniques :

Looks for text semantics.

Linguistic techniques extract sentence by Parsing and part of Natural language processing (NLP).

Speech tagging is among the starting steps.

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 11/31

Scope of Project

Problem Definition: Extractive Text Summarization

Single Document

Fully Automated Summarization (FAS)

Human Aided Machine Summarization (HAMS)

Machine Learning

Reinforcement Learning

Tools used:

Matlab

Java

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 12/31

Earlier Methodology proposed (FAS)

Chandra Prakash, Anupam Shukla “Automated summary generation from singe document using information gain ”

Springer, Contemporary Computing ,Communications in Computer and Information Science Volume 94, pp 152-159,

2010 .

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 13/31

Methodology proposed (HAMS)

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 14/31

Keyword Significant Factor

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 15/31

15

Solution

 Approach for the Problem Input: Document with text is fed into the system.

Preprocessing:

Tokenization: Divides the character sequence into words

sentence splitting further divides sequences of words into

sentences, and so on. Stemming or Lemmatization

Stop word filtering Feature Extraction :

Sentence Ranking: Machine Learning

Human Feedback

Output\ Result: Generated Summary  an abstract.

15

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 16/31

Methodology Steps..

Methodology for text summarization involves Term Selection using Pre-Processing

Tokenization or Segmentation

Stop word Filtering

Stemming or Lemmatization

Term weighting Term Frequency (TF):

Wi(T j)=f ij

where f ij is the frequency of  j th  term in sentence i.

Inverse Sentence Frequency (ISF) :

where N =no of sentences in the collection

n j =no of sentence where the term j appears.

 

  

 

nj

 NlogWi(Tjj   fij

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 17/31

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 18/31

Methodology Steps (cont…)

Information Gain is calculated as

Information Gain (IG) = (TFW)i+ ISFS(Tj)i + (NSL)i +(SPS)i+ (PNS)i

where i is the sentence and j is the term

Term-Sentence matrix after IG :

)(....)2()1(

................

)2(....)22()21(

)1(....)12()11(

)(

Wmn IGWm IGmW  IG

nW  IGW  IGW  IG

nW  IGW  IGW  IG

TSM 

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 19/31

Element of reinforcement learning

 Agent: Intelligent programs Environment: External condition

Policy : Defines the agent’s behavior at a given time A mapping from states to actions Lookup tables or simple function

An agent learns behavior through trial-and-error interactions with a dynamic

environment.

Agent

Environment

State Reward Action

Policy

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 20/31

Methodology Steps (cont…)

Processing Step:

Action Sentence scoring using Reinforcement Learning

Selection Policies

Ɛ-greedy

In our approach we have considerState : Sentences ;

Action: Updating Term weight is considered

Policy: Update the term to maximum the sentence rank 

Reward : scalar value of Term. (IG)

Q-Learning

y probabilithaction witRandom

1- probilitywith, = 

aa

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 21/31

Processing Step:

Matrix Q : learning matrix.

updted updted updted 

updted updted updted 

updted updted updted 

Wmn IGWm IGmW  IG

nW  IGW  IGW  IG

nW  IGW  IGW  IG

TSM updted 

)(....)2()1(

................

)2(....)22()21(

)1(....)12()11(

)(

)(....)2()1(

................

)2(....)22()21(

)1(....)12()11(

)(

Wmn IGWm IGmW  IG

nW  IGW  IGW  IG

nW  IGW  IGW  IG

TSM 

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 22/31

Summary Generation :

Sentence selection : Euclidean n-space

 P = 1, 2 … …

 Q = 1, 2 … …

Dataset Article from “The Hindu” (june 2013) DUC’06 sets of documents :

12 document sets

 No of document in each Set 25

Average no of sentence 32

300 document summary

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 23/31

Evaluation

Evaluation Techniques

 where, r is no of common sentence,  K m is length of machine generated summary and k h is length ofhuman generated summary 

 Available automated text summarizers Open Text summarizer (OTS),

Pertinence Summarizer (PS), and

Extractor Test Summarizer Software (ETSS).

The compression ratio is 30%

m K 

r 100 =(P)Precision

h K 

r 100 =(R)Recall

mh   K + K =

 R+ P  R P = score F  2r 100100 

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 24/31

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 25/31

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 26/31

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 27/31

Comparison of generated textsummary for HAMS

Comparison of Recall, PrecisionValue and F-score for HAMS

 Methods Precision value (P)

Recall Value(R)

F-score

SAAR (user

feedback)90 85 87.42

IGsummary 

75 65 70.57

OTS 75 60 66.66

PS 75 60 66.66

ETSS 75 60 66.66

Result

0 20 40 60 80 100

SAAR Based

IG Summary 

OTS

PS

ETSS

Chart Title

F-Score Recall Value ® Precision Value (P)

Compared with some available automated text summarizers• Open Text summarizer (OTS), Pertinence Summarizer (PS),

and Extractor Test Summarizer Software (ETSS)

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 28/31

Conclusion and future scope

 A novel approach for human aided text summarization by userfeedback from single document

This summarization by extract will be good enough for a reader tounderstand the main idea of a document, though the understandability might not be as good as a summary by abstract.

 As a future work this approach can be exacted for multi-documentsummary document extraction using machine learning.

 We can introduce the concept of multi agent into the system. This will increase its speed as well make the summary or abstract more generic.

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 29/31

References

1.  Verma R, Chen P, “Integrating Ontology Knowledge into a Query -based InformationSummarization System”, DUC 2007, 2007. Rochester, NY.2. Lunh H. P. 'The automatic creation of literature abstracts”, IBM Journal of Research and

Development, vol 2, pp 159—165, 1958.3. Edmundson H. P., “New Methods in Automatic Extracting”, Journal of the ACM (JACM),

 vol. 16 no.2, pp. 264-285, 1969.4. Salton G., Buckley, C., “Term-Weighting Approaches in Automatic Text Retrieval

Information Processing & Management”, Vol 24. pp.513 523, 1988.

5. Luhn H.P, “A Statical Approach to Mechanical Encoding and Searching of LiteraryInformation”. IBM Journal of Research and Development, pp. 309-317, 1975.

6. Salton G., Buckley, C. “Term- Weighting Approaches in Automatic Text Retrieval”.Information Processing & Management, Vol 24. pp.513–523, 1988.

7. Kupiec J et al., “A trainable document summarizer”, In Proceedings of SIGIR, 1995.8. Conroy J. M., O'leary D. P, “Text summarization via hidden markov model”, In Proceedings

of SIGIR '01, pp 406-407, 2001, New York, NY, USA.9.  Agarwal N., Ford K. H., Shneider M., “Sentence Boundary Detection using a MaxEnt

Classifer”.10. García-Hernández R. A., Ledeneva Y., “Word Sequence Models for Single Text

Summarization”, 2009 Second International Conferences on Advances in Computer-HumanInteractions, pp. 44-48, 2009.

11. The Hindu [http://www.hinduonnet.com/] Accessed on 23rd June 2009.12.  Van Rsbergen C J. Information Retrieval, 2nd edition. Dept. of Computer Science, University

of Glasgow. 1979.

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 30/31

References

13.  V. A. Yatsko and T. N. Vishnyakov (2006). A Method for Evaluating Modern Systems of Automatic Text Summarization.14. S. Hariharan, and R. Srinivasan,(2008).Investigations in single document summarization by

extraction method.15. René Arnulfo García-Hernández and Yulia Ledeneva (2009) Word Sequence Models for

Single Text Summarization.16. Kyoomarsi, F.; Khosravi, H.; Eslami, E.; Dehkordy, P.K.; Tajoddin, A.; Optimizing Text

Summarization Based on Fuzzy Logic. In Proceedings of Computer and Information Science,

2008. ICIS 08.17. Sparck-Jones, K. Automatic summarizing: factors and directions. In Mani, I.; Maybury, M.

 Advances in Automatic Text Summarization. The MIT Press (1999) 1-1218. Hovy, E. and C.-Y. Lin (1997). Automated Text Summarization in SUMMARIST. In

Proceedings of the ACL97/EACL97 Workshop on Intelligent Scalable Text Summarization,Madrid, Spain.

19. Mani, I. and M. T. Maybury (editors) (1999). Advances in Automatic Text Summarization.MIT Press, Cambridge, MA.

20. Sparck-Jones, K. (1999). Automatic Summarizing: Factors and Directions. In Mani, I. and M.T. Maybury (editors), Advances in Automatic Text Summarization, pp. 1–13. The MIT Press.

21. Lin, C.-Y. and E. Hovy (2000). The automated acquisition of topic signatures for textsummarization. In Proceedings of the 18th COLING Conference, Saarbr¨ucken, Germany.

22. Baldwin, B., R. Donaway, E. Hovy, E. Liddy, I. Mani, D. Marcu, K. McKeown, V. Mittal, M.Moens, D. Radev, K. Sparck-Jones, B. Sundheim, S. Teufel, R. Weischedel, and M. White(2000). An Evaluation Road Map for Summarization Research. http://www-nlpir.nist.gov/projects/duc/papers/summarization.roadmap.doc.

8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning

http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 31/31