text summarization

Post on 04-Aug-2015

82 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Saturday, April 15, 2023

1

Text SummarizationFor Review And

FeedbackBY :Aman Sadhwani

Saturday, April 15, 2023

2

What is Text Summarization?And why we need it?

• We can define summary as a text which reflects the main and important sentences from the original text. In Text summarization, Summary is generated by Computer.

• In Recent Years we are witnessing the amount of textual information is increasing day by day .The Textual Information grows rapidly. It becomes more difficult for the user to read the textual information and also it leads to loss of interest. That is the reason why Text Summarization came into picture which will solve this problem.

Saturday, April 15, 2023

3

Types of Text Summarization

1) Extraction: - In Extractive text summarization , summary is generated by selecting a set of words, phrases, paragraph or sentences from the original document.

2) Abstraction: - Abstractive methods are based on semantic representation and then use natural language processing techniques to generate a summary that is nearer to summary generated manually. This kind of summary may contain words that are not found in the original document. Currently research is going on this method and demand for this method is more.

Proposed System

4Saturday, April 15, 2023

We have developed and compared two text summarization techniques

1) Reduction based

2) Inter section based

Saturday, April 15, 2023

5

How Reduction Algorithm Works

Step 1 - It takes a text as input. 

Step 2 - Splits it into one or more paragraph(s).

Step 3 - Splits each paragraph into one or more sentence(s).

Step 4 - Splits each sentence into one or more words.

Step 5 - Gives each sentence weight-age (a floating point value) by comparing Its words to a pre-defined dictionary called "stopWords.txt“

If some word of a sentence matches to any word with the pre-defined Dictionary, then the word is considered as Low weighted.

Saturday, April 15, 2023

6

Cont..

Step 6 - An ordered list of weighted sentences is then prepared (Relatively High weighted sentences comes first and low weighted sentences comes At last position).

Step 7 - Now, we have the ordered list of weighted sentences, it continues to Store each sentence (from ordered weighted sentences) in the output Variable (i.e. a list) until it reaches the reduction ratio (It uses A formula to determine max number of sentences to put in the output List)

Step 8 - The output list is then returned.

Saturday, April 15, 2023

7

How InterSection Algorithm Works?

1. Split input text into Paragraph.

2. Split paragraph into sentences.

3. Split sentences into words.

4. Calculate the intersection between 2 sentences.

5. Remove non-alphabetic characters from sentence.

6. Convert content into dictionary.

7. Build the sentence dictionary.

8. Return best sentences in a paragraph.

9. Get the best sentences according to dictionary.

Saturday, April 15, 2023

8

Flow Chart

Saturday, April 15, 2023

9

Screen shots

Saturday, April 15, 2023

10

Saturday, April 15, 2023

11

Saturday, April 15, 2023

12

Saturday, April 15, 2023

13

Saturday, April 15, 2023

14

Saturday, April 15, 2023

15

Conclusion

Saturday, April 15, 2023

16

Cont…

By looking at last table we can say that intersection is faster than reduction

But reduction creates better summary than intersection.

Intersection works fine on some documents but generates only 1 or 2 line of summary on some documents.

This is because intersection is the most basic algorithm for text summarization. It doesn’t use any NLP libraries like reduction.

Hardware & Software requirement

17Saturday, April 15, 2023

Minimum Hardware Requirements

Processor : Intel Pentium II or Higher RAM : 128 Mb or Higher Monitor ,Keyboard, Mouse Printer (Optional) Hard disk : 20 GB Or Higher

Software Requirements

OS: Windows xp or higher Java Installed On Machine Python 2.7 installed on machine.

Saturday, April 15, 2023

18

Tools used

NetBeans

Python 2.7 IDLE

Saturday, April 15, 2023

19

References

http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html

http://www.iajet.org/iajet_files/vol.1/no.4/Text%20Summarization%20Extraction%20System%20TSES%20Using%20Extracted%20Keywords_doc.pdf

http://en.wikipedia.org/wiki/Sentiment_analysis

Saturday, April 15, 2023

20

Future enhancement

Will support summarization for multiple file types.

User wise Document management.

Multi document summarization.

Improved summarization algorithms.

Saturday, April 15, 2023

21

THANK YOU

top related