katja filippova

66
Automatic Text Summarization Katja Filippova [email protected] EML Research gGmbH TU Darmstadt Text Summarization – 25.02.2009 – p. 1

Upload: lidia-pivovarova

Post on 13-Jan-2015

953 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Katja Filippova

Automatic Text SummarizationKatja Filippova

[email protected]

EML Research gGmbH

TU Darmstadt

Text Summarization – 25.02.2009 – p. 1

Page 2: Katja Filippova

Text summarization

• A summary is a text that is produced from one or moretexts, that contains a significant portion of the information inthe original text(s), and that is no longer than half of theoriginal text(s) (Hovy, 2003)

• information retrieval• stock market prediction• generation of abstracts• online news summarization• ...

Text Summarization – 25.02.2009 – p. 2

Page 3: Katja Filippova

Overview

• Introduction• classification of summarization systems• abstraction vs. extraction

• Text cohesion and coherence for summarization• graph based methods• discourse structure based methods

• Document Understanding Conference• tasks• an example

• Research directions• sentence fusion and compression• integrating world knowledge

Text Summarization – 25.02.2009 – p. 3

Page 4: Katja Filippova

Text summarization: types

• A summary is a text that is produced from one or moretexts, that contains a significant portion of the information inthe original text(s), and that is no longer than half of theoriginal text(s) (Hovy, 2003)

• Indicative➠ indicates types of information➠ “alerts”

Text Summarization – 25.02.2009 – p. 4

Page 5: Katja Filippova

Text summarization: types

• A summary is a text that is produced from one or moretexts, that contains a significant portion of the information inthe original text(s), and that is no longer than half of theoriginal text(s) (Hovy, 2003)

• Indicative➠ indicates types of information➠ “alerts”

• Informative➠ includes quantitative/qualitative information➠ “informs”

Text Summarization – 25.02.2009 – p. 4

Page 6: Katja Filippova

Text summarization: types

• A summary is a text that is produced from one or moretexts, that contains a significant portion of the information inthe original text(s), and that is no longer than half of theoriginal text(s) (Hovy, 2003)

• Indicative➠ indicates types of information➠ “alerts”

• Informative➠ includes quantitative/qualitative information➠ “informs”

• Critic/evaluative➠ evaluates the content of the document Text Summarization – 25.02.2009 – p. 4

Page 7: Katja Filippova

Text summarization: types

INDICATIVE

• The work of Consumer Advice Centres is examined. Theinformation sources used to support this work are reviewed.The recent closure of many CACs has seriously affected theavailability of consumer information and advice. Thecontribution that public libraries can make in enhancing theavailability of consumer information and advice both to thepublic and other agencies involved in consumer informationand advice, is discussed.

Text Summarization – 25.02.2009 – p. 5

Page 8: Katja Filippova

Text summarization: types

INFORMATIVE

• An examination of the work of Consumer Advice Centresand of the information sources and support activities thatpublic libraries can offer. CACs have dealt with pre-shoppingadvice, education on consumers’ rights and complaintsabout goods and services, advising the client and oftenobtaining expert assessment. They have drawn on a widerange of information sources including case records, tradeliterature, contact files and external links. The recent closureof many CACs has seriously affected the availability ofconsumer information and advice. Libraries can cooperateclosely with advice agencies through local coordinatingcommitted, shared premises, join publicity referral and thesharing of professional expertise.

Text Summarization – 25.02.2009 – p. 5

Page 9: Katja Filippova

Text summarization: types

• Source: single-document vs. multi-document➠ research paper➠ proceedings of a conference

Text Summarization – 25.02.2009 – p. 6

Page 10: Katja Filippova

Text summarization: types

• Source: single-document vs. multi-document➠ research paper➠ proceedings of a conference

• Content: generic vs. query-based vs. user-focused➠ equal coverage of all major topics➠ based on a question “what are the causes of the war?”➠ users interested in chemistry

Text Summarization – 25.02.2009 – p. 6

Page 11: Katja Filippova

Text summarization: types

• Source: single-document vs. multi-document➠ research paper➠ proceedings of a conference

• Content: generic vs. query-based vs. user-focused➠ equal coverage of all major topics➠ based on a question “what are the causes of the war?”➠ users interested in chemistry

• Form: extract vs. abstract➠ fragments from the document➠ newly re-written text

Text Summarization – 25.02.2009 – p. 6

Page 12: Katja Filippova

Extraction vs. abstraction

How should a text summarization system proceed?

• read the documents

• understand them – builda semantic representation

• generate a summary fromthis representation

Text Summarization – 25.02.2009 – p. 7

Page 13: Katja Filippova

Extraction vs. abstraction

• unfortunately, a rich semantic representation is notpossible yet

• to date, most summarization systems are extractive

• usually, extraction units are sentences

• low cost solution: could work without ontologies,complex representations, etc.

• extractive summaries are usually incoherent

• trade-off between non-redundancy and completeness

Text Summarization – 25.02.2009 – p. 8

Page 14: Katja Filippova

Extraction vs. abstraction

Three sentences from related documents (Oct. 27 2009):• The Syrian foreign minister today condemned the killing of

eight civilians in a US raid as an act of "criminal and terroristaggression". (The Guardian)

• Syria accused the United States on Monday of carrying outa "terrorist aggression" after a deadly raid near its borderwith Iraq which it said killed eight civilians. (Reuters)

• Lebanese President Michel Suleiman on Monday contactedhis Syrian counterpart Bashar Assad to denounce"Sunday’s American aggression" against the Syrian villageof Abu Kamal near the border with Iraq, local Elnashrawebsite reported. (Aljazeera)

Text Summarization – 25.02.2009 – p. 9

Page 15: Katja Filippova

Extraction vs. abstraction

Three sentences from related documents (Oct. 27 2009):• The Syrian foreign minister today condemned the killing of

eight civilians in a US raid as an act of "criminal and terroristaggression". (The Guardian)

• Syria accused the United States on Monday of carrying outa "terrorist aggression" after a deadly raid near its borderwith Iraq which it said killed eight civilians. (Reuters)

• Lebanese President Michel Suleiman on Monday contactedhis Syrian counterpart Bashar Assad to denounce"Sunday’s American aggression" against the Syrian villageof Abu Kamal near the border with Iraq, local Elnashrawebsite reported. (Aljazeera)

Text Summarization – 25.02.2009 – p. 9

Page 16: Katja Filippova

Extraction vs. abstraction

Three sentences from related documents (Oct. 27 2009):• The Syrian foreign minister today condemned the killing of

eight civilians in a US raid as an act of "criminal and terroristaggression". (The Guardian)

• Syria accused the United States on Monday of carrying outa "terrorist aggression" after a deadly raid near its borderwith Iraq which it said killed eight civilians. (Reuters)

• Lebanese President Michel Suleiman on Monday contactedhis Syrian counterpart Bashar Assad to denounce"Sunday’s American aggression" against the Syrian villageof Abu Kamal near the border with Iraq, local Elnashrawebsite reported. (Aljazeera)

Text Summarization – 25.02.2009 – p. 9

Page 17: Katja Filippova

Extraction vs. abstraction

Three sentences from related documents (Oct. 27 2009):• The Syrian foreign minister today condemned the killing of

eight civilians in a US raid as an act of "criminal and terroristaggression". (The Guardian)

• Syria accused the United States on Monday of carrying outa "terrorist aggression" after a deadly raid near its borderwith Iraq which it said killed eight civilians. (Reuters)

• Lebanese President Michel Suleiman on Monday contactedhis Syrian counterpart Bashar Assad to denounce"Sunday’s American aggression" against the Syrian villageof Abu Kamal near the border with Iraq, local Elnashrawebsite reported. (Aljazeera)

Text Summarization – 25.02.2009 – p. 9

Page 18: Katja Filippova

Extraction vs. abstraction

• extractive summaries are not coherent – sentences pulledout from different documents make sense each but soundawkward when put together

Text Summarization – 25.02.2009 – p. 10

Page 19: Katja Filippova

Extraction vs. abstraction

• extractive summaries are not coherent – sentences pulledout from different documents make sense each but soundawkward when put together

• unresolved pronouns may distort the meaning

Text Summarization – 25.02.2009 – p. 10

Page 20: Katja Filippova

Extraction vs. abstraction

• extractive summaries are not coherent – sentences pulledout from different documents make sense each but soundawkward when put together

• unresolved pronouns may distort the meaning

• beginning with a sentence which starts with However, ... isnot a good idea

Text Summarization – 25.02.2009 – p. 10

Page 21: Katja Filippova

Extraction vs. abstraction

• extractive summaries are not coherent – sentences pulledout from different documents make sense each but soundawkward when put together

• unresolved pronouns may distort the meaning

• beginning with a sentence which starts with However, ... isnot a good idea

• there is a striking difference with human generated texts –pronouns and connectives are in the right place, the flow ofdiscourse makes sense

Text Summarization – 25.02.2009 – p. 10

Page 22: Katja Filippova

Extraction vs. abstraction

• extractive summaries are not coherent – sentences pulledout from different documents make sense each but soundawkward when put together

• unresolved pronouns may distort the meaning

• beginning with a sentence which starts with However, ... isnot a good idea

• there is a striking difference with human generated texts –pronouns and connectives are in the right place, the flow ofdiscourse makes sense

• How could one use this property of natural discourse forsummarization?

Text Summarization – 25.02.2009 – p. 10

Page 23: Katja Filippova

Text coherence vs. text cohesion

• John enjoys playing the piano. John wants to become afamous piano player. John works hard and works hard everyday. Working hard is necessary to become a famous pianoplayer.

Text Summarization – 25.02.2009 – p. 11

Page 24: Katja Filippova

Text coherence vs. text cohesion

• John enjoys playing the piano. John wants to become afamous piano player. John works hard and works hard everyday. Working hard is necessary to become a famous pianoplayer.

Text Summarization – 25.02.2009 – p. 11

Page 25: Katja Filippova

Text coherence vs. text cohesion

• John enjoys playing the piano. John wants to become afamous piano player. John works hard and works hard everyday. Working hard is necessary to become a famous pianoplayer.

• John enjoys playing the piano. However, he woke up earlyyesterday. But the day before yesterday the weather waswonderful, because rain and snow started immediately andcontinued the whole day through. By the way, his teacherdid the same.

Text Summarization – 25.02.2009 – p. 11

Page 26: Katja Filippova

Text coherence vs. text cohesion

• John enjoys playing the piano. John wants to become afamous piano player. John works hard and works hard everyday. Working hard is necessary to become a famous pianoplayer.

• John enjoys playing the piano. However, he woke up earlyyesterday. But the day before yesterday the weather waswonderful, because rain and snow started immediately andcontinued the whole day through. By the way, his teacherdid the same.

Text Summarization – 25.02.2009 – p. 11

Page 27: Katja Filippova

Text coherence vs. text cohesion

• John enjoys playing the piano. John wants to become afamous piano player. John works hard and works hard everyday. Working hard is necessary to become a famous pianoplayer.

• John enjoys playing the piano. However, he woke up earlyyesterday. But the day before yesterday the weather waswonderful, because rain and snow started immediately andcontinued the whole day through. By the way, his teacherdid the same.

• John enjoys playing the piano and wants to become famous.He works hard and does it every day because it isnecessary for his goal.

Text Summarization – 25.02.2009 – p. 11

Page 28: Katja Filippova

Text coherence vs. text cohesion

• Text coherence represents the overall structure of amulti-sentence text in terms of macro-level relationsbetween clauses or sentences (Halliday & Hasan, 1996).➠ Rhetorical Structure Theory (Mann & Thompson, 1988)➠ Discourse Representation Theory (Kamp, 1981)➠ Discourse Lexicalized Tree Adjoining Grammar (Forbes,

2001)

• John enjoys playing the piano. [John wants to become afamous piano player.] (that’s why) [John works hard andworks hard every day.] Working hard is necessary tobecome a famous piano player.

Text Summarization – 25.02.2009 – p. 12

Page 29: Katja Filippova

Text coherence vs. text cohesion

• Text cohesion involves relations between words, wordsenses, or referring expressions, which determine howtightly connected the text is (Halliday & Hasan, 1996).➠ anaphora, ellipsis, connectives➠ synonymy and other lexical relations

• John enjoys playing the piano. However, he woke up earlyyesterday. But the day before yesterday the weather waswonderful, because rain and snow started immediately andcontinued the whole day through. By the way, his teacherdid the same.

Text Summarization – 25.02.2009 – p. 12

Page 30: Katja Filippova

Coherence based summarization

• earlier systems considered technical documents and aimedat identifying important information by assigning weights tosentences (Luhn, 1958; Edmundson, 1969)

• several weighted features were used:➠ word (stem) frequency➠ presence of cue words (e.g., as a result, significant)

which signalize important content➠ sentence position➠ document structure

• feature weights were tuned manually

Text Summarization – 25.02.2009 – p. 13

Page 31: Katja Filippova

Coherence based summarization

• Rhetorical Structure Theory (Mann & Thompson, 1987)• elaboration• example• contrast• background• motivation• etc.

"I am optimistic"said Mr. Smith

as the market plunged.

AttributionCircumstance

(from Sporleder & Lapata, 2005)Text Summarization – 25.02.2009 – p. 14

Page 32: Katja Filippova

Coherence based summarization

• one could use discourse structure for summarization(Marcu, 2000)

• however, this is not done often:• there are few discourse parsers and they are not very

precise• there are arguments whether tree representation is

sufficient for discourse (Wolf & Gibson, 2005)• it is not obvious to classify rhetorical relations• some relations are argued to be anaphoric and not

discourse (Webber et al., 2003)

Text Summarization – 25.02.2009 – p. 15

Page 33: Katja Filippova

Cohesion based summarization

• it is common to represent a text as a graph, where nodesare sentences and edges are some relations between them(e.g., discourse relations or just similarity)

• a common graph connectivity assumption is that the nodeswhich are connected to many other nodes are likely to carrysalient information

• it is also assumed that nodes whose removal affects thestructure of the document are important (Skorochodko, 1972from Mani, 2001)

Text Summarization – 25.02.2009 – p. 16

Page 34: Katja Filippova

Cohesion based summarization

• it is common to represent a text as a graph, where nodesare sentences and edges are some relations between them(e.g., discourse relations or just similarity)

• a common graph connectivity assumption is that the nodeswhich are connected to many other nodes are likely to carrysalient information

• it is also assumed that nodes whose removal affects thestructure of the document are important (Skorochodko, 1972from Mani, 2001)

Text Summarization – 25.02.2009 – p. 16

Page 35: Katja Filippova

Cohesion based summarization

• modern approaches extend this idea and use PageRank(Page & Brin, 1998) to find salient nodes (Erkan & Radev,2004; Mihalcea & Tarau, 2004) in such a graph

• similar sentences are connected(bag-of-words similarity)

Text Summarization – 25.02.2009 – p. 17

Page 36: Katja Filippova

Cohesion based summarization

• modern approaches extend this idea and use PageRank(Page & Brin, 1998) to find salient nodes (Erkan & Radev,2004; Mihalcea & Tarau, 2004) in such a graph

• similar sentences are connected(bag-of-words similarity)

• a similarity threshold is used

Text Summarization – 25.02.2009 – p. 17

Page 37: Katja Filippova

Cohesion based summarization

• modern approaches extend this idea and use PageRank(Page & Brin, 1998) to find salient nodes (Erkan & Radev,2004; Mihalcea & Tarau, 2004) in such a graph

• similar sentences are connected(bag-of-words similarity)

• a similarity threshold is used• the top N of page-ranked

sentences are extracted

Text Summarization – 25.02.2009 – p. 17

Page 38: Katja Filippova

Coherence vs. cohesion based TS

• Coherence:+ transparent; coherence of the output can be improved– annotation of relations is still a challenge; preprocessing

difficulties

• Cohesion:+ intuitively appealing; low-cost; even unsupervized– requires WSD*, anaphora resolution; hard to pin down;

tuned thresholds

* word sense disambiguation

Text Summarization – 25.02.2009 – p. 18

Page 39: Katja Filippova

DUC competitions

• Document Understanding Conferences (2000-2007)• from 2008 Text Analysis Conference (TAC)

• provide participants with- a task- data- manual and automatic evaluation

• increasing challenge in tasks: from generic single-documentsummarization to multi-document update summary (2008)

Text Summarization – 25.02.2009 – p. 19

Page 40: Katja Filippova

DUC competitions

Sample topic: D0740I

round-the-world balloon flight

Report on the planning, attempts and firstsuccessful balloon circumnavigation of the earthby Bertrand Piccard and his crew.

Text Summarization – 25.02.2009 – p. 20

Page 41: Katja Filippova

DUC competitions

<DOC>

<DOCNO> APW19981112.0453 </DOCNO>

<DOCTYPE> NEWS STORY </DOCTYPE>

<DATE_TIME> 11/12/1998 08:21:00 </DATE_TIME>

<HEADER> w1942 &Cx1f; wstm- r i &Cx13; &Cx11; BC-Switzerlan d-BalloonQu

11-12 0355 </HEADER>

<BODY>

<SLUG> BC-Switzerland-Balloon Quest </SLUG> <HEADLINE> S wiss challenger

prepares third attempt at global record </HEADLINE> &UR; AP Photos GEV

101-102 &QL; <TEXT> GENEVA (AP) _ Swiss balloon pilot Bertra nd Piccard

and his new teammate, British flight engineer Tony Brown, sa id Thursday

they will be ready later this month for a new attempt to fly non stop

round the world. Their new Breitling Orbiter 3 balloon will t ake off

from Chateau d’Oex, in the Swiss Alps, as soon after Nov. 25 as weather

conditions are favorable, they said. It will be Piccard’s th ird attempt

to become the first to pilot a balloon around the world. In Feb ruary

the Swiss pilot, along with British flight engineer Andy Els on andText Summarization – 25.02.2009 – p. 20

Page 42: Katja Filippova

The EML NLP group at DUC 2007

Text Summarization – 25.02.2009 – p. 21

Page 43: Katja Filippova

Preprocessing: Annotation

• Sentence splitting• Tokenization• PoS tagging• Chunking• Named Entities recognition

Text Summarization – 25.02.2009 – p. 22

Page 44: Katja Filippova

Preprocessing: Problems

• Sentence splitting<sentence>At Pine Ridge, a scrolling marqueeat Big Bat’s Texaco expressed both joy overClinton’s visit and wariness of all theofficial attention: “Welcome PresidentClinton.</sentence> <sentence>Remember ourtreaties,” the sign read.

Text Summarization – 25.02.2009 – p. 23

Page 45: Katja Filippova

Preprocessing: Problems

• Sentence splitting<sentence>At Pine Ridge, a scrolling marqueeat Big Bat’s Texaco expressed both joy overClinton’s visit and wariness of all theofficial attention: “Welcome PresidentClinton.</sentence> <sentence>Remember ourtreaties,” the sign read.

• and cleaning<sentence> PINE RIDGE, S.D.</sentence>

<sentence> (AP) - President Clinton turned theattention of his national poverty tour todayto arguably the poorest, most forgotten U.S.citizens of them all: AmericanIndians.</sentence>

Text Summarization – 25.02.2009 – p. 23

Page 46: Katja Filippova

Preprocessing: Document filtering

• Match topic with document extracts• Pick the top 5 matching documents

Text Summarization – 25.02.2009 – p. 24

Page 47: Katja Filippova

Semantic analysis

• Filter topic• Connect topic words with words in

document sentences• Compute sentence scores

matching wordsmatching word sequences

➠ ranked list of sentences

Text Summarization – 25.02.2009 – p. 25

Page 48: Katja Filippova

Extractive summary generation

• Rerank sentences• Select the top non-redundant sentences (250 word limit)• Re-arrange sentences Text Summarization – 25.02.2009 – p. 26

Page 49: Katja Filippova

A good summary

Round-the-world balloon flight: Report on the planning, attemptsand first successful balloon circumnavigation of the earth byBertrand Piccard and his crew.

Swiss balloon pilot Bertrand Piccard announced Wednesdaythat he has chosen Brian Jones as his teammate for his nextattempt at circling the world in a balloon. Jones, 52, replacesfellow British flight engineer Tony Brown. Achieving whatpromoters called the last great milestone of aviation, BertrandPiccard and Brian Jones joined legends like the Wright Brothersand Charles Lindbergh with Saturday’s completion of the firstmanned round-the-world balloon flight. At 4:54 a.m. ESTSaturday, the two balloonists crossed the line of longitude fromwhich they had departed on March 1 at Chateau D’Oex,Switzerland, ... Text Summarization – 25.02.2009 – p. 27

Page 50: Katja Filippova

A bad summary

Angelina Jolie: What have been the most recent significantevents in the life and career of actress Angelina Jolie?

Angelina Jolie’s win for best supporting actress for her role in“Girl, Interrupted” came 21 years after father Jon Voight wasawarded best actor for “Coming Home.“ ANGELINA JOLIE’SLIFE ON THE EDGE After all, her career is in overdrive. ButJolie cautions that she’s still a serious actress. It’s not like I’msuddenly a better actress because I have awards or this boxoffice clout,” she says. “I am secure in the fact that I do havesomething to offer as an actress,”Jolie says. ‘...

Text Summarization – 25.02.2009 – p. 28

Page 51: Katja Filippova

Evaluation

• automatic evaluation with ROUGE (Lin, 2004)

• manual evaluation with respect to➠ responsiveness➠ linguistic quality

1. grammaticality2. non-redundancy3. referential clarity4. focus5. structure and coherence

• our system scored above the average, top 5 fornon-redundancy and coherence (recall the documentfiltering stage)

Text Summarization – 25.02.2009 – p. 29

Page 52: Katja Filippova

Research directions

• like in information retrieval, query expansion is expected toimprove recall➠ WordNet (Fellbaum, 1998) for similarity➠ Wikipedia for relatedness (Strube & Ponzetto, 2006)➠ paraphrases

Text Summarization – 25.02.2009 – p. 30

Page 53: Katja Filippova

Research directions

• like in information retrieval, query expansion is expected toimprove recall➠ WordNet (Fellbaum, 1998) for similarity➠ Wikipedia for relatedness (Strube & Ponzetto, 2006)➠ paraphrases

• coreference resolution is needed for preprocessing,otherwise, e.g., pronouns are filtered as stopwords

Text Summarization – 25.02.2009 – p. 30

Page 54: Katja Filippova

Research directions

• like in information retrieval, query expansion is expected toimprove recall➠ WordNet (Fellbaum, 1998) for similarity➠ Wikipedia for relatedness (Strube & Ponzetto, 2006)➠ paraphrases

• coreference resolution is needed for preprocessing,otherwise, e.g., pronouns are filtered as stopwords

• relevance vs. redundancy issue: in MDS, how can weensure non-redundancy of the summary? (Carbonell &Goldstein, 1998)

Text Summarization – 25.02.2009 – p. 30

Page 55: Katja Filippova

Research directions

• like in information retrieval, query expansion is expected toimprove recall➠ WordNet (Fellbaum, 1998) for similarity➠ Wikipedia for relatedness (Strube & Ponzetto, 2006)➠ paraphrases

• coreference resolution is needed for preprocessing,otherwise, e.g., pronouns are filtered as stopwords

• relevance vs. redundancy issue: in MDS, how can weensure non-redundancy of the summary? (Carbonell &Goldstein, 1998)

• sentence ordering for extractive MDS (Barzilay & Lapata,2005)

Text Summarization – 25.02.2009 – p. 30

Page 56: Katja Filippova

Directions of research

• abstractive summarization is a distant goal but there areways to go beyond sentence extraction➠ sentence compression➠ sentence fusion

Text Summarization – 25.02.2009 – p. 31

Page 57: Katja Filippova

Sentence compression

This is true, regardless of the opinion that some people have of Syria, and oftheir unhappiness at Syria’s presence in Lebanon.

Text Summarization – 25.02.2009 – p. 32

Page 58: Katja Filippova

Sentence compression

This is true, regardless of the opinion that some people have of Syria, and oftheir unhappiness at Syria’s presence in Lebanon.

Text Summarization – 25.02.2009 – p. 32

Page 59: Katja Filippova

Sentence compression

This is true, regardless of the opinion that some people have of Syria, and oftheir unhappiness at Syria’s presence in Lebanon.

• summarization on the sentence level

• in principle, a compression can be different from the input(different wording and structure)

• to date, most systems use word deletion only

• meanwhile there is a compression corpus available onlinehttp://homepages.inf.ed.ac.uk/s0460084/data

• the performance can be evaluated automatically

Text Summarization – 25.02.2009 – p. 32

Page 60: Katja Filippova

Sentence fusion

1 John Smith, born November 15 1900, studied chemistry and physics atthe University of London.

2 From 1917 Mr. Smith studied at the University of London and in 1921 hegraduated with distinction.

Text Summarization – 25.02.2009 – p. 33

Page 61: Katja Filippova

Sentence fusion

1 John Smith, born November 15 1900, studied chemistry and physics atthe University of London.

2 From 1917 Mr. Smith studied at the University of London and in 1921 hegraduated with distinction.

➠ Mr. Smith studied chemistry and physics at the University of Londonfrom 1917.

• pieces of related sentences are used to generate a novelsentence

• can be seen as a middle ground between extractive andabstractive summarization

• addresses the incompleteness-redundancy problem

Text Summarization – 25.02.2009 – p. 33

Page 62: Katja Filippova

Thank you!

(FOR YOUR ATTENTION)

Text Summarization – 25.02.2009 – p. 34

Page 63: Katja Filippova

References

• R. Barzilay & M. Lapata, 2005: Modeling local coherence:An entity-based approach

• S. Brin & L. Page, 1998: The anatomy of a large-scalehypertextual web search engine

• J. G. Carbonell & J. Goldstein, 1998: The use of MMR,diversity-based reranking for reordering documents andproducing summaries

• H. P. Edmundson, 1969: New methods in automaticextracting

• G. Erkan & D. Radev, 2004: LexRank: Graph-based lexicalcentrality as salience in text summarization

• C. Fellbaum, 1998: WordNet: An electronic lexical database

Text Summarization – 25.02.2009 – p. 35

Page 64: Katja Filippova

References

• K. Forbes, E. Miltsakaki, R. Prasad, A. Sarkar, A. Joshi, B.L. Webber, 2001: DLTAG system – discourse parsing with aLexicalized Tree Adjoining Grammar

• M. Halliday & R. Hasan, 1996: Cohesion in text• E. H. Hovy, 2003: Text summarization• H. Kamp, 1981: A theory of truth and semantic

representation• C.-Y. Lin, 2004: Automatic evaluation of summaries using

N-gram co-occurrence statistics• H. P. Luhn, 1958: The automatic creation of literature

abstracts• I. Mani, 2001: Automatic summarization

Text Summarization – 25.02.2009 – p. 36

Page 65: Katja Filippova

References

• W. C. Mann & S. A. Thompson, 1988: Rhetorical structuretheory. Towards a functional theory of text organization

• D. Marcu, 2000: The theory and practice of discourseparsing and summarization

• R. Mihalcea & P. Tarau, 2004: TextRank: Bringing orderinto text

• E. Skorochodko, 1972: Adaptive method of automaticabstracting and indexing

• C. Sporleder & M. Lapata, 2005: Discourse chunking and itsapplication to sentence compression

• M. Strube & S. P. Ponzetto, 2006: WikiRelate! Computingsemantic relatedness using Wikipedia

Text Summarization – 25.02.2009 – p. 37

Page 66: Katja Filippova

References

• B. L. Webber, M. Stone, A. Joshi, A. Knott, 2003: Anaphoraand discourse structure

• F. Wolf & E. Gibson, 2005: Representing discoursecoherence: A corpus-based study

Text Summarization – 25.02.2009 – p. 38