summarization
DESCRIPTION
Summarization. CS276B Web Search and Mining. Automated Text summarization Tutorial — COLING/ACL’98. Lecture 14 Text Mining II (includes slides borrowed from G. Neumann, M. Venkataramani, R. Altman, L. Hirschman, and D. Radev) . an exciting challenge. - PowerPoint PPT PresentationTRANSCRIPT
Summarization
CS276BWeb Search and Mining
Lecture 14Text Mining II
(includes slides borrowed from G. Neumann, M. Venkataramani, R. Altman, L. Hirschman, and D. Radev)
Automated Text summarization Tutorial — COLING/ACL’98
3
an exciting challenge......put a book on the scanner, turn the dial
to ‘2 pages’, and read the result...
...download 1000 documents from the web, send them to the summarizer, and select the best ones by reading the summaries of the clusters...
...forward the Japanese email to the summarizer, select ‘1 par’, and skim the translated summary.
4
Headline news — informing
5
TV-GUIDES — decision making
6
Abstracts of papers — time saving
7
Graphical maps — orienting
8
Textual Directions — planning
9
Cliff notes — Laziness support
10
Real systems — Money making
11
Questions
What kinds of summaries do people want? What are summarizing, abstracting, gisting,...?
How sophisticated must summ. systems be? Are statistical techniques sufficient? Or do we need symbolic techniques and deep
understanding as well? What milestones would mark quantum leaps
in summarization theory and practice? How do we measure summarization quality?
What is a Summary? Informative summary
Purpose: replace original document Example: executive summary
Indicative summary Purpose: support decision: do I want to
read original document yes/no? Example: Headline, scientific abstract
13
‘Genres’ of Summary? Indicative vs. informative
...used for quick categorization vs. content processing.
Extract vs. abstract...lists fragments of text vs. re-phrases content coherently.
Generic vs. query-oriented...provides author’s view vs. reflects user’s interest.
Background vs. just-the-news...assumes reader’s prior knowledge is poor vs. up-to-date.
Single-document vs. multi-document source...based on one text vs. fuses together many texts.
14
Examples of GenresExercise: summarize the following texts for the
following readers: text1: Coup Attempt
text2: childrens’ story
reader1: your friend, who knows nothing about South Africa. reader2: someone who lives in South Africa and knows the political position.reader3: your 4-year-old niece.reader4: the Library of Congress.
Why Automatic Summarization?
Algorithm for reading in many domains is:1) read summary2) decide whether relevant or not3) if relevant: read whole document
Summary is gate-keeper for large number of documents.
Information overload Often the summary is all that is read.
Human-generated summaries are expensive.
Summary Length (Reuters)
Goldstein et al. 1999
Summarization Algorithms Keyword summaries
Display most significant keywords Easy to do Hard to read, poor representation of content
Sentence extraction Extract key sentences Medium hard Summaries often don’t read well Good representation of content
Natural language understanding / generation Build knowledge representation of text Generate sentences summarizing content Hard to do well
Something between the last two methods?
Sentence Extraction Represent each sentence as a feature vector Compute score based on features Select n highest-ranking sentences Present in order in which they occur in text. Postprocessing to make summary more
readable/concise Eliminate redundant sentences Anaphors/pronouns
A woman walks. She smokes. Delete subordinate clauses, parentheticals
Sentence Extraction: Example Sigir95 paper on
summarization by Kupiec, Pedersen, Chen
Trainable sentence extraction
Proposed algorithm is applied to its own description (the paper)
Sentence Extraction: Example
Feature Representation Fixed-phrase feature
Certain phrases indicate summary, e.g. “in summary”
Paragraph feature Paragraph initial/final more likely to be important.
Thematic word feature Repetition is an indicator of importance
Uppercase word feature Uppercase often indicates named entities. (Taylor)
Sentence length cut-off Summary sentence should be > 5 words.
Feature Representation (cont.) Sentence length cut-off
Summary sentences have a minimum length. Fixed-phrase feature
True for sentences with indicator phrase “in summary”, “in conclusion” etc.
Paragraph feature Paragraph initial/medial/final
Thematic word feature Do any of the most frequent content words
occur? Uppercase word feature
Is uppercase thematic word introduced?
Training Hand-label sentences in training set
(good/bad summary sentences) Train classifier to distinguish good/bad
summary sentences Model used: Naïve Bayes
Can rank sentences according to score and show top n to user.
Evaluation Compare extracted sentences with
sentences in abstracts
Evaluation of features Baseline (choose first n sentences): 24% Overall performance (42-44%) not very good. However, there is more than one good
summary.
Multi-Document (MD) Summarization
Summarize more than one document Harder but benefit is large (can’t scan 100s
of docs) To do well, need to adopt more specific
strategy depending on document set. Other components needed for a production
system, e.g., manual post-editing. DUC: government sponsored bake-off
200 or 400 word summaries Longer → easier
Types of MD Summaries Single event/person tracked over a long
time period Elizabeth Taylor’s bout with pneumonia Give extra weight to character/event May need to include outcome (dates!)
Multiple events of a similar nature Marathon runners and races More broad brush, ignore dates
An issue with related events Gun control Identify key concepts and select sentences
accordingly
Determine MD Summary Type First, determine which type of summary to
generate Compute all pairwise similarities Very dissimilar articles → multi-event
(marathon) Mostly similar articles
Is most frequent concept named entity? Yes → single event/person (Taylor) No → issue with related events (gun control)
MultiGen Architecture (Columbia)
Generation Ordering according to date Intersection
Find concepts that occur repeatedly in a time chunk
Sentence generator
Processing Selection of good summary sentences Elimination of redundant sentences Replace anaphors/pronouns with noun
phrases they refer to Need coreference resolution
Delete non-central parts of sentences
Newsblaster (Columbia)
Query-Specific Summarization So far, we’ve look at generic summaries. A generic summary makes no assumption
about the reader’s interests. Query-specific summaries are specialized
for a single information need, the query. Summarization is much easier if we have a
description of what the user wants.
Genre Some genres are easy to summarize
Newswire stories Inverted pyramid structure The first n sentences are often the best
summary of length n Some genres are hard to summarize
Long documents (novels, the bible) Scientific articles?
Trainable summarizers are genre-specific.
Discussion Correct parsing of document format is
critical. Need to know headings, sequence, etc.
Limits of current technology Some good summaries require natural
language understanding Example: President Bush’s nominees for
ambassadorships Contributors to Bush’s campaign Veteran diplomats Others
Coreference Resolution
Coreference Two noun phrases referring to the same
entity are said to corefer. Example: Transcription from RL95-2 is
mediated through an ERE element at the 5-flanking region of the gene.
Coreference resolution is important for many text mining tasks: Information extraction Summarization First story detection
Types of Coreference Noun phrases: Transcription from RL95-2
… the gene … Pronouns: They induced apoptosis. Possessives: … induces their rapid
dissociation … Demonstratives: This gene is responsible
for Alzheimer’s
Preferences in pronoun interpretation
Recency: John has an Integra. Bill has a legend. Mary likes to drive it.
Grammatical role: John went to the Acura dealership with Bill. He bought an Integra.
(?) John and Bill went to the Acura dealership. He bought an Integra.
Repeated mention: John needed a car to go to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.
Preferences in pronoun interpretation
Parallelism: Mary went with Sue to the Acura dealership. Sally went with her to the Mazda dealership.
??? Mary went with Sue to the Acura dealership. Sally told her not to buy anything.
Verb semantics: John telephoned Bill. He lost his pamphlet on Acuras. John criticized Bill. He lost his pamphlet on Acuras.
An algorithm for pronoun resolution
Two steps: discourse model update and pronoun resolution.
Salience values are introduced when a noun phrase that evokes a new entity is encountered.
Salience factors: set empirically.
Salience weights in Lappin and Leass
Sentence recency 100
Subject emphasis 80
Existential emphasis 70
Accusative emphasis 50Indirect object and oblique complement
emphasis 40
Non-adverbial emphasis 50
Head noun emphasis 80
Lappin and Leass (cont’d) Recency: weights are cut in half after each
sentence is processed. Examples:
An Acura Integra is parked in the lot. There is an Acura Integra parked in the lot. John parked an Acura Integra in the lot. John gave Susan an Acura Integra. In his Acura Integra, John showed Susan his
new CD player.
Algorithm1. Collect the potential referents (up to four
sentences back).2. Remove potential referents that do not agree
in number or gender with the pronoun.3. Remove potential referents that do not pass
intrasentential syntactic coreference constraints.
4. Compute the total salience value of the referent by adding any applicable values for role parallelism (+35) or cataphora (-175).
5. Select the referent with the highest salience value. In case of a tie, select the closest referent in terms of string position.
Observations Lappin & Leass - tested on computer
manuals - 86% accuracy on unseen data.
Another well known theory is Centering (Grosz, Joshi, Weinstein), which has an additional concept of a “center”. (More of a theoretical model; less empirical confirmation.)
48
Making Sense of it All... To understand summarization, it helps to consider
several perspectives simultaneously: 1. Approaches: basic starting point, angle of
attack, core focus question(s): psycholinguistics, text linguistics, computation...
2. Paradigms: theoretical stance; methodological preferences: rules, statistics, NLP, Info Retrieval, AI...
3. Methods: the nuts and bolts: modules, algorithms, processing: word frequency, sentence position, concept generalization...
49
Psycholinguistic Approach: 2 Studies
Coarse-grained summarization protocols from professional summarizers (Kintsch and van Dijk, 78):
Delete material that is trivial or redundant. Use superordinate concepts and actions. Select or invent topic sentence.
552 finely-grained summarization strategies from professional summarizers (Endres-Niggemeyer, 98):
Self control: make yourself feel comfortable. Processing: produce a unit as soon as you have
enough data. Info organization: use “Discussion” section to
check results. Content selection: the table of contents is relevant.
50
Computational Approach: BasicsTop-Down: I know what I want!
— don’t confuse me with drivel!
User needs: only certain types of info
System needs: particular criteria of interest, used to focus search
Bottom-Up: • I’m dead curious:
what’s in the text?
• User needs: anything that’s important
• System needs: generic importance metrics, used to rate content
51
Query-Driven vs. Text-DRIVEN Focus
Top-down: Query-driven focus Criteria of interest encoded as search specs. System uses specs to filter or analyze text
portions. Examples: templates with slots with semantic
characteristics; termlists of important terms. Bottom-up: Text-driven focus
Generic importance metrics encoded as strategies. System applies strategies over rep of whole text. Examples: degree of connectedness in semantic
graphs; frequency of occurrence of tokens.
52
Bottom-Up, using Info. Retrieval
IR task: Given a query, find the relevant document(s) from a large set of documents.
Summ-IR task: Given a query, find the relevant passage(s) from a set of passages (i.e., from one or more documents).
• Questions: 1. IR techniques work on
large volumes of data; can they scale down accurately enough?
2. IR works on words; do abstracts require abstract representations?
xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx xxxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xxxx xxxxxx xx xx xxxx x xxxxx x xx xx xxxxx x x xxxxx xxxxxx xxxxxx x xxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xx xxx xxxx xxx xxxx xx xxxxx xxxxx xx xxx xxxxxx xxx
53
Top-Down, using Info. Extraction IE task: Given a template and a text, find all
the information relevant to each slot of the template and fill it in.
Summ-IE task: Given a query, select the best template, fill it in, and generate the contents.
• Questions:1. IE works only for very
particular templates; can it scale up?
2. What about information that doesn’t fit into any template—is this a generic limitation of IE?
xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx xxxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xxxx xxxx xxxxxx xx xx xxxx x xxxxx x xx xx xxxxx x x xxxxx xxxxxx xxxxxx x xxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xx x xxxx xxxx xxx xxxx xx xxxxx xxxxx xx xxx xxxxxx xxx
Xxxxx: xxxx Xxx: xxxx Xxx: xx xxx Xx: xxxxx xXxx: xx xxx Xx: x xxx xx Xx: xxx x Xxx: xx Xxx: x
54
NLP/IE:• Approach: try to
‘understand’ text—re-represent content using ‘deeper’ notation; then manipulate that.
• Need: rules for text analysis and manipulation, at all levels.
• Strengths: higher quality; supports abstracting.
• Weaknesses: speed; still needs to scale up to robust open-domain summarization.
IR/Statistics:• Approach: operate at
lexical level—use word frequency, collocation counts, etc.
• Need: large amounts of text.
• Strengths: robust; good for query-oriented summaries.
• Weaknesses: lower quality; inability to manipulate information at abstract levels.
Paradigms: NLP/IE vs. ir/statistics
55
Toward the Final Answer... Problem: What if neither IR-like nor
IE-like methods work?
Solution: semantic analysis of the text (NLP), using adequate knowledge bases that
support inference (AI).
Mrs. Coolidge: “What did the preacher preach about?”
Coolidge: “Sin.”Mrs. Coolidge: “What did
he say?”Coolidge: “He’s against
it.”
– sometimes counting and templates are insufficient,
– and then you need to do inference to understand.
Word counting
Inference
56
The Optimal Solution...Combine strengths of both paradigms…
...use IE/NLP when you have suitable template(s),
...use IR when you don’t…
…but how exactly to do it?
57
A Summarization Machine
EXTRACTS
ABSTRACTS
?
MULTIDOCS
Extract Abstract
Indicative
Generic
Background
Query-oriented
Just the news
10%
50%
100%
Very Brief Brief
Long
Headline
Informative
DOC QUERY
CASE FRAMESTEMPLATESCORE CONCEPTSCORE EVENTSRELATIONSHIPSCLAUSE FRAGMENTSINDEX TERMS
58
The Modules of the Summarization Machine
EXTRACTION
INTERPRETATION
EXTRACTS
ABSTRACTS
?
CASE FRAMESTEMPLATESCORE CONCEPTSCORE EVENTSRELATIONSHIPSCLAUSE FRAGMENTSINDEX TERMS
MULTIDOC
EXTRACTS
GENERATION
FILTERING
DOCEXTRACTS
60
Summarization methods (& exercise). Topic Extraction. Interpretation. Generation.
61
Overview of Extraction Methods Position in the text
lead method; optimal position policy title/heading method
Cue phrases in sentences Word frequencies throughout the text Cohesion: links among words
word co-occurrence coreference lexical chains
Discourse structure of the text Information Extraction: parsing and analysis
62
POSition-based method (1) Claim: Important sentences occur at the
beginning (and/or end) of texts. Lead method: just take first sentence(s)! Experiments:
In 85% of 200 individual paragraphs the topic sentences occurred in initial position and in 7% in final position (Baxendale, 58).
Only 13% of the paragraphs of contemporary writers start with topic sentences (Donlan, 80).
63
Optimum Position Policy (OPP) Claim: Important sentences are located at
positions that are genre-dependent; these positions can be determined automatically through training (Lin and Hovy, 97). Corpus: 13000 newspaper articles (ZIFF corpus). Step 1: For each article, determine overlap
between sentences and the index terms for the article.
Step 2: Determine a partial ordering over the locations where sentences containing important words occur: Optimal Position Policy (OPP)
64
Title-Based Method (1) Claim: Words in titles and headings are
positively relevant to summarization.
Shown to be statistically valid at 99% level of significance (Edmundson, 68).
Empirically shown to be useful in summarization systems.
65
Cue-Phrase method (1) Claim 1: Important sentences contain
‘bonus phrases’, such as significantly, In this paper we show, and In conclusion, while non-important sentences contain ‘stigma phrases’ such as hardly and impossible.
Claim 2: These phrases can be detected automatically (Kupiec et al. 95; Teufel and Moens 97).
Method: Add to sentence score if it contains a bonus phrase, penalize if it contains a stigma phrase.
66
Word-frequency-based method (1)
Claim: Important sentences contain words that occur “somewhat” frequently.
Method: Increase sentence score for each frequent word.
Evaluation: Straightforward approach empirically shown to be mostly detrimental in summarization systems.
words
Wordfrequency
The resolving power of words
(Luhn, 59)
67
Cohesion-based methods Claim: Important sentences/paragraphs are
the highest connected entities in more or less elaborate semantic structures.
Classes of approaches word co-occurrences; local salience and grammatical relations; co-reference; lexical similarity (WordNet, lexical chains); combinations of the above.
68
Cohesion: WORD co-occurrence (1)
Apply IR methods at the document level: texts are collections of paragraphs (Salton et al., 94; Mitra et al., 97; Buckley and Cardie, 97): Use a traditional, IR-based, word similarity
measure to determine for each paragraph Pi the set Si of paragraphs that Pi is related to.
Method: determine relatedness score Si for each
paragraph, extract paragraphs with largest Si scores.
P1 P2
P3
P4
P5P6P7
P8
P9
69
Word co-occurrence method (2)
Cornell’s Smart-based approach expand original query compare expanded query against paragraphs select top three paragraphs (max 25% of original)
that are most similar to the original query (SUMMAC,98): 71.9% F-score for relevance judgment
CGI/CMU approach maximize query-relevance while minimizing
redundancy with previous information.(SUMMAC,98): 73.4% F-score for relevance judgment
In the context of query-based summarization
70
Cohesion: Local salience Method Assumes that important phrasal expressions are
given by a combination of grammatical, syntactic, and contextual parameters (Boguraev and Kennedy, 97):
CNTX: 50 iff the expression is in the current discourse segmentSUBJ: 80 iff the expression is a subjectEXST: 70 iff the expression is an existential constructionACC: 50 iff the expression is a direct objectHEAD: 80 iff the expression is not contained in another phraseARG: 50 iff the expression is not contained in an adjunct
71
Cohesion: Lexical chains method (1)
But Mr. Kenny’s move speeded up work on a machine which uses micro-computers to control the rate at which an anaesthetic is pumpedinto the blood of patients undergoing surgery. Such machines are nothing new. But Mr. Kenny’s device uses two personal-computers to achievemuch closer monitoring of the pump feeding the anaesthetic into the patient. Extensive testing of the equipment has sufficiently impressedthe authorities which regulate medical equipment in Britain, and, so far,four other countries, to make this the first such machine to be licensedfor commercial sale to hospitals.
Based on (Morris and Hirst, 91)
72
Lexical chains-based method (2) Assumes that important sentences are those
that are ‘traversed’ by strong chains (Barzilay and Elhadad, 97). Strength(C) = length(C) -
#DistinctOccurrences(C) For each chain, choose the first sentence that is
traversed by the chain and that uses a representative set of concepts from that chain. LC algorithm Lead-based algorithm[Jing et al., 98] corpus Recall Prec Recall Prec10% cutoff 67% 61% 82.9% 63.4%
20% cutoff 64% 47% 70.9% 46.9%
73
Cohesion: Coreference method
Build co-reference chains (noun/event identity, part-whole relations) between
query and document - In the context of query-based summarization
title and document sentences within document
Important sentences are those traversed by a large number of chains:
a preference is imposed on chains (query > title > doc)
Evaluation: 67% F-score for relevance (SUMMAC, 98). (Baldwin and Morton, 98)
74
Cohesion: Connectedness method (1)
Map texts into graphs: The nodes of the graph are the words of the
text. Arcs represent adjacency, grammatical, co-
reference, and lexical similarity-based relations. Associate importance scores to words (and
sentences) by applying the tf.idf metric. Assume that important words/sentences are
those with the highest scores.
(Mani and Bloedorn, 97)
80
Information extraction Method (1)
Idea: content selection using templates Predefine a template, whose slots specify what is of
interest. Use a canonical IE system to extract from a (set of)
document(s) the relevant information; fill the template.
Generate the content of the template as the summary. Previous IE work:
FRUMP (DeJong, 78): ‘sketchy scripts’ of terrorism, natural disasters, political visits...
(Mauldin, 91): templates for conceptual IR. (Rau and Jacobs, 91): templates for business. (McKeown and Radev, 95): templates for news.
81
Information Extraction method (2) Example template:
MESSAGE:ID TSL-COL-0001SECSOURCE:SOURCE ReutersSECSOURCE:DATE 26 Feb 93
Early afternoonINCIDENT:DATE 26 Feb 93INCIDENT:LOCATION World Trade CenterINCIDENT:TYPE BombingHUM TGT:NUMBER AT LEAST 5
82
IE State of the Art MUC conferences (1988–97):
Test IE systems on series of domains: Navy sub-language (89), terrorism (92), business (96),...
Create increasingly complex templates. Evaluate systems, using two measures:
Recall (how many slots did the system actually fill, out of the total number it should have filled?).
Precision (how correct were the slots that it filled?).1989 1992 1996
Recall 63.9 71.5 67.1Precision 87.4 84.2 78.3
83
Review of Methods Text location: title,
position Cue phrases Word frequencies Internal text cohesion:
word co-occurrences local salience co-reference of names,
objects lexical similarity semantic rep/graph
centrality Discourse structure
centrality
Information extraction templates
Query-driven extraction:
query expansion lists co-reference with query
names lexical similarity to
query
Bottom-up methods Top-down methods