the pythy summarization system: microsoft research at duc 2007
DESCRIPTION
The Pythy Summarization System: Microsoft Research at DUC 2007. Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki, and Lucy Vanderwende Microsoft Research April 26, 2007. DUC Main Task Results. Automatic Evaluations (30 participants) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/1.jpg)
The PYTHY Summarization System: Microsoft Research at DUC 2007
Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi,
Hisami Suzuki, and Lucy Vanderwende
Microsoft Research
April 26, 2007
![Page 2: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/2.jpg)
DUC Main Task Results
• Automatic Evaluations (30 participants)
• Human Evaluations
• Did pretty well on both measures
Criterion Rank ScoreROUGE-2 2 0.12028
ROUGE-SU4 3 0.17074
Criterion RankPyramid 1=
Content 5=
![Page 3: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/3.jpg)
Overview of PYTHY
• Linear sentence ranking model
• Learns to rank sentences based on:• ROUGE scores against model summaries• Semantic Content Unit (SCU) weights of
sentences selected by past peers• Considers simplified sentences
alongside original sentences
Kk
kk sfwsScore..1
)()(
![Page 4: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/4.jpg)
Featureinventor
y
TargetsROUGE Oracle
Pyramid/SCU
ROUGE X 2
Ranking/
Training
Model
Sentences
SimplifiedSentences
DocsDocsDocs
Docs
PYTHYTraining
![Page 5: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/5.jpg)
SentencesDocs
Docs
Featureinventor
y
SimplifiedSentences
DocsDocs
Model
PYTHYTesting
Search
Dynamic
Scoring
Summary
![Page 6: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/6.jpg)
Sentence Simplification
• Extension of simplification method for DUC06• Provides sentence alternatives, rather than
deterministically simplify a sentence• Uses syntax-based heuristic rules• Simplified sentences evaluated alongside originals
• In DUC 2007:• Average new candidates generated: 1.38 per sentence• Simplified sentences generated: 61% of all sents• Simplified sentences in final output: 60%
Featureinventory
TargetsROUGE OraclePyramid
/SCUROUGE
X 2
Ranking
Training
Model
SentencesSimplifiedSentences
Docs Do
csDocs Doc
s
PYTHYTrainin
g
![Page 7: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/7.jpg)
Sentence-Level Features
• SumFocus features: SumBasic (Nenkova et al 2006) + Task focus• cluster frequency and topic frequency• only these used in MSR DUC06
• Other content word unigrams: headline frequency• Sentence length features (binary features)• Sentence position features (real-valued and binary)• N-grams (bigrams, skip bigrams, multiword phrases)• All tokens (topic and cluster frequency)• Simplified Sentences (binary and ratio of relative length)• Inverse document frequency (idf)
Featureinventory
TargetsROUGE OraclePyramid
/SCUROUGE
X 2
Ranking
Training
Model
SentencesSimplifiedSentences
Docs Do
csDocs Doc
s
PYTHYTrainin
g
![Page 8: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/8.jpg)
Pairwise Ranking
• Define preferences for sentence pairs• Defined using human summaries and SCU weights
• Log-linear ranking objective used in training
• Maximize the probability of choosing the better sentence from each pair of comparable sentences
Featureinventory
TargetsROUGE OraclePyramid
/SCUROUGE
X 2
Ranking
Training
Model
SentencesSimplifiedSentences
Docs Do
csDocs Doc
s
PYTHYTrainin
g
[Ofer et al. 03], [Burges et al. 05]
![Page 9: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/9.jpg)
ROUGE Oracle Metric
• Find an oracle extractive summary• the summary with the highest average ROUGE-2
and ROUGE-SU4 scores • All sentences in the oracle are considered
“better” than any sentence not in the oracle• Approximate greedy search used for finding
the oracle summary
Featureinventory
TargetsROUGE OraclePyramid
/SCUROUGE
X 2
Ranking
Training
Model
SentencesSimplifiedSentences
Docs Do
csDocs Doc
s
PYTHYTrainin
g
![Page 10: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/10.jpg)
Pyramid-Derived Metric
• University of Ottawa SCU-annotated corpus (Copeck et al 06)
• Some sentences in 05 & 06 document collections are:• known to contain certain SCUs• known not to contain any SCUs
• Sentence score is sum of weights of all SCUs
• for un-annotated sentences, the score is undefined
• A sentence pair is constructed for training s1 > s2 iff w(s1) >w(s2)
TargetsROUGE OraclePyramid
/SCUROUGE
X 2
Ranking
Training
Model
SentencesSimplifiedSentences
Docs Do
csDocs Doc
s
PYTHYTrainin
g Feature
inventory
![Page 11: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/11.jpg)
Model Frequency Metrics
• Based on unigram and skip bigram frequency
• Computed for content words only• Sentence si is “better” than sj if
TargetsROUGE OraclePyramid
/SCUROUGE
X 2
Ranking
Training
Model
SentencesSimplifiedSentences
Docs Do
csDocs Doc
s
PYTHYTrainin
g Feature
inventory
k
kcpsw )(ˆ)( models
![Page 12: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/12.jpg)
Combining multiple metrics
• From ROUGE oracle
all sentences in oracle summary better than other sentences
• From SCU annotations
sentences with higher avg SCU weights better
• From model frequency
sentences with words occurring in models better• Combined loss: adding the losses according to all metrics
}:{1 ji ssijD
}:{2 ji ssijD
}:{3 ji ssijD
)()()( 321 DLDLDLL
TargetsROUGE OraclePyramid
/SCUROUGE
X 2
Ranking
Training
Model
SentencesSimplifiedSentences
Docs Do
csDocs Doc
s
PYTHYTrainin
g Feature
inventory
Ranking
Training
![Page 13: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/13.jpg)
SentencesDocs
Docs
Featureinventory
SimplifiedSentences
DocsDocs
Model
PYTHYTesting
Search
Dynamic
Scoring
Summary
![Page 14: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/14.jpg)
Dynamic Sentence Scoring
• Eliminate redundancy by re-weighting• Similar to SumBasic (Nenkova et al 2006), re-
weighting given previously selected sentences
• Discounts for features that decompose into word frequency estimates
SearchDynami
c Scoring
![Page 15: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/15.jpg)
Search
• The search constructs partial summaries and scores them:
• The score of a summary does not decompose into an independent sum of sentence scores• Global dependencies make exact search hard
• Used multiple beams for each length of partial summaries• [McDonald 2007]
),..,|(),...,,( 11...1
21 i
niin sssscoresssScore
SearchDynami
c Scoring
![Page 16: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/16.jpg)
Impact of Sentence Simplification
No Simplified SimplifiedR-2 R-SU4 R-2 R-SU4
SumFocus 0.078 0.132 0.078 0.134
PYTHY 0.089 0.140 0.096 0.147
•Trained on 05 data, tested on O6 data
![Page 17: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/17.jpg)
Impact of Sentence Simplification
No Simplified SimplifiedR-2 R-SU4 R-2 R-SU4
SumFocus 0.078 0.132 0.078 0.134
PYTHY 0.089 0.140 0.096 0.147
•Trained on 05 data, tested on O6 data
![Page 18: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/18.jpg)
Impact of Sentence Simplification
No Simplified SimplifiedR-2 R-SU4 R-2 R-SU4
SumFocus 0.078 0.132 0.078 0.134
PYTHY 0.089 0.140 0.096 0.147
•Trained on 05 data, tested on O6 data
![Page 19: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/19.jpg)
Evaluating the Metrics
Criterion Num Pairs
Train Acc Content Only All Words
R-2 R-SU4 R-2 R-SU4Oracle 941K 93.1 0.076 0.107 0.093 0.143
SCUs 430K 62.0 0.078 0.108 0.086 0.134
Model Freq. 6.3M 96.9 0.076 0.106 0.096 0.147All 7.7M 94.2 0.076 0.107 0.096 0.147
Trained on 05 data, tested on 06 dataIncludes simplified sentences
![Page 20: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/20.jpg)
Evaluating the Metrics
Criterion Num Pairs
Train Acc Content Only All Words
R-2 R-SU4 R-2 R-SU4Oracle 941K 93.1 0.076 0.107 0.093 0.143
SCUs 430K 62.0 0.078 0.108 0.086 0.134
Model Freq. 6.3M 96.9 0.076 0.106 0.096 0.147All 7.7M 94.2 0.076 0.107 0.096 0.147
Trained on 05 data, tested on 06 dataIncludes simplified sentences
![Page 21: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/21.jpg)
Update Summarization Pilot• SVM novelty classifier trained on TREC 02 & 03 novelty
track
ROUGE 2 ROUGE-SU4
PYTHY + Novelty (1) 0.07135 0.11164
PYTHY + Novelty (.5) 0.07879 0.12929
PYTHY + Novelty (.1) 0.08721 0.12958
PYTHY 0.08686 0.12876
SumFocus 0.07002 0.11033
)BG|)(novelPr(PrevS|(PrevS|(Score iiPythyi s)sScore)s
![Page 22: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/22.jpg)
Summary and Future Work• Summary
• Combination of different target metrics for training• Many sentence features• Pair-wise ranking function• Dynamic scoring
• Future work• Boost robustness
• Sensitive to cluster properties (e.g., size)• Improve grammatical quality of simplified sentences• Reconcile novelty and (ir)relevance• Learn features over whole summaries rather than individual
sentences
![Page 23: The Pythy Summarization System: Microsoft Research at DUC 2007](https://reader034.vdocuments.net/reader034/viewer/2022042822/568160f7550346895dd03449/html5/thumbnails/23.jpg)
Thank You