1
Sentence Extraction-based Presentation SummarizationTechniques and Evaluation Metrics
Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui
ICASSP2005Present : Yao-Min Huang
Date : 04/07/2005
2
Introduction
• One of the major applications of automatic speech recognition is to transcribe speech documents such as talks, presentations, lectures, and broadcast news.
• Spontaneous speech is ill-formed and very different from written text.– redundant information repetitions , repairs ...
– Irrelevant or incorrect information recognition errors
3
Introduction
• Therefore, an approach in which all words are simply transcribed is not an effective one for spontaneous speech.
• Instead, speech summarization which extracts important information and removes redundant and incorrect information is ideal for recognizing spontaneous speech.
4
Introduction (SAP2004)
5
Sentence Extraction Methods
• Review SAP2004– Important Sentence Extraction
• The score for important sentence extraction is calculated for each sentence.
• N : # of words in the sentence W• L(wi) 、 I(wi) 、 C(wi) are the linguistic score(trigram) , the
significance score(nouns, verbs, adjectives and out-of-vocabulary (OOV) ), and the confidence score of word wi , respectively.
– The experiment shows that Significance score is more effective than L score and C score.
• This paper simply uses the significance score.
1
1( ) ( ) ( ) ( )
N
i I i C it
S W L w I w C wN
6
Sentence Extraction Methods
• Review SAP2004– Significance score measured by the amount of information
• for content words including nouns, verbs, adjectives and out-of-vocabulary (OOV) words, based on word occurrence in a corpus
–
– Important keywords are weighted and the words unrelated to the original content, such as recognition errors, are de-weighted by this score.
( ) log Ai i i
i
FI w f icf f
F
i
i i
A i
# of occurrences of w in the recognized utterances
F # of occurrence of w in the large-scale corpus
F =
i
i
f
F
::
7
Sentence Extraction Methods• Extraction using latent semantic analysis
– Equal to the “Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis SIGIR2001“
–
****** . . .
. . .
***. . .
U
U*1 U*nU*2
U1*
Um*
U2*
.
.
.= ×∑×
*∆***∆ . . .
. . .
***. . .
VT
V*1'
V*n'
V*2' . . .
V1*'
Vn*'
V2*'
.
.
.
W1
S1
Wm
W2
SnS2
.
.
.
****** . . .
. . .
***. . .
A
Select the sentence which has the largest index valuewith the right singular vector (column vector of V)
( j# in sentence j) ji jia f word icf
8
Sentence Extraction Methods
• Extraction using dimension reduction based on SVD
– K=5
2
1
( ) ( )K
i k ikk
Score i v
9
Sentence Extraction Methods
• Extraction using sentence location– human subjects tend to extract sentences from the first and the last
segments under the condition of 10% summarization ratio,whereas there is no such tendency at 50% summarization ratio.
10
Sentence Extraction Methods
• Extraction using sentence location– The introduction and conclusion segments are
estimated based on the Hearst method(1997) using sentence cohesiveness.
• which is measured by a cosine value between content word-frequency vectors consisting of more than a fixed number of content words.
11
Sentence Extraction Methods
• Each segmentation boundary is the first sentence from the beginning or end of the presentation speech
12
Objective Evaluation Metrics
• Summarization accuracy– SAP2004
• measured in comparison with the closest word string extracted from the word network as the summarization accuracy.
• works reasonably well at relatively high summarization ratios such as 50%, but has problems at low summarization ratios such as 10%
– since the variation between manual summaries is so large that the network accepts inappropriate summaries.
13
Objective Evaluation Metrics
• Summarization accuracy– Therefore
• investigated word accuracy obtained by individually using the manual summaries (SumACCY-E).
– (SumACCY-E/max)
» the largest score among human summaries
» which is equivalent to the NrstACCY proposed in [C. Hori ..etc 2004,ACL].
– (SumACCY-E/ave)
» average score
14
Objective Evaluation Metrics
• Sentence recall / precision– Since sentence boundaries are not explicitly indicated
in input speech• Solution T.Kitade .. etc,2004
– Extraction of a sentence in the recognition result is considered as extraction of one or multiple sentences in the manual summary with an overlap of 50% or more words.
– In this metric, sentence recall/precision is measured by the largest score (F-measure/max) or the average score (F-measure/ave) of the F-measures.
15
Objective Evaluation Metrics
• ROUGE-N
H
n
n n
( )
( )
where
S is the set of manual summaries
S is an individual manual summary
g is an N-gram
C(g ) is the number of co-occurrences of g in t
H n
H n
m nS S g S
nS S g S
C gROUGE N
C g
he manual
summary and automatic summary
16
Experiments
• Experimental conditions– 30 presentations by 20 males and 10 females in the CSJ were
automatically summarized at 10% summarization ratio.– Mean word recognition accuracy was 69%. – Sentence boundaries in the recognition results were automatically
determined using language models, which achieved 72% recall and 75% precision.
– The technique of extracting sentences• significance score (SIG)• latent semantic analysis (LSA)• dimension reduction based on SVD (DIM);• SIG combined with IC (sentence location) (SIG+IC); • LSA combined with IC (LSA+IC); • DIM combined with IC (DIM+IC).
17
Experiments
• Subjective evaluation– 180 automatic summaries (30 presentations x 6
summarization methods) were evaluated by 12 human subjects.
– The summaries were evaluated in terms of ease of understanding and appropriateness as summaries in five levels
• 1-very bad; 2-bad; 3-normal;4-good; 5-very good.
– The subjective evaluation results were converted into factor scores using factor analysis in order to normalize subjective differences.
18
Experiments
• By combining the IC method using sentence location, every summarization method was significantly improved. SIG+IC achieved the best score, but the difference between SIG+IC and DIM+IC was not significant.
19
Experiments
• Correlation between subjective and objective evaluation results– In order to investigate the relationship between subjective and objective
evaluation results, the automatic summaries were evaluated by eight objective evaluation metrics
• SumACCY • SumACCYE/max, • SumACCY-E/ave,• F-measure/max, • F-measure/ave,• ROUGE-1 (Uni-gram)• ROUGE-2 (Bi-gram)• ROUGE-3 (Tri-gram)
20
Experiments
max
21
Experiments
•
22
Experiments
• All the objective metrics yielded correlation with human judgment.
• If the effect of word recognition accuracy for each sentence is removed, all the metrics, except ROUGE-1, yield high correlations. – ROUGE-1 measures overlapping 1-grams, which probably
causes the correlation between ROUGE-1 and the
recognition accuracy.
23
Experiments
24
Experiments
• In contrast with the results averaged over all the presentations, no metric has strong correlation. – This is due to the large variation of scores over the whole set
of presentations.
25
CONCLUSION
• This paper has presented several sentence extraction methods for automatic presentation speech summarization and objective eval-uation metrics.
• We have proposed sentence extraction methods using dimension reduction based on SVD and sentence location.
• Under the condition of 10% summarization ratio, it was confirmed that the method using sentence location improves summarization results.
26
CONCLUSION
• Among the objective evaluation metrics, SumACCY, SumACCY-E, F-measure, ROUGE-2 and 3 were found to be effective.
• Although the correlation between the subjective and objective scores averaged over presentations is high, the correlation for each individual presentation is not so high
– due to the large variation of scores across presentations.
27
Future
• Future research includes investigation of– other objective evaluation metrics– evaluation of summarization methods
containing sentence compaction– producing optimum summarization techniques
by employing objective evaluation metrics