automatic key term extraction and summarization from spoken course lectures...
DESCRIPTION
Automatic Key Term Extraction and Summarization from Spoken Course Lectures 課程錄音之自動關鍵用語擷取及摘要. National Taiwan University. Speaker: Yun- Nung Chen 陳縕儂 Advisor: Prof. Lin-Shan Lee 李琳山. Introduction. Target: extract key terms and summaries from course lectures. Introduction. Key Term. - PowerPoint PPT PresentationTRANSCRIPT
Speaker: Yun-Nung Chen 陳縕儂Advisor: Prof. Lin-Shan Lee 李琳山
National Taiwan University
Automatic Key Term Extraction and Summarizationfrom Spoken Course Lectures課程錄音之自動關鍵用語擷取及摘要
2
IntroductionMaster Defense, National Taiwan University
Target: extract key terms and summaries from course lectures
3
Key Term Summary
O Indexing and retrievalO The relations between key
terms and segments of documents
IntroductionMaster Defense, National Taiwan University
O Efficiently understand the document
Related to document understanding and semantics from the documentBoth are “Information Extraction”
4
Automatic Key Term Extraction
Master Defense, National Taiwan University
5
DefinitionO Key TermO Higher term frequencyO Core contentO Two typesO KeywordO Ex. “ 語音”
O Key phraseO Ex. “ 語言 模型”
Master Defense, National Taiwan University
6
Automatic Key Term Extraction
▼ Original spoken documents
Archive of spoken
documents
Branching Entropy
Feature Extraction
Learning Methods1)AdaBoost2)Neural Network
ASR
speech signal
ASR trans
Master Defense, National Taiwan University
7
Automatic Key Term Extraction
Archive of spoken
documents
Branching Entropy
Feature Extraction
Learning Methods1)AdaBoost2)Neural Network
ASR
speech signal
ASR trans
Master Defense, National Taiwan University
8
Automatic Key Term Extraction
Archive of spoken
documents
Branching Entropy
Feature Extraction
Learning Methods1)AdaBoost2)Neural Network
ASR
speech signal
ASR trans
Master Defense, National Taiwan University
9
Phrase Identification
Automatic Key Term Extraction
Archive of spoken
documents
Branching Entropy
Feature Extraction
Learning Methods1)AdaBoost2)Neural Network
ASR
speech signal
First using branching entropy to identify phrases
ASR trans
Master Defense, National Taiwan University
10
Phrase Identification
Key Term Extraction
Automatic Key Term Extraction
Archive of spoken
documents
Branching Entropy
Feature Extraction
Learning Methods1)AdaBoost2)Neural Network
ASR
speech signalKey terms
entropyacoustic model
:
Then using learning methods to extract key terms by some features
ASR trans
Master Defense, National Taiwan University
11
Phrase Identification
Key Term Extraction
Automatic Key Term Extraction
Archive of spoken
documents
Branching Entropy
Feature Extraction
Learning Methods1)AdaBoost2)Neural Network
ASR
speech signalKey terms
entropyacoustic model
:
ASR trans
Master Defense, National Taiwan University
12
Branching Entropy
O Inside the phrase
hidden Markov model
How to decide the boundary of a phrase?
represent
iscan::
isofin
::
Master Defense, National Taiwan University
13
Branching Entropy
O Inside the phraseO Inside the phrase
hidden Markov model
How to decide the boundary of a phrase?
represent
iscan::
isofin
::
Master Defense, National Taiwan University
14
Branching Entropy
hidden Markov model
boundary
Define branching entropy to decide possible boundary
How to decide the boundary of a phrase?
represent
iscan::
isofin
::
O Inside the phraseO Inside the phraseO Boundary of the phrase
Master Defense, National Taiwan University
15
Branching Entropy
hidden Markov model
O Definition of Right Branching EntropyO Probability of xi given X
O Right branching entropy for X
X xi
How to decide the boundary of a phrase?
represent
iscan::
isofin
::
Master Defense, National Taiwan University
16
Branching Entropy
hidden Markov model
O Decision of Right BoundaryO Find the right boundary located between X and xi where
Xboundary
How to decide the boundary of a phrase?
represent
iscan::
isofin
::
Master Defense, National Taiwan University
17
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent
iscan::
isofin
::
Master Defense, National Taiwan University
18
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent
iscan::
isofin
::
Master Defense, National Taiwan University
19
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent
iscan::
isofin
::
Master Defense, National Taiwan University
boundary
Using PAT tree to implement
20
Phrase Identification
Key Term Extraction
Automatic Key Term Extraction
Archive of spoken
documents
Branching Entropy
Feature Extraction
Learning Methods1)AdaBoost2)Neural Network
ASR
speech signalKey terms
entropyacoustic model
:
Extract prosodic, lexical, and semantic features for each candidate term
ASR trans
Master Defense, National Taiwan University
21
Feature ExtractionOProsodic features
O For each candidate term appearing at the first time
Feature Name Feature Description
Duration(I – IV)
normalized duration(max, min, mean, range)
Speaker tends to use longer duration to emphasize key terms
using 4 values for duration of the term
duration of phone “a” normalized by avg duration of phone “a”
Master Defense, National Taiwan University
22
Feature ExtractionOProsodic features
O For each candidate term appearing at the first timeHigher pitch may represent significant information
Feature Name Feature Description
Duration(I – IV)
normalized duration(max, min, mean, range)
Master Defense, National Taiwan University
23
Feature ExtractionOProsodic features
O For each candidate term appearing at the first timeHigher pitch may represent significant information
Feature Name Feature Description
Duration(I – IV)
normalized duration(max, min, mean, range)
Pitch(I - IV)
F0(max, min, mean, range)
Master Defense, National Taiwan University
24
Feature ExtractionOProsodic features
O For each candidate term appearing at the first timeHigher energy emphasizes important information
Feature Name Feature Description
Duration(I – IV)
normalized duration(max, min, mean, range)
Pitch(I - IV)
F0(max, min, mean, range)
Master Defense, National Taiwan University
25
Feature ExtractionOProsodic features
O For each candidate term appearing at the first timeHigher energy emphasizes important information
Feature Name Feature Description
Duration(I – IV)
normalized duration(max, min, mean, range)
Pitch(I - IV)
F0(max, min, mean, range)
Energy(I - IV)
energy(max, min, mean, range)
Master Defense, National Taiwan University
26
Feature ExtractionOLexical features
Feature Name Feature Description
TF term frequency
IDF inverse document frequency
TFIDF tf * idf
PoS the PoS tag
Using some well-known lexical features for each candidate term
Master Defense, National Taiwan University
27
Feature ExtractionOSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Probability
Key terms tend to focus on limited topics
t1t2
t j
tn
D1D2
Di
DN
TK
Tk
T2
T1
P(T |D )k i P(t |T )j k
Di: documents Tk: latent topics tj: terms
Master Defense, National Taiwan University
28
Feature ExtractionOSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Probability
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
describe a probability distribution
Master Defense, National Taiwan University
29
Feature ExtractionOSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Significance
Within-topic to out-of-topic ratio
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
within-topic freq.
out-of-topic freq.
Master Defense, National Taiwan University
30
Feature ExtractionOSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Significance
Within-topic to out-of-topic ratio
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
within-topic freq.
out-of-topic freq.
Master Defense, National Taiwan University
31
Feature ExtractionOSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Entropy
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
Master Defense, National Taiwan University
32
Feature ExtractionOSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Entropy
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
LTE term entropy for latent topic
non-key term
key term
Key terms tend to focus on limited topics
Higher LTE
Lower LTE
Master Defense, National Taiwan University
33
Phrase Identification
Key Term Extraction
Automatic Key Term Extraction
Archive of spoken
documents
Branching Entropy
Feature Extraction
Learning Methods1)AdaBoost2)Neural Network
ASR
speech signal
ASR trans
Key terms
entropyacoustic model
:
Using supervised approaches to extract key terms
Master Defense, National Taiwan University
34
Learning MethodsOAdaptive Boosting (AdaBoost)ONeural Network
Automatically adjust the weights of features to train a classifier
Master Defense, National Taiwan University
35
ExperimentsAutomatic Key Term Extraction
Master Defense, National Taiwan University
36
ExperimentsOCorpusO NTU lecture corpusO Mandarin Chinese embedded by English wordsO Single speakerO 45.2 hours
OASR SystemO Bilingual AM with model adaptation [1]O LM with adaptation using random forests [2]
Master Defense, National Taiwan University
Language Mandarin English Overall
Char Acc (%) 78.15 53.44 76.26
[1] Ching-Feng Yeh, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master Thesis, 2011.[2] Chao-Yu Huang, “Language Model Adaptation for Mandarin-English Code-Mixed Lectures Using Word Classes and Random Forests,” Master Thesis, 2011.
37
ExperimentsOReference Key TermsO Annotations from 61 students who have taken the courseO If the an annotator labeled 150 key terms, he gave each of
them a score of 1/150 , but 0 to othersO Rank the terms by the sum of all scores given by all
annotators for each termO Choose the top N terms form the listO N is average number of key terms
O N = 154 key termsO 59 key phrases and 95 keywords
OEvaluationO 3-fold cross validation
Master Defense, National Taiwan University
38
Pr Lx Sm Pr+Lx Pr+Lx+Sm0
10
20
30
40
50
60
ExperimentsOFeature EffectivenessO Neural network for keywords from ASR transcriptions
Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful
20.78
42.86
35.63
48.15
56.55
Pr: ProsodicLx: LexicalSm: Semantic
F-measure
Master Defense, National Taiwan University
39
N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network
0
10
20
30
40
50
60
70
ASRManual
ExperimentsOOverall Performance (Keywords & Key Phrases)
Baseline
Branching entropy performs well
F-measure
Master Defense, National Taiwan University
N-GramTFIDF
Branching EntropyTFIDF
Branching EntropyAdaBoost
Branching EntropyNeural Network
key phrasekeyword
23.44
52.6057.68
62.70
40
The performance of manual is slightly better than ASR
N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network
0
10
20
30
40
50
60
70
ASRManual
ExperimentsOOverall Performance (Keywords & Key Phrases)
55.84
Baseline
62.3967.31
32.19
Supervised learning using neural network gives the best results
F-measure
Master Defense, National Taiwan University
N-GramTFIDF
Branching EntropyTFIDF
Branching EntropyAdaBoost
Branching EntropyNeural Network
key phrasekeyword
23.44
52.6057.68
62.70
41
Automatic Summarization
Master Defense, National Taiwan University
42
IntroductionOExtractive SummaryO Important sentences in the document
OComputing Importance of Sentences
O Statistical Measure, Linguistic Measure, Confidence Score, N-Gram Score, Grammatical Structure Score
ORanking Sentences by Importance and Deciding Ratio of Summary
Master Defense, National Taiwan University
Proposed a better statistical measure of a term
43
Statistical Measure of a TermOLTE-Based Statistical Measure (Baseline)
OKey-Term-Based Statistical MeasureO Considering only key terms
O Weighted by LTS of the term
Master Defense, National Taiwan University
Tk-1 Tk Tk+1… …
Key terms can represent core content of the documentLatent topic probability can be estimated more accurately
ti ϵ key
ti ϵ key
44
Importance of the SentenceOOriginal Importance
O LTE-based statistical measureO Key-term-based statistical measure
ONew ImportanceO Considering original importance and similarity of other
sentences
Master Defense, National Taiwan University
Sentences similar to more sentences should get higher importance
45
Random Walk on a GraphO IdeaO Sentences similar to more important sentences should be more important
OGraph ConstructionO Node: sentence in the documentO Edge: weighted by similarity between nodes
ONode ScoreO Interpolating two scores
O Normalized original score of sentence Si
O Scores propagated from neighbors according to edge weight p(j, i)
Master Defense, National Taiwan University
Nodes connecting to more nodes with higher scores should get higher scores
score of Si in k-th iter.
46
Random Walk on a GraphOTopical Similarity between SentencesO Edge weight sim(Si, Sj): (sentence i sentence j)O Latent topic probability of the sentence
O Using Latent Topic Significance
Master Defense, National Taiwan University
Sj t
LTS
Si
… …Tk Tk+1
tj
Tk-1
ti tk
47
Random Walk on a GraphOScores of SentencesO Converged equation
O Matrix form
O Solutiondominate eigen vector of P’
O Integrated with Original Importance
Master Defense, National Taiwan University
48
ExperimentsAutomatic Summarization
Master Defense, National Taiwan University
49
ExperimentsOSame Corpus and ASR SystemO NTU lecture corpus
OReference SummariesO Two human produced reference summaries for each documentO Ranking sentences from “the most important” to “of average importance”
OEvaluation Metric
O ROUGE-1, ROUGE-2, ROUGE-3O ROUGE-L: Longest Common Subsequence (LCS)
Master Defense, National Taiwan University
50
EvaluationMaster Defense, National Taiwan University
10% 20% 30%414345474951535557
10% 20% 30%18
23
28
10% 20% 30%9
14
19
10% 20% 30%40
45
50
ASR ROUGE-1 LTE Key
ROUGE-2 ROUGE-3 ROUGE-L
Key-term-based statistical measure is helpful
51
EvaluationMaster Defense, National Taiwan University
10% 20% 30%414345474951535557
10% 20% 30%18
23
28
10% 20% 30%9
14
19
10% 20% 30%40
45
50
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L
Random walk can help the LTE-based statistical measure
10% 20% 30%414345474951535557
10% 20% 30%18
23
28
10% 20% 30%9
14
19
10% 20% 30%40
45
50
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L
Random walk can also help the key-term-based statistical measure
LTE LTE + RW
Key Key + RW
ASR
ASR
Topical similarity can compensate recognition errors
52
EvaluationMaster Defense, National Taiwan University
10% 20% 30%414345474951535557
10% 20% 30%18
23
28
10% 20% 30%9
14
19
10% 20% 30%40
45
50
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-LASR LTE LTE + RW Key Key + RW
10% 20% 30%4143454749515355
10% 20% 30%18
23
28
10% 20% 30%9
14
19
10% 20% 30%40
45
50
Manual
Key-term-based statistical measure and random walk using topical similarity are useful for summarization
53
Conclusions
Master Defense, National Taiwan University
54
Automatic Key Term Extraction Automatic Summarization
• The performance can be improved by▫ Key-term-based statistical
measure▫ Random walk with topical
similarity Compensating recognition errors Giving higher scores to sentences
topically similar to more important sentences
Considering all sentences in the document
• The performance can be improved by▫ Identifying phrases by branching
entropy▫ Prosodic, lexical, and semantic
features together
ConclusionsMaster Defense, National Taiwan University
55
Thanks for your attention! Q & A
Published Papers:[1] Yun-Nung Chen, Yu Huang, Sheng-Yi Kong, and Lin-Shan Lee, “Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features,” in Proceedings of SLT, 2010.[2] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin-Shan Lee, “Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms,” in Proceedings of InterSpeech, 2011.
Master Defense, National Taiwan University