natural language processing references: 1. foundations of statistical natural language processing 2....
TRANSCRIPT
Natural Language Processing
References:1. Foundations of Statistical Natural Language Processing
2. Speech and Language Processing
Berlin ChenDepartment of Computer Science & Information Engineering
National Taiwan Normal University
2
Motivation for NLP (1/2)
• Academic: Explore the nature of linguistic communication– Obtain a better understanding of how languages work
• Practical: Enable effective human-machine communication
– Conversational agents are becoming an important form of human-computer communication
– Revolutionize the way computers are used• More flexible and intelligent
3
Motivation for NLP (2/2)
• Different Academic Disciplines: Problems and Methods– Electrical Engineering,
Statistics– Computer Science – Linguistics– Psychology
• Many of the techniques presented were first develpoed for speech and then spread over into NLP– E.g. Language models in speech recognition
LinguisticsPsychology
Computer
Science
Electrical
Engineering,
Statistics
NLP
4
Turing Test
• Alan Turing,1950
– Alan predicted at the end of 20 century a machine with 10 gigabytes of memory would have 30% chance of fooling a human interrogator after 5 minutes of questions
• Does it come true?
interrogator
5
Hollywood Cinema
• Computers/robots can listen, speak, and answer our questions– E.g.: HAL 9000 computer in “2001: A Space Odyssey”
(2001 太空漫遊 )
6
State of the Art
• Canadian computer program accepted daily weather data and generated weather reports (1976)
• Read student essays and grade them• Automated reading tutor• Spoken Dialogues
– AT&T, How May I Help You?
7
Major Topics for NLP
• Semantics/Meaning– Representation of Meaning– Semantic Analysis– Word Sense Disambiguation
• Pragmatics – Natural Language Generation– Discourse, Dialogue and Conversational Agents– Machine Translation
8
Dissidences
• Rationalists (e.g. Chomsky)– Humans are innate language faculties– (Almost fully) encoded rules plus reasoning mechanisms– Dominating between 1960’s~mid 1980’s
• Empiricists (e.g. Shannon) – The mind does not begin with detailed sets of principles and
procedures for language components and cognitive domains– Rather, only general operations for association, pattern
recognition, generalization, etc., are endowed with• General language models plus machine learning approaches
– Dominating between 1920’s~mid 1960’s and resurging 1990’s~
9
Dissidences: Statistical and Non-Statistical NLP
• The dividing line between the two has become much more fuzzy recently– An increasing number of non-statistical researches use corpus e
vidence and incorporate quantitative methods• Corpus: “a body of texts” ( 大量的文稿 )
– Statistical NLP needs to start with all the scientific knowledge available about a phenomenon when building a probabilistic model, rather than closing one’s eye and taking a clean-slate approach
• Probabilistic and data-driven
• Statistical NLP → “Language Technology” or “Language Engineering”
(I) Part-of-Speech Tagging
11
Review
• Tagging (part-of-speech tagging)– The process of assigning (labeling) a part-of-speech or other
lexical class marker to each word in a sentence (or a corpus)• Decide whether each word is a noun, verb, adjective, or
whatever
The/AT representative/NN put/VBD chairs/NNS on/IN the/AT table/NN
Or
The/AT representative/JJ put/NN chairs/VBZ on/IN the/AT table/NN
– An intermediate layer of representation of syntactic structure• When compared with syntactic parsing
– Above 96% accuracy for most successful approaches
Tagging can be viewed as a kind of syntactic disambiguation
12
Introduction
• Parts-of-speech– Known as POS, word classes, lexical tags, morphology classes
• Tag sets– Penn Treebank : 45 word classes used (Francis, 1979)
• Penn Treebank is a parsed corpus
– Brown corpus: 87 word classes used (Marcus et al., 1993)
– ….
The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.
13
The Penn Treebank POS Tag Set
14
Disambiguation
• Resolve the ambiguities and choose the proper tag for the context
• Most English words are unambiguous (have only one tag) but many of the most common words are ambiguous– E.g.: “can” can be a (an auxiliary) verb or a noun – E.g.: statistics of Brown corpus
- 11.5% word types are
ambiguous
- But 40% tokens are ambiguous
(However, the probabilities of
tags associated a word are
not equal → many ambiguous
tokens are easy to disambiguate)
wtPwtP 21
15
Process of POS Tagging
Tagging Algorithm
A String of WordsA Specified
Tagset
A Single Best Tag of Each WordVB DT NN .
Book that flight .
VBZ DT NN VB NN ?
Does that flight serve dinner ?
Two information sources used:
- Syntagmatic information (looking at information about tag sequences)
- Lexical information (predicting a tag based on the word concerned)
16
POS Tagging Algorithms (1/2)
Fall into One of Two Classes
• Rule-based Tagger– Involve a large database of handcrafted disambiguation rules
• E.g. a rule specifies that an ambiguous word is a noun rather than a verb if it follows a determiner
• ENGTWOL: a simple rule-based tagger based on the constraint grammar architecture
• Stochastic/Probabilistic Tagger– Also called model-based tagger– Use a training corpus to compute the probability of a given word
having a given context – E.g.: the HMM tagger chooses the best tag for a given word
(maximize the product of word likelihood and tag sequence probability)
“a new play”
P(NN|JJ) ≈ 0.45
P(VBP|JJ) ≈ 0.0005
17
POS Tagging Algorithms (1/2)
• Transformation-based/Brill Tagger– A hybrid approach
– Like rule-based approach, determine the tag of an ambiguous word based on rules
– Like stochastic approach, the rules are automatically induced from previous tagged training corpus with the machine learning technique
• Supervised learning
18
Rule-based POS Tagging (1/3)
• Two-stage architecture– First stage: Use a dictionary to assign each word a list of
potential parts-of-speech– Second stage: Use large lists of hand-written disambiguation
rules to winnow down this list to a single part-of-speech for each word
Pavlov had shown that salivation …
Pavlov PAVLOV N NOM SG PROPER
had HAVE V PAST VFIN SVO
HAVE PCP2 SVO
shown SHOW PCP2 SVOO SVO SV
that ADV
PRON DEM SG
DET CENTRAL DEM SG
CS
salivation N NOM SG
An example forThe ENGTOWL tagger
A set of 1,100 constraintscan be applied to the input sentence
(complementizer)
(preterit)
(past participle)
19
Rule-based POS Tagging (2/3)
• Simple lexical entries in the ENGTWOL lexicon
past participle
20
Rule-based POS Tagging (3/3)
Example: It isn’t that odd! ( 它沒有那麼奇特的 )
I consider that odd. ( 我思考那奇數 ?)
ADV
Complement
A
NUM
21
HMM-based Tagging (1/8)
• Also called Maximum Likelihood Tagging– Pick the most-likely tag for a word
• For a given sentence or words sequence , an HMM tagger chooses the tag sequence that maximizes the following probability
tags1 previoustagtagwordmaxargtag
:position at wordaFor
nPP
i
jjij
i
N-gram HMM tagger
tag sequence probabilityword/lexical likelihood
22
HMM-based Tagging (2/8)
• Assumptions made here– Words are independent of each other
• A word’s identity only depends on its tag
– “Limited Horizon” and “Time Invariant” (“Stationary”) • Limited Horizon: a word’s tag only depends on the previous
few tags (limited horizon) and the dependency does not change over time (time invariance)
• Time Invariant: the tag dependency won’t change as tag sequence appears different positions of a sentence
Do not model long-distance relationships well !
- e.g., Wh-extraction,…
23
HMM-based Tagging (3/8)
• Apply a bigram-HMM tagger to choose the best tag for a given word – Choose the tag ti for word wi that is most probable given the prev
ious tag ti-1 and current word wi
– Through some simplifying Markov assumptions
iijj
i wttPt ,maxarg 1
jiijj
i twPttPt 1maxarg
tag sequence probability word/lexical likelihood
24
HMM-based Tagging (4/8)
• Example: Choose the best tag for a given word
Secretariat/NNP is /VBZ expected/VBN to/TO race/VB tomorrow/NN
to/TO race/??? P(VB|TO) P(race|VB)=0.00001
P(NN|TO) P(race|NN)=0.000007
0.34 0.00003
0.021 0.00041
Pretend that the previous
word has already tagged
25
HMM-based Tagging (5/8)
• The Viterbi algorithm for the bigram-HMM tagger
end
do 1- step 1 to1:ifor
argmaxion 3.Terminat
argmax
1 2maxInduction 2.
, 1tion Initializa 1.
1
1
11
1
11
iii
nJj
n
kjiJk
i
jikjik
i
jjjj
XX
n-
jX
ttPkj
Jjn,i, twPttPkj
tPπJj, twPπj
26
HMM-based Tagging (6/8)
• Apply trigram-HMM tagger to choose the best sequence of tags for a given sentence– When trigram model is used
• Maximum likelihood estimation based on the relative frequencies observed in the pre-tagged training corpus (labeled data)
n
i
ii
n
i
iiinttt
twPtttPttPtPT13
12121,..,2,1
,maxargˆ
jij
iiiiML
jjii
iiiiiiML
twc
twctwP
tttc
tttctttP
,
,
,12
1212
Smoothing or linear interpolationare needed !
iML
iiMLiiiMLiiismoothed
tP
ttPtttPtttP
)1(
,, 11212
27
HMM-based Tagging (7/8)
• Probability smoothing of and is necessary
j
ij
iiii twc
twctwP
,
,
jji
iiii ttc
ttcttP
1
11
1ii ttP ii twP
28
HMM-based Tagging (8/8)
• Probability re-estimation based on unlabeled data • EM (Expectation-Maximization) algorithm is applied
– Start with a dictionary that lists which tags can be assigned to which words
» word likelihood function cab be estimated» tag transition probabilities set to be equal
– EM algorithm learns (re-estimates) the word likelihood function for each tag and the tag transition probabilities
• However, a tagger trained on hand-tagged data worked better than one trained via EM
– Treat the model as a Markov Model in training but treat them as a Hidden Markov Model in tagging
ii twP
1ii ttP
Secretariat/NNP is /VBZ expected/VBN to/TO race/VB tomorrow/NN
29
Transformation-based Tagging (1/8)
• Also called Brill tagging– An instance of Transformation-Based Learning (TBL)
• Motive– Like the rule-based approach, TBL is based on rules that
specify what tags should be assigned to what word– Like the stochastic approach, rules are automatically induced
from the data by the machine learning technique
• Note that TBL is a supervised learning technique– It assumes a pre-tagged training corpus
30
Transformation-based Tagging (2/8)
• How the TBL rules are learned– Three major stages
1. Label every word with its most-likely tag using a set of tagging rules (use the broadest rules at first)
2. Examine every possible transformation (rewrite rule), and select the one that results in the most improved tagging (supervised! should compare to the pre-tagged corpus )
3. Re-tag the data according this rule
– The above three stages are repeated until some stopping criterion is reached
• Such as insufficient improvement over the previous pass
– An ordered list of transformations (rules) can be finally obtained
31
Transformation-based Tagging (3/8)
• Example
So, race will be initially coded as NN
(label every word with its most-likely tag)
P(NN|race)=0.98
P(VB|race)=0.02
(a). is/VBZ expected/VBN to/To race/NN tomorrow/NN
(b). the/DT race/NN for/IN outer/JJ space/NN
Refer to the correct tag
Information of each word,
and find the tag of racein (a) is wrong
Learn/pick a most suitable transformation rule: (by examining every possible transformation)
Change NN to VB while the previous tag is TO
expected/VBN to/To race/NN → expected/VBN to/To race/VB Rewrite rule:
1
2
3
32
Transformation-based Tagging (4/8)
• Templates (abstracted transformations)– The set of possible transformations may be infinite
– Should limit the set of transformations
– The design of a small set of templates (abstracted transformations) is needed
E.g., a strange rule like:
transform NN to VB if the previous word was “IBM” and the word “the” occurs between 17 and 158 words before
that
33
Transformation-based Tagging (5/8)
• Possible templates (abstracted transformations)
Brill’s templates.Each begins with“Change tag a to
tagb when ….”
34
Transformation-based Tagging (6/8)
• Learned transformations
more valuable player
Constraints for tags
Constraints for words
Rules learned by Brill’s original tagger
Modal verbs (should, can,…)
Verb, past participle
Verb, 3sg, past tense
Verb, 3sg, Present
35
Transformation-based Tagging (7/8)
• Reference for tags used in the previous slide
36
Transformation-based Tagging (8/8)
• Algorithm
The GET_BEST_INSTANCE procedure in the example algorithm is
“Change tag from X to Y if the previous tag is Z”.
for all combinations
of tags
Get best instance for each transformation
Z
XYtraverse
corpus
Check if it is better than the best instance achieved in previous iterations
append to the rule list
score
(II) Extractive Spoken Document Summarization - Models and Features
38
Introduction (1/3)
• World Wide Web has led to a renaissance of the research of automatic document summarization, and has extended it to cover a wider range of new tasks
• Speech is one of the most important sources of information about multimedia content
• However, spoken documents associated with multimedia are unstructured without titles and paragraphs and thus are difficult to retrieve and browse– Spoken documents are merely audio/video signals or a very long
sequence of transcribed words including errors
– It is inconvenient and inefficient for users to browse through each of them from the beginning to the end
39
Introduction (2/3)
• Spoken document summarization, which aims to generate a summary automatically for the spoken documents, is the key for better speech understanding and organization
• Extractive vs. Abstractive Summarization– Extractive summarization is to select a number of indicative
sentences or paragraphs from original document and sequence them to form a summary
– Abstractive summarization is to rewrite a concise abstract that can reflect the key concepts of the document
– Extractive summarization has gained much more attention in the recent past
40
Introduction (3/3)
Large-textCorpus
Speech Signal
Acousticmodel
Linguistic model
Prosodic FeaturesExtraction
SpeechRecognition
Speech Transcription
Important Unit(Sentence)Extraction
SentenceCompaction
Statistical Features
Prosody Features
Confidence Score
Language Model
Lexical Features
Word DependencyProbability
RecordsCaptionsIndexes
SummarySummary
FeaturesExtraction
41
History of Summarization Research
1950 1960 1970 1980 1990 2000
Early system using a surface-level approach (1958)
The first entity-level approaches based on syntactic analysis (1961)
The use of location features (1969)
The extended surface-level approach to include the used of cue phrases
The emergence of more extensive entity-level approaches (1972)
The first discourse-based approaches based on story grammars (1980)
A variety of different work (entity-level approaches based on AI 、 logic and production rules semantic networks 、 hybrid approaches)
Recent work has almost exclusively focused on extract rather than abstracts.A renewed interest in earlier surface-level approaches.
The emergence of new areas such as multi-document summarization (1997), multiligual summarization, and multimedia summarization (1997)
Spoken-document Summarization
Text-document Summarization
The first training approach (1995)
The first SVD-based approach (1995)
More natural language generation work begins to focus on text summarization
42
Extraction Based on Sentence Locations/Structures
• Sentence extraction using sentence location information– Lead (Hajime and Manabu 2000) – Focusing on the introductory and concluding segments (Hirohata
et al. 2005)– Specific structure on some domain (Maskey et al. 2003)
• E.g., broadcast news programs - sentence position, speaker type, previous-speaker type, next-speaker type, speaker change
43
Statistical Summarization Approaches (1/7)
• Spoken sentences are ranked and selected based on some similarity measures or significant scores
(a) Similarity Measures– Vector Space Model (VSM) (Ho 2003)– The document and sentence of it are represented in vector forms– The sentences that have the highest relevance scores to the
whole document are selected– To summarize more important and different concepts in a
document• Relevance measure (Gong et al. 2001)
• Maximum Marginal Relevance (MMR) (Murray et al. 2005)
x
y
iS
D
44
Statistical Summarization Approaches (2/7)
(a) Similarity Measures– Relevance measure (Gong et al. 2001)
– Maximum Marginal Relevance, MMR (Murray et al. 2005)
)),()(1()),(()( SummSSimaDSSimaiS ii
MMR D
Summary
iS
The candidate sentence set
mi SSSS ,...,,..,, 21
Compute the relevance score between sentence and document D
iS
Select sentence that hasthe highest relevance score
If the number of sentence in the summary reaches
the predefined valueNo
Delete sentence and recompute the weighted term-frequency vector for the document
maxS
D
YesStop
maxS
45
Statistical Summarization Approaches (3/7)
(b) SVD-based Method– The sentence can also be represented as a semantic vector– While the sentence with more topic or semantic information are s
elected– LSA (Gong et al. 2001)
– DIM (Hirohata et al. 2005)
Mi
i
i
i
i
a
a
a
a
A
3
2
1
SVD
iNN
i
i
i
v
v
v
A
22
11
ˆ
Dimension reduction
iKK
i
i
v
v
11
Weighted word-frequency vector
Weighted singular-value
vector
Reduced dimension vector
Mi
i
i
i
i
a
a
a
a
A
3
2
1
SVD
iNN
i
i
i
v
v
v
A
22
11
ˆ
Dimension reduction
iKK
i
i
v
v
11
Weighted word-frequency vector
Weighted singular-value
vector
Reduced dimension vector
K
kikki vS
1
2)(
Jw
w
w
2
1
1S 2S MS
J content words
M sentences Information of word j
j
A U
1
2
K
i
Information of sentence i
tVTerm-sentence
matrixLeft singularvector matrix
Right singularvector matrix
singular value matrix
12
k
Jw
w
w
2
1
1S 2S MS
J content words
M sentences Information of word j
j
A U
1
2
K
i
Information of sentence i
tVTerm-sentence
matrixLeft singularvector matrix
Right singularvector matrix
singular value matrix
12
k
46
Statistical Summarization Approaches (4/7)
(c) Sentence Significance Score (SIG)– Each sentence in the document is represented as a sequence of
terms, which can be simply given by a significance score– Features such as the confidence score, linguistic score or prosod
ic information also can be further integrated – Sentence selection can be performed based on this score– E.g., Given a sentence
• Linguistic score:
• Significance score:
• Or Sentence Significance Score
(Hirohata et al. 2005)
J
jjCjIji wCwIwL
JS
1
)()()(1
Jji wwwwS ,...,,...,, 21
J
jji wI
JS
1
)(1
).....|(log)( 1 jjj wwPwL
j
Ajj F
FfwI log)(
47
Statistical Summarization Approaches (5/7)
(c) Sentence Significance Score– Sentence:
• :statistical measure, such as TF/IDF• :linguistic measure, e.g., named entities and POSs• :confidence score• :N-gram score• is calculated from the grammatical structure of the sentence
• Statistical measure also can be evaluated using PLSA (Probabilistic Latent Semantic Analysis)
– Topic Significance– Term Entropy
)()()()()(1
51
4321 i
J
jjjjji Sbwgwcwlws
JS
Jji wwwwS ,...,,...,, 21
)( jws)( jwl)( jwc)( jwg
)( iSb
48
Statistical Summarization Approaches (6/7)
(d) Classification-based Methods– Sentence selection is formulated as a binary classification proble
m. A sentence can either be included in a summary or not
– These methods need a set of training documents (or labeled data) for training the classifiers
– For example,• Naïve Bayes’ Classifier/Bayesian Network Classifier (Kupiec 1995, Koumpis et al. 2005, Maskey et al. 2005)
• Support Vector Machine (SVM) (Zhu and Penn 2005)
• Logistic Regression (Zhu and Penn 2005)
• Gaussian Mixture Models (GMM) (Murray et al. 2005)
)|()(1
Cxpxp v
V
v
SummaryNon-summary
iS
49
Statistical Summarization Approaches (7/7)
(e) Combined Methods (Hirohata et al. 2005)
– Sentence Significance Score (SIG) combined with Location Information
– Latent semantic analysis (LSA) combined with Location Information
– DIM combined with Location Information
50
Probabilistic Generative Approaches (1/7)
• MAP criterion for sentence selection
• Sentence prior– Sentence prior is simply set to uniform here– Or may have to do with
• Sentence duration/position, correctness of sentence boundary, confidence score, prosodic information, etc.
• Each sentence of the document can be ranked by this likelihood value
SPSDP
DP
SPSDPDSP
i
i
ii
Sentence model Sentence prior
51
Probabilistic Generative Approaches (2/7)
• Hidden Markov Model (HMM)– Each sentence of the spoken document is treated as a
probabilistic generative model of N-grams, while the spoken document is the observation
– : the sentence model, estimated from the sentence
– : the collection model, estimated from a large corpus
(In order to have some probability of every term in the vocabulary)
– : a weighting parameter
SiD
ij
ij
Dw
Dtc
jjiHMM CwPSwPSDP,
1
SwP j S
CwP j C
SwP j
CwP j
1
Sentence model
Document (Observation)
iLji wwwwD ....21
ij
ij
Dw
Dtc
jjiHMM CwPSwPSDP,
1
S
SwP j
CwP j
1
Sentence model
Document (Observation)
iLji wwwwD ....21
ij
ij
Dw
Dtc
jjiHMM CwPSwPSDP,
1
S
52
Probabilistic Generative Approaches (3/7)
• Relevance Model, RM– In HMM, the true sentence model might not be
accurately estimated (by MLE)• Since the sentence consists only of few terms
– In order to improve estimation of the sentence model• Each sentence has its own associated relevant
model , constructed by the subset of documents in the collection that are relevant to the sentence
• The relevance model is then linearly combined with the original sentence model to form a more accurate sentence model
SwP j
SRS
Sjjj RwPSwPSwP 1ˆ
ij
ij
Dw
Dwc
jjiHMM CwPSwPSDP,
1ˆ ˆ
i
ii S
wSSwP
)()|(
53
Probabilistic Generative Approaches (4/7)
• A schematic diagram of extractive spoken document summarization jointly using the HMM and RM models
1D2D
iD
IR System
General Text NewsCollection
Retrieved Relevant
Documentsof S S’s HMM
Model
S’s RMModel
Sentence
Document Likelihood
)|( SDP iHMM
ContemporaryText NewsCollection
iD
S S
Spoken Documents to be Summarized Local Feedback
54
Probabilistic Generative Approaches (5/7)
• Topical Mixture Model (TMM)– Build a probabilistic latent topical space – Measure the likelihood of a sentence generating a given
document in such space
Dwn
Dw
K
kikki STPTwPSDP
,
1
1TwnP
DocumentD=w1w2…wn…wN
A sentence model
2TwnP
KTwnP
iSTP 2
iSTP 1
iK STP
The TMM model for s specific sentence Si
55
Probabilistic Generative Approaches (6/7)
• Word Topical Mixture Model (wTMM)– To explore the co-occurrence relationship between words of the l
anguage– Each word of the language as a topical mixture model for
predicting the occurrence of the other word
– Each sentence of the spoken document to be summarized was treated as a composite word TMM model for generating the document
– The likelihood of the document being generated by can be expressed as:
jww
jwM
K
kwkkw jj
MTPTwPMwP1
)|()|()|(
D iS
Dwn
Dw Sw
K
kwkkiji
ij
jMTPTwPSDP
,
1,
56
Probabilistic Generative Approaches (7/7)
• Word Topical Mixture Model (wTMM)
1H
2H
CH
1D
2D
cD
CD
c
c cj
j
Dwn
Dw Hw
K
kwkkcjcc MTPTwPHDP
,
1,|
Dwn
Dw Sw
K
kwkkiji
ij
jMTPTwPSDP
,
1,)|(
Sentence 1S
Sentence nS
Sentence NS
Sentence iS
D
cH
)|( 1TwP)|(11 wMTP
)|(12 wMTP
)|(1wK MTP )|( KTwP
)|( 2TwP
)|( 1TwP)|(21 wMTP
)|(22 wMTP
)|(2wK MTP )|( KTwP
)|( 2TwP
)|( 1TwP)|( 1 VwMTP
)|( 2 VwMTP
)|(VwK MTP )|( KTwP
)|( 2TwP
Text DocumentsTitlesSpoken Document
57
Comparison of Extractive Summarization Methods
• Literal Term Matching Vs. Concept Matching – Literal Term Matching :
• Extraction using degree of similarity (VSM, MMR)• Extraction using features score (Sentence score)• HMM, HMMRM
– Concept Matching :• Extraction using latent semantic analysis (LSA, DIM)• TMM, wTMM
58
Evaluation Metrics (1/3)
• Subjective Evaluation Metrics (Direct evaluation)– Conducted by human subjects– Different levels
• Objective Evaluation Metrics– Automatic summaries were evaluated by objective metrics
• Automatic Evaluation– Summaries are evaluated by IR
59
Evaluation Metrics (2/3)
• Objective Evaluation Metrics– Sentence recall/precision (Hirohata et al. 2004)
• Sentence recall/precision is commonly used in evaluating sentence-extraction-based text summarization
• Sentence boundaries are not explicitly indicated in input speech, estimated boundaries based on recognition results do not always agree with those in manual summaries
(Kitade et al., 2004)• F-measure, F-measure/max, F-measure/ave.
man
summan
S
SSR
sum
summan
S
SSP
PR
PRF
2
60
Evaluation Metrics (3/3)
• Objective Evaluation Metrics– ROUGE-N (Lin et al. 2003)
• ROUGE-N is an N-gram recall between an automatic summary and a set of manual summaries
– Cosine Measure (Saggion et al. 2002, Ho 2003)
H n
H n
SS Sgn
SS Sgnm
gC
gC
)(
)(
NROUGE
DD
H
hDhDDhD
H
RmASIMmEmASIM
DmACC
2
%),(%)(%),(1
%)( 1,,
%)(%)(
%)(%)(
,
,
,%)(%),(mEmA
mEmA
DhD
DhD
DhD
VV
VVmEmASIM
昨天 馬英九 訪問 中國大陸
昨天 馬英九 結束 訪問 回國
x
y
DA
DhE ,