identifying opinion holders for question answering in opinion texts soo-min kim and eduard hovy...
TRANSCRIPT
Identifying Opinion Holders for Question
Answering in Opinion Texts
Soo-Min Kim and Eduard HovyInformation Sciences Institute
University of Southern California4676 Admiralty Way
Marina del Rey, CA 90292-6695{skim, hovy}@isi.edu
Advisor: Hsin-Hsi ChenSpeaker: Yong-Sheng Lo
Date: 2007/08/16
AAAI - 2005
Introduction 1/2 Question answering in opinion texts
“Who strongly believes in Y” A system to recognize the holder of opinion Y
Application Stock market predictors
Earlier work (Kim and Hovy,2004) Focus on identifying opinion expressions within text 現在進一步要找出 opinion holder
Example 小叮噹認為銅鑼燒很好吃
Opinion holder :小叮噹 Opinion expression :認為 Opinion :銅鑼燒很好吃
Introduction 2/2 Define the opinion holder as an entity who express
es explicitly or implicitly the opinion contained in a sentence Entity
=(person, country, organization, or special group of people) 一個 opinion expression 對應一個 holder
“A think B’s criticism of T is wrong” B is the holder of “the criticism of T” A is the person who has an opinion that B’s criticism is wrong
辨別 opinion holder 的困難點1. The opinion sentence contains more than one likely
holder entity “Russia’s defense minister said Sunday that his country
disagrees with the U.S. view of Iraq, Iran and North Korea as an ‘axis of evil’”.
The candidate holders : “Russia”, “Russia’s defense minister”, “U.S.”, “Iraq”, “Iran”, “North Korea”
2. There is more than one opinion in a sentence “In relation to Bush’s axis of evil remarks, the German
Foreign Minister also said, Allies are not satellites, and the French Foreign Minister caustically criticized that the United States’ unilateral, simplistic worldview poses a new threat to the world”.
本文提的解法 Automatic method for identifying opinion holders (OH)
1. Identify all possible opinion holder entities in a sentence 使用現有工具找出句子中的Name entities 和 Noun phrases
2. Apply the Maximum Entropy (ME) ranking algorithm to select the most probable entity
System architecture
Holder candidate set
Named entities (NE) Using BBN’s named entity tag
ger IdentiFinder Noun phrases (NP)
Using Charniak’s parser For example
Maximum Entropy ranking algorithm
A machine learning approach Maximum Entropy modeling
Classification Select many candidates as answers as long as they are
marked as true and does not select any candidate if every one is marked as false
Poor performance Ranking
Select the most probable candidate as an answer To maximize a given conditional probability distribution
Training data MPQA corpus (Wiebe et al., 2003)
535 documents (10657 sentences) 以下是標記者的標記例子:
只選意見強度 (Strength) 為 high or extreme 的句子
Opinion
Holder
Training 流程
Feature selection for ME1. Full parsing features (f2,f3,f4,f6)2. Partial parsing features (f7,f8,f9)3. Others (f1,f5)
Full parsing features 1/5 Using charniak’s parser For example:
China’s official Xinhua news agency <H> Form MPQA
accusing <E> From Earlier work (Kim and Hovy,2004)
Full parsing features 2/5
Full parsing features 3/5 To express tree structure for ME training
“<H> NP S VP S S VP VBG <E>” Data sparseness problem
Full parsing features 4/5 Solution: 分成三條 path(f2,f3,f4)
For example “<H> NPH SHE VPE SE SE VPE VBGE <E>”
Full parsing features 5/5
f6: The top two levels below a child node of
HEhead on the path toward Hhead For example
P1 = “<H> NPH PPH NPH ” P2 = “<H> NPH NPH PPH VPH NPH PPH NPH
” P1 and P2 as the same because they share “
PPH NPH” at the top
Partial parsing features
Using CASS parser f7 : (vgp…) f8 : (c …) f9 : Yes or No
Other features Non-structural features
f1 : Type of <H>
The type of the candidate, with values NP, PERSON, ORGANIZATION, and LOCATION
This feature enables ME to determine the most probable one among them automatically
f5 : The distance between <H> and <E>, counted in
parse tree words
Answer selection for evaluation 1/2 Strict selection
For example 標準答案:小叮噹 System :小叮噹
Lenient selection For example
標準答案:“ Michel Sidibe, Director of the Country and Regional Support Department of UNAIDS”
System :“ Michel Sidibe” Accept candidates with priority 1 & 2 & 3
Answer selection for evaluation 2/2 Threshold 1 = 0.5
Allow a candidate as an answer in case half of the words in a holder appear in the candidate as well
Threshold 2 = 4 The average number words in human annotated
holders is 3.71
Experiments 1/3 961 pairs of (<H>,<E>)
863 for training 98 for testing
Baseline The system choose the closest candidate to the
expression as a holder without ME decision
Experiments 2/3
Experiments 3/3
Conclusions The importance of opinion holder identification
was noticed yet it has not been much studied to date, partly because of the lack of annotated data.
Using Maximum Entropy ranking to select the most probable holder among multiple candidates.
Adopting parsing features significantly improved system performance.