identifying opinion holders for question answering in opinion texts soo-min kim and eduard hovy...

Identifying Opinion Holders for Question

Answering in Opinion Texts

Soo-Min Kim and Eduard HovyInformation Sciences Institute

University of Southern California4676 Admiralty Way

Marina del Rey, CA 90292-6695{skim, hovy}@isi.edu

Advisor: Hsin-Hsi ChenSpeaker: Yong-Sheng Lo

Date: 2007/08/16

AAAI - 2005

Introduction 1/2 Question answering in opinion texts

“Who strongly believes in Y” A system to recognize the holder of opinion Y

Application Stock market predictors

Earlier work (Kim and Hovy,2004) Focus on identifying opinion expressions within text 現在進一步要找出 opinion holder

Example 小叮噹認為銅鑼燒很好吃

Opinion holder ：小叮噹 Opinion expression ：認為 Opinion ：銅鑼燒很好吃

Introduction 2/2 Define the opinion holder as an entity who express

es explicitly or implicitly the opinion contained in a sentence Entity

=(person, country, organization, or special group of people) 一個 opinion expression 對應一個 holder

“A think B’s criticism of T is wrong” B is the holder of “the criticism of T” A is the person who has an opinion that B’s criticism is wrong

辨別 opinion holder 的困難點1. The opinion sentence contains more than one likely

holder entity “Russia’s defense minister said Sunday that his country

disagrees with the U.S. view of Iraq, Iran and North Korea as an ‘axis of evil’”.

The candidate holders : “Russia”, “Russia’s defense minister”, “U.S.”, “Iraq”, “Iran”, “North Korea”

2. There is more than one opinion in a sentence “In relation to Bush’s axis of evil remarks, the German

Foreign Minister also said, Allies are not satellites, and the French Foreign Minister caustically criticized that the United States’ unilateral, simplistic worldview poses a new threat to the world”.

本文提的解法 Automatic method for identifying opinion holders (OH)

1. Identify all possible opinion holder entities in a sentence 使用現有工具找出句子中的Name entities 和 Noun phrases

2. Apply the Maximum Entropy (ME) ranking algorithm to select the most probable entity

System architecture

Holder candidate set

Named entities (NE) Using BBN’s named entity tag

ger IdentiFinder Noun phrases (NP)

Using Charniak’s parser For example

Maximum Entropy ranking algorithm

A machine learning approach Maximum Entropy modeling

Classification Select many candidates as answers as long as they are

marked as true and does not select any candidate if every one is marked as false

Poor performance Ranking

Select the most probable candidate as an answer To maximize a given conditional probability distribution

Training data MPQA corpus (Wiebe et al., 2003)

535 documents (10657 sentences) 以下是標記者的標記例子：

只選意見強度 (Strength) 為 high or extreme 的句子

Opinion

Holder

Training 流程

Feature selection for ME1. Full parsing features (f2,f3,f4,f6)2. Partial parsing features (f7,f8,f9)3. Others (f1,f5)

Full parsing features 1/5 Using charniak’s parser For example:

China’s official Xinhua news agency <H> Form MPQA

accusing <E> From Earlier work (Kim and Hovy,2004)

Full parsing features 2/5

Full parsing features 3/5 To express tree structure for ME training

“<H> NP S VP S S VP VBG <E>” Data sparseness problem

Full parsing features 4/5 Solution: 分成三條 path(f2,f3,f4)

For example “<H> NPH SHE VPE SE SE VPE VBGE <E>”

Full parsing features 5/5

f6: The top two levels below a child node of

HEhead on the path toward Hhead For example

P1 = “<H> NPH PPH NPH ” P2 = “<H> NPH NPH PPH VPH NPH PPH NPH

” P1 and P2 as the same because they share “

PPH NPH” at the top

Partial parsing features

Using CASS parser f7 : (vgp…) f8 : (c …) f9 : Yes or No

Other features Non-structural features

f1 : Type of <H>

The type of the candidate, with values NP, PERSON, ORGANIZATION, and LOCATION

This feature enables ME to determine the most probable one among them automatically

f5 : The distance between <H> and <E>, counted in

parse tree words

Answer selection for evaluation 1/2 Strict selection

For example 標準答案：小叮噹 System ：小叮噹

Lenient selection For example

標準答案：“ Michel Sidibe, Director of the Country and Regional Support Department of UNAIDS”

System ：“ Michel Sidibe” Accept candidates with priority 1 & 2 & 3

Answer selection for evaluation 2/2 Threshold 1 = 0.5

Allow a candidate as an answer in case half of the words in a holder appear in the candidate as well

Threshold 2 = 4 The average number words in human annotated

holders is 3.71

Experiments 1/3 961 pairs of (<H>,<E>)

863 for training 98 for testing

Baseline The system choose the closest candidate to the

expression as a holder without ME decision

Experiments 2/3

Experiments 3/3

Conclusions The importance of opinion holder identification

was noticed yet it has not been much studied to date, partly because of the lack of annotated data.

Using Maximum Entropy ranking to select the most probable holder among multiple candidates.

Adopting parsing features significantly improved system performance.

identifying opinion holders for question answering in opinion texts soo-min kim and eduard hovy...

Documents