opinion mining : a multifaceted problem lei zhang university of illinois at chicago some slides are...

Post on 27-Dec-2015

225 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Opinion Mining : A Multifaceted Problem

Lei Zhang

University of Illinois at Chicago

Some slides are based on Prof. Bing Liu’s presentation

Introduction

Most text information processing methods (e.g. web search, text mining) work with factual information but not deal with opinion information.

Opinion Mining Computational study of opinions, sentiments

expressed in text

Why opinion mining now? mainly because of the Web, we can get huge volumes of opinionated text

Why opinion mining is important Whenever we need to make a decision, we

would like to hear other’s advice. In the past.

Individual : Friends or family. Business : Surveys and consultants.

Word of mouth on the Web People can express their opinions in reviews,

forum discussions, blogs…

Intellectually challenging & major applications A popular research topic in recent years in NLP

(Natural Language Processing) and Web data mining

A lot of companies in US.

It touches every aspect of NLP and is well-scoped.

Potentially it would be a major application for NLP But this problem is NOT easy.

A popular problem

An example review

“I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …”

What we see?Opinions, targets of opinions, and opinion

holders

Target entity

Definition (entity): an entity e is a product, person, event or organization. e is represented as a hierarchy of components, sub-components, and

so on. Each node represents a component and is

associated with a set of attributes of the component.

An opinion can be expressed on any node or attribute of the node.

To simplify our discussion, we use the term features to represent both components and attributes.

What is an opinion An opinion is a quintuple

(ej, fjk, soijkl, hi, tl),

where ej is a target entity.fjk is a feature of the entity ej.soijkl is the sentiment value of the opinion of

the opinion holder hi on feature fjk of entity ej at time tl. soijkl is +ve, -ve, or neu, or a more granular rating.

hi is an opinion holder. tl is the time when the opinion is expressed.

Opinion mining objective

Objective: given an opinionated document, Discover all quintuples (ej, fjk, soijkl, hi, tl),

i.e., mine the five corresponding pieces of information in each quintuple, and

Or, solve some simpler problems.

With the quintuples, Unstructured Text Structured Data

Traditional data and visualization tools can be used to slice, dice and visualize the results in all kinds of ways

Enable qualitative and quantitative analysis.

Sentiment classification: doc-level

Classify a document (e.g., a review) based on the overall sentiment expressed by opinion holder Classes: positive, or negative (and neutral)

It assumes

Each document focuses on a single entity and contains opinions from a single opinion holder.

Subjectivity analysis : sentence-level

Sentence-level sentiment analysis has two tasks:

Subjectivity classification: Subjective or objective. Objective: e.g., “I bought an iPhone a few days ago.” Subjective: e.g., “It is such a nice phone.”

Sentiment classification: For subjective sentences or clauses, classify positive or negative. Positive: e.g., “It is such a nice phone.” Negative: e.g., “The screen is bad.”

Feature-based sentiment analysis

Sentiment classification at both document and sentence (or clause) levels are NOT sufficient,

they do not tell what people like and/or dislike A positive opinion on an entity does not mean that

the opinion holder likes everything. An negative opinion on an entity does not mean

that the opinion holder dislikes everything.

Feature-based opinion summary

“I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …”

….

Feature based summary:

Feature1: Touch screenPositive: 212 The touch screen was really cool. The touch screen was so easy to

use and can do amazing things. …Negative: 6 The screen is easily scratched. I have a lot of difficulty in removing

finger marks from the touch screen. … Feature2: battery life…

Visual comparison

Summary of reviews of Cell Phone 1

Voice Screen Size Weight

Battery

+

_

Comparison of reviews of

Cell Phone 1

Cell Phone 2 _

+

Feature-based opinion summary

Opinion mining is challenging

“This past Saturday, I bought a Nokia phone and my girlfriend bought a Moto phone with Bluetooth. We called each other when we got home. The voice on my phone was not so clear, worse than my previous phone. The battery life was long. My girlfriend was quite happy with her phone. I wanted a phone with good sound quality. So my purchase was a real disappointment. I returned the phone yesterday.”

Opinion mining is a multifaceted problem

(ej, fjk, soijkl, hi, tl),

ej - an entity: Named entity extraction (more)fjk - a feature of ej: Information extractionsoijkl is sentiment: Sentiment determination hi is an opinion holder: Information/Data

Extractiontl is the time: Data Extraction

Co-reference resolution Relation extraction Synonym match (voice = sound quality) …

Entity extraction (competing entities)

An entity can be a product, service, person, organization or event in opinion document.

“This past Saturday, I bought a Nokia phone and my girlfriend bought a Moto phone with Bluetooth.”

Nokia and Moto(Motorola) are entities.

Why we need entity extraction

Without knowing the entity, the piece of opinion has little value.

Companies want to know the competitors in the market. This is the first step to understand the competitive landscape from opinion documents.

Related work

Named entity recognition (NER) Aims to identity entities such as names of persons,

organizations and locations in natural language text.

Our problem is similar to NER problem, but with some differences. 1. Fine grained entity classes (products, service) rather

than coarse grained entity classes (people, location, organization )

2. Only want a specific type: e.g. a particular type of drug names.

3. Neologism : e.g. “Sammy” (Sony) , “SE” (Sony-Ericsson)

4. Feature sparseness (lack of contextual patterns)5. Data noise (over-capitalization , under-capitalization)

NER methods

Supervised learning methods The current dominant technique for addressing

the NER problem Hidden Markov Models (HMM) Maximum Entropy Models (ME) Support Vector Machines (SVM) Conditional Random Field (CRF)

Shortcomings: Rely on large sets of labeled examples. Labeling

is labor-intensive and time-consuming.

NER methods

Unsupervised learning methods Mainly clustering. Gathering named entities

from clustered groups based on the similarity of context. The techniques rely on lexical resources (e.g., WordNet), on lexical patterns and on statistics computed on a large unannotated corpus.

Shortcomings: low precision and recall for the result

NER methods

Semi-supervised learning methods

Show promise for identifying and labeling entities. Starting with a set of seed entities, semi-supervised methods use either class specific patterns to populate an entity class or distributional similarity to find terms similar to the seeds.

Specific methods: Bootstrapping Co-traning Distributional similarity

Set expansion problem

To find competing entities, the extracted entities must be relevant, i.e., they must be of the same class/type as the user provided entities.

The user can only provide a few names because there are so many different brands and models.

Our problem is actually a set expansion problem, which expands a set of given seed entities.

Set expansion problem

Given a set Q of seed entities of a particular class C, and a set D of candidate entities, we wish to determine which of the entities in D belong to C. That is, we “grow” the class C based on the set of seed examples Q.

This is a classification problem. However, in practice, the problem is often solved as a ranking problem.

Distributional similarity

Distributional similarity is classical method for set expansion problem.

It compares the similarity of the word distribution of the surround words of a candidate entity and the seed entities, and then ranking the candidate entities based on the similarity values.

Our experiment shows this approach is inaccurate.

Positive and unlabeled learning model(PU learning model)

A two-class classification model.

Given a set P of positive examples of a particular class and a set U of unlabeled examples (containing hidden positive and negative cases), a classifier is built using P and U for classifying the data in U or future test cases.

The set expansion problem can be mapped into PU learning exactly.

S-EM algorithm

S-EM is an algorithm under PU learning model.

It is based on Naïve Bayes classification and Expectation Maximum (EM) algorithm.

The main idea of S-EM is to use spy technique to identify some reliable negatives (RN) from the unlabeled set U, and then use an EM algorithm to learn from P, RN and U-RN .

We use classification score to rank entities.

S-EM algorithm (Liu et.al, ICML 2002)

Our algorithm (Li, Zhang, et al., ACL 2010)

Given positive set P and unlabelled set U, S-EM produces a Bayesian classifier C, which is used to classify each vector u U and to assign a probability p (+|u) to indicate the likelihood that u belongs to the positive class.

Entity ranking

Rank candidate d : Let Md be the median of {P(+|Vector 1), P(+|Vector 2), P(+|Vector 3), ……, P(+|Vector n)}. The final score (fs) for d is defined as:

fs (d )=Md * log ( 1 + n )

Where n is the frequency count of candidate entity d in the corpus.

A high fs (d) implies a high likelihood that d is in the expanded entity set.

Candidate entities with higher median score and higher frequency count in the corpus will be ranked high.

Feature extraction

Feature indicators (1) Dependency relation

Opinions words modify object features, e.g.,

“This camera takes great pictures”

Exploits the dependency relations of

Opinions and features to extract Features.

Given a set of seed opinion words (no feature input), we can extract features and also opinion words iteratively.

“The voice on my phone was not so clear, worse than my previous phone. The battery life was long”

Extraction rules

Feature extraction

(2) Part-whole relation pattern A part-whole pattern indicates one object is part of anotherobject. It is a good indicator for features if the class conceptword (the “whole” part) is known.

(3) “No” pattern

a specific pattern for product review and forum posts. People often express their comments or opinions on features by this short pattern (e.g. no noise)

Feature ranking

Rank extracted feature candidates by feature importance. If a feature candidate is correct and important, it should be ranked high. For unimportant feature or noise, it should be ranked low.

Two major factors affecting the feature importance.

Feature relevance: it describes how possible a feature candidate is a correct feature.

Feature frequency: a feature is important, if appears frequently in opinion documents.

HITS algorithm for feature relevance

There is a mutual enforcement relation between opinion words, part-whole relation and “no” patterns and features. If an adjective modifies many correct features, it is highly possible to be a good opinion word. Similarly, if a feature candidate can be extracted by many opinion words, part-whole patterns, or “no” pattern, it is also highly likely to be a correct feature. The Web page ranking algorithm HITS is applicable.

Our algorithm ( Zhang, et al., COLING 2010)

(1)Extract features by dependency relation, part-whole pattern etc.(2)Compute feature score using HITS without considering frequency.(3)The final score function considering the feature frequency S = S(f) * log (freq(f))freq(f) is the frequency count of feature f. and S(f) is the authority score of feature f.

Identify opinion orientation

For each feature, we identify the sentiment or opinion orientation expressed by a reviewer.

Almost all approaches make use of opinion words and phrases(Lexicon-based method). Some opinion words have context independent

orientations, e.g., “great”. Some other opinion words have context dependent

orientations, e.g., “short” Many ways to use opinion words.

Machine learning methods for sentiment classification at the sentence and clause levels are also applicable.

Aggregation of opinion words Input: a pair (f, s), where f is a feature and s is a

sentence that contains f. Output: whether the opinion on f in s is positive,

negative, or neutral. Two steps:

Step 1: split the sentence if needed based on BUT words (but, except that, etc).

Step 2: work on the segment sf containing f. Let the set of opinion words in sf be w1, .., wn. Sum up their orientations (1, -1, 0), and assign the orientation to (f, s) accordingly.

Step 2 can be changed to

with better results. wi.o is the opinion orientation of wi. d(wi, f) is the distance from f to wi.

n

ii

i

fwd

ow1 ),(

.

Basic opinion rules (Liu, Ch. in NLP Handbook)

Negation rules: A negation word or phrase usually reverses the opinion expressed in a sentence. Negation words include “no” “not”, etc.

e.g. “ this cellphone is not good.”

But-clause rules: A sentence containing “but” also needs special treatment. The opinion before “but” and after “but” are usually the opposite to each other. Phrases such as “except that” “except for” behave similarly.

e.g. “ I love Nicholas Cage but I really have no desire to see the Sorcerer’s Apprentice ”

More…

Two main types of opinion

Direct Opinions: direct sentiment expressions on some entity or feature e.g., “the picture quality of this camera is great.”

Comparative Opinions: Comparisons expressing similarities or differences of more than one entity or feature . Usually stating an ordering or preference. e.g., “car x is cheaper than car y.”

Comparative opinions

Gradable Non-Equal Gradable: Relations of the type greater or

less than

e.g: “optics of camera A is better than that of camera B”

Equative: Relations of the type equal to

e.g: “camera A and camera B both come in 7MP ”

Superlative: Relations of the type greater or less than all others

e.g: “camera A is the cheapest camera available in market”

Mining comparative opinions (Jindal and Liu, SIGIR 2006; Ding, Liu, Zhang, KDD 2009)

Objective: Given an opinionated document d,. Extract comparative opinions:

(O1, O2, F, po, h, t),

where O1 and O2 are the object sets being compared based on their shared features F, po is the preferred object set of the opinion holder h, and t is the time when the comparative opinion is expressed.

Thank you

top related