automatic classification and summarization
DESCRIPTION
Today’s World Wide Web has become a major source of information for the peo-ple. With the advent of customer reviews, blogs and growth of e-commerce in thisdecade, user-generated-content has grown rapidly on the web. It has an inherentproperty called sentiment; playing an important role in decision-making process ofthe people. In order to provide better information access, analysing sentiments andrating them in terms of satisfaction has become an essential characteristic on theweb.TRANSCRIPT
AUTOMATIC CLASSIFICATION AND SUMMARIZATION
OF SENTIMENT IN DOCUMENTS
By
Kiran Sarvabhotla
200402038
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THEREQUIREMENTS FOR THE DEGREE OF
Master of Science (by Research)in
Computer Science & Engineering
Search and Information Extraction Lab
Language Technologies Research Center
International Institute of Information Technology
Hyderabad, India
May 2010
Copyright c© 2010 Kiran Sarvabhotla
All Rights Reserved
Dedicated to my family and friends.
INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY
Hyderabad, India
CERTIFICATE
It is certified that the work contained in this thesis, titled “ Automatic Classifica-
tion and Summarization of Sentiment in Documents ” by Kiran Sarvabhotla
(200402038) submitted in partial fulfillment for the award of the degree of Master
of Science (by Research) in Computer Science & Engineering, has been carried out
under my supervision and it is not submitted elsewhere for a degree.
Date Advisor :
Dr. Vasudeva VarmaAssociate Professor
IIIT, Hyderabad
Acknowledgements
I would like to thank my advisor Dr.Vasudeva Varma for his continual guidance
during my masters degree. He gave me the freedom to explore on different topics
and helped me to zero down to one of the hot topics of research. His sincere efforts
and valuable comments are key factors for my work to get published in one of the
reputed journals in IR. Also, I would like to thank him for his support in writing my
masters thesis.
I would like to express my sincere gratitude to Dr. Prasad Pingali, who was the
first person I met in the IE Lab. His passion towards research motivated a happy go
lucky guy like me towards research. I would not have been what I am today, if I
had not met him. My special thanks to Mr. Babji, a guy who never says no, for his
help ranging from providing infrastructure, eatables and some fitness tips.
Special thanks to my friend P V Sai Krishna, who made me to come to IE lab
and motivated me in very troubled times. My special thanks to K.P Raja Sekhar,
my first project partner. I would like to thank Surya Ganesh, Kranthi Reddy, Rohit
Bharadwaj and Swathi who helped me during the early stages of my research.
I take this chance to thank all my batch mates for making my life in IIIT so
enjoyable and memorable. I always relish the fun we had in IIIT through out my
life.
I would like to thank my father, Lakshmi Narasimham, my mother, Ramalak-
shmi, and my sister Sravani for their love and support.
Last but not the least, I thank my dear friend Vijaya Kumari, for her support and
motivation by constantly reminding me of my capabilities.
Abstract
Today’s World Wide Web has become a major source of information for the peo-
ple. With the advent of customer reviews, blogs and growth of e-commerce in this
decade, user-generated-content has grown rapidly on the web. It has an inherent
property called sentiment; playing an important role in decision-making process of
the people. In order to provide better information access, analysing sentiments and
rating them in terms of satisfaction has become an essential characteristic on the
web.
Sentiment analysis or opinion mining is a web 2.0 problem that aims to deter-
mine the attitude of a speaker or writer towards a particular topic by classifying
the polarity in the text. Sentiment classification can be viewed as a special case
of topical classification applied to subjective portions (sources of sentiment) of the
document. Hence, the key task in sentiment classification is extracting subjectiv-
ity. In this thesis, we classify the overall sentiment of a document using supervised
learning approaches. We focus more on extracting subjective features, current ap-
proaches in extracting them and their limitations.
Existing approaches for extracting subjective features rely heavily on linguistic
resources like sentiment lexicon and some complex subjective patterns based on
Part-Of-Speech (POS) information, thus making the task more resource dependent.
Since, regional language content is growing on the web gradually and people are
interested to express their thoughts in their local language, extending these resource
based approaches for various languages is a tedious job. It requires a lot of human
effort to build sentiment lexicons and frame rules for detecting subjective patterns.
To make the task of subjective feature extraction more feasible, approaches that
reduce the use of linguistic resources are needed. In this thesis, we attempt to
address the problem of resource dependency in subjective feature extraction. We
assume that entire document does not contain subjective information and it has a
misleading text in the form of objective information. We propose a method called
RSUMM that filters objective content from a document. We explore the use of
classic information retrieval models for estimating subjectivity of each sentence in
RSUMM.
We follow a two step ”filtering” methodology to extract subjectivity. We esti-
mate subjectivity at sentence level and retain the most subjective sentences from
each document. In this way, we have an excerpt of a document that preserves sub-
jectivity to a level comparable or better than the total document for efficient sen-
timent classification. Then, we apply well known feature selection techniques on
the subjective extract for obtaining the final subjective feature set. We evaluate our
methodology on two supervised customer review datasets. We use standard evalua-
tion metrics in classification like accuracy and mean-absolute-error for evaluation.
Our results on those datasets prove the effectiveness of our proposed ”filtering”
methodology. Based on the results, we conclude that subjective feature extraction
is possible with minimal use of linguistic resources.
Although ratings convey the sentiment in a glance, the real essence of it is con-
tained in the text itself. The second part of this thesis explains our approach to sum-
marize sentiments of multiple users towards a topic. We produce an extract based
summary from multiple documents related to a topic preserving the sentiment in
it. We focus more on relating sentiment classification and sentiment summariza-
tion and show how classification helps in summarizing sentiments. We evaluate our
approach on a standard web blog dataset using standard evaluation metrics.
Publications
• Kiran Sarvabhotla, Prasad Pingali and Vasudeva Varma, “A Lexical SimilarityBased Approach for Extracting Subjectivity in Documents”, Published in theJournal of Information Retrieval, Special Issue on Web Mining for Search,Vol 14-(3), 2011.
• Kiran Sarvabhotla, Prasad Pingali and Vasudeva Varma, “Supervised Learn-ing Approaches for Rating Customer Reviews”, Published In the Journal ofIntelligent Systems, Vol 19-(1), 2010.
• Kiran Sarvabhotla, Kranthi Reddy .B and Vasudeva Varma, “ClassificationBased Approach for Summarizing Opinions in Blog Posts”,In the Proceed-ings of Indian International Conference on Artificial Intelligence (IICAI-09),Special Track on Web 2.0 and Natural Language Processing, Tumkur, De-cember, 2009.
• Vasudeva Varma, Prasad Pingali, Rahul Katragadda, Sai Krishna, Surya Ganesh,Kiran Sarvabhotla, Harish Garapati, Hareen Gopisetty, Vijay Bharath Reddy,Kranthi Reddy, Praveen Bysani and Rohit Bharadwaj, “IIIT Hyderabad atTAC 2008”,In the Working Notes of Text Analysis Conference (TAC) at thejoint meeting of the annual conferences of TAC and TREC, USA, November,2008.
Contents
Table of Contents ix
List of Tables xii
List of Figures xiii
1 Introduction 11.1 Introduction to Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Rating Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.3 A Generic Approach to Document Sentiment Analysis . . . . . . . 4
1.2 Extracting Subjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.1 Challenges in Extracting Subjectivity . . . . . . . . . . . . . . . . 71.2.2 Existing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Problem Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Overview of the Proposed Methodology . . . . . . . . . . . . . . . . . . . 111.4.1 RSUMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.2 Evaluation and Comparisons . . . . . . . . . . . . . . . . . . . . . 13
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Related Work 152.1 Sentiment Classification at Different Levels . . . . . . . . . . . . . . . . . 16
2.1.1 Word or Phrase Sentiment Classification . . . . . . . . . . . . . . . 162.1.2 Document Sentiment Classification . . . . . . . . . . . . . . . . . 182.1.3 Sentiment Classification at Sentence Level . . . . . . . . . . . . . 20
2.2 Subjectivity Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.1 Min-cut based Subjectivity Classification . . . . . . . . . . . . . . 21
2.3 State-of-the-art Approaches and Benchmarks . . . . . . . . . . . . . . . . 22
ix
CONTENTS
3 Subjective Feature Extraction 243.1 Information Retrieval Models and SVM . . . . . . . . . . . . . . . . . . . 25
3.1.1 Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . 253.1.2 Unigram Language Model . . . . . . . . . . . . . . . . . . . . . . 263.1.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 RSUMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 Lexical Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.2 Probabilistic Estimate . . . . . . . . . . . . . . . . . . . . . . . . 313.2.3 Term Co-occurrence . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.2 Fisher Discriminant Ratio . . . . . . . . . . . . . . . . . . . . . . 343.3.3 Final Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Evaluation 374.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 Datasets and Classifiers . . . . . . . . . . . . . . . . . . . . . . . . 374.1.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 394.1.3 Estimating the Parameter ’X’ . . . . . . . . . . . . . . . . . . . . . 40
4.2 Binary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Multi-variant Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Sentiment Summarization 555.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2 Classification Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2.1 Training the Classifier . . . . . . . . . . . . . . . . . . . . . . . . 595.2.2 Polarity Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2.3 Final Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 615.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6 Conclusion 636.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
x
CONTENTS
6.2.1 Products comparison . . . . . . . . . . . . . . . . . . . . . . . . . 666.2.2 Sentiment summarization . . . . . . . . . . . . . . . . . . . . . . 666.2.3 Opinion reason mining . . . . . . . . . . . . . . . . . . . . . . . . 676.2.4 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Bibliography 69
xi
List of Tables
4.1 Statistics of the dataset PDS2 . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 Results showing CV accuracies for baseline BL and top half TH and bottom
half BH on PDS1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Results showing CV accuracies for RSUMMLS,RSUMMLS+MI and RSUMMLS+FDR
on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4 Results showing CV accuracies for RSUMMPE,RSUMMPE+MI and RSUMMPE+FDR
on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.5 Results showing CV accuracies for RSUMMCO,RSUMMCO+MI and RSUMMCO+FDR
on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.6 State-of-the-art Accuracy Values on PDS1 . . . . . . . . . . . . . . . . . . 474.7 Table showing the results obtained by Stefano et al. on PDS2 for their
different feature representations with MV as the feature selection method . . 494.8 Table showing CV accuracies on PDS2 for different feature representations
using total review with LR as the classification method . . . . . . . . . . . 504.9 Table showing CV accuracies on PDS2 for different feature representations
using RSUMMCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.10 Table showing CV accuracies on PDS2 for different feature representations
using ADF metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.11 Table showing CV accuracies on PDS2 for different feature representations
using RSUMM CO with MI and FDR . . . . . . . . . . . . . . . . . . . . . 514.12 Table showing CV accuracies on PDS2 for different feature representations
using naive-bayes classifier and MI as the feature selection method . . . . . 52
5.1 Results showing average NR, NP and F-Measure values for 22 topics . . . 62
xii
List of Figures
1.1 General methodology adopted in document sentiment analysis . . . . . . . 41.2 A sample movie review . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Logit Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 Plot showing the effect of ’X’ on accuracy with RSUMMLS . . . . . . . . 414.2 Plot showing the effect of ’X’ on accuracy with RSUMMPE . . . . . . . . 414.3 Plot showing the effect of ’X’ on accuracy with RSUMMCO . . . . . . . . 42
5.1 Sample TAC Queries and Targets . . . . . . . . . . . . . . . . . . . . . . 575.2 Architecture of our sentiment summarization system . . . . . . . . . . . . 58
xiii
Chapter 1
Introduction
Today’s World Wide Web has become a major source of information for the people. The
textual information in the web can be broadly categorized into two: facts and opinions.
Facts are objective expressions of the people about entities, events etc., whereas opinions
are subjective expressions that describe their feelings, sentiments or apprehensions on enti-
ties, events and others [60, 71]. Most of the research work in the areas of natural language
processing, text mining and information retrieval focused on facts particularly news stories
until the early part of this decade. One such application of research is text classification
of news content to politics, movies, sports etc.. This can be attributed to the abundance of
news content and scarcity of opinion content on the web at that time.
With the advent of customer reviews, blogs and growth of e-commerce in this decade,
user-generated-content has grown rapidly on the web. Reviews posted by the people on an
e-commerce website, views of people in blogs, discussion forums etc. can be collectively
called as user-generated-content. It has an inherent property called sentiment. Analysing
these sentiments have received much attention among research community and market an-
alysts in recent years because of its potential business applications such as improving cus-
tomer satisfaction on the product. Hence, sentiment analysis has become one of the hot
topics of research in this decade.
1
CHAPTER 1. INTRODUCTION
1.1 Introduction to Sentiment Analysis
1.1.1 Sentiment Analysis
Sentiment analysis is an area of computational study of opinions, sentiments or emo-
tions [71]. The word sentiment is defined as:
1. A personal belief or judgement that is not founded on proof or certainty.
2. A general thought, feeling or sense.
3. A cognitive perception or emotional feeling towards a particular brand or product.
(market perspective)
Sentiment analysis or opinion mining is thus a paradigm of natural language processing,
text mining and information retrieval that aims to determine the attitude of a speaker or
writer towards a particular topic. The basic task in sentiment analysis is to identify polarity
of the given text. The analysis is done at different text levels: word, phrase, aspect, sentence
and at document level, and a semantic orientation predicted.
As mentioned earlier, the textual information contains both facts (objective informa-
tion) and opinions (subjective information). Objective information does not convey the
sentiments of the people. Hence, for any sentiment analysis task, the key is to extract the
subjective information robustly and analyse it.
1.1.2 Rating Sentiments
Sentiment analysis is a web 2.0 problem that attracted attention with the growth of social
media. Social media (blogs, customer reviews, discussion forums etc.) is playing a promi-
nent role in decision-making process of the people. It has become customary for them to
know ”what others are saying” about a particular product or service they wish to buy or
avail. According to a survey performed on TripAdvisor1, among the users who use the web-1http://www.tripadvisor.com
2
CHAPTER 1. INTRODUCTION
site, 97% are influenced by other travellers opinions [38]. Popular websites like IMDb2,
Amazon3 are encouraging users to post reviews so that they can be useful to others.
Since, popular products or services are often commented by the people and also some
reviews have large content, it is very difficult for customers or manufacturers to go through
the entire content for arriving at a decision. To better facilitate them and to provide better
information access, several websites are encouraging people to quantify a particular prod-
uct or service in terms of their satisfaction. Hence, labeling a review with a rating has
become a crucial characteristic on the web. The labeling is generally done on the basis of
overall satisfaction. The ratings convey the summary of the text in a glance and they are of
immense help.
The rating functionality is provided on some popular websites. Most of the blogs, dis-
cussion forums and reviews do not have explicit ratings. Yet, they are valuable and useful
sources of information. But, there is every chance that a customer might skip them because
of their large content (the problem of information overload). Hence, systems that analyse
the sentiments of people in the given text and predict the polarity are gaining popularity.
The polarity orientation is also referred as semantic orientation, sentiment orientation or
opinion orientation.
In our work, we attempt to rate sentiments on two popular scales on the web; posi-
tive/negative or thumbs up/thumbs down 4 and on a five point scale 5 for the overall doc-
ument 6. There are also very few systems that rate sentiments at a very fine level (word,
phrase or aspect).
2http://www.imdb.com3http://www.amazon.com4A binary scale of polarity5A multi-variant scale of 1-5: starred(*) rating with 5 being the top and 1 being the worst.6In our case, a document refers to a review or a blog
3
CHAPTER 1. INTRODUCTION
Figure 1.1 General methodology adopted in document sentiment analysis
1.1.3 A Generic Approach to Document Sentiment Analysis
The general methodology adopted for document sentiment analysis is depicted in Fig-
ure 1.1.
The first phase is preprocessing that includes typical text processing methods like to-
kenization of the text, removing stopwords and stemming. The crucial part for sentiment
analysis system is identifying the subjective expressions in the text. This phase is generally
called as ”subjective feature extraction”. The efficiency of the system is highly dependent
on the robustness of this phase. Since, the key task in sentiment analysis is extracting
subjectivity, we focus more on this area in our work.
The third phase is analysing the extracted subjective expressions and predicting their
polarity. Most of the sentiment analysis systems either use linguistic resources especially
sentiment lexicons like Senti Wordnet7 and General Inquirer8 or unsupervised/supervised
7http://sentiwordnet.isti.cnr.it/8http://www.wjh.harvard.edu/ inquirer/
4
CHAPTER 1. INTRODUCTION
approaches to predict the semantic orientation.
Some of the popular sentiment analysis systems are:
• OPINE9
• Opinion Observer
• Red Opal
• Social Media Analytics (SAS)
• SinoBuzz
1.2 Extracting Subjectivity
As mentioned earlier, the critical step in any sentiment analysis task is extracting subjectiv-
ity. For example, consider the movie review in Figure 1.2 for the movie ”Iron Man 2”. The
overall sentiment of the author towards the movie is negative. Lets examine the features
that prompted the author to arrive at this decision.
Explicit subjective features:
Negative semantic orientation:
- disappointed, ”What the hell”, ”was n’t even epic”, Stupidity, boring, ”no sense”, worst,
blah.
Positive semantic orientation:
- liked, ”great special effects”, sophisticated, better, good.
Implicit subjective features:
- ”Basically all the action you will see in this movie is what you saw in the trailer”
(nothing new) - ”beat 20 guards? you bet, hack system made by a guy who hacked into
important military system in 10 sec”
9http://www.cs.washington.edu/research/knowitall/opine/
5
CHAPTER 1. INTRODUCTION
Figure 1.2 A sample movie review
6
CHAPTER 1. INTRODUCTION
There is more usage of negative features by the writer compared to the positive aspects
of the movie (both implicit and explicit). Hence, the overall sentiment of the author towards
it is negative. The features may not be mere unigrams, higher order n-grams (larger text
units) like ”great special effects” are also subjective.
1.2.1 Challenges in Extracting Subjectivity
There are some interesting observations from the sentiment bearing document in Figure 1.2.
1. A document is a mixture of subjective and objective information.
2. There is a subtle difference in expressing sentiment to a topic.
3. It is either structured or unstructured document.
4. It can be an information mixed with different orientations, but the overall sentiment
is biased towards one label.
If we observe the review above, there are sentences like ”There is an agent played by
Scarlett Johanson” that don’t describe the feelings of the author. They are facts related to
cast and plot of the movie. There can be sentences or paragraphs in a document that only
conveys the factual information. This information can be regarded as a potential noise or
misleading text, hence needs to be filtered.
There are more ways to express sentiment rather than the topic. Suppose, news stories
related to movies, politics, sports etc., can be categorized using keywords alone. In the
above review, there are sentences with no sentiment bearing word, yet implicitly convey
the sentiment of the author. To capture this subtle difference in expressing sentiment is a
challenging task.
There is no restriction on the part of the user to follow a certain pattern while expressing
his thoughts. Hence, a document can be treated as an unstructured data and mining such
data is difficult. However, there are some documents written by professionals that are
7
CHAPTER 1. INTRODUCTION
structured. For example, a movie reviewer for a popular site will follow a pattern like
explaining the plot first, discussing the aspects, conveying the overall sentiment of his/her
at the end.
Also, there can be pros or cons of a product mentioned in the same document. Hence,
it has contradictory information. But, the overall sentiment for the document is biased
towards one label and analysing such contradictory patterns is challenging.
1.2.2 Existing Approaches
Most of the existing approaches for extracting subjective features from documents rely
heavily on linguistic resources. A subjective feature can be a word or a phrase as shown
in Figure 1.2. Popular linguistic resources researchers use for subjective feature extraction
are:
• Sentiment lexicons
• POS tagger
Sentiment lexicons are dictionaries prepared by researchers for analysing the senti-
ments. They contain words and phrases that are subjective with their corresponding orien-
tations [28, 31, 27] as given by them. Senti WordNet and General Inquirer are examples
of such lexicons. Using Part-Of-Speech (POS) tagger, researchers framed rules based on
textual patterns. These patterns are considered to be subjective and the text units that follow
these patterns are extracted. The patterns vary from a simple noun phrase (NP), verb phrase
(VP) to very complex patterns. Brills tagger10 and NLP toolkit11 are some of the examples
of resources that are used to extract the POS information [5, 64, 72, 92].
Some of the subjective patterns are:
- NP, VP, JJ NN, RB JJ not NN, JJ JJ not NN, RB VB, NN JJ not NN etc. where NN is a
noun and RB stands for adverb, JJ stands for adjective and VB stands for verb.10http://www.cs.cmu.edu/afs/cs/project/airepository/ai/areas/nlp/parsing/taggers/brill/0.html11http://nltk.sourceforge.net/index.php/Main Page
8
CHAPTER 1. INTRODUCTION
The text units that match the above patterns are considered as subjective features. There
are also other techniques like clausal extraction and dependency parsing for extracting sub-
jective features [64]. Using clausal extraction tools, researchers extract clauses from the
text. They use POS tagger and frame patterns to discard the text units that don’t contribute
to subjectivity from the extracted clauses. More on these will be discussed in Chapter 2 in
detail.
1.3 Problem Description
From the above sections, it is clear that analysing sentiments ins documents is a challenging
task. There is a lot of misleading text in the form of objective information. There are
subtle variations in expressing sentiment by writers. The sentiment bearing document has
contradictory information and it is difficult to analyse these type of information. Also,
we looked at the existing approaches for extracting subjective features from the sentiment
bearing document. In this thesis, we propose methods to extract subjective features from the
document with the help of simple frequency based approaches using information retrieval
models.
1.3.1 Problem Scope
Current day research work in subjective feature extraction rely on linguistic resources like
lexicons and POS tagger. Lexicons are very generic and they can’t capture subtle variations
in expressing sentiment from context to context and from domain to domain. They contain
subjective features with binary orientation and using them for multi-variate analysis is not
possible. Using POS tagger and other tools, researchers frame complex rules to extract
subjective features. Hence, the task of subjective feature extraction has become more re-
source dependent. Regional language content is growing on the web gradually and people
are more interested to express their thoughts and feelings in their local language. Hence, to
extend the current day approaches in subjective feature extraction across several languages9
CHAPTER 1. INTRODUCTION
is a tedious job. It requires a lot of human effort to build such tools in each language. So,
to make the task of subjective feature extraction more feasible, we need approaches that
require minimum use of linguistic resources and yet achieve significant results.
1.3.2 Motivation
We identified two major problems with the existing approaches in subjective feature ex-
traction.
1. They rely heavily on linguistic resources, thus making the task resource dependent.
2. Usage of complex patterns to extract subjectivity.
This motivated us to investigate approaches that make the task of subjective feature
extraction more simple and generic. In our work, we rely more on corpus statistics rather
than complex textual patterns or linguistic resources to extract subjective features. Since,
sentiment analysis addresses the problem of predicting the polarity of a given text unit. It is
often referred as sentiment classification and we use this word often in the rest of the thesis
as a synonym to sentiment analysis.
1.3.3 Problem Statement
Sentiment classification address the problem of predicting the polarity of the text. It can be
viewed as a special case of topical classification applied to subjective portions. Hence, the
key task in sentiment classification is ”subjective feature extraction”. Existing approaches
for extracting subjectivity rely heavily on linguistic resources and complex rule based pat-
terns of subjectivity, thus making the task very much resource dependent and complex.
With regional language content growing on the web, scarcity of such resources should not
prevent people to conduct research on those languages. Hence, approaches that require
minimum usage of language resources yet perform to a level comparable to using them
10
CHAPTER 1. INTRODUCTION
are needed. In this way, we can solve the problem of resource dependency prevalent in
sentiment analysis.
1.4 Overview of the Proposed Methodology
We use supervised learning approaches to classify the overall polarity of the document.
We focus more on extracting subjective features from it and representing them as feature
vector for classification. We approach the problem of removing resource dependency in
subjective feature extraction by making two claims.
1. Entire review does not contain subjective information.
2. If we can successfully filter out objective information, then subjective feature extrac-
tion is achievable with minimum use of linguistic resources and no complex patterns.
We follow a ”filtering strategy” at sentence and word level to extract subjective features
from the document. We view each review a mixture of objective and subjective sentences
where the former has nothing to convey on the feelings of the author. If you observe the
sample review in Fig 1.2, the subjective features are less in number compared to the entire
content. In our analysis on the web manually, we found that most of the reviews have
the same pattern. Hence, we need to discard the potential noise in the form of objective
sentences before converting the document text units as feature vector for the classifier.
1.4.1 RSUMM
We propose a method called RSUMM to extract subjective sentences from a review. It
is based on Information Retrieval (IR) models like vector space model, language model
and term co-occurrence model. We use techniques similer to the above IR models to filter
out the objective information in a review. We call the excerpt of a review with objective
information filtered as subjective extract.
Our subjective feature extraction occurs in two steps:11
CHAPTER 1. INTRODUCTION
• RSUMM estimates subjectivity of each sentence and returns most subjective sen-
tences as subjective extract.
• Then we apply feature selection techniques on the subjective extract to have the final
feature set.
In this thesis, we propose three variants of RSUMM to obtain subjective extract. The
first method is based on lexical similarity between each sentence and two term vectors, we
call it as RSUMMLS. We define two metrics average document frequency (ADF), average
subjective measure (ASM) in RSUMMLS to score the lexical similarity. The first metric
intuitively extracts important terms from a given collection. ASM metric intuitively se-
lects most subjective terms from a given collection. We use both metrics to estimate the
subjectivity of a sentence. We retain sentences that are more subjective in the subjective
extract.
The second method is based on probabilistic estimates rather than raw term similarity.
In this method, we estimate subjectivity of a sentence based on how its terms are observed
from a subjective model. We call this method as RSUMMPE. The basis for this method is
unigram language modeling used in information retrieval. In the third method, we use the
meta information like title, pros, cons and aspects of a product available with the review.
We frame target words with the available meta information and use term co-occurrence
model to estimate subjectivity. We call this method as RSUMMCO. It assumes that authors
model subjective expressions around entities like aspects, pros and cons of a product.
We retain the top X% of sentences as the subjective extract in each method. We es-
timate the best ’X’ value for each method in such a way it preserves sentiment in the
subjective extract at a level comparable or better to the full review.
After obtaining the subjective extract, we need to convert it as a feature vector. We use
n-gram models to present it as a feature vector for the classifier. As n-gram modeling is
done on sentences and they are relatively larger text units, there can be large number of
irrelevant features. Hence, for faster learning and better classification a feature selection
12
CHAPTER 1. INTRODUCTION
phase is needed in our case. We employ two state-of-the-art feature selection methods
Mutual Information (MI) and Fisher Discriminant Ratio (FDR). We use support vector
machines (SVM) as the classifier in our work. We view the problem of predicting the
sentiment on a binary scale as a problem of support vector classification (SVC) and rating
it on a multi-variate scale as a problem of logistic regression (LR).
1.4.2 Evaluation and Comparisons
We conduct experiments on customer reviews, one of the major entities of social media.
The customer review datasets are from movie and hotel review domains. They are popular
domains among research community in sentiment classification. We evaluate our system
confirming to standard classification evaluation metrics like accuracy and mean-absolute-
error. We support our claims made above, based on the results on the datasets. Through
out our subjective extraction phase, we depend on corpus statistics. We don’t use any
complex patterns or rule based approaches in this phase, rather use simple frequency based
metrics. Hence, we are making the task of subjective feature extraction simple and resource
independent that can be extended easily.
There were approaches that minimize or do not use any linguistic resource in the liter-
ature. Pang et al. in [72] used unigrams, bigrams, POS information, sentence position as
features. They viewed sentiment classification as a special case of topical classification and
used standard machine learning techniques like naive-bayes and support vector machines.
Pang et al. in [69] extended their work in [72] by focusing on sentence level subjectivity
detection. They filtered out the objective information using min-cut based classification
method to obtain a subjective extract. They proved that sentence level subjectivity detec-
tion indeed helps document level sentiment classification using unigrams as features on the
subjective extract. Cui et al. in [16] used n-gram model to represent each review. They did
n’t use any resource but focused on n-gram model to represent each review and compared
the performance of different classifiers.
13
CHAPTER 1. INTRODUCTION
Our RSUMM is inspired from the work by Pang et al. in [69], but the methodology
adopted to filter out objective information is totally different from our’s. Also, in addition
to filtering out the objective information, we apply feature selection techniques to obtain the
final subjective feature set. We also rate the sentiments of customer on a multi-variant scale
which is fairly a new application in sentiment analysis. Most of the work in the literature
focused on predicting the binary orientation of sentiment.
1.5 Thesis Organization
The rest of the thesis is organized as follows:
In chapter 2, we discuss the related work in sentiment classification at different levels
and different techniques used to classify sentiment. We also describe how the research in
this area has evolved in this decade.
Chapter 3 describes our methodology of extracting subjective features using informa-
tion retrieval approaches. We first present an overview of information retrieval models and
supervised learning methods we use. Then, we describe RSUMM to have a subjective
extract. Also, we discuss the feature selection methods we employ on subjective extract.
In chapter 4, we describe the experiments conducted on movie and hotel review datasets
to validate our methodology. We report the results using our methodology and compare
them with the existing state-of-the-art approaches. We discuss the results and present our
observations in detail in this chapter.
We describe our approach on multi-document sentiment summarization in chapter 5.
What makes sentiment summarization different from automatic text summarization ?. We
focus more on this aspect and explain how we can use sentiment classification to summarize
sentiments. Finally, we conclude the thesis by outlining our contributions, providing some
insights on how to extend this work in future.
14
Chapter 2
Related Work
In this chapter, we discuss the literature related to sentiment classification, subjectivity
extraction, unsupervised and supervised approaches for classifying sentiment. We discuss
sentiment classification at different levels of text units on both binary and multi-variant
scales. We also discuss the existing literature on subjectivity detection at the sentence
level.
Classification is an age old problem and several classifiers were suggested in the last
few decades. Among them naive-bayes, support vector machines, decision tress, rule-
based classifiers are important and widely used in several applications. A good review
on classification methods can be found in [39, 58]. Until the early part of this decade,
most of the classification tasks focused on classifying news stories. With the advent of
customer reviews and the growth of e-commerce in the early part of this decade, sentiment
classification has become an emerging and hot area of research for its potential business
applications and market intelligence. As discussed in Chapter 1, analysing sentiments and
predicting their orientation poses very challenging research issues.
15
CHAPTER 2. RELATED WORK
2.1 Sentiment Classification at Different Levels
Sentiment classification dates back to late 1990s [2, 51, 83], but in early part of this decade,
it has become an important discipline in the areas of natural language processing, text
mining and information retrieval [14, 21, 22, 25, 33, 35, 37, 42, 45, 54, 62, 76, 79, 84,
91, 96, 98, 107]. Until the early 2000s, two main approaches to sentiment classification
were based on machine learning and semantic analysis techniques. Later, shallow natural
language processing techniques were used in sentiment classification, especially in overall
document sentiment classification.
2.1.1 Word or Phrase Sentiment Classification
In the early stages of research, word sentiment classification was considered to be the basis
for phrase and document sentiment classification. Manual or semi-manual construction
of lexicons with words and their semantic orientations were developed [40, 41, 59, 73].
The words in them were mostly adjectives or adverbs that have semantic orientation [1,
28, 34, 88, 94] and the orientation was defined by researchers. The approaches to classify
sentiment at word level could be grouped into two: 1) Corpus based approaches and 2)
Dictionary based approaches.
The first group included methods that depend on syntactic and co-occurrence patterns
of words in large texts to determine their sentiment [40, 93, 109]. The second group used
WordNet1 information, especially synsets and hierarchies, to acquire sentiment bearing
words or to measure the similarity between candidate words and sentiment-bearing words
like ”good” or ”bad” [44, 52, 49].
Analysis by Conjunctions
In this method semantic orientation of adjectives was predicted using conjunctive words
like and, or, but, either-or, neither-nor. The intuition was that the act of conjoining ad-
1http://wordnet.princeton.edu/
16
CHAPTER 2. RELATED WORK
jectives is based on linguistic constraints (’and’ always conjuncts two adjectives with same
orientation whereas ’but’ contradicts them) [40]. The steps followed to predict the semantic
orientation of adjectives using conjunctive analysis are:
1. Extract adjective pairs along with their conjunctive words.
2. Train a log-linear regression classifier then classify pairs of adjectives as having same
orientation or opposite orientation.
3. Then, a clustering algorithm to partition the set into positive and negative orientation
terms.
Analysis by Lexical Relations
This method used semantic association to determine the orientation. If followed the intu-
ition that two words or phrases tend to have same semantic orientation if they have strong
association [92, 94, 49]. To determine the degree of semantic association, researchers have
used WordNet or web search. The entire process occurred in the following manner:
1. Construct relations using WordNet especially synsets.
2. Define the distance between two text units.
3. Calculate the semantic orientation by calculating its relative distance from two seed
words that have semantic orientation like ’good’ and ’bad’ or ’excellent’ and ’poor’.
4. The semantic orientation of word or phrase is positive if the relative distance is
greater than zero, negative otherwise.
Analysis by Glosses
This method followed an assumption that if one term was semantically oriented in one
direction, then the terms in its gloss tend to have same semantic orientation [28, 27, 26].
The process occurred in the following steps:17
CHAPTER 2. RELATED WORK
1. A seedset representing two categories, positive and negative is provided as input.
2. Expand the seedset to accommodate new terms by using lexical relations.
3. For each term t in the expanded set, it is collated by all the glosses of t and textual
representation is converted to a vector representation for classification.
4. Train a binary classifier on the term representation in the expanded seedset and then
applied to the terms in the test set.
Analysis by General Inquirer
General Inquirer (GI) is a system that contains a list of terms with their different senses. For
each sense of the term, it provides a short definition as well as other information. Terms
are tagged as positive and negative. In addition, GI dictionary also contains negations,
intensifiers and diminishers like ’not’, ’fantastic’,’barely’ etc.. The occurrence probability
of the term for each sense is also provided. Hence, it is widely used by researchers in
subjective feature extraction [5, 50].
2.1.2 Document Sentiment Classification
Supervised machine learning approaches are popular among researchers in predicting the
overall sentiment of the document [3, 7, 50, 64, 67, 72, 69, 100, 81]. Most of them focused
on labeling a new sample as ”positive” or ”negative” based on previously seen samples an-
notated by humans. Grading a review on a multi-variant scale is fairly a new application in
this area. The entire process is typically composed of two steps: 1) Extracting the subjec-
tive features from the training data and converting them as feature vectors. 2) Training the
classifier on the feature vectors and apply the classification on a new sample. Preprocessing
the raw documents before extracting the subjective features is also done. The preprocessing
stage includes removing HTML tags, tokenization of documents.
18
CHAPTER 2. RELATED WORK
Subjective Feature Extraction
To extract subjective features, researchers used lexicons like Senti WordNet and General
Inquirer [27, 5]. Most of these resources contain words and phrases (rare in number).
In sentiment classification, larger text units also play an important role in predicting the
semantic orientation as shown in Fig. 1.2. Hence, researchers framed rules using POS
information to extract larger text units than simple unigrams that were considered to be
subjective [72, 5, 92, 27, 28].
Researchers used lexical filtering techniques based on hypernymy in WordNet [11, 17,
20, 23, 24, 31, 36, 48, 77, 85] and patterns based on POS tagger [10, 63, 78, 82, 104,
105, 106]. WordNet filter attempted to substitute synonyms by a set of likely synonyms
and hypernymy generalization, because it is unlikely to encounter repetitions of identical
words in the text. POS filters were used to extract the patterns that don’t contribute to
subjectivity as in [64]. These patterns are considered to be noise and POS filters would
remove them before converting text units to feature vectors for classification.
Appraising adjective method [99, 100] focused on extraction and analysis of appraisal
adjective groups optionally modified by enhancers or diminishers. Coherent groups that ex-
pressed together a particular attitude are extracted. Examples of appraisal adjective groups
are: ”extremely boring”, ”not really very good” etc.
The steps followed in this method were:
1. Build a lexicon using semi-automatic techniques, gathering and classifying adjectives
and modifiers to categories in several taxonomies of appraisal attributes.
2. Extract adjectival appraisal groups from texts and compute their attribute values ac-
cording to this lexicon.
3. Represent documents as vectors of relative frequency features using these groups.
4. Train a support vector machine algorithm discriminating positively from negatively
oriented test documents.
19
CHAPTER 2. RELATED WORK
Training the Classifier
Researchers viewed document sentiment classification as a special case of topical classifi-
cation and conducted experiments with machine learning algorithms like naive-bayes, knn,
support vector machines. Pang et al. [72] used machine learning techniques to classify the
overall sentiment in movie reviews. Their best accuracy value (82.9%) was reported using
unigrams as features and SVM as the classifier. Among all the classifiers, SVM and naive-
bayes were widely used to predict the sentiment orientation. The features used are n-grams,
lexical information, POS information, sentence position, adjectives, appraisal adjectives.
2.1.3 Sentiment Classification at Sentence Level
Since, researchers thought it was too coarse to compute the sentiment at document level,
they investigated approaches to determine the focus of each sentence. They computed the
semantic orientation at sentence level [19, 29, 53]. They extracted opinion bearing terms,
opinion holders and opinion-product aspect association in each sentence and then analysed
the semantic orientation. There was also an area of research called aspect based sentiment
classification where they extract aspects of a product and rate sentiments of people on
its each aspect. Thet et al. in [90] conducted experiments on aspect based classification of
movie reviews. They used information extraction techniques like pronoun resolution, entity
extraction and con-referencing to segment each sentence. They predicted the sentiment of
users towards cast (producers, directors) and also overall sentiment.
2.2 Subjectivity Classification
Subjectivity detection is the task to investigate whether a text unit presents the opinion of
the author or convey facts. The text unit is typically a sentence or a paragraph. Researchers
proved that subjectivity detection at sentence level has a very tight relation with document
sentiment classification [69, 102, 103, 101, 109]. Subjectivity detection helped the senti-
20
CHAPTER 2. RELATED WORK
ment classifier from considering the irrelevant or potentially misleading text. Pang and Lee
in [69] compressed the reviews into much shorter extracts, optimizing the sentiment con-
tent in them to a level comparable to full review. Naive-bayes and min-cut classification are
two popular classifiers used in subjectivity detection. We discuss briefly on min-cut based
classification, the state-of-the-art approach in subjectivity detection [69, 4].
2.2.1 Min-cut based Subjectivity Classification
Cut-based classification method assumes that text units that occur near each other (within
discourse boundaries) share the same subjectivity status [69]. In [69], they used pair-wise
interaction information and their algorithm used an efficient and intuitive graph-based for-
mula relying on finding minimum cuts.
Suppose there are n items x1, x2, x3, . . . , xn to be divided into two classes C1 and C2.
Then, there are two types of penalties individual and association for xi and xj to be in the
same class.
Individual scores, indj(xi): it gives non-negative estimates of each xi’s preference in
class Cj using xi alone.
Association scores, assoc(xi, xj): it gives non-negatives of xi and xj preference to be in
the same class.
The algorithm tries to find solution to the following optimization problem; assigning xi to
C1 and C2 minimizing the partition cost:∑x∈C1
ind2(x) +∑x∈C2
ind1(x) +∑
xi∈C1,xk∈C2
assoc(xi, xk) (2.1)
The situation is represented in an undirected graph (G) with vertices {v1, v2, . . . , vn, s, t};
the last two are respectively source and sink. Add n-edges (s, vi) with weight ind1(xi) and
n-edges (vi, t) with weight ind2(xi). Finally add edges (vi, vk) with weight assoc(xi, xk).
A cut (S, T ) of G is a partition of its nodes into sets S = {s}∪S ′ and T = {t}∪T ′, where
s /∈ S ′,t /∈ T ′. Its cost cost(S, T ) is the sum of weights of all edges crossing from S to T .
A minimum cut of G corresponds to minimum cost.21
CHAPTER 2. RELATED WORK
In [69], each vertex was a sentence. The individual penalties were obtained using naive-
bayes classifier for each sentence per each class. Association penalties were obtained using
some proximity relations based on sentence position.
2.3 State-of-the-art Approaches and Benchmarks
We have seen in the above sections, the sentiment classification at different levels, sub-
jectivity detection and resources used in subjective feature extraction. For word or phrase
level sentiment classification, researchers used seed list of words and lexical resources
like WordNet. Document sentiment classification was done using supervised learning ap-
proaches and it is practised now also. Most of the document sentiment classification ap-
proaches predicted the sentiment on a binary scale, whereas the multi-variate classification
is fairly a new application. The document sentiment classification is highly domain spe-
cific [15, 68, 57, 4].
Since, our focus is on extracting subjective features and presenting them as feature
vectors to the classifier, we don’t discuss the work related to domain transfer problem in
sentiment classification. Popular domains among researchers in document sentiment clas-
sification are: 1) Movie review domain and 2) Hotel review domain. Movie review domain
is highly popular among research community in sentiment classification and many focused
on predicting the polarity of movie reviews [72, 69, 64, 67]. It is due to the popularity of
movies, abundant information about movies on the web and also the challenging nature of
reviews [92]. Movie reviews have a mixture of objective and subjective information and to
mine subjective features from them is a challenging task. Hotel review domain is popular
because of the traveling people do, they tend to enquire about various hotels on the web.
We conduct experiments on reviews in both domains. Among the classifiers, SVM is used
by researchers because of its better performance compared to others.
Pang et al. in [72] used supervised machine learning techniques like naive-bayes, sup-
port vector machines with unigrams, bigrams, POS information, sentence position as fea-
22
CHAPTER 2. RELATED WORK
tures. They concluded that machine learning approaches had outperformed human pro-
duced baselines. SVM performed better compared to naive-bayes with unigrams as fea-
tures. Mullen and Collier [67] used diverse information scores that assigns value to each
word or phrase using WordnNet, topic proximity and syntactic relations. They also used
SVM classifier on the same movie review dataset and reported an accuracy of 86%.
Matsumoto et al. [64] observed that word order and syntactic relations play an impor-
tant role in sentiment classification. They proposed a method based on word sub-sequence
mining and dependency parsing to extract the word order and relation. To generate word
sub-sequences, they used clausal extraction tools and used n-gram model with n being 6.
They had set a support threshold of 2 for unigrams, bigrams and 10 for others to be con-
sidered as potential features for classification. They used dependency parsing techniques
to extract larger text units and combined it with POS tagger to remove the non-subjective
items. They reported an accuracy of 88.3% on the same movie review dataset using SVM
classifier. All these methods predicted the orientation of polarity at document level on a
binary scale. (positive/negative)
Quantifying the sentiment with a satisfaction score is a fairly recent application in sen-
timent classification. Pang and Lee [70] conducted experiments on scoring movie reviews
on an ordinal scale of four values. Their focus was on using different learning algorithms;
simple multi-label classification, SVR and a meta-algorithm. In [110], a new task was pro-
posed, the prediction of utility of product reviews than scoring a review. They used linear
regression techniques to compute the utility.
Stefano et al. [5] viewed the problem of grading reviews on a scale of one to five as
a problem of ordinal regression. They focused more on deriving subjective features and
selecting them. They used POS tagger, GI lexicon to extract subjective patterns. They used
feature pruning techniques based on minimum variance (MV) and a variant of MV called
round-robin minimum variance (RRMV) since their dataset was highly skewed. This was
one among the important observations they presented in their work. Among all, support
vector regression (SVR) techniques are popular among researchers to grade reviews.
23
Chapter 3
Subjective Feature Extraction
In this chapter, we describe our approach to mine subjective features from a review. We
view each review r as a combination of subjective and objective sentences (r = (Ssubj ∪
Sobj)). We propose a method called RSUMM to score each sentence for its subjective
nature and extract the set Ssubj from r. Our subjective feature extraction approach follows
two steps:
1. We score each sentence and obtain an extract of a review that preserves subjectivity at
a level comparable or even better to the total review. We call this method as RSUMM.
2. Then, we apply feature selection techniques on the extract of a review to obtain the
final subjective feature set.
Our RSUMM is based on information retrieval techniques like vector space model,
language model and word co-occurrence model widely used in document retrieval and
other applications. In this thesis, we propose three variants of RSUMM for estimating
subjectivity of each sentence.
1. Lexical Similarity (RSUMMLS).
2. Probabilistic Estimates (RSUMMPE).
3. Co-occurrence statistics (RSUMMCO).24
CHAPTER 3. SUBJECTIVE FEATURE EXTRACTION
After extracting subjective features using the above methodology, we use supervised
learning approaches to predict the overall sentiment of a document. We use n-gram models
to represent the extracted subjective sentences as feature vectors for classification. We use
SVM classifier and view the problem of predicting the sentiment on a binary scale as a
problem of support vector classification (SVC). We use logistic regression (LR) to grade
reviews on a multi-variant scale.
3.1 Information Retrieval Models and SVM
In this section, we describe briefly information retrieval models, SVC and LR.
3.1.1 Vector Space Model
Vector space model assigns weights to index terms [6]. It is widely used in information
retrieval to determine the relevance of a document for a given query. Both documents
and the query are represented as weighted vectors of terms and these weights are used to
compute the degree of similarity between the query and a document. Higher the similarity
degree, more relevant is the document to the query.
Formal Definition: Both query q and document d are represented as a weighted vector
of terms. The query vector is defined as q := (w1,q, w2,q, . . . , wt,q) and the document vector
d := (w1,d, w2,d, . . . , wt,d) where t is the total number of index terms.
Then, the degree of similarity between the document d and the query q is the correlation
between the two vectors. The correlation is quantified by a variety of similarity measures,
for instance by the cosine of the angle between the two vectors. The weighting measure
used typically in vector space model is tfidf.
TFIDF (t, C) = tf(t, d) X Math.log
(N
n
)(3.1)
where tf(t, d) denote the frequency of the term in the given document d, N denotes the total
number of documents in collection C, and n denoted the number of documents containing25
CHAPTER 3. SUBJECTIVE FEATURE EXTRACTION
term t in C.
cosθ =~d.~q
‖~d‖‖~q‖
=
∑ti=1 wi,d X wi,q√∑t
i=1 w2i,d X
√∑ti=1 w
2i,q
(3.2)
3.1.2 Unigram Language Model
A statistical language model is a probabilistic model for generating text. It was proposed in
late 1990’s in information retrieval. It estimates the probability of generating a query from
the document model. The basic assumption of this model is that users have a reasonable
idea of terms they want in documents; it directly exploits this idea. Language modeling
consider documents as models and queries as string of texts randomly sampled from these
models [6]. It ranks documents according to the probability that a query Q would be
observed during repeated random sampling from the document model MD: P(
QMD
).
Unigram language model is the simplest form of language model that discards the con-
ditioning of the text, and estimates the probability of each term independently. It is often
used in information retrieval compared to other types of language models because of its
simplicity.
P
(Q
Md
)=
∏t∈Q
P
(t
Md
)=
∏t∈Q
tf(t,d)dld
(3.3)
where Md denotes the language model of a document d, tf(t,d) denotes the frequency of
term t in document d and dld denotes the total number of tokens in document d.
Language modeling approach suffers due to sparseness in the data. We may not wish to
assign zero probability for one or more query term t that is missing in document d. There
are many smoothing techniques available to address the problem of data sparseness. This
model is fairly a recent model used in information retrieval, machine translation, speech
recognition etc.. It shares some similar characteristics with vector space model, both use26
CHAPTER 3. SUBJECTIVE FEATURE EXTRACTION
term frequencies to estimate the importance of the term, terms are often treated as indepen-
dent. Language modeling is based on probability estimates rather than similarity.
3.1.3 Support Vector Machines
Support vector machines (SVM) is a useful technique for text classification. A classifica-
tion task usually involves training the classifier with some data instances whose label is
known, and predicting the label on some unknown data instances. As all supervised learn-
ing approaches do, SVM also involves training and testing phase [13, 95]. Each sample in
the training set has one target value (label or class) and several attributes (features). Given
a set of instance-label pairs (feature vectors) (xi, yi), i = 1, 2, 3, . . . , l where xi ∈ Rn and
y ∈ {−1,+1}l, SVM requires the solution of the following optimization problem.
minw,b,ξ1
2wTw + C
l∑i=1
ξi (3.4)
subject to yi(wT φ(xi) + b) ≥ 1− ξi and ξi ≥ 0
In general, the training vectors are mapped to a high dimensional space by the function φ.
Then, SVM finds a separating hyperplane with maximum margin in this higher dimensional
space. C > 0 is the penalty parameter. There are several kernel functions used in SVM
like linear, polynomial, radial basis function and sigmoid kernels.
In this thesis, we do not go into the details of supervised learning approaches like what
is the best kernel function and what are the best parameters for learning. We focus on deriv-
ing subjective features and representing them as feature vectors for the learning algorithm
with existing kernel functions and default parameters.
Logistic Regression
In this work, we use logistic regression (LR) to rate sentiments on a ordinal scale of one to
five along side simple binary classification. This problem is called as ordinal regression in
machine learning [5]. It lies in between simple binary classification and metric regression.
27
CHAPTER 4. EVALUATION
Table 4.6 State-of-the-art Accuracy Values on PDS1
Author and Literature Classifier Accuracy
Pang and Lee SVM 87.2
Garg et al. SVM 90.2
Matsumoto et al. SVM 93.7
Aue and Gamon SVM 90.5
Kennedy and Inkpen SVM 86.2
RSUMMLS. It could be due to the fact that SDS was also from movie domain and using
lexical similarity indeed helped it.
Among all the three variants, RSUMMCO did n’t fare well both in isolation as well as
in combination with MI and FDR for each feature representation of reviews. The accu-
racy value of it in isolation was 3.4% below RSUMMLS and 2% below RSUMMPE. In
this method also, applying MI and FDR increased the accuracy but not to the extent of
RSUMMLS and RSUMMLS. It could be due to the inadequacy of meta information incor-
porated by us to score subjectivity. Mining such information from a review may help in
better results with this method. But as we mentioned earlier, mining the meta information
is beyond the scope of this thesis.
RSUMMLS in combination with FDR as the feature selection method and bigrams as
features reported the maximum accuracy on PDS1 in our methodology. Using unigrams
as features, RSUMMPE performed better compared to other variants with FDR as feature
selection method (89.2%). The best accuracy value using both unigrams and bigrams as
features was reported by RSUMMPE in combination with FDR. Table 4.6, shows the accu-
racy values reported by researchers till date on PDS1.
To the best of our knowledge, the highest accuracy value reported on PDS1 was 93.7%
by Matsumoto et al. [64]. We reported a maximum accuracy of 94.9%, an increase of
1.2% using a combination of lexical similarity RSUMMLS and FDR as the feature selection
47
CHAPTER 4. EVALUATION
method. Their methodology included extracting clauses, generating word sub-sequences
using the extracted clauses. Then, they used POS tagger to prune the sub-sequences based
on the patterns that do not contribute to subjectivity and with a minimum support threshold
of two for unigrams, bigrams and ten for higher order sub-sequences. In addition to sub-
sequences, they used dependency parsing techniques to extract phrases and used them as
features for classification.
Our methodology was rather simple compared to them. First we decomposed a review
into subjective extract and then applied feature selection methods on n-gram models as fea-
ture vectors for classification. Through out our approach, we depended on frequency based
approaches and corpus statistics. We reported an accuracy comparable to that reported by
Matsumoto et al. [64] using our methodology. It proved our other claim that sentiment clas-
sification could be done by using simple approaches than complex patterns and linguistic
resources.
Pang et al. in [69] reported an accuracy of 87.2% on PDS1 using unigrams as features
and SVM as the classifier. Their assumption was that sentence level subjectivity detection
improves the document sentiment classification. They proved it with the help of their results
(82.6% to 87.2%). But their approach was more inclined towards extracting subjectivity
using contextual information. They assumed that sentences in proximity share the same
subjectivity status. Their subjectivity estimation was based on individual probabilities of
each sentence from the naive-bayes classifier trained on SDS. In addition to that, contextual
information that scores proximity between sentences also used. Hence, our approach was
clearly different from them. Our maximum accuracy was 94.9% using bigrams as features
and it was an increase of about 7% from the accuracy reported by them.
4.3 Multi-variant Classification
The baseline (BL) for our multi-variant classification system is using unigram representa-
tion on full review with no RSUMM and feature selection methods. Stefano et al. in [5]
48
CHAPTER 4. EVALUATION
Table 4.7 Table showing the results obtained by Stefano et al. on PDS2 for theirdifferent feature representations with MV as the feature selection method
Features MAEµ MAEM
BOW 0.682 1.141
BOW+Expr 0.456 0.830
BOW+Expr+sGI 0.448 1.165
BOW+Expr+sGI+eGI 0.437 0.942
emphasized the fact that it may not be raw unigrams, some times larger text units play a
major role in determining the orientation of sentiment. From the above experiments, it was
evident that using larger text units like bigrams (BI) as features for classification made the
system perform better. Hence, we stick to the assumption that larger text units enhance
the performance of classification as stated in [5]. They used GI lexicon and POS tagger to
extract larger text units, but we use RSUMM with MI and FDR as an alternative to using
linguistic resources.
They extracted text units that contribute to subjectivity using rule based approach with
the help of POS tagger. Some of the patterns include: Art JJ NN, NN VB JJ etc. They called
the text units that follow these patterns as expressions (Expr). Aggregating patterns was
done using GI lexicon. For example, text units like ”great location” and ”good location” are
aggregated as [positive] location. They called this way of aggregating text units as simple
GI expression (sGI). Then, there was a more complex way of aggregating text units called
Enriched GI expression (eGI). The above text units are aggregated as [Strong] [Positive]
location and [Virtue] [Positive] location respectively.
They used minimum variance (MV) as the feature selection method to select important
features. The results obtained by [5] were reported in Table. 4.7, lower values indicate more
accurate prediction with bold values being the best. The baseline for their system was using
bag-of-words (BOW) as the feature vector representation of the review to ε-SVR method.
49
CHAPTER 4. EVALUATION
Table 4.8 Table showing CV accuracies on PDS2 for different feature representa-tions using total review with LR as the classification method
Features MAEµ MAEM
BL 0.580 0.807
BL+BI 0.540 0.897
BL+BI+TRI 0.528 0.969
Table 4.9 Table showing CV accuracies on PDS2 for different feature representa-tions using RSUMMCO
Features MAEµ MAEM
BL+RSUMMCO 0.598 0.898
BL+RSUMMCO+BI 0.495 0.921
BL+RSUMMCO+BI+TRI 0.473 0.992
They divided PDS2 into 75% and 25% randomly for training and testing. There was no
cross validation test done, hence the values reported were not statistically significant.
4.3.1 Results
We limited our selves to using upto trigrams (TRI) in multi-variate classification. Table 4.8
shows our results for various feature representations on PDS2 using full review and logistic
regression as the classification method.
We did n’t apply RSUMMLS and RSUMMPE methods for scoring subjectivity, as SDS
contained subjective and objective sentences from movie review domain. Sentiment anal-
ysis is highly domain dependent and features from one domain would not work in other
domains. It was discussed already in Chapter. 2. Due to the meta information available
along with reviews in PDS2, we used RSUMMCO to score subjectivity of each sentence in
50
CHAPTER 4. EVALUATION
Table 4.10 Table showing CV accuracies on PDS2 for different feature represen-tations using ADF metric
Features MAEµ MAEM
BL+ADF 0.585 0.758
BL+BI+ADF 0.531 0.776
BL+BI+TRI+ADF 0.532 0.705
Table 4.11 Table showing CV accuracies on PDS2 for different feature represen-tations using RSUMM CO with MI and FDR
Features MI FDR
MAEµ MAEM MAEµ MAEM
BL+RSUMMCO 0.569 0.827 0.560 0.847
BL+RSUMMCO+BI 0.431 0.781 0.0.435 0.822
BL+RSUMMCO+BI+TRI 0.444 0.842 0.477 0.87
PDS2. We set ’X’ as 80% in our case to obtain subjective extract. Then we applied MI and
FDR on the subjective extract.
We used ADF metric as a conditional criteria. We associated two or more words if
they have document frequency greater than ADF of the collection PDS2 (ADFPDS2). We
applied ADF metric on unigrams as a feature selection method. For example consider text
units like, ”had a great time”, ”decent location” and ”hotel was very nice”. We extracted
features like [great time], [decent location], ”[hotel very nice]” provided each unigram has
document frequency greater than ADFPDS2. Table. 4.10 showed the effect of applying
ADF metric on unigrams and as a conditional criteria for bigrams and trigrams. We re-
ported the accuracy values of RSUMMCO in Table. 4.9. Results after applying MI and
FDR on the extract of RSUMMCO were reported in Table. 4.11
51
CHAPTER 4. EVALUATION
Table 4.12 Table showing CV accuracies on PDS2 for different feature represen-tations using naive-bayes classifier and MI as the feature selection method
Features MAEµ MAEM
UNI 0.496 0.524
UNI+BI 0.439 0.503
UNI+BI+TRI 0.44 0.444
In addition to LR, we also used naive-bayes classifier with different feature representa-
tions and MI as the feature selection method. Bigram and trigram features were obtained
using ADF conditional criteria as described above. Results of this experiment were re-
ported in Table. 4.12.
4.3.2 Discussion
We used unigrams, bigrams, trigrams and combination of them as features for rating re-
views on a scale of one to five. We obtained a highest MAEµ value of 0.431 very much
comparable to what Stefano et al. [5] have obtained (0.437) using their subjective feature
extraction method based on linguistic resources. There was a relative improvement of 1.5%
in MAEµ using our approach. The highest MAEµ value was reported using RSUMMCO
for obtaining subjective extract in combination with MI as the feature selection method.
The baseline MAEµ value using unigrams as features on total review was 0.580. There
was a relative improvement of 25.7% from BL with unigrams as features which is sig-
nificant. But, unigrams in combination with MI, FDR, ADF as feature selection method
performed slightly below the baseline. It could be due to aggressive thresholds of 10%
that we used to select the final feature set. It also supported our assumption some times
larger text units like bigrams and trigrams enhance the performance of classification. In
each case, bigrams in combination with unigrams, trigrams in combination with unigrams
and bigrams performed better to BL in MAEµ evaluation perspective. Since, the dataset52
CHAPTER 4. EVALUATION
was fairly large compared to PDS1, we went to the extent of trigrams.
The highest accuracy value obtained by Stefano et al. [5] using MAEM metric was
0.830. We have not obtained significant results in this regard using RSUMMCO with MI and
FDR as feature selection methods. Also combining bigrams and trigrams with unigrams
declined the performance of classification. It strongly conveyed that usage of higher order
n-grams was dependent on the size of the dataset. As PDS2 was highly skewed towards
labels four and five, using our filtering methodology based on co-occurrence did n’t classify
the samples with labels one, two, three accurately. But using MAEµ we were to able to
produce good results, it conveyed that labels which were dense classified better using our
methodology.
We obtained better MAEM values using ADF as the feature selection method for un-
igrams, and as a conditional criteria for obtaining bigrams and trigrams. The relative im-
provement of about 14.4% from the BL was obtained using a combination of unigrams,
bigrams and trigrams as features. Naive-bayes classifier in combination with MI as the
feature selection method performed better in classifying the labels one, two and three. It
obtained higher MAEM values compared to LR method. The best value of 0.444 was re-
ported using it. It conveyed naive-bayes which was popular among topical classification
can still be applied to multi-variate sentiment classification.
4.4 Conclusion
In this chapter, we explained how we evaluated our system. We clearly explained the statis-
tics of the datasets, cross validation tests, evaluation metrics. We used standard evaluation
metrics like accuracy, mean-absolute-error for evaluation. We implemented RSUMMLS,
RSUMMPE and RSUMMCO on PDS1 and only RSUMMCO on PDS2 because of the do-
main dependency problem in sentiment analysis. We proved that subjective feature ex-
traction was achievable minimizing linguistic resources through our experimental results.
Using our methodology, we were able to achieve significant improvements on both PDS1
53
CHAPTER 4. EVALUATION
and PDS2 from the baseline and existing state-of-the-art approaches.
54
Chapter 5
Sentiment Summarization
In this chapter, we discuss on how to summarize sentiments of different users towards a
particular topic. Here, we focus on summarizing sentiments of users in multiple docu-
ments unlike RSUMM that focused on single document subjective summary. Also, we re-
late sentiment classification and sentiment summarization and how former helps the latter.
Sentiment summarization is one application where sentiment classification can be applied.
5.1 Introduction
Automated text summarization addresses the problem of information overload by condens-
ing the essence of the text to a level comparable to that of original document. It can be
either abstract or extract based on single and multiple documents. Sentiment summariza-
tion differs from traditional document summarization [80, 87] as it has to optimize an extra
property called sentiment in it. Although, rating is a form of summary for the text, the
real essence of the sentiment is contained in the text itself. In our work, we developed a
system that summarizes the sentiments of different users towards a particular topic from
multiple blog posts. In this work, we view the problem of sentiment summarization as a
two-stage classification problem at sentence level. First, we estimate the subjectivity and
then estimate the polarity of each sentence.
55
CHAPTER 5. SENTIMENT SUMMARIZATION
Most of the existing work in multi-document sentiment summarization focused on gen-
erating aspect based summary of a product. Aspect based summarization followed two
steps:
1. Extract product feature-opinion associations from sentences.
2. Prune them to generate the summary.
Hu and Liu in [43] used POS tagger to extract product features. They assumed that product
features are nouns and noun phrases and extracted them using POS tagger. Then they used
frequent item set mining to prune the product features. They classified a sentence as an
opinion or fact based on these product features. If a sentence has more than two features,
it is likely to contain the sentiment of the user. They determined the polarity orientation of
a sentence using a manual seed list of opinion bearing words. They produced a summary
for each feature of a product providing evidences in the form of opinion sentences from
reviews. Remember, the feature of a product is different from the n-gram features discussed
earlier. Researchers followed the above methodology with different ways of extracting
feature-opinion associations from customer reviews until 2008 [111, 112, 30].
With the introduction of opinion summarization track in Text Analysis Conference
(TAC) 20081, extract based opinion summarization gained popularity. The track focused
on query based opinion summarization of blog posts rather than customer reviews and re-
searchers developed systems and evaluated them using the TAC data as in [46, 9].
Task Definition: TAC 2008 Opinion Summarization task is defined as the automatic gen-
eration of well-organized, fluent summaries of opinions about specified targets, as found in
a set of blog documents. Each summary has to address a set of complex questions about
the target, where the question cannot be answered simply with a named entity (or even
a list of named entities). The input to the summarization task comprises a target, some
opinion-related questions about the target (see Figure. 5.1) and a set of documents that
contain answers to the questions. The output is a summary for each target.1http://www.nist.gov/tac/tracks/2008/index.html
56
CHAPTER 5. SENTIMENT SUMMARIZATION
Figure 5.1 Sample TAC Queries and Targets
Our summarization system is illustrated in Figure. 5.2. The input to the system is
a query and a set of blog posts (documents) from which the sentiment summary has to
generated. We assume that each query has a polarity orientation and predict the orientation
as positive/negative that will be used as a filter. For example, the query ”What features do
people like about vista?” expects the positive comments that writers expressed on product
Windows Vista to be returned in the summary. We do not look into complex queries but
instead focused on simple queries that have either positive or negative orientation as in TAC
dataset.
5.2 Classification Based Approach
We view the problem of summarizing sentiments as a two-stage classification problem at
sentence level. We split each document in the document set into sentences and predict
whether a sentence is an opinion or fact. Later, we determine the polarity of opinionated
sentences returned by the above method on a binary scale as positive or negative.
57
CHAPTER 5. SENTIMENT SUMMARIZATION
Figure 5.2 Architecture of our sentiment summarization system
58
CHAPTER 5. SENTIMENT SUMMARIZATION
5.2.1 Training the Classifier
For training the opinion/fact classifier, we used a set of 10,000 sentences that have equal
number of sentences labeled as opinions and facts. For training the polarity classifier, we
crawled about 1,28,000 reviews on various topics rated manually on a scale of one to five.
We used this as the training set for classifying each opinionated sentence as positive or
negative. We tagged each sentence in a review as positive or negative based on the rating
given at the end of each review. Reviews with rating of four and five are considered to
be positive and others as negative. We used rainbow text classifier implemented in [65]
and built classification models using it. It has several in built methods like naive-bayes,
Knn, TFIDF, Probabilistic indexing etc. We trained the classifier using unigrams and word
association as features with probabilistic indexing [8] as the method. Probabilistic indexing
method performed better compared to other methods on the training data.
Word association is a simple variant of bigram, it has nothing related to association rule
mining. We tokenize each sentence in opinion/fact and polarity training data into words
and associate each token with all other tokens in the sentence. The motivation behind this
approach is that the characteristic of opinion or polarity of a sentence is not determined by
a single token; rather it is the combination of tokens that determines it. We limit ourselves
to text units of maximum size two while training the classifier.
5.2.2 Polarity Estimation
We define the metric polarity estimation (PE) that estimates the polarity score of a sentence
for a particular orientation. The orientation of query is used as a filter. If the query focuses
on positive aspects of a product, then we estimate polarity for sentences that are labeled as
positive by the polarity classifier and vice versa. We use the scores returned by the rainbow
classifier to compute PE of a sentence for the query orientation as shown in the eqn. 5.1.
The polarity orientation of the query is also determined using the polarity classifier. The
59
CHAPTER 5. SENTIMENT SUMMARIZATION
smoothing parameters in the eqn. 5.1 are intuitively set to 0.3 and 0.7 respectively.
PE(S|C) = 0.3XPPI(S|O) + 0.7XPPI(S|C) (5.1)
where PE(S|C) denote the polarity estimate of an opinion sentence S, PPI(S|O) denotes
the probability of sentence being an opinion and PPI(S|C) denotes the probability of a sen-
tence belonging to class C returned by the opinion/fact and polarity classifier respectively
and C can be positive or negative depending on the query.
5.2.3 Final Ranking
In addition to polarity estimate metric, we rank sentences using two other metrics query
dependent (QD) and query independent (QI) metric. Query dependent metric boosts the
sentences that are more relevant to the query as described in [75]. Query independent
metric picks most informative sentence using relevance based language modeling [47]. QI
metric uses KL divergence [56] for estimating the importance of a sentence by observing its
likelihood in relevant and irrelevant distribution respectively. The final score of a sentence
is a linear combination of the above three metrics and is shown in eqn. 5.2.
FS(S) = λ1QI(S) + λ2QD(S) + λ2PE(S|C) (5.2)
where QI(S),QD(S) and PE(S|C) are the query independent, query dependent and po-
larity scores of sentence S respectively. λ1, λ2 and λ3 are the smoothing parameters for
each metric.
5.3 Experiments
5.3.1 Dataset
We evaluated our approach to summarize sentiments on TAC 2008 opinion summarization
task dataset. The dataset have 25 topics and each topic has one or two of squishy list
60
CHAPTER 5. SENTIMENT SUMMARIZATION
questions and a set of documents (blog posts) where the answers are likely to be found. A
descriptive answer is expected for a squishy list question. In this task, we have to preserve
an extra property called sentiment in the summary to the maximum extent. The questions
focused on either positive or negative aspects of a topic.
5.3.2 Evaluation Metrics
We evaluated our system using ”Nugget Judgements” provided for each topic in TAC.
Each judgement had a nugget score or weight that would be used to judge the quality of
the summary. The nugget judgement with maximum weight was considered to be most
relevant. Judgements were provided only for 22 topics out of 25, hence, we evaluated
our system using the 22 judgements only. We used the evaluation metrics Nugget Recall
(NR), Nugget Precision (NP) and F-Measure confirming to standard TAC practices. Those
sentences that have an overlap threshold of at least 40% are considered to be redundant and
subsequently discarded.
Nugget Recall (NR) :sum of weights of nuggets returned in the summary
sum of weights of all nuggets related to the topic(5.3)
Nugget Precision (NP ) : Allowance/Length (5.4)
where Allowance =100 X number of nuggets returned in the summary and Length =
number of non-white space characters in the summary.
F −Measure = (1 + β2)XNP.NR
β2.NP +NR; with β = 1 (5.5)
5.3.3 Results
We chose the values of λ1, λ2 and λ3 in eqn. 5.2 as 0.35, 0.25 and 0.4 respectively after
some manual tuning of weights. The values presented in the Table 5.1 are the average
values over 22 topics. Average F-measure score obtained using our approach is better than
many of the systems submitted to TAC 2008. Out of the thirty six runs submitted to the
task only nine runs performed better than us with the best being 0.489.61
CHAPTER 5. SENTIMENT SUMMARIZATION
Table 5.1 Results showing average NR, NP and F-Measure values for 22 topics
Evaluation Metric Avg. Score
NR 0.287
NP 0.164
F-Measure 0.209
5.4 Conclusion
In this work, we were able to present a general overview of a sentiment summarization sys-
tem and how sentiment classification helps in summarizing sentiments. Our summarization
system focused on extract based summaries unlike previous systems that are aspect based.
We built two classifiers that classify each sentence in a document as an opinion or fact
and positive or negative respectively using unigrams and word association as features. We
estimated the polarity of a sentence using the classifier scores and combined it with QI and
QD in the final scoring of a sentence. We also took care of redundancy while generating
summary based on overlap threshold. Sentiment summarization particularly extract based,
is still at early stages of research and we believe our approach is a right step in the future
direction to explore more novel methods.
62
Chapter 6
Conclusion
Sentiment classification could be treated as a special case of topical classification applied
to subjective portions of a document. In this thesis, we discussed the problem of document
sentiment classification and subjective feature extraction, the key component in it. We dis-
cussed the challenges in extracting subjectivity, existing approaches and their limitations.
Though, there were many techniques proposed in the last decade for extracting subjective
features from a document, there are still many open problems that needs to be addressed.
Most of the proposed methods relied heavily on linguistic resources like sentiment lexicons
and complex patterns based on POS information, making the task more resource dependent
and complex. It requires a lot of human effort to develop such tools to analyse sentiments
in various domains and languages. Hence, extending these resource dependent approaches
to various domains, languages is not a feasible solution. This motivated us to conduct re-
search on methodologies that require minimum use of linguistic resources and yet achieve
comparable or better results.
Also, most of the existing sentiment analysis systems predict the polarity on a binary
scale. However, in real world applications, expressing the sentiment is too complex and
it can’t be simple binary. Hence, we conducted experiments to predict the sentiment on
a multi-variant scale of one to five popularly known as starred rating on the web. We
adopted a filtering methodology to derive subjective features and used supervised learning
63
CHAPTER 6. CONCLUSION
approaches to analyse the overall sentiment of a document. We proposed a method called
RSUMM in combination with well known feature selection techniques to extract subjective
features. Our RSUMM was based on information retrieval models. Techniques similar to
vector space model, unigram language model and term co-occurrence model were used to
estimate subjectivity of each sentence.
6.1 Contributions
Current day approaches in sentiment analysis lie at the crossroads of NLP and IR, where
subjective feature extraction was highly dominated by linguistic resources. In this thesis,
we attempted to move away from using language resources and investigated approaches
that make the task of subjective feature extraction ”resource independent”. It was the major
contribution of this thesis. We approached the problem by following a two step filtering
methodology and did experiments to predict the sentiment in customer reviews. The basis
for this methodology was our analysis on the web manually, where we had seen many
reviews with less subjective content compared to the total content.
• We proposed a method called RSUMM to extract subjective sentences from a docu-
ment. We estimated subjectivity of each sentence using three variants of RSUMM;
RSUMMLS, RSUMMPE and RSUMMCO. All the variants of RSUMM were based
on information retrieval models. We used techniques similar to vector space model,
unigram language model and term co-occurrence model to estimate subjectivity. We
obtained an extract of each review retaining the most subjective sentences from the
document in it. We used the subjective extract to predict the sentiment orientation
rather than full review.
• We used n-gram models to convert a sentence into a feature vector for classification.
As n-gram modeling was done on sentences, there could be lot of irrelevant features
that needs to be filtered. We used two state-of-the-art feature selection methods mu-
64
CHAPTER 6. CONCLUSION
tual information and fisher discriminant ratio to remove them.
• Logistic regression (LR) method was widely used in patent information retrieval and
also in applications where human preferences play a major role like grading a student.
It was used for the first time in sentiment classification to the best of our knowledge
for calibrating the customer satisfaction. But, we did n’t go into internals of this
method rather focused on deriving features from the document and presenting them
as feature vectors.
• We also worked on an application of sentiment classification, sentiment summariza-
tion. We summarized the sentiment in multiple blog posts related to a topic following
a classification based approach. We adopted a two-stage classification procedure to
summarize blog posts at sentence level. A sentence was estimated for its subjectiv-
ity and polarity based on the classifier scores. We used a linear combination of the
scores for final ranking.
We conducted experiments on standard datasets used by many researchers in sentiment
classification and summarization. The classification datasets were form hotel and movie
review domains. The dataset we used for evaluating our summarization system contained
blog posts on different topics. We evaluated our methodology confirming to standard evalu-
ation metrics in both classification and summarization. We used accuracy, mean-absolute-
error (both micro and macro versions of it) and reported results. Precision, recall and
F-measure were used as metrics for evaluating opinion summaries.
We reported the results of experiments on sentiment classification and summarization
in Chapter 4 and Chapter 5 respectively. We were able to achieve good accuracy values
while classifying sentiments on both binary and multi-variant scales. Our results were on
par or better to the state-of-the-art approaches that used linguistic resources for extracting
subjective features. We evaluated our summarization system on TAC 2008 blog dataset and
we were able to obtain good performance using our classification methodology than many
systems participated in TAC.65
CHAPTER 6. CONCLUSION
6.2 Applications
In this section, we discuss some real world applications of sentiment classification.
6.2.1 Products comparison
Most of the people are using web to recommend a product or not. Online merchants are
asking customers to review their products and also very curious on their judgements. Re-
searchers are also focusing on classifying views of the people on a product as recommended
or not automatically [72, 18, 89]. A product has several aspects on which people comment
on and probably have short comings in one aspect with merits on another [66, 86]. To ana-
lyze these sentiments in the text and coming up with a comparison of customers opinion on
different products with a mere single glance (rating) can really facilitate a better informa-
tion access for merchants and others. The comparison of products on the web can enable
people to easily gather marketing intelligence and product benchmarking information.
Liu et al. in [61] proposed a novel framework to analyse and compare customer opin-
ions on several competing products. A prototype system called Opinion Observer was
implemented. The process involved two steps: 1) Identifying product features or aspects
that users have commented up on. 2) For each feature extracted, identify the semantic ori-
entation of the sentiment. They presented the comparison output in the form of a visual
summary for each feature (aspect) of the product.
6.2.2 Sentiment summarization
The number of reviews that a product receives is increasing rapidly on the web. Popular
products are often commented by the people. And, some of the reviews are long, contain
less opinion information and redundant. This makes it hard for a potential customer and
also product manufactures to make an informed decision. Sentiment summarization sum-
marizes the opinions by predicting the polarity of the sentiment in the text, quantifying
66
CHAPTER 6. CONCLUSION
the sentiment and relation between entities [55, 74]. A customer or a manufacturer gets
a complete overview of what different people are saying about a particular product with a
sentiment summary. We conducted experiments on this application of sentiment classifica-
tion.
6.2.3 Opinion reason mining
Opinion reason mining is another area where sentiment classification can be applied. In this
area of research, people do a critical in-depth analysis of opinion-assessment. For example,
”What are the reasons for the popularity of Windows 7”?. For such type of queries, simply
giving some 150 reviews on Windows 7 that are positive and some 50 reviews that have
negative polarity is not sufficient. Reasons such as ”The product is popular for its look and
feel, and bootable time.” convey a in-depth assessment for the customer. In this application,
sentiment classification is used to come up with a general overview of pros and cons of a
product, and also the exact reasons for them.
6.2.4 Other Applications
Online message sentiment filtering, sentiment web search engine, E-mail sentiment detec-
tion and web blog author sentiment prediction are other applications of sentiment classifi-
cation.
6.3 Future Directions
Our approach can be considered as a building block for investigating subjective feature
extraction methods that require minimum use of linguistic resources. In this thesis, we ex-
plored two simple metrics( ADF, ASM) and methods based on information retrieval models
(probabilistic estimate, term co-occurrence) for estimating subjectivity. In future, one can
explore on more novel metrics and models in subjective feature extraction and conduct ex-
67
CHAPTER 6. CONCLUSION
periments. We employed two state-of-the-art feature selection methods but did n’t explore
more on other feature selection methods. This area can be treated as one of the possible
future directions to explore particularly in multi-variate classification.
From the results we reported in the above chapters, we believe that our methodology is
a right step in the direction of investigating subjective feature extraction approaches that use
statistical means. But, due to unavailability of standard annotated datasets in different lan-
guages and of large size, we conducted experiments on datasets used by many researchers
in sentiment classification for comparing our results. Based on the accuracy values we have
obtained, we are fairly confident that our approach reduced the ”resource dependency prob-
lem” in subjective feature extraction. In future, one can extend this methodology to conduct
experiments on analysing sentiments in regional languages and also very large datasets.
We followed a naive classification based approach at sentence level for summarizing
opinions in blog posts. In sentiment summarization, aspect based summarization is highly
popular compared to extract based summarization. Extract based summarization is gaining
popularity now a days. We focused on appreciating the need for a sentiment summarization
system and the difference between normal text summarization and sentiment summariza-
tion. We also focused on establishing the relation between sentiment classification and
summarization. Our methodology can be further improved in future to include more novel
techniques for summarizing sentiments.
68
Bibliography
[1] A. Andreevskaia and S. Bergler. Mining wordnet for a fuzzy sentiment: Sentimenttag extraction from wordnet glosses. In proceedings of EACL., 2006.
[2] S. Argamon, M. Koppel, and G. Avneri. Routing documents according to style.In proceedings of first international workshop on innovative information systems.,1998.
[3] A. Aue and M. Gamon. Customizing sentiment classifiers to new domains: A casestudy. In proceedings of RANLP., 2005.
[4] B. Avrim and S. Chawla. Learning from labeled and unlabeled data using graphmincuts. In proceedings of 18th ICML., pages 19–26, 2001.
[5] S. Baccianella, A. Esuli, and F. Sebastini. Multi-facet rating of product reviews. Inproceedings of European Conference on Infromation Retrieval, ECIR., pages 461–472, 2009.
[6] Ricardo Baeza-Yates and Berthier Riberio-Neto. Modern Information Retrieval.Addison-Wesley Longman Publishing Co., 2002.
[7] P. Beineke, T. Hastie, and S. Vaithayanathan. The sentimental factor: Improvingreview classification via human-provided information. In proceedings of 42nd ACL.,2004.
[8] A. Bookstein and D.R. Swanson. Probabilistic methods for automatic indexing.Journal of ASIS., Vol. 25:312–319, 1974.
[9] A. Bossard, M. Genereux, and T. Poibeau. Cbseas, a summarization system, inte-gration of opinion mining techniques to summarize blogs. In proceedings of EACL.,pages 5–8, 2009.
[10] E. Brill. Transformation based error-driven learning and natural language process-ing. Computational Linguistics., Vol. 21:pp. 543–565, 1995.
[11] A. Budanitsky and G. Hirst. Semantic distance in wordnet: An experimentalapplication-oriented evaluation of five measures. In proceedings of NAACL work-shop on WordNet and other lexical resources., 2001.
69
BIBLIOGRAPHY
[12] J.A. Bullinaria. Semantic categorization using simple word co-occurrence statistics.In proceedings of ESSLLI workshop on Distributional Lexical Semantics., pages 1–8, 2008.
[13] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector ma-chines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
[14] P. Chaovalit and L. Zhou. Movie review mining: A comparison between supervisedand unsupervised classification approaches. In proceedings of 38th Hawaii interna-tional conference on system sciences., pages 1–9, 2005.
[15] E. Charlotta. Topic dependence in sentiment classification., 2004.
[16] H. Cui, V. Mittal, and M. Datar. Comparative experiments on sentiment classificationfor online product reviews. In proceedings of American Association for ArtificialIntelligence, AAAI., pages 1265–1270, 2006.
[17] J.R. Curran. Ensemble methods for automatic thesaurus extraction. In proceedingsof EMNLP., pages 222–229, 2002.
[18] S.R. Das and M. Chen. Yahoo! for amazon: Sentiment parsing from small talk onthe web. In proceedings of 8th Asia Pacific finance association annual conference.,2001.
[19] K. Dave, S. Lawrence, and D. Pennock. Mining the peanut gallery: Opinion ex-traction and semantic classification of product reviews. In proceedings of WWW.,2003.
[20] A. Devitt and C. Vogel. The topology of wordnet: Some metrics. In proceedings ofglobal WordNet conference, GWC., 2004.
[21] M. Dimitrova, A. Finn, N. Kushmeric, and B. Smyth. Web genre visualization. Inproceedings of conference on human factors in computing systems., 2002.
[22] S.D. Durbin, J. Neal Richter, and D. Warner. A system for effective rating of texts.In proceedings of OTC-3, workshop on operational text classification., 2003.
[23] P. Edmonds. Semantic representations of near-synonyms for automatic lexicalchoice. PhD thesis, University of Toronto, 1999.
[24] P. Edmonds and G. Hirst. Near-synomymy and lexical choice. Computational Lin-guistics, Vol. 28:pp. 105–144, 2002.
[25] M. Efron. Cultural orientation: Classifying subjective documents by cociation anal-ysis. In proceedings of AAAI fall symposium on style and meaning in language.,pages 41–48, 2004.
70
BIBLIOGRAPHY
[26] A. Esuli and F. Sebastiani. Determining the semantic orientation of terms throughgloss classification. In proceedings of CIKM., 2005.
[27] A. Esuli and F. Sebastini. Determining the term subjectivity and term orientation foropinion mining. In proceedings of 11th conference of the european chapter of theassociation for computational linguistics, EACL., 2006.
[28] A. Esuli and F. Sebastini. Sentiwordnet: A publicly available lexical source foropinion mining. In proceedings of LREC 2006., 2006.
[29] Z. Fei, J. Liu, and G. Wu. Sentiment classification using phrase patterns. In pro-ceedings of 4th international conference on computer and information technology,CIT., 2004.
[30] O. Feiguina and G. Lapalme. Query based summarization of customer reviews. Inproceedings of Canadian AI., pages 452–463, 2007.
[31] C. Fellbaum. WordNet: An electronic lexical database. MIT Press, 1998.
[32] R.M. French and C. Labiouse. Four problems with extracting human semantics fromlarge text corpora. In proceedings of 24th annual conference of the cognitive sciencesociety., pages 316–322, 2002.
[33] M. Gamon. Sentiment classification on customer feedback data: Noisy data, largefeature vectors and the role of linguistic analysis. In proceedings of 20th interna-tional conference on computational linguistics., pages 841–847, 2004.
[34] M. Gamon and A. Aue. Automatic identification of sentiment vocabulary exploitinglow association with known sentiment terms. In proceedings of ACL workshop onfeature engineering in machine learning in NLP., 2005.
[35] N.S. Glance, M. Hurst, and T. Tomokiyo. Blog pulse: Automatic trend discoveryfor weblogs. In proceedings of WWW workshop on the weblogging ecosystem: Ag-gregation, analysis and dynamics., 2004.
[36] G. Grefenstette. Explorations in automatic thesaurus discovery. Kluwer AcademicPress, 1994.
[37] G. Grefenstette, Y. Qu, J.G. Shanahan, and D.A. Evans. Coupling niche browsersand affect analysis for an opinion mining application. In proceedings of RIAO-04.,pages 186–194, 2004.
[38] U. Gretzel and K.Y. Yoo. Use and impact of online travel review. In proceedings ofthe 2008 International Conference on Information and Communication Technology.,pages 35–46, 2008.
[39] Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. MorganKauffman, 2001.
71
BIBLIOGRAPHY
[40] V. Hatzivassiloglou and K.R. McKeown. Predicting the semantic orientation of ad-jectives. In proceedings of 35th ACL., 1997.
[41] V. Hatzivassiloglou and J. Wiebe. Effects of adjective orientation and gradability onsentence subjectivity. In proceedings of 18th international conference on computa-tional linguistics., 2000.
[42] D. Hillard, M. Ostendorf, and E. Shriberg. Detection of agreement vs disagreementin meetings: Training with unlabeled data. In proceedings of HLT/NAACL., 2004.
[43] M. Hu and B. Liu. Mining and summarizing customer reviews. In proceedings ofSIGKDD., 2004.
[44] M. Hu and B. Liu. Mining opinion features in customer reviews. In proceedings ofAAAI., pages 755–760, 2004.
[45] D.J. Inkpen, O. Feiguina, and G. Hirst. Generating more positive and more negativetext. Computing attitude and affect in text: Theory and applications. The informationretrieval series., Vol. 20:pp. 187–196, 2004.
[46] G.C. Jack, L.L. Jochen, F. Schilder, and K. Ravi. Query-based opinion summariza-tion for legal blog entries. In proceedings of ICAIL., 2009.
[47] J. Jagadeesh, P. Prasad, and Vasudeva Varma. A relevance-based language modelingapproach to duc 2005. In working notes of DUC., 2005.
[48] J. Justeson and K. Slava. Technical terminology: some linguistic properties and analgorithm for identification in text. Natural Language Engineering., Vol. 1:pp. 9–27,1993.
[49] J. Kamps, M. Marx, R.J. Mokken, and M. de Rijke. Using wordnet to measure se-mantic orientation of adjectives. In proceedings of LREC., pages 1115–1118, 2004.
[50] A. Kennedy and D. Inkpen. Sentiment classification of movie reviews using contex-tual valence shifters. Computational Intelligence, Vol. 22:pp. 110–125, 2006.
[51] B. Kessler, G. Nunberg, and H. Schautze. Automatic detection of text genre. Inproceedings of 35th ACL., pages 32–38, 1997.
[52] S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In proceedings ofCOLING., pages 1363–1373, 2004.
[53] S-M. Kim and E. Hovy. Automatic detection of opinion bearing words and sen-tences. In proceedings of IJCNLP., 2005.
[54] N. Kobayashi, T. Inui, and K. Inui. Dictionary based acquisition of the lexical knowl-edge for p/n analysis (in japanese). In proceedings of Japanese society for artificialintelligence., pages 45–50, 2001.
72
BIBLIOGRAPHY
[55] W. Ku, i, L-Y. Lee, T. Wu, and H-H. Chen. Major topic detection and its applicationto opinion summarization. In proceedings of SIGIR., pages 627–628, 2005.
[56] S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathemat-ical Statistics., Vol. 22, 1951.
[57] J. Laffetry, A. McCallum, and F. Pereira. Conditional random fields: Probabilisticmodels for segmenting and labeling of sequence data. In proceedings of ICML.,2001.
[58] Tjen-Sien Lim, Wei-Yin Loh, and Yu-Shan Shih. A comparison of prediction accu-racy, complexity, and training time of thirty-three old and new classification algo-rithms. In Machine Learning., pages 203–228, 2000.
[59] D. Lin. Automatic retrieval and clustering of similar words. In proceedings ofCOLING-ACL., pages 768–774, 1998.
[60] B. Liu. Sentiment analysis and subjectivity. Handbook of Natural Language Pro-cessing, 2010.
[61] B. Liu, M. Hu, and J. Cheng. Opinion observer: Analyzing and summarizing opin-ions on the web. In proceedings of WWW., pages 10–14, 2005.
[62] H. Liu, H. Lieberman, and T. Selker. A model of textual affect sensing using real-world knowledge. In proceedings of 8th international conference on intelligent userinterfaces., pages 125–132, 2003.
[63] R. Losee. Natural language processing in support of decision-making. phrases andpart-of-speech tagging. Information processing and management., Vol. 37:pp. 769–787, 2001.
[64] S. Matsumoto, H. Takamura, and M. Okumura. Sentiment classification using wordsub-sequences and dependency sub-tress. In proceedings of Pacific Asia Conferenceon Knowledge Discovery and Data Management, PAKDD., pages 301–311, 2005.
[65] Andrew Kachites McCallum. Bow: A toolkit for statistical language modeling, textretrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.
[66] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima. Mining product reputa-tions on the web. In proceedings of ACM SIGKDD., pages 341–349, 2002.
[67] T. Mullen and N. Collier. Sentiment analysis using support vector machines usingdiverse information scores. In proceedings of EMNLP., pages 412–418, 2004.
[68] K. Nigam, A. McCallum, and S. Thrun. Text classification from labeled and unla-beled documents using em. Machine Learning., Vol. 39:pp. 103–134, 2000.
73
BIBLIOGRAPHY
[69] B. Pang and L. Lee. A sentimental education: Sentiment analysis using subjectivitysummarization based on minimum cuts. In proceedings of Association for Compu-tational Linguistics, ACL., page 271278, 2004.
[70] B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment cate-gorization with respect to rating scales. In proceedings of 43rd ACL., pages 115–124,2005.
[71] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trendsin Information Retrieval, Vol. 2:pp. 1–135, 2008.
[72] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification us-ing machine learning techniques. In proceedings of Association for ComputationalLinguistics, ACL., pages 79–86, 2002.
[73] F.C.N. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. Inproceedings of ACL., pages 183–190, 1993.
[74] B. Philip, T. Hastie, M. Christopher, and V. Shivakumar. Exploring sentiment sum-marization. In proceedings of AAAI symposium on exploring attitude and effect intext., 2004.
[75] P. Prasad, K. Rahul, and Vasudeva Varma. Iiit hyderbad at duc’07. In working notesof DUC., 2007.
[76] A. Rabuer and A. Muller-Kogler. Integrating automatic genre analysis into digitallibraries. In proceedings of 1st ACM-IEEE joint conference on digital libraries.,2001.
[77] R. Rapp. A freely available automatically generated thesaurus of related words. Inproceedings of LREC., 2004.
[78] A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In proceed-ings of EMNLP., pages 133–142, 1996.
[79] E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. Inproceedings of EMNLP., pages 105–112, 2003.
[80] G. Salton, A. Singhal, C. Buckley, and M. Mitra. Automatic text decomposition us-ing text segments and text themes. In proceedings of ACM conference on Hypertext.,1996.
[81] F. Salvetti, S. Lewis, and C. Reichenbach. Automatic opinion polarity classificationof movie reviews. Colorado research in linguists., Vol. 17, 2003.
[82] H. Schmid. Probabilistic part-of-speech tagging using decision trees. In proceedingsof international conference on new methods in language processing., 1994.
74
BIBLIOGRAPHY
[83] E. Spertus. Automatic recognition of hostile messages. In proceedings of IAAI.,1997.
[84] P. Subasic and J. Huettner. Affect analysis of text using fuzzy text typing, fuzzysystems. IEEE transactions, Vol. 9:pp. 483–496, 2001.
[85] M. Taboada, A. Caroline, and V. Kimberly. Creating semantic orientation dictionar-ies. In proceedings of 5th LREC., 2006.
[86] M. Taboada, M.A. Gillies, and P. McFetridge. Sentiment classification techniquesfor tracking literary reputation. In proceedings of LREC Workshop ”Towards Com-putational Models of Literary Analysis”., pages 36–43, 2006.
[87] J. Tait. Automatic Summarizing of English Texts. PhD thesis, University of Cam-bridge, 1983.
[88] H. Takamura, T. Inui, and M. Okumura. Extracting semantic orientation of wordsusing spin model. In proceedings of 43rd ACL., 2005.
[89] L. Terveen, W. Hill, B. Amento, and J. Creter. Phoaks. a system for sharing recom-mendations. In Communications of the ACM., pages 59–62, 1997.
[90] T.T. Thet, C-J. Na, and S.G. Christopher Khoo. Sentiment classification of moviereviews using multiple perspectives. In proceedings of ICADL., pages 184–193,2008.
[91] R.M. Tong. An operational system for detecting and tracking opinions in onlinediscussion. In proceedings of SIGIR workshop on operational text classification.,2001.
[92] P.D Turney. Thumbs up or thumps down? semantic orientation applied to unsu-pervised classification of reviews. In proceedings of Association for ComputationalLinguistics, ACL., pages 417–424, 2002.
[93] P.D. Turney and M.L. Littman. Unsupervised learning of semantic orientation from ahundred-billion-word corpus. Technical report: National research council, Canada.,2002.
[94] P.D. Turney and M.L. Littman. Measuring praise and criticism: Inference of se-mantic orientation from association. In ACM transactions on information systems.,pages 315–346, 2003.
[95] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York.,1995.
[96] S. Vegnaduzzo. Acquisition of subjective with limited resources. In proceedings ofAAAI spring symposium on exploring attitude and affect in text., 2004.
75
BIBLIOGRAPHY
[97] S. Wang, D. Li, Y. Wei, and H. Li. A feature selection method based on fisher’sdiscriminant ratio for sentiment classification. In proceedings of WISM., pages 88–97, 2009.
[98] J. Weibe and E. Riloff. Creating subjective and objective sentence classifiers fromunannotated texts. In proceedings of 6th international conference on intelligent textprocessing and computational linguistics., 2005.
[99] C. Whitelaw, S. Argamon, and N. Garg. Using appraisal taxonomies for sentimentanalysis. In proceedings of first computational systemic functional grammar confer-ence., 2005.
[100] C. Whitelaw, N. Garg, and S. Argamon. Using appraisal groups for sentiment anal-ysis. In proceedings of CIKM., pages 625–631, 2005.
[101] J. Wiebe. Learning subjective adjectives from the corpora. In proceedings of AAAI.,pages 735–740, 2000.
[102] J. Wiebe, R. Bruce, and .T. O’Hara. Development and use of gold standard data forsubjectivity classifications. In proceedings of 37th ACL., pages 246–253, 1999.
[103] J. Wiebe, R. Bruce, and T. O’Hara. Development and use of gold standard data forsubjectivity classifications. In proceedings of 37th ACL., pages 246–253, 1999.
[104] J. Wiebe, T. Wilson, and M. Bell. Identifying collocations for recognizing opinions.In proceedings of ACL/EACL workshop on collocation., 2001.
[105] J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emo-tions in language. In proceedings of LREC., 2005.
[106] Y. Wilks and M. Stevenson. The grammar of sense: Using part-of-speech tags asa first step in semantic disambiguation. Natural language engineering., Vol. 4:pp.135–144, 1998.
[107] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phraselevel sentiment analysis. In proceedings of HLT/EMNLP., 2005.
[108] Y. Yang and J.O. Pederson. A comparitive study on feature selection methods in textcategorization. In proceedings of ICML., pages 412–470, 1997.
[109] H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: Separatingfacts from opinions and identifying the polarity of opinion sentences. In proceedingsof EMNLP., pages 129–136, 2003.
[110] Z. Zhang and B. Varadarajan. Utility scoring of product reviews. In proceedings of15th CIKM., pages 51–57, 2006.
76
BIBLIOGRAPHY
[111] L. Zhuang, F. Jing, and Y.X. Zhu. Movie review mining and summarization. Inproceedings of CIKM., 2006.
[112] L. Zhuang, F. Jing, and Y.X. Zhu. A joint model of text and aspect ratings forsentiment summarization. In proceedings of ACL., pages 308–316, 2008.
77