automatic classification and summarization

AUTOMATIC CLASSIFICATION AND SUMMARIZATION

OF SENTIMENT IN DOCUMENTS

By

Kiran Sarvabhotla

200402038

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THEREQUIREMENTS FOR THE DEGREE OF

Master of Science (by Research)in

Computer Science & Engineering

Search and Information Extraction Lab

Language Technologies Research Center

International Institute of Information Technology

Hyderabad, India

May 2010

Dedicated to my family and friends.

INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY

Hyderabad, India

CERTIFICATE

It is certified that the work contained in this thesis, titled “ Automatic Classifica-

tion and Summarization of Sentiment in Documents ” by Kiran Sarvabhotla

(200402038) submitted in partial fulfillment for the award of the degree of Master

of Science (by Research) in Computer Science & Engineering, has been carried out

under my supervision and it is not submitted elsewhere for a degree.

Date Advisor :

Dr. Vasudeva VarmaAssociate Professor

IIIT, Hyderabad

Acknowledgements

I would like to thank my advisor Dr.Vasudeva Varma for his continual guidance

during my masters degree. He gave me the freedom to explore on different topics

and helped me to zero down to one of the hot topics of research. His sincere efforts

and valuable comments are key factors for my work to get published in one of the

reputed journals in IR. Also, I would like to thank him for his support in writing my

masters thesis.

I would like to express my sincere gratitude to Dr. Prasad Pingali, who was the

first person I met in the IE Lab. His passion towards research motivated a happy go

lucky guy like me towards research. I would not have been what I am today, if I

had not met him. My special thanks to Mr. Babji, a guy who never says no, for his

help ranging from providing infrastructure, eatables and some fitness tips.

Special thanks to my friend P V Sai Krishna, who made me to come to IE lab

and motivated me in very troubled times. My special thanks to K.P Raja Sekhar,

my first project partner. I would like to thank Surya Ganesh, Kranthi Reddy, Rohit

Bharadwaj and Swathi who helped me during the early stages of my research.

I take this chance to thank all my batch mates for making my life in IIIT so

enjoyable and memorable. I always relish the fun we had in IIIT through out my

life.

I would like to thank my father, Lakshmi Narasimham, my mother, Ramalak-

shmi, and my sister Sravani for their love and support.

Last but not the least, I thank my dear friend Vijaya Kumari, for her support and

motivation by constantly reminding me of my capabilities.

Abstract

Today’s World Wide Web has become a major source of information for the peo-

ple. With the advent of customer reviews, blogs and growth of e-commerce in this

decade, user-generated-content has grown rapidly on the web. It has an inherent

property called sentiment; playing an important role in decision-making process of

the people. In order to provide better information access, analysing sentiments and

rating them in terms of satisfaction has become an essential characteristic on the

web.

Sentiment analysis or opinion mining is a web 2.0 problem that aims to deter-

mine the attitude of a speaker or writer towards a particular topic by classifying

the polarity in the text. Sentiment classification can be viewed as a special case

of topical classification applied to subjective portions (sources of sentiment) of the

document. Hence, the key task in sentiment classification is extracting subjectiv-

ity. In this thesis, we classify the overall sentiment of a document using supervised

learning approaches. We focus more on extracting subjective features, current ap-

proaches in extracting them and their limitations.

Existing approaches for extracting subjective features rely heavily on linguistic

resources like sentiment lexicon and some complex subjective patterns based on

Part-Of-Speech (POS) information, thus making the task more resource dependent.

Since, regional language content is growing on the web gradually and people are

interested to express their thoughts in their local language, extending these resource

based approaches for various languages is a tedious job. It requires a lot of human

effort to build sentiment lexicons and frame rules for detecting subjective patterns.

To make the task of subjective feature extraction more feasible, approaches that

reduce the use of linguistic resources are needed. In this thesis, we attempt to

address the problem of resource dependency in subjective feature extraction. We

assume that entire document does not contain subjective information and it has a

misleading text in the form of objective information. We propose a method called

RSUMM that filters objective content from a document. We explore the use of

classic information retrieval models for estimating subjectivity of each sentence in

RSUMM.

We follow a two step ”filtering” methodology to extract subjectivity. We esti-

mate subjectivity at sentence level and retain the most subjective sentences from

each document. In this way, we have an excerpt of a document that preserves sub-

jectivity to a level comparable or better than the total document for efficient sen-

timent classification. Then, we apply well known feature selection techniques on

the subjective extract for obtaining the final subjective feature set. We evaluate our

methodology on two supervised customer review datasets. We use standard evalua-

tion metrics in classification like accuracy and mean-absolute-error for evaluation.

Our results on those datasets prove the effectiveness of our proposed ”filtering”

methodology. Based on the results, we conclude that subjective feature extraction

is possible with minimal use of linguistic resources.

Although ratings convey the sentiment in a glance, the real essence of it is con-

tained in the text itself. The second part of this thesis explains our approach to sum-

marize sentiments of multiple users towards a topic. We produce an extract based

summary from multiple documents related to a topic preserving the sentiment in

it. We focus more on relating sentiment classification and sentiment summariza-

tion and show how classification helps in summarizing sentiments. We evaluate our

approach on a standard web blog dataset using standard evaluation metrics.

Publications

• Kiran Sarvabhotla, Prasad Pingali and Vasudeva Varma, “A Lexical SimilarityBased Approach for Extracting Subjectivity in Documents”, Published in theJournal of Information Retrieval, Special Issue on Web Mining for Search,Vol 14-(3), 2011.

• Kiran Sarvabhotla, Prasad Pingali and Vasudeva Varma, “Supervised Learn-ing Approaches for Rating Customer Reviews”, Published In the Journal ofIntelligent Systems, Vol 19-(1), 2010.

• Kiran Sarvabhotla, Kranthi Reddy .B and Vasudeva Varma, “ClassificationBased Approach for Summarizing Opinions in Blog Posts”,In the Proceed-ings of Indian International Conference on Artificial Intelligence (IICAI-09),Special Track on Web 2.0 and Natural Language Processing, Tumkur, De-cember, 2009.

• Vasudeva Varma, Prasad Pingali, Rahul Katragadda, Sai Krishna, Surya Ganesh,Kiran Sarvabhotla, Harish Garapati, Hareen Gopisetty, Vijay Bharath Reddy,Kranthi Reddy, Praveen Bysani and Rohit Bharadwaj, “IIIT Hyderabad atTAC 2008”,In the Working Notes of Text Analysis Conference (TAC) at thejoint meeting of the annual conferences of TAC and TREC, USA, November,2008.

Contents

Table of Contents ix

List of Tables xii

List of Figures xiii

1 Introduction 11.1 Introduction to Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Rating Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.3 A Generic Approach to Document Sentiment Analysis . . . . . . . 4

1.2 Extracting Subjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.1 Challenges in Extracting Subjectivity . . . . . . . . . . . . . . . . 71.2.2 Existing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Problem Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Overview of the Proposed Methodology . . . . . . . . . . . . . . . . . . . 111.4.1 RSUMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.2 Evaluation and Comparisons . . . . . . . . . . . . . . . . . . . . . 13

1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Related Work 152.1 Sentiment Classification at Different Levels . . . . . . . . . . . . . . . . . 16

2.1.1 Word or Phrase Sentiment Classification . . . . . . . . . . . . . . . 162.1.2 Document Sentiment Classification . . . . . . . . . . . . . . . . . 182.1.3 Sentiment Classification at Sentence Level . . . . . . . . . . . . . 20

2.2 Subjectivity Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.1 Min-cut based Subjectivity Classification . . . . . . . . . . . . . . 21

2.3 State-of-the-art Approaches and Benchmarks . . . . . . . . . . . . . . . . 22

ix

CONTENTS

3 Subjective Feature Extraction 243.1 Information Retrieval Models and SVM . . . . . . . . . . . . . . . . . . . 25

3.1.1 Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . 253.1.2 Unigram Language Model . . . . . . . . . . . . . . . . . . . . . . 263.1.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 RSUMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 Lexical Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.2 Probabilistic Estimate . . . . . . . . . . . . . . . . . . . . . . . . 313.2.3 Term Co-occurrence . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.2 Fisher Discriminant Ratio . . . . . . . . . . . . . . . . . . . . . . 343.3.3 Final Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Evaluation 374.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1.1 Datasets and Classifiers . . . . . . . . . . . . . . . . . . . . . . . . 374.1.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 394.1.3 Estimating the Parameter ’X’ . . . . . . . . . . . . . . . . . . . . . 40

4.2 Binary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3 Multi-variant Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Sentiment Summarization 555.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2 Classification Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2.1 Training the Classifier . . . . . . . . . . . . . . . . . . . . . . . . 595.2.2 Polarity Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2.3 Final Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 615.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6 Conclusion 636.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

x

CONTENTS

6.2.1 Products comparison . . . . . . . . . . . . . . . . . . . . . . . . . 666.2.2 Sentiment summarization . . . . . . . . . . . . . . . . . . . . . . 666.2.3 Opinion reason mining . . . . . . . . . . . . . . . . . . . . . . . . 676.2.4 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Bibliography 69

xi

List of Tables

4.1 Statistics of the dataset PDS2 . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 Results showing CV accuracies for baseline BL and top half TH and bottom

half BH on PDS1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Results showing CV accuracies for RSUMMLS,RSUMMLS+MI and RSUMMLS+FDR

on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4 Results showing CV accuracies for RSUMMPE,RSUMMPE+MI and RSUMMPE+FDR

on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.5 Results showing CV accuracies for RSUMMCO,RSUMMCO+MI and RSUMMCO+FDR

on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.6 State-of-the-art Accuracy Values on PDS1 . . . . . . . . . . . . . . . . . . 474.7 Table showing the results obtained by Stefano et al. on PDS2 for their

different feature representations with MV as the feature selection method . . 494.8 Table showing CV accuracies on PDS2 for different feature representations

using total review with LR as the classification method . . . . . . . . . . . 504.9 Table showing CV accuracies on PDS2 for different feature representations

using RSUMMCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.10 Table showing CV accuracies on PDS2 for different feature representations

using ADF metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.11 Table showing CV accuracies on PDS2 for different feature representations

using RSUMM CO with MI and FDR . . . . . . . . . . . . . . . . . . . . . 514.12 Table showing CV accuracies on PDS2 for different feature representations

using naive-bayes classifier and MI as the feature selection method . . . . . 52

5.1 Results showing average NR, NP and F-Measure values for 22 topics . . . 62

xii

List of Figures

1.1 General methodology adopted in document sentiment analysis . . . . . . . 41.2 A sample movie review . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Logit Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Plot showing the effect of ’X’ on accuracy with RSUMMLS . . . . . . . . 414.2 Plot showing the effect of ’X’ on accuracy with RSUMMPE . . . . . . . . 414.3 Plot showing the effect of ’X’ on accuracy with RSUMMCO . . . . . . . . 42

5.1 Sample TAC Queries and Targets . . . . . . . . . . . . . . . . . . . . . . 575.2 Architecture of our sentiment summarization system . . . . . . . . . . . . 58

xiii

Chapter 1

Introduction

Today’s World Wide Web has become a major source of information for the people. The

textual information in the web can be broadly categorized into two: facts and opinions.

Facts are objective expressions of the people about entities, events etc., whereas opinions

are subjective expressions that describe their feelings, sentiments or apprehensions on enti-

ties, events and others [60, 71]. Most of the research work in the areas of natural language

processing, text mining and information retrieval focused on facts particularly news stories

until the early part of this decade. One such application of research is text classification

of news content to politics, movies, sports etc.. This can be attributed to the abundance of

news content and scarcity of opinion content on the web at that time.

With the advent of customer reviews, blogs and growth of e-commerce in this decade,

user-generated-content has grown rapidly on the web. Reviews posted by the people on an

e-commerce website, views of people in blogs, discussion forums etc. can be collectively

called as user-generated-content. It has an inherent property called sentiment. Analysing

these sentiments have received much attention among research community and market an-

alysts in recent years because of its potential business applications such as improving cus-

tomer satisfaction on the product. Hence, sentiment analysis has become one of the hot

topics of research in this decade.

1

CHAPTER 1. INTRODUCTION

1.1 Introduction to Sentiment Analysis

1.1.1 Sentiment Analysis

Sentiment analysis is an area of computational study of opinions, sentiments or emo-

tions [71]. The word sentiment is defined as:

1. A personal belief or judgement that is not founded on proof or certainty.

2. A general thought, feeling or sense.

3. A cognitive perception or emotional feeling towards a particular brand or product.

(market perspective)

Sentiment analysis or opinion mining is thus a paradigm of natural language processing,

text mining and information retrieval that aims to determine the attitude of a speaker or

writer towards a particular topic. The basic task in sentiment analysis is to identify polarity

of the given text. The analysis is done at different text levels: word, phrase, aspect, sentence

and at document level, and a semantic orientation predicted.

As mentioned earlier, the textual information contains both facts (objective informa-

tion) and opinions (subjective information). Objective information does not convey the

sentiments of the people. Hence, for any sentiment analysis task, the key is to extract the

subjective information robustly and analyse it.

1.1.2 Rating Sentiments

Sentiment analysis is a web 2.0 problem that attracted attention with the growth of social

media. Social media (blogs, customer reviews, discussion forums etc.) is playing a promi-

nent role in decision-making process of the people. It has become customary for them to

know ”what others are saying” about a particular product or service they wish to buy or

avail. According to a survey performed on TripAdvisor1, among the users who use the web-1http://www.tripadvisor.com

2


site, 97% are influenced by other travellers opinions [38]. Popular websites like IMDb2,

Amazon3 are encouraging users to post reviews so that they can be useful to others.

Since, popular products or services are often commented by the people and also some

reviews have large content, it is very difficult for customers or manufacturers to go through

the entire content for arriving at a decision. To better facilitate them and to provide better

information access, several websites are encouraging people to quantify a particular prod-

uct or service in terms of their satisfaction. Hence, labeling a review with a rating has

become a crucial characteristic on the web. The labeling is generally done on the basis of

overall satisfaction. The ratings convey the summary of the text in a glance and they are of

immense help.

The rating functionality is provided on some popular websites. Most of the blogs, dis-

cussion forums and reviews do not have explicit ratings. Yet, they are valuable and useful

sources of information. But, there is every chance that a customer might skip them because

of their large content (the problem of information overload). Hence, systems that analyse

the sentiments of people in the given text and predict the polarity are gaining popularity.

The polarity orientation is also referred as semantic orientation, sentiment orientation or

opinion orientation.

In our work, we attempt to rate sentiments on two popular scales on the web; posi-

tive/negative or thumbs up/thumbs down 4 and on a five point scale 5 for the overall doc-

ument 6. There are also very few systems that rate sentiments at a very fine level (word,

phrase or aspect).

2http://www.imdb.com3http://www.amazon.com4A binary scale of polarity5A multi-variant scale of 1-5: starred(*) rating with 5 being the top and 1 being the worst.6In our case, a document refers to a review or a blog

3


Figure 1.1 General methodology adopted in document sentiment analysis

1.1.3 A Generic Approach to Document Sentiment Analysis

The general methodology adopted for document sentiment analysis is depicted in Fig-

ure 1.1.

The first phase is preprocessing that includes typical text processing methods like to-

kenization of the text, removing stopwords and stemming. The crucial part for sentiment

analysis system is identifying the subjective expressions in the text. This phase is generally

called as ”subjective feature extraction”. The efficiency of the system is highly dependent

on the robustness of this phase. Since, the key task in sentiment analysis is extracting

subjectivity, we focus more on this area in our work.

The third phase is analysing the extracted subjective expressions and predicting their

polarity. Most of the sentiment analysis systems either use linguistic resources especially

sentiment lexicons like Senti Wordnet7 and General Inquirer8 or unsupervised/supervised

7http://sentiwordnet.isti.cnr.it/8http://www.wjh.harvard.edu/ inquirer/

4


approaches to predict the semantic orientation.

Some of the popular sentiment analysis systems are:

• OPINE9

• Opinion Observer

• Red Opal

• Social Media Analytics (SAS)

• SinoBuzz

1.2 Extracting Subjectivity

As mentioned earlier, the critical step in any sentiment analysis task is extracting subjectiv-

ity. For example, consider the movie review in Figure 1.2 for the movie ”Iron Man 2”. The

overall sentiment of the author towards the movie is negative. Lets examine the features

that prompted the author to arrive at this decision.

Explicit subjective features:

Negative semantic orientation:

- disappointed, ”What the hell”, ”was n’t even epic”, Stupidity, boring, ”no sense”, worst,

blah.

Positive semantic orientation:

- liked, ”great special effects”, sophisticated, better, good.

Implicit subjective features:

- ”Basically all the action you will see in this movie is what you saw in the trailer”

(nothing new) - ”beat 20 guards? you bet, hack system made by a guy who hacked into

important military system in 10 sec”

9http://www.cs.washington.edu/research/knowitall/opine/

5


Figure 1.2 A sample movie review

6


There is more usage of negative features by the writer compared to the positive aspects

of the movie (both implicit and explicit). Hence, the overall sentiment of the author towards

it is negative. The features may not be mere unigrams, higher order n-grams (larger text

units) like ”great special effects” are also subjective.

1.2.1 Challenges in Extracting Subjectivity

There are some interesting observations from the sentiment bearing document in Figure 1.2.

1. A document is a mixture of subjective and objective information.

2. There is a subtle difference in expressing sentiment to a topic.

3. It is either structured or unstructured document.

4. It can be an information mixed with different orientations, but the overall sentiment

is biased towards one label.

If we observe the review above, there are sentences like ”There is an agent played by

Scarlett Johanson” that don’t describe the feelings of the author. They are facts related to

cast and plot of the movie. There can be sentences or paragraphs in a document that only

conveys the factual information. This information can be regarded as a potential noise or

misleading text, hence needs to be filtered.

There are more ways to express sentiment rather than the topic. Suppose, news stories

related to movies, politics, sports etc., can be categorized using keywords alone. In the

above review, there are sentences with no sentiment bearing word, yet implicitly convey

the sentiment of the author. To capture this subtle difference in expressing sentiment is a

challenging task.

There is no restriction on the part of the user to follow a certain pattern while expressing

his thoughts. Hence, a document can be treated as an unstructured data and mining such

data is difficult. However, there are some documents written by professionals that are

7


structured. For example, a movie reviewer for a popular site will follow a pattern like

explaining the plot first, discussing the aspects, conveying the overall sentiment of his/her

at the end.

Also, there can be pros or cons of a product mentioned in the same document. Hence,

it has contradictory information. But, the overall sentiment for the document is biased

towards one label and analysing such contradictory patterns is challenging.

1.2.2 Existing Approaches

Most of the existing approaches for extracting subjective features from documents rely

heavily on linguistic resources. A subjective feature can be a word or a phrase as shown

in Figure 1.2. Popular linguistic resources researchers use for subjective feature extraction

are:

• Sentiment lexicons

• POS tagger

Sentiment lexicons are dictionaries prepared by researchers for analysing the senti-

ments. They contain words and phrases that are subjective with their corresponding orien-

tations [28, 31, 27] as given by them. Senti WordNet and General Inquirer are examples

of such lexicons. Using Part-Of-Speech (POS) tagger, researchers framed rules based on

textual patterns. These patterns are considered to be subjective and the text units that follow

these patterns are extracted. The patterns vary from a simple noun phrase (NP), verb phrase

(VP) to very complex patterns. Brills tagger10 and NLP toolkit11 are some of the examples

of resources that are used to extract the POS information [5, 64, 72, 92].

Some of the subjective patterns are:

- NP, VP, JJ NN, RB JJ not NN, JJ JJ not NN, RB VB, NN JJ not NN etc. where NN is a

noun and RB stands for adverb, JJ stands for adjective and VB stands for verb.10http://www.cs.cmu.edu/afs/cs/project/airepository/ai/areas/nlp/parsing/taggers/brill/0.html11http://nltk.sourceforge.net/index.php/Main Page

8


The text units that match the above patterns are considered as subjective features. There

are also other techniques like clausal extraction and dependency parsing for extracting sub-

jective features [64]. Using clausal extraction tools, researchers extract clauses from the

text. They use POS tagger and frame patterns to discard the text units that don’t contribute

to subjectivity from the extracted clauses. More on these will be discussed in Chapter 2 in

detail.

1.3 Problem Description

From the above sections, it is clear that analysing sentiments ins documents is a challenging

task. There is a lot of misleading text in the form of objective information. There are

subtle variations in expressing sentiment by writers. The sentiment bearing document has

contradictory information and it is difficult to analyse these type of information. Also,

we looked at the existing approaches for extracting subjective features from the sentiment

bearing document. In this thesis, we propose methods to extract subjective features from the

document with the help of simple frequency based approaches using information retrieval

models.

1.3.1 Problem Scope

Current day research work in subjective feature extraction rely on linguistic resources like

lexicons and POS tagger. Lexicons are very generic and they can’t capture subtle variations

in expressing sentiment from context to context and from domain to domain. They contain

subjective features with binary orientation and using them for multi-variate analysis is not

possible. Using POS tagger and other tools, researchers frame complex rules to extract

subjective features. Hence, the task of subjective feature extraction has become more re-

source dependent. Regional language content is growing on the web gradually and people

are more interested to express their thoughts and feelings in their local language. Hence, to

extend the current day approaches in subjective feature extraction across several languages9


is a tedious job. It requires a lot of human effort to build such tools in each language. So,

to make the task of subjective feature extraction more feasible, we need approaches that

require minimum use of linguistic resources and yet achieve significant results.

1.3.2 Motivation

We identified two major problems with the existing approaches in subjective feature ex-

traction.

1. They rely heavily on linguistic resources, thus making the task resource dependent.

2. Usage of complex patterns to extract subjectivity.

This motivated us to investigate approaches that make the task of subjective feature

extraction more simple and generic. In our work, we rely more on corpus statistics rather

than complex textual patterns or linguistic resources to extract subjective features. Since,

sentiment analysis addresses the problem of predicting the polarity of a given text unit. It is

often referred as sentiment classification and we use this word often in the rest of the thesis

as a synonym to sentiment analysis.

1.3.3 Problem Statement

Sentiment classification address the problem of predicting the polarity of the text. It can be

viewed as a special case of topical classification applied to subjective portions. Hence, the

key task in sentiment classification is ”subjective feature extraction”. Existing approaches

for extracting subjectivity rely heavily on linguistic resources and complex rule based pat-

terns of subjectivity, thus making the task very much resource dependent and complex.

With regional language content growing on the web, scarcity of such resources should not

prevent people to conduct research on those languages. Hence, approaches that require

minimum usage of language resources yet perform to a level comparable to using them

10


are needed. In this way, we can solve the problem of resource dependency prevalent in

sentiment analysis.

1.4 Overview of the Proposed Methodology

We use supervised learning approaches to classify the overall polarity of the document.

We focus more on extracting subjective features from it and representing them as feature

vector for classification. We approach the problem of removing resource dependency in

subjective feature extraction by making two claims.

1. Entire review does not contain subjective information.

2. If we can successfully filter out objective information, then subjective feature extrac-

tion is achievable with minimum use of linguistic resources and no complex patterns.

We follow a ”filtering strategy” at sentence and word level to extract subjective features

from the document. We view each review a mixture of objective and subjective sentences

where the former has nothing to convey on the feelings of the author. If you observe the

sample review in Fig 1.2, the subjective features are less in number compared to the entire

content. In our analysis on the web manually, we found that most of the reviews have

the same pattern. Hence, we need to discard the potential noise in the form of objective

sentences before converting the document text units as feature vector for the classifier.

1.4.1 RSUMM

We propose a method called RSUMM to extract subjective sentences from a review. It

is based on Information Retrieval (IR) models like vector space model, language model

and term co-occurrence model. We use techniques similer to the above IR models to filter

out the objective information in a review. We call the excerpt of a review with objective

information filtered as subjective extract.

Our subjective feature extraction occurs in two steps:11


• RSUMM estimates subjectivity of each sentence and returns most subjective sen-

tences as subjective extract.

• Then we apply feature selection techniques on the subjective extract to have the final

feature set.

In this thesis, we propose three variants of RSUMM to obtain subjective extract. The

first method is based on lexical similarity between each sentence and two term vectors, we

call it as RSUMMLS. We define two metrics average document frequency (ADF), average

subjective measure (ASM) in RSUMMLS to score the lexical similarity. The first metric

intuitively extracts important terms from a given collection. ASM metric intuitively se-

lects most subjective terms from a given collection. We use both metrics to estimate the

subjectivity of a sentence. We retain sentences that are more subjective in the subjective

extract.

The second method is based on probabilistic estimates rather than raw term similarity.

In this method, we estimate subjectivity of a sentence based on how its terms are observed

from a subjective model. We call this method as RSUMMPE. The basis for this method is

unigram language modeling used in information retrieval. In the third method, we use the

meta information like title, pros, cons and aspects of a product available with the review.

We frame target words with the available meta information and use term co-occurrence

model to estimate subjectivity. We call this method as RSUMMCO. It assumes that authors

model subjective expressions around entities like aspects, pros and cons of a product.

We retain the top X% of sentences as the subjective extract in each method. We es-

timate the best ’X’ value for each method in such a way it preserves sentiment in the

subjective extract at a level comparable or better to the full review.

After obtaining the subjective extract, we need to convert it as a feature vector. We use

n-gram models to present it as a feature vector for the classifier. As n-gram modeling is

done on sentences and they are relatively larger text units, there can be large number of

irrelevant features. Hence, for faster learning and better classification a feature selection

12


phase is needed in our case. We employ two state-of-the-art feature selection methods

Mutual Information (MI) and Fisher Discriminant Ratio (FDR). We use support vector

machines (SVM) as the classifier in our work. We view the problem of predicting the

sentiment on a binary scale as a problem of support vector classification (SVC) and rating

it on a multi-variate scale as a problem of logistic regression (LR).

1.4.2 Evaluation and Comparisons

We conduct experiments on customer reviews, one of the major entities of social media.

The customer review datasets are from movie and hotel review domains. They are popular

domains among research community in sentiment classification. We evaluate our system

confirming to standard classification evaluation metrics like accuracy and mean-absolute-

error. We support our claims made above, based on the results on the datasets. Through

out our subjective extraction phase, we depend on corpus statistics. We don’t use any

complex patterns or rule based approaches in this phase, rather use simple frequency based

metrics. Hence, we are making the task of subjective feature extraction simple and resource

independent that can be extended easily.

There were approaches that minimize or do not use any linguistic resource in the liter-

ature. Pang et al. in [72] used unigrams, bigrams, POS information, sentence position as

features. They viewed sentiment classification as a special case of topical classification and

used standard machine learning techniques like naive-bayes and support vector machines.

Pang et al. in [69] extended their work in [72] by focusing on sentence level subjectivity

detection. They filtered out the objective information using min-cut based classification

method to obtain a subjective extract. They proved that sentence level subjectivity detec-

tion indeed helps document level sentiment classification using unigrams as features on the

subjective extract. Cui et al. in [16] used n-gram model to represent each review. They did

n’t use any resource but focused on n-gram model to represent each review and compared

the performance of different classifiers.

13


Our RSUMM is inspired from the work by Pang et al. in [69], but the methodology

adopted to filter out objective information is totally different from our’s. Also, in addition

to filtering out the objective information, we apply feature selection techniques to obtain the

final subjective feature set. We also rate the sentiments of customer on a multi-variant scale

which is fairly a new application in sentiment analysis. Most of the work in the literature

focused on predicting the binary orientation of sentiment.

1.5 Thesis Organization

The rest of the thesis is organized as follows:

In chapter 2, we discuss the related work in sentiment classification at different levels

and different techniques used to classify sentiment. We also describe how the research in

this area has evolved in this decade.

Chapter 3 describes our methodology of extracting subjective features using informa-

tion retrieval approaches. We first present an overview of information retrieval models and

supervised learning methods we use. Then, we describe RSUMM to have a subjective

extract. Also, we discuss the feature selection methods we employ on subjective extract.

In chapter 4, we describe the experiments conducted on movie and hotel review datasets

to validate our methodology. We report the results using our methodology and compare

them with the existing state-of-the-art approaches. We discuss the results and present our

observations in detail in this chapter.

We describe our approach on multi-document sentiment summarization in chapter 5.

What makes sentiment summarization different from automatic text summarization ?. We

focus more on this aspect and explain how we can use sentiment classification to summarize

sentiments. Finally, we conclude the thesis by outlining our contributions, providing some

insights on how to extend this work in future.

14

Chapter 2

Related Work

In this chapter, we discuss the literature related to sentiment classification, subjectivity

extraction, unsupervised and supervised approaches for classifying sentiment. We discuss

sentiment classification at different levels of text units on both binary and multi-variant

scales. We also discuss the existing literature on subjectivity detection at the sentence

level.

Classification is an age old problem and several classifiers were suggested in the last

few decades. Among them naive-bayes, support vector machines, decision tress, rule-

based classifiers are important and widely used in several applications. A good review

on classification methods can be found in [39, 58]. Until the early part of this decade,

most of the classification tasks focused on classifying news stories. With the advent of

customer reviews and the growth of e-commerce in the early part of this decade, sentiment

classification has become an emerging and hot area of research for its potential business

applications and market intelligence. As discussed in Chapter 1, analysing sentiments and

predicting their orientation poses very challenging research issues.

15

CHAPTER 2. RELATED WORK

2.1 Sentiment Classification at Different Levels

Sentiment classification dates back to late 1990s [2, 51, 83], but in early part of this decade,

it has become an important discipline in the areas of natural language processing, text

mining and information retrieval [14, 21, 22, 25, 33, 35, 37, 42, 45, 54, 62, 76, 79, 84,

91, 96, 98, 107]. Until the early 2000s, two main approaches to sentiment classification

were based on machine learning and semantic analysis techniques. Later, shallow natural

language processing techniques were used in sentiment classification, especially in overall

document sentiment classification.

2.1.1 Word or Phrase Sentiment Classification

In the early stages of research, word sentiment classification was considered to be the basis

for phrase and document sentiment classification. Manual or semi-manual construction

of lexicons with words and their semantic orientations were developed [40, 41, 59, 73].

The words in them were mostly adjectives or adverbs that have semantic orientation [1,

28, 34, 88, 94] and the orientation was defined by researchers. The approaches to classify

sentiment at word level could be grouped into two: 1) Corpus based approaches and 2)

Dictionary based approaches.

The first group included methods that depend on syntactic and co-occurrence patterns

of words in large texts to determine their sentiment [40, 93, 109]. The second group used

WordNet1 information, especially synsets and hierarchies, to acquire sentiment bearing

words or to measure the similarity between candidate words and sentiment-bearing words

like ”good” or ”bad” [44, 52, 49].

Analysis by Conjunctions

In this method semantic orientation of adjectives was predicted using conjunctive words

like and, or, but, either-or, neither-nor. The intuition was that the act of conjoining ad-

1http://wordnet.princeton.edu/

16


jectives is based on linguistic constraints (’and’ always conjuncts two adjectives with same

orientation whereas ’but’ contradicts them) [40]. The steps followed to predict the semantic

orientation of adjectives using conjunctive analysis are:

1. Extract adjective pairs along with their conjunctive words.

2. Train a log-linear regression classifier then classify pairs of adjectives as having same

orientation or opposite orientation.

3. Then, a clustering algorithm to partition the set into positive and negative orientation

terms.

Analysis by Lexical Relations

This method used semantic association to determine the orientation. If followed the intu-

ition that two words or phrases tend to have same semantic orientation if they have strong

association [92, 94, 49]. To determine the degree of semantic association, researchers have

used WordNet or web search. The entire process occurred in the following manner:

1. Construct relations using WordNet especially synsets.

2. Define the distance between two text units.

3. Calculate the semantic orientation by calculating its relative distance from two seed

words that have semantic orientation like ’good’ and ’bad’ or ’excellent’ and ’poor’.

4. The semantic orientation of word or phrase is positive if the relative distance is

greater than zero, negative otherwise.

Analysis by Glosses

This method followed an assumption that if one term was semantically oriented in one

direction, then the terms in its gloss tend to have same semantic orientation [28, 27, 26].

The process occurred in the following steps:17


1. A seedset representing two categories, positive and negative is provided as input.

2. Expand the seedset to accommodate new terms by using lexical relations.

3. For each term t in the expanded set, it is collated by all the glosses of t and textual

representation is converted to a vector representation for classification.

4. Train a binary classifier on the term representation in the expanded seedset and then

applied to the terms in the test set.

Analysis by General Inquirer

General Inquirer (GI) is a system that contains a list of terms with their different senses. For

each sense of the term, it provides a short definition as well as other information. Terms

are tagged as positive and negative. In addition, GI dictionary also contains negations,

intensifiers and diminishers like ’not’, ’fantastic’,’barely’ etc.. The occurrence probability

of the term for each sense is also provided. Hence, it is widely used by researchers in

subjective feature extraction [5, 50].

2.1.2 Document Sentiment Classification

Supervised machine learning approaches are popular among researchers in predicting the

overall sentiment of the document [3, 7, 50, 64, 67, 72, 69, 100, 81]. Most of them focused

on labeling a new sample as ”positive” or ”negative” based on previously seen samples an-

notated by humans. Grading a review on a multi-variant scale is fairly a new application in

this area. The entire process is typically composed of two steps: 1) Extracting the subjec-

tive features from the training data and converting them as feature vectors. 2) Training the

classifier on the feature vectors and apply the classification on a new sample. Preprocessing

the raw documents before extracting the subjective features is also done. The preprocessing

stage includes removing HTML tags, tokenization of documents.

18


Subjective Feature Extraction

To extract subjective features, researchers used lexicons like Senti WordNet and General

Inquirer [27, 5]. Most of these resources contain words and phrases (rare in number).

In sentiment classification, larger text units also play an important role in predicting the

semantic orientation as shown in Fig. 1.2. Hence, researchers framed rules using POS

information to extract larger text units than simple unigrams that were considered to be

subjective [72, 5, 92, 27, 28].

Researchers used lexical filtering techniques based on hypernymy in WordNet [11, 17,

20, 23, 24, 31, 36, 48, 77, 85] and patterns based on POS tagger [10, 63, 78, 82, 104,

105, 106]. WordNet filter attempted to substitute synonyms by a set of likely synonyms

and hypernymy generalization, because it is unlikely to encounter repetitions of identical

words in the text. POS filters were used to extract the patterns that don’t contribute to

subjectivity as in [64]. These patterns are considered to be noise and POS filters would

remove them before converting text units to feature vectors for classification.

Appraising adjective method [99, 100] focused on extraction and analysis of appraisal

adjective groups optionally modified by enhancers or diminishers. Coherent groups that ex-

pressed together a particular attitude are extracted. Examples of appraisal adjective groups

are: ”extremely boring”, ”not really very good” etc.

The steps followed in this method were:

1. Build a lexicon using semi-automatic techniques, gathering and classifying adjectives

and modifiers to categories in several taxonomies of appraisal attributes.

2. Extract adjectival appraisal groups from texts and compute their attribute values ac-

cording to this lexicon.

3. Represent documents as vectors of relative frequency features using these groups.

4. Train a support vector machine algorithm discriminating positively from negatively

oriented test documents.

19


Training the Classifier

Researchers viewed document sentiment classification as a special case of topical classifi-

cation and conducted experiments with machine learning algorithms like naive-bayes, knn,

support vector machines. Pang et al. [72] used machine learning techniques to classify the

overall sentiment in movie reviews. Their best accuracy value (82.9%) was reported using

unigrams as features and SVM as the classifier. Among all the classifiers, SVM and naive-

bayes were widely used to predict the sentiment orientation. The features used are n-grams,

lexical information, POS information, sentence position, adjectives, appraisal adjectives.

2.1.3 Sentiment Classification at Sentence Level

Since, researchers thought it was too coarse to compute the sentiment at document level,

they investigated approaches to determine the focus of each sentence. They computed the

semantic orientation at sentence level [19, 29, 53]. They extracted opinion bearing terms,

opinion holders and opinion-product aspect association in each sentence and then analysed

the semantic orientation. There was also an area of research called aspect based sentiment

classification where they extract aspects of a product and rate sentiments of people on

its each aspect. Thet et al. in [90] conducted experiments on aspect based classification of

movie reviews. They used information extraction techniques like pronoun resolution, entity

extraction and con-referencing to segment each sentence. They predicted the sentiment of

users towards cast (producers, directors) and also overall sentiment.

2.2 Subjectivity Classification

Subjectivity detection is the task to investigate whether a text unit presents the opinion of

the author or convey facts. The text unit is typically a sentence or a paragraph. Researchers

proved that subjectivity detection at sentence level has a very tight relation with document

sentiment classification [69, 102, 103, 101, 109]. Subjectivity detection helped the senti-

20


ment classifier from considering the irrelevant or potentially misleading text. Pang and Lee

in [69] compressed the reviews into much shorter extracts, optimizing the sentiment con-

tent in them to a level comparable to full review. Naive-bayes and min-cut classification are

two popular classifiers used in subjectivity detection. We discuss briefly on min-cut based

classification, the state-of-the-art approach in subjectivity detection [69, 4].

2.2.1 Min-cut based Subjectivity Classification

Cut-based classification method assumes that text units that occur near each other (within

discourse boundaries) share the same subjectivity status [69]. In [69], they used pair-wise

interaction information and their algorithm used an efficient and intuitive graph-based for-

mula relying on finding minimum cuts.

Suppose there are n items x1, x2, x3, . . . , xn to be divided into two classes C1 and C2.

Then, there are two types of penalties individual and association for xi and xj to be in the

same class.

Individual scores, indj(xi): it gives non-negative estimates of each xi’s preference in

class Cj using xi alone.

Association scores, assoc(xi, xj): it gives non-negatives of xi and xj preference to be in

the same class.

The algorithm tries to find solution to the following optimization problem; assigning xi to

C1 and C2 minimizing the partition cost:∑x∈C1

ind2(x) +∑x∈C2

ind1(x) +∑

xi∈C1,xk∈C2

assoc(xi, xk) (2.1)

The situation is represented in an undirected graph (G) with vertices {v1, v2, . . . , vn, s, t};

the last two are respectively source and sink. Add n-edges (s, vi) with weight ind1(xi) and

n-edges (vi, t) with weight ind2(xi). Finally add edges (vi, vk) with weight assoc(xi, xk).

A cut (S, T ) of G is a partition of its nodes into sets S = {s}∪S ′ and T = {t}∪T ′, where

s /∈ S ′,t /∈ T ′. Its cost cost(S, T ) is the sum of weights of all edges crossing from S to T .

A minimum cut of G corresponds to minimum cost.21


In [69], each vertex was a sentence. The individual penalties were obtained using naive-

bayes classifier for each sentence per each class. Association penalties were obtained using

some proximity relations based on sentence position.

2.3 State-of-the-art Approaches and Benchmarks

We have seen in the above sections, the sentiment classification at different levels, sub-

jectivity detection and resources used in subjective feature extraction. For word or phrase

level sentiment classification, researchers used seed list of words and lexical resources

like WordNet. Document sentiment classification was done using supervised learning ap-

proaches and it is practised now also. Most of the document sentiment classification ap-

proaches predicted the sentiment on a binary scale, whereas the multi-variate classification

is fairly a new application. The document sentiment classification is highly domain spe-

cific [15, 68, 57, 4].

Since, our focus is on extracting subjective features and presenting them as feature

vectors to the classifier, we don’t discuss the work related to domain transfer problem in

sentiment classification. Popular domains among researchers in document sentiment clas-

sification are: 1) Movie review domain and 2) Hotel review domain. Movie review domain

is highly popular among research community in sentiment classification and many focused

on predicting the polarity of movie reviews [72, 69, 64, 67]. It is due to the popularity of

movies, abundant information about movies on the web and also the challenging nature of

reviews [92]. Movie reviews have a mixture of objective and subjective information and to

mine subjective features from them is a challenging task. Hotel review domain is popular

because of the traveling people do, they tend to enquire about various hotels on the web.

We conduct experiments on reviews in both domains. Among the classifiers, SVM is used

by researchers because of its better performance compared to others.

Pang et al. in [72] used supervised machine learning techniques like naive-bayes, sup-

port vector machines with unigrams, bigrams, POS information, sentence position as fea-

22


tures. They concluded that machine learning approaches had outperformed human pro-

duced baselines. SVM performed better compared to naive-bayes with unigrams as fea-

tures. Mullen and Collier [67] used diverse information scores that assigns value to each

word or phrase using WordnNet, topic proximity and syntactic relations. They also used

SVM classifier on the same movie review dataset and reported an accuracy of 86%.

Matsumoto et al. [64] observed that word order and syntactic relations play an impor-

tant role in sentiment classification. They proposed a method based on word sub-sequence

mining and dependency parsing to extract the word order and relation. To generate word

sub-sequences, they used clausal extraction tools and used n-gram model with n being 6.

They had set a support threshold of 2 for unigrams, bigrams and 10 for others to be con-

sidered as potential features for classification. They used dependency parsing techniques

to extract larger text units and combined it with POS tagger to remove the non-subjective

items. They reported an accuracy of 88.3% on the same movie review dataset using SVM

classifier. All these methods predicted the orientation of polarity at document level on a

binary scale. (positive/negative)

Quantifying the sentiment with a satisfaction score is a fairly recent application in sen-

timent classification. Pang and Lee [70] conducted experiments on scoring movie reviews

on an ordinal scale of four values. Their focus was on using different learning algorithms;

simple multi-label classification, SVR and a meta-algorithm. In [110], a new task was pro-

posed, the prediction of utility of product reviews than scoring a review. They used linear

regression techniques to compute the utility.

Stefano et al. [5] viewed the problem of grading reviews on a scale of one to five as

a problem of ordinal regression. They focused more on deriving subjective features and

selecting them. They used POS tagger, GI lexicon to extract subjective patterns. They used

feature pruning techniques based on minimum variance (MV) and a variant of MV called

round-robin minimum variance (RRMV) since their dataset was highly skewed. This was

one among the important observations they presented in their work. Among all, support

vector regression (SVR) techniques are popular among researchers to grade reviews.

23

Chapter 3

Subjective Feature Extraction

In this chapter, we describe our approach to mine subjective features from a review. We

view each review r as a combination of subjective and objective sentences (r = (Ssubj ∪

Sobj)). We propose a method called RSUMM to score each sentence for its subjective

nature and extract the set Ssubj from r. Our subjective feature extraction approach follows

two steps:

1. We score each sentence and obtain an extract of a review that preserves subjectivity at

a level comparable or even better to the total review. We call this method as RSUMM.

2. Then, we apply feature selection techniques on the extract of a review to obtain the

final subjective feature set.

Our RSUMM is based on information retrieval techniques like vector space model,

language model and word co-occurrence model widely used in document retrieval and

other applications. In this thesis, we propose three variants of RSUMM for estimating

subjectivity of each sentence.

1. Lexical Similarity (RSUMMLS).

2. Probabilistic Estimates (RSUMMPE).

3. Co-occurrence statistics (RSUMMCO).24

CHAPTER 3. SUBJECTIVE FEATURE EXTRACTION

After extracting subjective features using the above methodology, we use supervised

learning approaches to predict the overall sentiment of a document. We use n-gram models

to represent the extracted subjective sentences as feature vectors for classification. We use

SVM classifier and view the problem of predicting the sentiment on a binary scale as a

problem of support vector classification (SVC). We use logistic regression (LR) to grade

reviews on a multi-variant scale.

3.1 Information Retrieval Models and SVM

In this section, we describe briefly information retrieval models, SVC and LR.

3.1.1 Vector Space Model

Vector space model assigns weights to index terms [6]. It is widely used in information

retrieval to determine the relevance of a document for a given query. Both documents

and the query are represented as weighted vectors of terms and these weights are used to

compute the degree of similarity between the query and a document. Higher the similarity

degree, more relevant is the document to the query.

Formal Definition: Both query q and document d are represented as a weighted vector

of terms. The query vector is defined as q := (w1,q, w2,q, . . . , wt,q) and the document vector

d := (w1,d, w2,d, . . . , wt,d) where t is the total number of index terms.

Then, the degree of similarity between the document d and the query q is the correlation

between the two vectors. The correlation is quantified by a variety of similarity measures,

for instance by the cosine of the angle between the two vectors. The weighting measure

used typically in vector space model is tfidf.

TFIDF (t, C) = tf(t, d) X Math.log

(N

n

)(3.1)

where tf(t, d) denote the frequency of the term in the given document d, N denotes the total

number of documents in collection C, and n denoted the number of documents containing25


term t in C.

cosθ =~d.~q

‖~d‖‖~q‖

=

∑ti=1 wi,d X wi,q√∑t

i=1 w2i,d X

√∑ti=1 w

2i,q

(3.2)

3.1.2 Unigram Language Model

A statistical language model is a probabilistic model for generating text. It was proposed in

late 1990’s in information retrieval. It estimates the probability of generating a query from

the document model. The basic assumption of this model is that users have a reasonable

idea of terms they want in documents; it directly exploits this idea. Language modeling

consider documents as models and queries as string of texts randomly sampled from these

models [6]. It ranks documents according to the probability that a query Q would be

observed during repeated random sampling from the document model MD: P(

QMD

).

Unigram language model is the simplest form of language model that discards the con-

ditioning of the text, and estimates the probability of each term independently. It is often

used in information retrieval compared to other types of language models because of its

simplicity.

P

(Q

Md

)=

∏t∈Q

P

(t

Md

)=

∏t∈Q

tf(t,d)dld

(3.3)

where Md denotes the language model of a document d, tf(t,d) denotes the frequency of

term t in document d and dld denotes the total number of tokens in document d.

Language modeling approach suffers due to sparseness in the data. We may not wish to

assign zero probability for one or more query term t that is missing in document d. There

are many smoothing techniques available to address the problem of data sparseness. This

model is fairly a recent model used in information retrieval, machine translation, speech

recognition etc.. It shares some similar characteristics with vector space model, both use26


term frequencies to estimate the importance of the term, terms are often treated as indepen-

dent. Language modeling is based on probability estimates rather than similarity.

3.1.3 Support Vector Machines

Support vector machines (SVM) is a useful technique for text classification. A classifica-

tion task usually involves training the classifier with some data instances whose label is

known, and predicting the label on some unknown data instances. As all supervised learn-

ing approaches do, SVM also involves training and testing phase [13, 95]. Each sample in

the training set has one target value (label or class) and several attributes (features). Given

a set of instance-label pairs (feature vectors) (xi, yi), i = 1, 2, 3, . . . , l where xi ∈ Rn and

y ∈ {−1,+1}l, SVM requires the solution of the following optimization problem.

minw,b,ξ1

2wTw + C

l∑i=1

ξi (3.4)

subject to yi(wT φ(xi) + b) ≥ 1− ξi and ξi ≥ 0

In general, the training vectors are mapped to a high dimensional space by the function φ.

Then, SVM finds a separating hyperplane with maximum margin in this higher dimensional

space. C > 0 is the penalty parameter. There are several kernel functions used in SVM

like linear, polynomial, radial basis function and sigmoid kernels.

In this thesis, we do not go into the details of supervised learning approaches like what

is the best kernel function and what are the best parameters for learning. We focus on deriv-

ing subjective features and representing them as feature vectors for the learning algorithm

with existing kernel functions and default parameters.

Logistic Regression

In this work, we use logistic regression (LR) to rate sentiments on a ordinal scale of one to

five along side simple binary classification. This problem is called as ordinal regression in

machine learning [5]. It lies in between simple binary classification and metric regression.

27

CHAPTER 4. EVALUATION

Table 4.6 State-of-the-art Accuracy Values on PDS1

Author and Literature Classifier Accuracy

Pang and Lee SVM 87.2

Garg et al. SVM 90.2

Matsumoto et al. SVM 93.7

Aue and Gamon SVM 90.5

Kennedy and Inkpen SVM 86.2

RSUMMLS. It could be due to the fact that SDS was also from movie domain and using

lexical similarity indeed helped it.

Among all the three variants, RSUMMCO did n’t fare well both in isolation as well as

in combination with MI and FDR for each feature representation of reviews. The accu-

racy value of it in isolation was 3.4% below RSUMMLS and 2% below RSUMMPE. In

this method also, applying MI and FDR increased the accuracy but not to the extent of

RSUMMLS and RSUMMLS. It could be due to the inadequacy of meta information incor-

porated by us to score subjectivity. Mining such information from a review may help in

better results with this method. But as we mentioned earlier, mining the meta information

is beyond the scope of this thesis.

RSUMMLS in combination with FDR as the feature selection method and bigrams as

features reported the maximum accuracy on PDS1 in our methodology. Using unigrams

as features, RSUMMPE performed better compared to other variants with FDR as feature

selection method (89.2%). The best accuracy value using both unigrams and bigrams as

features was reported by RSUMMPE in combination with FDR. Table 4.6, shows the accu-

racy values reported by researchers till date on PDS1.

To the best of our knowledge, the highest accuracy value reported on PDS1 was 93.7%

by Matsumoto et al. [64]. We reported a maximum accuracy of 94.9%, an increase of

1.2% using a combination of lexical similarity RSUMMLS and FDR as the feature selection

47


method. Their methodology included extracting clauses, generating word sub-sequences

using the extracted clauses. Then, they used POS tagger to prune the sub-sequences based

on the patterns that do not contribute to subjectivity and with a minimum support threshold

of two for unigrams, bigrams and ten for higher order sub-sequences. In addition to sub-

sequences, they used dependency parsing techniques to extract phrases and used them as

features for classification.

Our methodology was rather simple compared to them. First we decomposed a review

into subjective extract and then applied feature selection methods on n-gram models as fea-

ture vectors for classification. Through out our approach, we depended on frequency based

approaches and corpus statistics. We reported an accuracy comparable to that reported by

Matsumoto et al. [64] using our methodology. It proved our other claim that sentiment clas-

sification could be done by using simple approaches than complex patterns and linguistic

resources.

Pang et al. in [69] reported an accuracy of 87.2% on PDS1 using unigrams as features

and SVM as the classifier. Their assumption was that sentence level subjectivity detection

improves the document sentiment classification. They proved it with the help of their results

(82.6% to 87.2%). But their approach was more inclined towards extracting subjectivity

using contextual information. They assumed that sentences in proximity share the same

subjectivity status. Their subjectivity estimation was based on individual probabilities of

each sentence from the naive-bayes classifier trained on SDS. In addition to that, contextual

information that scores proximity between sentences also used. Hence, our approach was

clearly different from them. Our maximum accuracy was 94.9% using bigrams as features

and it was an increase of about 7% from the accuracy reported by them.

4.3 Multi-variant Classification

The baseline (BL) for our multi-variant classification system is using unigram representa-

tion on full review with no RSUMM and feature selection methods. Stefano et al. in [5]

48


Table 4.7 Table showing the results obtained by Stefano et al. on PDS2 for theirdifferent feature representations with MV as the feature selection method

Features MAEµ MAEM

BOW 0.682 1.141

BOW+Expr 0.456 0.830

BOW+Expr+sGI 0.448 1.165

BOW+Expr+sGI+eGI 0.437 0.942

emphasized the fact that it may not be raw unigrams, some times larger text units play a

major role in determining the orientation of sentiment. From the above experiments, it was

evident that using larger text units like bigrams (BI) as features for classification made the

system perform better. Hence, we stick to the assumption that larger text units enhance

the performance of classification as stated in [5]. They used GI lexicon and POS tagger to

extract larger text units, but we use RSUMM with MI and FDR as an alternative to using

linguistic resources.

They extracted text units that contribute to subjectivity using rule based approach with

the help of POS tagger. Some of the patterns include: Art JJ NN, NN VB JJ etc. They called

the text units that follow these patterns as expressions (Expr). Aggregating patterns was

done using GI lexicon. For example, text units like ”great location” and ”good location” are

aggregated as [positive] location. They called this way of aggregating text units as simple

GI expression (sGI). Then, there was a more complex way of aggregating text units called

Enriched GI expression (eGI). The above text units are aggregated as [Strong] [Positive]

location and [Virtue] [Positive] location respectively.

They used minimum variance (MV) as the feature selection method to select important

features. The results obtained by [5] were reported in Table. 4.7, lower values indicate more

accurate prediction with bold values being the best. The baseline for their system was using

bag-of-words (BOW) as the feature vector representation of the review to ε-SVR method.

49


Table 4.8 Table showing CV accuracies on PDS2 for different feature representa-tions using total review with LR as the classification method

Features MAEµ MAEM

BL 0.580 0.807

BL+BI 0.540 0.897

BL+BI+TRI 0.528 0.969

Table 4.9 Table showing CV accuracies on PDS2 for different feature representa-tions using RSUMMCO

Features MAEµ MAEM

BL+RSUMMCO 0.598 0.898

BL+RSUMMCO+BI 0.495 0.921

BL+RSUMMCO+BI+TRI 0.473 0.992

They divided PDS2 into 75% and 25% randomly for training and testing. There was no

cross validation test done, hence the values reported were not statistically significant.

4.3.1 Results

We limited our selves to using upto trigrams (TRI) in multi-variate classification. Table 4.8

shows our results for various feature representations on PDS2 using full review and logistic

regression as the classification method.

We did n’t apply RSUMMLS and RSUMMPE methods for scoring subjectivity, as SDS

contained subjective and objective sentences from movie review domain. Sentiment anal-

ysis is highly domain dependent and features from one domain would not work in other

domains. It was discussed already in Chapter. 2. Due to the meta information available

along with reviews in PDS2, we used RSUMMCO to score subjectivity of each sentence in

50


Table 4.10 Table showing CV accuracies on PDS2 for different feature represen-tations using ADF metric

Features MAEµ MAEM

BL+ADF 0.585 0.758

BL+BI+ADF 0.531 0.776

BL+BI+TRI+ADF 0.532 0.705

Table 4.11 Table showing CV accuracies on PDS2 for different feature represen-tations using RSUMM CO with MI and FDR

Features MI FDR

MAEµ MAEM MAEµ MAEM

BL+RSUMMCO 0.569 0.827 0.560 0.847

BL+RSUMMCO+BI 0.431 0.781 0.0.435 0.822

BL+RSUMMCO+BI+TRI 0.444 0.842 0.477 0.87

PDS2. We set ’X’ as 80% in our case to obtain subjective extract. Then we applied MI and

FDR on the subjective extract.

We used ADF metric as a conditional criteria. We associated two or more words if

they have document frequency greater than ADF of the collection PDS2 (ADFPDS2). We

applied ADF metric on unigrams as a feature selection method. For example consider text

units like, ”had a great time”, ”decent location” and ”hotel was very nice”. We extracted

features like [great time], [decent location], ”[hotel very nice]” provided each unigram has

document frequency greater than ADFPDS2. Table. 4.10 showed the effect of applying

ADF metric on unigrams and as a conditional criteria for bigrams and trigrams. We re-

ported the accuracy values of RSUMMCO in Table. 4.9. Results after applying MI and

FDR on the extract of RSUMMCO were reported in Table. 4.11

51


Table 4.12 Table showing CV accuracies on PDS2 for different feature represen-tations using naive-bayes classifier and MI as the feature selection method

Features MAEµ MAEM

UNI 0.496 0.524

UNI+BI 0.439 0.503

UNI+BI+TRI 0.44 0.444

In addition to LR, we also used naive-bayes classifier with different feature representa-

tions and MI as the feature selection method. Bigram and trigram features were obtained

using ADF conditional criteria as described above. Results of this experiment were re-

ported in Table. 4.12.

4.3.2 Discussion

We used unigrams, bigrams, trigrams and combination of them as features for rating re-

views on a scale of one to five. We obtained a highest MAEµ value of 0.431 very much

comparable to what Stefano et al. [5] have obtained (0.437) using their subjective feature

extraction method based on linguistic resources. There was a relative improvement of 1.5%

in MAEµ using our approach. The highest MAEµ value was reported using RSUMMCO

for obtaining subjective extract in combination with MI as the feature selection method.

The baseline MAEµ value using unigrams as features on total review was 0.580. There

was a relative improvement of 25.7% from BL with unigrams as features which is sig-

nificant. But, unigrams in combination with MI, FDR, ADF as feature selection method

performed slightly below the baseline. It could be due to aggressive thresholds of 10%

that we used to select the final feature set. It also supported our assumption some times

larger text units like bigrams and trigrams enhance the performance of classification. In

each case, bigrams in combination with unigrams, trigrams in combination with unigrams

and bigrams performed better to BL in MAEµ evaluation perspective. Since, the dataset52


was fairly large compared to PDS1, we went to the extent of trigrams.

The highest accuracy value obtained by Stefano et al. [5] using MAEM metric was

0.830. We have not obtained significant results in this regard using RSUMMCO with MI and

FDR as feature selection methods. Also combining bigrams and trigrams with unigrams

declined the performance of classification. It strongly conveyed that usage of higher order

n-grams was dependent on the size of the dataset. As PDS2 was highly skewed towards

labels four and five, using our filtering methodology based on co-occurrence did n’t classify

the samples with labels one, two, three accurately. But using MAEµ we were to able to

produce good results, it conveyed that labels which were dense classified better using our

methodology.

We obtained better MAEM values using ADF as the feature selection method for un-

igrams, and as a conditional criteria for obtaining bigrams and trigrams. The relative im-

provement of about 14.4% from the BL was obtained using a combination of unigrams,

bigrams and trigrams as features. Naive-bayes classifier in combination with MI as the

feature selection method performed better in classifying the labels one, two and three. It

obtained higher MAEM values compared to LR method. The best value of 0.444 was re-

ported using it. It conveyed naive-bayes which was popular among topical classification

can still be applied to multi-variate sentiment classification.

4.4 Conclusion

In this chapter, we explained how we evaluated our system. We clearly explained the statis-

tics of the datasets, cross validation tests, evaluation metrics. We used standard evaluation

metrics like accuracy, mean-absolute-error for evaluation. We implemented RSUMMLS,

RSUMMPE and RSUMMCO on PDS1 and only RSUMMCO on PDS2 because of the do-

main dependency problem in sentiment analysis. We proved that subjective feature ex-

traction was achievable minimizing linguistic resources through our experimental results.

Using our methodology, we were able to achieve significant improvements on both PDS1

53


and PDS2 from the baseline and existing state-of-the-art approaches.

54

Chapter 5

Sentiment Summarization

In this chapter, we discuss on how to summarize sentiments of different users towards a

particular topic. Here, we focus on summarizing sentiments of users in multiple docu-

ments unlike RSUMM that focused on single document subjective summary. Also, we re-

late sentiment classification and sentiment summarization and how former helps the latter.

Sentiment summarization is one application where sentiment classification can be applied.

5.1 Introduction

Automated text summarization addresses the problem of information overload by condens-

ing the essence of the text to a level comparable to that of original document. It can be

either abstract or extract based on single and multiple documents. Sentiment summariza-

tion differs from traditional document summarization [80, 87] as it has to optimize an extra

property called sentiment in it. Although, rating is a form of summary for the text, the

real essence of the sentiment is contained in the text itself. In our work, we developed a

system that summarizes the sentiments of different users towards a particular topic from

multiple blog posts. In this work, we view the problem of sentiment summarization as a

two-stage classification problem at sentence level. First, we estimate the subjectivity and

then estimate the polarity of each sentence.

55

CHAPTER 5. SENTIMENT SUMMARIZATION

Most of the existing work in multi-document sentiment summarization focused on gen-

erating aspect based summary of a product. Aspect based summarization followed two

steps:

1. Extract product feature-opinion associations from sentences.

2. Prune them to generate the summary.

Hu and Liu in [43] used POS tagger to extract product features. They assumed that product

features are nouns and noun phrases and extracted them using POS tagger. Then they used

frequent item set mining to prune the product features. They classified a sentence as an

opinion or fact based on these product features. If a sentence has more than two features,

it is likely to contain the sentiment of the user. They determined the polarity orientation of

a sentence using a manual seed list of opinion bearing words. They produced a summary

for each feature of a product providing evidences in the form of opinion sentences from

reviews. Remember, the feature of a product is different from the n-gram features discussed

earlier. Researchers followed the above methodology with different ways of extracting

feature-opinion associations from customer reviews until 2008 [111, 112, 30].

With the introduction of opinion summarization track in Text Analysis Conference

(TAC) 20081, extract based opinion summarization gained popularity. The track focused

on query based opinion summarization of blog posts rather than customer reviews and re-

searchers developed systems and evaluated them using the TAC data as in [46, 9].

Task Definition: TAC 2008 Opinion Summarization task is defined as the automatic gen-

eration of well-organized, fluent summaries of opinions about specified targets, as found in

a set of blog documents. Each summary has to address a set of complex questions about

the target, where the question cannot be answered simply with a named entity (or even

a list of named entities). The input to the summarization task comprises a target, some

opinion-related questions about the target (see Figure. 5.1) and a set of documents that

contain answers to the questions. The output is a summary for each target.1http://www.nist.gov/tac/tracks/2008/index.html

56


Figure 5.1 Sample TAC Queries and Targets

Our summarization system is illustrated in Figure. 5.2. The input to the system is

a query and a set of blog posts (documents) from which the sentiment summary has to

generated. We assume that each query has a polarity orientation and predict the orientation

as positive/negative that will be used as a filter. For example, the query ”What features do

people like about vista?” expects the positive comments that writers expressed on product

Windows Vista to be returned in the summary. We do not look into complex queries but

instead focused on simple queries that have either positive or negative orientation as in TAC

dataset.

5.2 Classification Based Approach

We view the problem of summarizing sentiments as a two-stage classification problem at

sentence level. We split each document in the document set into sentences and predict

whether a sentence is an opinion or fact. Later, we determine the polarity of opinionated

sentences returned by the above method on a binary scale as positive or negative.

57


Figure 5.2 Architecture of our sentiment summarization system

58


5.2.1 Training the Classifier

For training the opinion/fact classifier, we used a set of 10,000 sentences that have equal

number of sentences labeled as opinions and facts. For training the polarity classifier, we

crawled about 1,28,000 reviews on various topics rated manually on a scale of one to five.

We used this as the training set for classifying each opinionated sentence as positive or

negative. We tagged each sentence in a review as positive or negative based on the rating

given at the end of each review. Reviews with rating of four and five are considered to

be positive and others as negative. We used rainbow text classifier implemented in [65]

and built classification models using it. It has several in built methods like naive-bayes,

Knn, TFIDF, Probabilistic indexing etc. We trained the classifier using unigrams and word

association as features with probabilistic indexing [8] as the method. Probabilistic indexing

method performed better compared to other methods on the training data.

Word association is a simple variant of bigram, it has nothing related to association rule

mining. We tokenize each sentence in opinion/fact and polarity training data into words

and associate each token with all other tokens in the sentence. The motivation behind this

approach is that the characteristic of opinion or polarity of a sentence is not determined by

a single token; rather it is the combination of tokens that determines it. We limit ourselves

to text units of maximum size two while training the classifier.

5.2.2 Polarity Estimation

We define the metric polarity estimation (PE) that estimates the polarity score of a sentence

for a particular orientation. The orientation of query is used as a filter. If the query focuses

on positive aspects of a product, then we estimate polarity for sentences that are labeled as

positive by the polarity classifier and vice versa. We use the scores returned by the rainbow

classifier to compute PE of a sentence for the query orientation as shown in the eqn. 5.1.

The polarity orientation of the query is also determined using the polarity classifier. The

59


smoothing parameters in the eqn. 5.1 are intuitively set to 0.3 and 0.7 respectively.

PE(S|C) = 0.3XPPI(S|O) + 0.7XPPI(S|C) (5.1)

where PE(S|C) denote the polarity estimate of an opinion sentence S, PPI(S|O) denotes

the probability of sentence being an opinion and PPI(S|C) denotes the probability of a sen-

tence belonging to class C returned by the opinion/fact and polarity classifier respectively

and C can be positive or negative depending on the query.

5.2.3 Final Ranking

In addition to polarity estimate metric, we rank sentences using two other metrics query

dependent (QD) and query independent (QI) metric. Query dependent metric boosts the

sentences that are more relevant to the query as described in [75]. Query independent

metric picks most informative sentence using relevance based language modeling [47]. QI

metric uses KL divergence [56] for estimating the importance of a sentence by observing its

likelihood in relevant and irrelevant distribution respectively. The final score of a sentence

is a linear combination of the above three metrics and is shown in eqn. 5.2.

FS(S) = λ1QI(S) + λ2QD(S) + λ2PE(S|C) (5.2)

where QI(S),QD(S) and PE(S|C) are the query independent, query dependent and po-

larity scores of sentence S respectively. λ1, λ2 and λ3 are the smoothing parameters for

each metric.

5.3 Experiments

5.3.1 Dataset

We evaluated our approach to summarize sentiments on TAC 2008 opinion summarization

task dataset. The dataset have 25 topics and each topic has one or two of squishy list

60


questions and a set of documents (blog posts) where the answers are likely to be found. A

descriptive answer is expected for a squishy list question. In this task, we have to preserve

an extra property called sentiment in the summary to the maximum extent. The questions

focused on either positive or negative aspects of a topic.

5.3.2 Evaluation Metrics

We evaluated our system using ”Nugget Judgements” provided for each topic in TAC.

Each judgement had a nugget score or weight that would be used to judge the quality of

the summary. The nugget judgement with maximum weight was considered to be most

relevant. Judgements were provided only for 22 topics out of 25, hence, we evaluated

our system using the 22 judgements only. We used the evaluation metrics Nugget Recall

(NR), Nugget Precision (NP) and F-Measure confirming to standard TAC practices. Those

sentences that have an overlap threshold of at least 40% are considered to be redundant and

subsequently discarded.

Nugget Recall (NR) :sum of weights of nuggets returned in the summary

sum of weights of all nuggets related to the topic(5.3)

Nugget Precision (NP ) : Allowance/Length (5.4)

where Allowance =100 X number of nuggets returned in the summary and Length =

number of non-white space characters in the summary.

F −Measure = (1 + β2)XNP.NR

β2.NP +NR; with β = 1 (5.5)

5.3.3 Results

We chose the values of λ1, λ2 and λ3 in eqn. 5.2 as 0.35, 0.25 and 0.4 respectively after

some manual tuning of weights. The values presented in the Table 5.1 are the average

values over 22 topics. Average F-measure score obtained using our approach is better than

many of the systems submitted to TAC 2008. Out of the thirty six runs submitted to the

task only nine runs performed better than us with the best being 0.489.61


Table 5.1 Results showing average NR, NP and F-Measure values for 22 topics

Evaluation Metric Avg. Score

NR 0.287

NP 0.164

F-Measure 0.209

5.4 Conclusion

In this work, we were able to present a general overview of a sentiment summarization sys-

tem and how sentiment classification helps in summarizing sentiments. Our summarization

system focused on extract based summaries unlike previous systems that are aspect based.

We built two classifiers that classify each sentence in a document as an opinion or fact

and positive or negative respectively using unigrams and word association as features. We

estimated the polarity of a sentence using the classifier scores and combined it with QI and

QD in the final scoring of a sentence. We also took care of redundancy while generating

summary based on overlap threshold. Sentiment summarization particularly extract based,

is still at early stages of research and we believe our approach is a right step in the future

direction to explore more novel methods.

62

Chapter 6

Conclusion

Sentiment classification could be treated as a special case of topical classification applied

to subjective portions of a document. In this thesis, we discussed the problem of document

sentiment classification and subjective feature extraction, the key component in it. We dis-

cussed the challenges in extracting subjectivity, existing approaches and their limitations.

Though, there were many techniques proposed in the last decade for extracting subjective

features from a document, there are still many open problems that needs to be addressed.

Most of the proposed methods relied heavily on linguistic resources like sentiment lexicons

and complex patterns based on POS information, making the task more resource dependent

and complex. It requires a lot of human effort to develop such tools to analyse sentiments

in various domains and languages. Hence, extending these resource dependent approaches

to various domains, languages is not a feasible solution. This motivated us to conduct re-

search on methodologies that require minimum use of linguistic resources and yet achieve

comparable or better results.

Also, most of the existing sentiment analysis systems predict the polarity on a binary

scale. However, in real world applications, expressing the sentiment is too complex and

it can’t be simple binary. Hence, we conducted experiments to predict the sentiment on

a multi-variant scale of one to five popularly known as starred rating on the web. We

adopted a filtering methodology to derive subjective features and used supervised learning

63

CHAPTER 6. CONCLUSION

approaches to analyse the overall sentiment of a document. We proposed a method called

RSUMM in combination with well known feature selection techniques to extract subjective

features. Our RSUMM was based on information retrieval models. Techniques similar to

vector space model, unigram language model and term co-occurrence model were used to

estimate subjectivity of each sentence.

6.1 Contributions

Current day approaches in sentiment analysis lie at the crossroads of NLP and IR, where

subjective feature extraction was highly dominated by linguistic resources. In this thesis,

we attempted to move away from using language resources and investigated approaches

that make the task of subjective feature extraction ”resource independent”. It was the major

contribution of this thesis. We approached the problem by following a two step filtering

methodology and did experiments to predict the sentiment in customer reviews. The basis

for this methodology was our analysis on the web manually, where we had seen many

reviews with less subjective content compared to the total content.

• We proposed a method called RSUMM to extract subjective sentences from a docu-

ment. We estimated subjectivity of each sentence using three variants of RSUMM;

RSUMMLS, RSUMMPE and RSUMMCO. All the variants of RSUMM were based

on information retrieval models. We used techniques similar to vector space model,

unigram language model and term co-occurrence model to estimate subjectivity. We

obtained an extract of each review retaining the most subjective sentences from the

document in it. We used the subjective extract to predict the sentiment orientation

rather than full review.

• We used n-gram models to convert a sentence into a feature vector for classification.

As n-gram modeling was done on sentences, there could be lot of irrelevant features

that needs to be filtered. We used two state-of-the-art feature selection methods mu-

64


tual information and fisher discriminant ratio to remove them.

• Logistic regression (LR) method was widely used in patent information retrieval and

also in applications where human preferences play a major role like grading a student.

It was used for the first time in sentiment classification to the best of our knowledge

for calibrating the customer satisfaction. But, we did n’t go into internals of this

method rather focused on deriving features from the document and presenting them

as feature vectors.

• We also worked on an application of sentiment classification, sentiment summariza-

tion. We summarized the sentiment in multiple blog posts related to a topic following

a classification based approach. We adopted a two-stage classification procedure to

summarize blog posts at sentence level. A sentence was estimated for its subjectiv-

ity and polarity based on the classifier scores. We used a linear combination of the

scores for final ranking.

We conducted experiments on standard datasets used by many researchers in sentiment

classification and summarization. The classification datasets were form hotel and movie

review domains. The dataset we used for evaluating our summarization system contained

blog posts on different topics. We evaluated our methodology confirming to standard evalu-

ation metrics in both classification and summarization. We used accuracy, mean-absolute-

error (both micro and macro versions of it) and reported results. Precision, recall and

F-measure were used as metrics for evaluating opinion summaries.

We reported the results of experiments on sentiment classification and summarization

in Chapter 4 and Chapter 5 respectively. We were able to achieve good accuracy values

while classifying sentiments on both binary and multi-variant scales. Our results were on

par or better to the state-of-the-art approaches that used linguistic resources for extracting

subjective features. We evaluated our summarization system on TAC 2008 blog dataset and

we were able to obtain good performance using our classification methodology than many

systems participated in TAC.65


6.2 Applications

In this section, we discuss some real world applications of sentiment classification.

6.2.1 Products comparison

Most of the people are using web to recommend a product or not. Online merchants are

asking customers to review their products and also very curious on their judgements. Re-

searchers are also focusing on classifying views of the people on a product as recommended

or not automatically [72, 18, 89]. A product has several aspects on which people comment

on and probably have short comings in one aspect with merits on another [66, 86]. To ana-

lyze these sentiments in the text and coming up with a comparison of customers opinion on

different products with a mere single glance (rating) can really facilitate a better informa-

tion access for merchants and others. The comparison of products on the web can enable

people to easily gather marketing intelligence and product benchmarking information.

Liu et al. in [61] proposed a novel framework to analyse and compare customer opin-

ions on several competing products. A prototype system called Opinion Observer was

implemented. The process involved two steps: 1) Identifying product features or aspects

that users have commented up on. 2) For each feature extracted, identify the semantic ori-

entation of the sentiment. They presented the comparison output in the form of a visual

summary for each feature (aspect) of the product.

6.2.2 Sentiment summarization

The number of reviews that a product receives is increasing rapidly on the web. Popular

products are often commented by the people. And, some of the reviews are long, contain

less opinion information and redundant. This makes it hard for a potential customer and

also product manufactures to make an informed decision. Sentiment summarization sum-

marizes the opinions by predicting the polarity of the sentiment in the text, quantifying

66


the sentiment and relation between entities [55, 74]. A customer or a manufacturer gets

a complete overview of what different people are saying about a particular product with a

sentiment summary. We conducted experiments on this application of sentiment classifica-

tion.

6.2.3 Opinion reason mining

Opinion reason mining is another area where sentiment classification can be applied. In this

area of research, people do a critical in-depth analysis of opinion-assessment. For example,

”What are the reasons for the popularity of Windows 7”?. For such type of queries, simply

giving some 150 reviews on Windows 7 that are positive and some 50 reviews that have

negative polarity is not sufficient. Reasons such as ”The product is popular for its look and

feel, and bootable time.” convey a in-depth assessment for the customer. In this application,

sentiment classification is used to come up with a general overview of pros and cons of a

product, and also the exact reasons for them.

6.2.4 Other Applications

Online message sentiment filtering, sentiment web search engine, E-mail sentiment detec-

tion and web blog author sentiment prediction are other applications of sentiment classifi-

cation.

6.3 Future Directions

Our approach can be considered as a building block for investigating subjective feature

extraction methods that require minimum use of linguistic resources. In this thesis, we ex-

plored two simple metrics( ADF, ASM) and methods based on information retrieval models

(probabilistic estimate, term co-occurrence) for estimating subjectivity. In future, one can

explore on more novel metrics and models in subjective feature extraction and conduct ex-

67


periments. We employed two state-of-the-art feature selection methods but did n’t explore

more on other feature selection methods. This area can be treated as one of the possible

future directions to explore particularly in multi-variate classification.

From the results we reported in the above chapters, we believe that our methodology is

a right step in the direction of investigating subjective feature extraction approaches that use

statistical means. But, due to unavailability of standard annotated datasets in different lan-

guages and of large size, we conducted experiments on datasets used by many researchers

in sentiment classification for comparing our results. Based on the accuracy values we have

obtained, we are fairly confident that our approach reduced the ”resource dependency prob-

lem” in subjective feature extraction. In future, one can extend this methodology to conduct

experiments on analysing sentiments in regional languages and also very large datasets.

We followed a naive classification based approach at sentence level for summarizing

opinions in blog posts. In sentiment summarization, aspect based summarization is highly

popular compared to extract based summarization. Extract based summarization is gaining

popularity now a days. We focused on appreciating the need for a sentiment summarization

system and the difference between normal text summarization and sentiment summariza-

tion. We also focused on establishing the relation between sentiment classification and

summarization. Our methodology can be further improved in future to include more novel

techniques for summarizing sentiments.

68

Bibliography

[1] A. Andreevskaia and S. Bergler. Mining wordnet for a fuzzy sentiment: Sentimenttag extraction from wordnet glosses. In proceedings of EACL., 2006.

[2] S. Argamon, M. Koppel, and G. Avneri. Routing documents according to style.In proceedings of first international workshop on innovative information systems.,1998.

[3] A. Aue and M. Gamon. Customizing sentiment classifiers to new domains: A casestudy. In proceedings of RANLP., 2005.

[4] B. Avrim and S. Chawla. Learning from labeled and unlabeled data using graphmincuts. In proceedings of 18th ICML., pages 19–26, 2001.

[5] S. Baccianella, A. Esuli, and F. Sebastini. Multi-facet rating of product reviews. Inproceedings of European Conference on Infromation Retrieval, ECIR., pages 461–472, 2009.

[6] Ricardo Baeza-Yates and Berthier Riberio-Neto. Modern Information Retrieval.Addison-Wesley Longman Publishing Co., 2002.

[7] P. Beineke, T. Hastie, and S. Vaithayanathan. The sentimental factor: Improvingreview classification via human-provided information. In proceedings of 42nd ACL.,2004.

[8] A. Bookstein and D.R. Swanson. Probabilistic methods for automatic indexing.Journal of ASIS., Vol. 25:312–319, 1974.

[9] A. Bossard, M. Genereux, and T. Poibeau. Cbseas, a summarization system, inte-gration of opinion mining techniques to summarize blogs. In proceedings of EACL.,pages 5–8, 2009.

[10] E. Brill. Transformation based error-driven learning and natural language process-ing. Computational Linguistics., Vol. 21:pp. 543–565, 1995.

[11] A. Budanitsky and G. Hirst. Semantic distance in wordnet: An experimentalapplication-oriented evaluation of five measures. In proceedings of NAACL work-shop on WordNet and other lexical resources., 2001.

69

BIBLIOGRAPHY

[12] J.A. Bullinaria. Semantic categorization using simple word co-occurrence statistics.In proceedings of ESSLLI workshop on Distributional Lexical Semantics., pages 1–8, 2008.

[13] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector ma-chines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.

[14] P. Chaovalit and L. Zhou. Movie review mining: A comparison between supervisedand unsupervised classification approaches. In proceedings of 38th Hawaii interna-tional conference on system sciences., pages 1–9, 2005.

[15] E. Charlotta. Topic dependence in sentiment classification., 2004.

[16] H. Cui, V. Mittal, and M. Datar. Comparative experiments on sentiment classificationfor online product reviews. In proceedings of American Association for ArtificialIntelligence, AAAI., pages 1265–1270, 2006.

[17] J.R. Curran. Ensemble methods for automatic thesaurus extraction. In proceedingsof EMNLP., pages 222–229, 2002.

[18] S.R. Das and M. Chen. Yahoo! for amazon: Sentiment parsing from small talk onthe web. In proceedings of 8th Asia Pacific finance association annual conference.,2001.

[19] K. Dave, S. Lawrence, and D. Pennock. Mining the peanut gallery: Opinion ex-traction and semantic classification of product reviews. In proceedings of WWW.,2003.

[20] A. Devitt and C. Vogel. The topology of wordnet: Some metrics. In proceedings ofglobal WordNet conference, GWC., 2004.

[21] M. Dimitrova, A. Finn, N. Kushmeric, and B. Smyth. Web genre visualization. Inproceedings of conference on human factors in computing systems., 2002.

[22] S.D. Durbin, J. Neal Richter, and D. Warner. A system for effective rating of texts.In proceedings of OTC-3, workshop on operational text classification., 2003.

[23] P. Edmonds. Semantic representations of near-synonyms for automatic lexicalchoice. PhD thesis, University of Toronto, 1999.

[24] P. Edmonds and G. Hirst. Near-synomymy and lexical choice. Computational Lin-guistics, Vol. 28:pp. 105–144, 2002.

[25] M. Efron. Cultural orientation: Classifying subjective documents by cociation anal-ysis. In proceedings of AAAI fall symposium on style and meaning in language.,pages 41–48, 2004.

70

BIBLIOGRAPHY

[26] A. Esuli and F. Sebastiani. Determining the semantic orientation of terms throughgloss classification. In proceedings of CIKM., 2005.

[27] A. Esuli and F. Sebastini. Determining the term subjectivity and term orientation foropinion mining. In proceedings of 11th conference of the european chapter of theassociation for computational linguistics, EACL., 2006.

[28] A. Esuli and F. Sebastini. Sentiwordnet: A publicly available lexical source foropinion mining. In proceedings of LREC 2006., 2006.

[29] Z. Fei, J. Liu, and G. Wu. Sentiment classification using phrase patterns. In pro-ceedings of 4th international conference on computer and information technology,CIT., 2004.

[30] O. Feiguina and G. Lapalme. Query based summarization of customer reviews. Inproceedings of Canadian AI., pages 452–463, 2007.

[31] C. Fellbaum. WordNet: An electronic lexical database. MIT Press, 1998.

[32] R.M. French and C. Labiouse. Four problems with extracting human semantics fromlarge text corpora. In proceedings of 24th annual conference of the cognitive sciencesociety., pages 316–322, 2002.

[33] M. Gamon. Sentiment classification on customer feedback data: Noisy data, largefeature vectors and the role of linguistic analysis. In proceedings of 20th interna-tional conference on computational linguistics., pages 841–847, 2004.

[34] M. Gamon and A. Aue. Automatic identification of sentiment vocabulary exploitinglow association with known sentiment terms. In proceedings of ACL workshop onfeature engineering in machine learning in NLP., 2005.

[35] N.S. Glance, M. Hurst, and T. Tomokiyo. Blog pulse: Automatic trend discoveryfor weblogs. In proceedings of WWW workshop on the weblogging ecosystem: Ag-gregation, analysis and dynamics., 2004.

[36] G. Grefenstette. Explorations in automatic thesaurus discovery. Kluwer AcademicPress, 1994.

[37] G. Grefenstette, Y. Qu, J.G. Shanahan, and D.A. Evans. Coupling niche browsersand affect analysis for an opinion mining application. In proceedings of RIAO-04.,pages 186–194, 2004.

[38] U. Gretzel and K.Y. Yoo. Use and impact of online travel review. In proceedings ofthe 2008 International Conference on Information and Communication Technology.,pages 35–46, 2008.

[39] Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. MorganKauffman, 2001.

71

BIBLIOGRAPHY

[40] V. Hatzivassiloglou and K.R. McKeown. Predicting the semantic orientation of ad-jectives. In proceedings of 35th ACL., 1997.

[41] V. Hatzivassiloglou and J. Wiebe. Effects of adjective orientation and gradability onsentence subjectivity. In proceedings of 18th international conference on computa-tional linguistics., 2000.

[42] D. Hillard, M. Ostendorf, and E. Shriberg. Detection of agreement vs disagreementin meetings: Training with unlabeled data. In proceedings of HLT/NAACL., 2004.

[43] M. Hu and B. Liu. Mining and summarizing customer reviews. In proceedings ofSIGKDD., 2004.

[44] M. Hu and B. Liu. Mining opinion features in customer reviews. In proceedings ofAAAI., pages 755–760, 2004.

[45] D.J. Inkpen, O. Feiguina, and G. Hirst. Generating more positive and more negativetext. Computing attitude and affect in text: Theory and applications. The informationretrieval series., Vol. 20:pp. 187–196, 2004.

[46] G.C. Jack, L.L. Jochen, F. Schilder, and K. Ravi. Query-based opinion summariza-tion for legal blog entries. In proceedings of ICAIL., 2009.

[47] J. Jagadeesh, P. Prasad, and Vasudeva Varma. A relevance-based language modelingapproach to duc 2005. In working notes of DUC., 2005.

[48] J. Justeson and K. Slava. Technical terminology: some linguistic properties and analgorithm for identification in text. Natural Language Engineering., Vol. 1:pp. 9–27,1993.

[49] J. Kamps, M. Marx, R.J. Mokken, and M. de Rijke. Using wordnet to measure se-mantic orientation of adjectives. In proceedings of LREC., pages 1115–1118, 2004.

[50] A. Kennedy and D. Inkpen. Sentiment classification of movie reviews using contex-tual valence shifters. Computational Intelligence, Vol. 22:pp. 110–125, 2006.

[51] B. Kessler, G. Nunberg, and H. Schautze. Automatic detection of text genre. Inproceedings of 35th ACL., pages 32–38, 1997.

[52] S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In proceedings ofCOLING., pages 1363–1373, 2004.

[53] S-M. Kim and E. Hovy. Automatic detection of opinion bearing words and sen-tences. In proceedings of IJCNLP., 2005.

[54] N. Kobayashi, T. Inui, and K. Inui. Dictionary based acquisition of the lexical knowl-edge for p/n analysis (in japanese). In proceedings of Japanese society for artificialintelligence., pages 45–50, 2001.

72

BIBLIOGRAPHY

[55] W. Ku, i, L-Y. Lee, T. Wu, and H-H. Chen. Major topic detection and its applicationto opinion summarization. In proceedings of SIGIR., pages 627–628, 2005.

[56] S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathemat-ical Statistics., Vol. 22, 1951.

[57] J. Laffetry, A. McCallum, and F. Pereira. Conditional random fields: Probabilisticmodels for segmenting and labeling of sequence data. In proceedings of ICML.,2001.

[58] Tjen-Sien Lim, Wei-Yin Loh, and Yu-Shan Shih. A comparison of prediction accu-racy, complexity, and training time of thirty-three old and new classification algo-rithms. In Machine Learning., pages 203–228, 2000.

[59] D. Lin. Automatic retrieval and clustering of similar words. In proceedings ofCOLING-ACL., pages 768–774, 1998.

[60] B. Liu. Sentiment analysis and subjectivity. Handbook of Natural Language Pro-cessing, 2010.

[61] B. Liu, M. Hu, and J. Cheng. Opinion observer: Analyzing and summarizing opin-ions on the web. In proceedings of WWW., pages 10–14, 2005.

[62] H. Liu, H. Lieberman, and T. Selker. A model of textual affect sensing using real-world knowledge. In proceedings of 8th international conference on intelligent userinterfaces., pages 125–132, 2003.

[63] R. Losee. Natural language processing in support of decision-making. phrases andpart-of-speech tagging. Information processing and management., Vol. 37:pp. 769–787, 2001.

[64] S. Matsumoto, H. Takamura, and M. Okumura. Sentiment classification using wordsub-sequences and dependency sub-tress. In proceedings of Pacific Asia Conferenceon Knowledge Discovery and Data Management, PAKDD., pages 301–311, 2005.

[65] Andrew Kachites McCallum. Bow: A toolkit for statistical language modeling, textretrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.

[66] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima. Mining product reputa-tions on the web. In proceedings of ACM SIGKDD., pages 341–349, 2002.

[67] T. Mullen and N. Collier. Sentiment analysis using support vector machines usingdiverse information scores. In proceedings of EMNLP., pages 412–418, 2004.

[68] K. Nigam, A. McCallum, and S. Thrun. Text classification from labeled and unla-beled documents using em. Machine Learning., Vol. 39:pp. 103–134, 2000.

73

BIBLIOGRAPHY

[69] B. Pang and L. Lee. A sentimental education: Sentiment analysis using subjectivitysummarization based on minimum cuts. In proceedings of Association for Compu-tational Linguistics, ACL., page 271278, 2004.

[70] B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment cate-gorization with respect to rating scales. In proceedings of 43rd ACL., pages 115–124,2005.

[71] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trendsin Information Retrieval, Vol. 2:pp. 1–135, 2008.

[72] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification us-ing machine learning techniques. In proceedings of Association for ComputationalLinguistics, ACL., pages 79–86, 2002.

[73] F.C.N. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. Inproceedings of ACL., pages 183–190, 1993.

[74] B. Philip, T. Hastie, M. Christopher, and V. Shivakumar. Exploring sentiment sum-marization. In proceedings of AAAI symposium on exploring attitude and effect intext., 2004.

[75] P. Prasad, K. Rahul, and Vasudeva Varma. Iiit hyderbad at duc’07. In working notesof DUC., 2007.

[76] A. Rabuer and A. Muller-Kogler. Integrating automatic genre analysis into digitallibraries. In proceedings of 1st ACM-IEEE joint conference on digital libraries.,2001.

[77] R. Rapp. A freely available automatically generated thesaurus of related words. Inproceedings of LREC., 2004.

[78] A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In proceed-ings of EMNLP., pages 133–142, 1996.

[79] E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. Inproceedings of EMNLP., pages 105–112, 2003.

[80] G. Salton, A. Singhal, C. Buckley, and M. Mitra. Automatic text decomposition us-ing text segments and text themes. In proceedings of ACM conference on Hypertext.,1996.

[81] F. Salvetti, S. Lewis, and C. Reichenbach. Automatic opinion polarity classificationof movie reviews. Colorado research in linguists., Vol. 17, 2003.

[82] H. Schmid. Probabilistic part-of-speech tagging using decision trees. In proceedingsof international conference on new methods in language processing., 1994.

74

BIBLIOGRAPHY

[83] E. Spertus. Automatic recognition of hostile messages. In proceedings of IAAI.,1997.

[84] P. Subasic and J. Huettner. Affect analysis of text using fuzzy text typing, fuzzysystems. IEEE transactions, Vol. 9:pp. 483–496, 2001.

[85] M. Taboada, A. Caroline, and V. Kimberly. Creating semantic orientation dictionar-ies. In proceedings of 5th LREC., 2006.

[86] M. Taboada, M.A. Gillies, and P. McFetridge. Sentiment classification techniquesfor tracking literary reputation. In proceedings of LREC Workshop ”Towards Com-putational Models of Literary Analysis”., pages 36–43, 2006.

[87] J. Tait. Automatic Summarizing of English Texts. PhD thesis, University of Cam-bridge, 1983.

[88] H. Takamura, T. Inui, and M. Okumura. Extracting semantic orientation of wordsusing spin model. In proceedings of 43rd ACL., 2005.

[89] L. Terveen, W. Hill, B. Amento, and J. Creter. Phoaks. a system for sharing recom-mendations. In Communications of the ACM., pages 59–62, 1997.

[90] T.T. Thet, C-J. Na, and S.G. Christopher Khoo. Sentiment classification of moviereviews using multiple perspectives. In proceedings of ICADL., pages 184–193,2008.

[91] R.M. Tong. An operational system for detecting and tracking opinions in onlinediscussion. In proceedings of SIGIR workshop on operational text classification.,2001.

[92] P.D Turney. Thumbs up or thumps down? semantic orientation applied to unsu-pervised classification of reviews. In proceedings of Association for ComputationalLinguistics, ACL., pages 417–424, 2002.

[93] P.D. Turney and M.L. Littman. Unsupervised learning of semantic orientation from ahundred-billion-word corpus. Technical report: National research council, Canada.,2002.

[94] P.D. Turney and M.L. Littman. Measuring praise and criticism: Inference of se-mantic orientation from association. In ACM transactions on information systems.,pages 315–346, 2003.

[95] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York.,1995.

[96] S. Vegnaduzzo. Acquisition of subjective with limited resources. In proceedings ofAAAI spring symposium on exploring attitude and affect in text., 2004.

75

BIBLIOGRAPHY

[97] S. Wang, D. Li, Y. Wei, and H. Li. A feature selection method based on fisher’sdiscriminant ratio for sentiment classification. In proceedings of WISM., pages 88–97, 2009.

[98] J. Weibe and E. Riloff. Creating subjective and objective sentence classifiers fromunannotated texts. In proceedings of 6th international conference on intelligent textprocessing and computational linguistics., 2005.

[99] C. Whitelaw, S. Argamon, and N. Garg. Using appraisal taxonomies for sentimentanalysis. In proceedings of first computational systemic functional grammar confer-ence., 2005.

[100] C. Whitelaw, N. Garg, and S. Argamon. Using appraisal groups for sentiment anal-ysis. In proceedings of CIKM., pages 625–631, 2005.

[101] J. Wiebe. Learning subjective adjectives from the corpora. In proceedings of AAAI.,pages 735–740, 2000.

[102] J. Wiebe, R. Bruce, and .T. O’Hara. Development and use of gold standard data forsubjectivity classifications. In proceedings of 37th ACL., pages 246–253, 1999.

[103] J. Wiebe, R. Bruce, and T. O’Hara. Development and use of gold standard data forsubjectivity classifications. In proceedings of 37th ACL., pages 246–253, 1999.

[104] J. Wiebe, T. Wilson, and M. Bell. Identifying collocations for recognizing opinions.In proceedings of ACL/EACL workshop on collocation., 2001.

[105] J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emo-tions in language. In proceedings of LREC., 2005.

[106] Y. Wilks and M. Stevenson. The grammar of sense: Using part-of-speech tags asa first step in semantic disambiguation. Natural language engineering., Vol. 4:pp.135–144, 1998.

[107] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phraselevel sentiment analysis. In proceedings of HLT/EMNLP., 2005.

[108] Y. Yang and J.O. Pederson. A comparitive study on feature selection methods in textcategorization. In proceedings of ICML., pages 412–470, 1997.

[109] H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: Separatingfacts from opinions and identifying the polarity of opinion sentences. In proceedingsof EMNLP., pages 129–136, 2003.

[110] Z. Zhang and B. Varadarajan. Utility scoring of product reviews. In proceedings of15th CIKM., pages 51–57, 2006.

76

BIBLIOGRAPHY

[111] L. Zhuang, F. Jing, and Y.X. Zhu. Movie review mining and summarization. Inproceedings of CIKM., 2006.

[112] L. Zhuang, F. Jing, and Y.X. Zhu. A joint model of text and aspect ratings forsentiment summarization. In proceedings of ACL., pages 308–316, 2008.

77