[ieee 2011 ieee 8th international conference on e-business engineering (icebe) - beijing, china...

8
Learning Domain-specific Sentiment Lexicons for Predicting Product Sales Raymond Y.K. Lau and Wenping Zhang Department of Information Systems City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong SAR E-mail: {raylau, wzhang23}@cityu.edu.hk Peter D. Bruza Faculty of Science and Technology Queensland University of Technology Brisbane QLD 4001, Australia [email protected] K.F. Wong Department of Systems Engineering & Engineering Management Chinese University of Hong Kong Shatin, Hong Kong SAR [email protected] Abstract Generic sentiment lexicons have been widely used for sentiment analysis these days. However, manually con- structing sentiment lexicons is very time-consuming and it may not be feasible for certain application domains where annotation expertise is not available. One contribution of this paper is the development of a statistical learning based computational method for the automatic construction of domain-specific sentiment lexicons to enhance cross- domain sentiment analysis. Our initial experiments show that the proposed methodology can automatically generate domain-specific sentiment lexicons which contribute to im- prove the effectiveness of opinion retrieval at the document level. Another contribution of our work is that we show the feasibility of applying the sentiment metric derived based on the automatically constructed sentiment lexicons to predict product sales of certain product categories. Our research contributes to the development of more effective sentiment analysis system to extract business intelligence from numerous opinionated expressions posted to the Web. Keywords: Sentiment Lexicon, Sentiment Analysis, Statisti- cal Learning, Kullback-Leibler divergence, Business Intel- ligence. 1 Introduction Sentiment analysis involves multi-disciplinary research such as information retrieval [15], text mining [11, 28], ma- chine learning [16, 26], and natural language processing NLP [29, 30]. In the field of information retrieval (IR), sentiment analysis is seen as a special kind of document retrieval and ranking process that aims at retrieving views on certain entities such as products, people, organizations rather than simply retrieving topical information of the en- tities [15, 33]. One sub-task commonly set in sentiment analysis is to determine the orientation (i.e., polarity) of an opinionated expression. For research on sentiment analysis, a Blog Track of the annual TREC conference has been es- tablished to benchmark the performance of state-of-the-art opinion retrieval systems [15, 31]. Commercial sentiment analysis systems such as Reuters NewsScope Sentiment En- gine 1 can only extract sentiments for a particular domain (e.g., the stock investment domain). To enable an auto- mated sentiment analysis process to be applied to a wide range of application domains and languages, it is desirable that an sentiment analysis system can learn a set of domain- specific sentiment indicators (i.e., a sentiment lexicon) with minimal human intervention. The reason is that construct- ing sentiment lexicon manually is very labor-intensive and the expertise required for their construction may not even be available for certain application domains. Nevertheless, automated sentiment lexicon construction involves several fundamental research challenges, as does sentiment analysis in general. First, there is inevitably a de- gree of uncertainty related to the identification of targeted entities and the associated sentiments expressed in natural language. Second, it is difficult to accurately determine the polarity of a sentiment across various domains. For in- stance, while the token “small” has a negative orientation in a review about hotel (e.g., “the room is small”), the same 1 http://www.reuters.se/productinfo/newsscopesentiment/description.aspx 2011 Eighth IEEE International Conference on e-Business Engineering 978-0-7695-4518-9/11 $26.00 © 2011 IEEE DOI 10.1109/ICEBE.2011.55 131

Upload: kf

Post on 13-Feb-2017

215 views

Category:

Documents


3 download

TRANSCRIPT

Learning Domain-specific Sentiment Lexicons for Predicting Product Sales

Raymond Y.K. Lau and Wenping ZhangDepartment of Information Systems

City University of Hong KongTat Chee Avenue, Kowloon, Hong Kong SAR

E-mail: {raylau, wzhang23}@cityu.edu.hk

Peter D. BruzaFaculty of Science and Technology

Queensland University of TechnologyBrisbane QLD 4001, Australia

[email protected]

K.F. WongDepartment of Systems Engineering & Engineering Management

Chinese University of Hong KongShatin, Hong Kong [email protected]

Abstract

Generic sentiment lexicons have been widely used forsentiment analysis these days. However, manually con-structing sentiment lexicons is very time-consuming and itmay not be feasible for certain application domains whereannotation expertise is not available. One contribution ofthis paper is the development of a statistical learning basedcomputational method for the automatic constructionof domain-specific sentiment lexicons to enhance cross-domain sentiment analysis. Our initial experiments showthat the proposed methodology can automatically generatedomain-specific sentiment lexicons which contribute to im-prove the effectiveness of opinion retrieval at the documentlevel. Another contribution of our work is that we showthe feasibility of applying the sentiment metric derivedbased on the automatically constructed sentiment lexiconsto predict product sales of certain product categories. Ourresearch contributes to the development of more effectivesentiment analysis system to extract business intelligencefrom numerous opinionated expressions posted to the Web.

Keywords: Sentiment Lexicon, Sentiment Analysis, Statisti-cal Learning, Kullback-Leibler divergence, Business Intel-ligence.

1 Introduction

Sentiment analysis involves multi-disciplinary research

such as information retrieval [15], text mining [11, 28], ma-

chine learning [16, 26], and natural language processing

NLP [29, 30]. In the field of information retrieval (IR),

sentiment analysis is seen as a special kind of document

retrieval and ranking process that aims at retrieving views

on certain entities such as products, people, organizations

rather than simply retrieving topical information of the en-

tities [15, 33]. One sub-task commonly set in sentiment

analysis is to determine the orientation (i.e., polarity) of an

opinionated expression. For research on sentiment analysis,

a Blog Track of the annual TREC conference has been es-

tablished to benchmark the performance of state-of-the-art

opinion retrieval systems [15, 31]. Commercial sentiment

analysis systems such as Reuters NewsScope Sentiment En-

gine 1 can only extract sentiments for a particular domain

(e.g., the stock investment domain). To enable an auto-

mated sentiment analysis process to be applied to a wide

range of application domains and languages, it is desirable

that an sentiment analysis system can learn a set of domain-

specific sentiment indicators (i.e., a sentiment lexicon) with

minimal human intervention. The reason is that construct-

ing sentiment lexicon manually is very labor-intensive and

the expertise required for their construction may not even

be available for certain application domains.

Nevertheless, automated sentiment lexicon construction

involves several fundamental research challenges, as does

sentiment analysis in general. First, there is inevitably a de-

gree of uncertainty related to the identification of targeted

entities and the associated sentiments expressed in natural

language. Second, it is difficult to accurately determine

the polarity of a sentiment across various domains. For in-

stance, while the token “small” has a negative orientation in

a review about hotel (e.g., “the room is small”), the same

1http://www.reuters.se/productinfo/newsscopesentiment/description.aspx

2011 Eighth IEEE International Conference on e-Business Engineering

978-0-7695-4518-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ICEBE.2011.55

131

token could have a positive orientation such as “a small net-

book is convenient for a business trip” in a computer review.

In fact, the token “small” has a strong negative sense as de-

fined in the SentiWordNet sentiment lexicon [4].

It is believed that sentiment analysis can extract the busi-

ness intelligence to help business managers or marketers

enhance product design or market planning [2, 7]. By us-

ing an automatically constructed domain-specific sentiment

lexicon, we expect to be able to extract sentiment features

from online reviews or comments to facilitate business pro-

cesses (e.g., predicting product sales for more effective mar-

ket planning). One main contribution of this paper is the

development of a novel computational method to automat-

ically construct domain-specific sentiment lexicons based

on a variant of the Kullback-Leibler (KL) divergence statis-

tically learning method. Our main intuition is that there are

plenty of user-labeled opinionated documents (e.g., product

reviews) on the Web, and these labeled opinionated doc-

uments can be used as the input training examples for a

statistical learning method to learn which particular senti-

ment word is likely to be positive or negative. As with the

case of personalized information retrieval, an opinionated

document with a positive sentiment orientation is likely to

contain more positive sentiment indicators, whereas a neg-

ative opinionated document is more likely to contain more

negative sentiment indicators. By computing the divergence

of the probabilities of a sentiment word appearing in posi-

tive opinionated documents and negative opinionated doc-

uments of a given domain, it is possible to automatically

extract a set of positive sentiments and a set of negative sen-

timents to build a domain-specific sentiment lexicon.

2 Related Research

Domain-specific sentiment lexicon expansion method

has been examined by using linguistic rules [18, 19]. The

basic intuition is that sentiment words are often associated

with product features, and sentiment words may also be col-

located with other sentiment words in a sentence. There-

fore, the first propagation can be conducted by locating

some common product features, and then searching for ad-

jacent words with specific Part-Of-Speech (POS) as senti-

ment words. Once initial sentiment words are identified,

the second propagation can be invoked by further extract-

ing the sentiment words which are adjacent to the initially

identified sentiment words. To identify the candidate prod-

uct features and sentiment words, a dependency tree can be

constructed. The polarities of the automatically extracted

sentiment words are determined based on linguistic rules

as well. For instance, for sentiment words connected by

conjunction operators such as “and”, ”as well”, “too”, etc.

within the same sentence, usually they should have the same

polarity. The double propagation method is compared to a

supervised machine learning classifier, namely Conditional

Random Fields (CRF); it is shown that the proposed method

is more effective than CRF [18, 19]. The method proposed

in this paper differs from the double propagation method in

that statistical learning rather than linguistic rules is applied

to construct domain-specific sentiment lexicon. Although

linguistic rules are effective for some cases, they may not be

able to cover a variety of sentiment and product feature pat-

terns embedded in product reviews. Our proposed method

is more general and is readily applied to different languages

and domains.

Another attempt has been made to build a domain-

specific opinion dictionary based on a statistical learning

method called divergence from randomness (DFR) [6]. The

DFR approach measures the divergence between a term’s

probability distribution in a set of relevant and opinionated

documents and its probability distribution in a set of rele-

vant documents. However, the DFR method was used to

extract opinionated terms only; the orientations of these

terms were not estimated. The Average Opinionated En-

tropy (AOE) function was designed to facilitate the auto-

matic construction of an opinion lexicon for a particular

topic [1]. The AOE function measures the divergence of

the term frequencies in the set of opinionated and rele-

vant documents from those in the set of relevant-only docu-

ments. The computational mechanism for the AOE function

is underpinned by Kullback-Leibler (KL) divergence. In a

more recent study, a constrained non-negative matrix tri-

factorization method has been applied to extract sentiment-

laden terms from a domain, after which these terms are used

to conduct opinion orientation prediction for another do-

main [14].

Semantic orientation (SO) analysis was proposed to ex-

pand a list of seeding sentiments [28]. The orientation of an

arbitrary term was estimated based on the strength of asso-

ciation between the term and fourteen seeding sentiments.

Point-wise mutual information (PMI) and latent semantic

analysis (LSA) were used to estimate the strength of as-

sociation between any pair of words based on a Web cor-

pus. A semantic inference-based method was also explored

to estimate the polarities of sentiments by expanding a list

of seeding sentiments based on the synonym and antonym

sets captured in WordNet [7]. The PMI method was also

used to expand a list of seeding sentiment indicators based

on the training corpus of the 2008 TREC Blog Track [13].

The problem of generating a list of sentiment indicators

across different domains is referred as the domain-transfer

problem [26, 27]. The relative similarity ranking (RSR)

method was developed to select the most informative and

opinionated documents from a training set to retrain a clas-

sifier [26]. A mutual reinforcement approach underpinned

by PMI was also applied to extract the hidden associations

between entity feature groups and opinion indicator groups

132

from a Chinese review corpus [24].

3 System Architecture

The general system architecture of the proposed domain-

specific sentiment analysis system is depicted in Figure 1.

A user first selects a product domain for sentiment anal-

ysis. Based on the selected domain, our system will use

dedicated crawlers or external Web services 2 to retrieve

product reviews. Traditional document pre-processing pro-

cedures [22] are then invoked to process the product re-

views and product descriptions. Similar to previous stud-

ies, a product feature is represented by a Noun or a Noun

compound [2, 7, 17], and sentiment words are represented

by Adjective or Adverb [25]. As indicated in previous stud-

ies [19, 18], a sentiment is usually associated with a fea-

ture. By identifying product features, we want to prune

the Adjective or Adverb phrases which are not really sen-

timents. Based on the automatically constructed domain-

specific sentiment lexicons, the sentiment analysis module

can identify sentiment words in product reviews and de-

termine the polarity of a review or a product. Our proto-

type system was developed using Java (J2SE v 1.4.2), Java

Server Pages (JSP) 2.1, and Servlet 2.5. The prototype sys-

tem is operated under Apache Tomcat 6.0 which is hosted

on a DELL 1950 III Server with 8TB disk space.

Figure 1. The System Architecture of aDomain-specific Sentiment Analysis System

2http://ecs.amazonaws.com/AWSECommerceService/

4 Automated Learning of Domain-specificSentiment Lexicons

Although machine learning techniques have been ex-

plored for the expansion of hand-crafted sentiment lexicon,

a large number of manually labeled training examples at the

sentence level are often required to train an accurate classi-

fier. Given the sheer volume of user contributed and la-

beled opinionated expressions (e.g., product reviews) in the

era of Web 2.0, our proposed statistical learning approach

which does not require additional manual annotation to gen-

erate the training data. Therefore, our method can scale

up to process large volume of reviews and can readily be

applied to different problem domains. Moreover, our pro-

posed method also utilizes proven IR techniques [22] and

an adapted statistical learning method [8, 10] for automatic

sentiment identification and polarity prediction. These

methods have been empirically tested and they are efficient

enough to process opinionated documents of a Web scale.

First, the TFIDF measure is used to identify the most

representative terms within a training review. Second, the

nouns (candidate features) among the most representative

terms are identified. Low frequency product features are

identified by matching the terms of a training review with

the common features appearing in some product descrip-

tions. Finally, the adjacent Adjective or Adverb of the can-

didate features are treated as candidate sentiment words.

The specific TFIDF measure we used in our prototype

system is defined according to the formulation presented

in [21]. For each identified candidate product feature, the

adjacent Adjectives or Adverbs are taken as the potential

sentiment words [18, 19]. From our perspective, this is

a kind of pruning method which tries to remove the Ad-

jectives or Adverbs that are not really sentiments. The

proposed methodology of automated domain-specific sen-

timent lexicon construction can be summarized as follows:

1. Domain-independent sentiment words can be extracted

from existing linguistic resources to build the basis of a

new sentiment lexicon; for example, the strong subjec-

tive words defined in OpinionFinder [20] can be used

to infer domain-independent sentiment words;

2. The conjunctive rule [32] is applied to automatically

extract additional sentiment words and predict their

polarities from an arbitrary domain; the conjunctive

rule states that sentiment words connected by standard

conjunction operators such as “and”, “or”, “also”, etc.

in the same sentence should have the same polarity;

3. Apply a statistical learning method to expand domain-

specific sentiment words which cannot be discovered

based on the previous steps.

133

For the discovery of additional domain-dependent senti-

ment words which cannot be discovered by the conjunctive

rule, a Kullback-Leibler (KL) divergence [9] based statisti-

cal learning method is invoked. The main task is to estimate

the polarity of a domain-dependent sentiment word. Instead

of asking human annotators to tagged examples to train a

classification function as in the previous work [29, 30], we

would like to develop an automated sentiment lexicon con-

struction method which can leverage the rated reviews al-

ready available at Web 2.0 sites such as amazon.com. In the

era of Web 2.0, user-contributed data is the norm, and there

have been huge number of human labeled review data that

we can leveraged to conduct statistical learning. For exam-

ple, an Amazon rating of 4− 5 can be regarded as positive,

and a rating of 1 can be taken as negative; a mid range rating

of 2 − 3 is considered neutral. The basic intuition is that a

positive review is more likely to contain positive sentiments

than a negative review does. Therefore, we may use the

label of a product review to infer the sentiment polarity of

an individual product feature and sentiment pair. Figure 2

shows an example of a user labeled product review posted

to amazon.com.

Figure 2. Labeled Amazon Product Reviews

Based on the theory of Kullback-Leibler (KL) diver-

gence [9], an effective measure called Keyword Classifier

(KC) has been developed to identify positive, negative, and

neutral keywords representing an information seeker’s pos-

itive, negative, or neutral information needs [8, 10]. The

KC method has been adapted and successfully applied to

induce positive and negative beliefs in adaptive informa-

tion filtering [10, 23]. Instead of summing the probabili-

ties characterizing the positive and negative events as in the

original KL divergence formulation, the KC measure takes

a subtraction between the conditional probabilities related

to the positive and the negative events. Such a formula-

tion corresponds to our intuition of weighting positive and

negative sentiments presented in product reviews. For our

domain-specific sentiment lexicon construction system, we

apply the adapted KC measure [10, 23] to induce the polar-

ity and the corresponding strength of a candidate sentiment

from a review. Although it is possible that a positively rated

review may contain negative sentiments, these negative sen-

timents may not repeatedly occur in many positively rated

reviews. Accordingly, when a large number of labeled re-

views are provided, the proposed statistical learning method

can still correctly identify the positive and the negative sen-

timent words.

A variant of the KC formulation is shown as follows [12,

10]:

score(t) = tanh(df(t)α Pr(Pos|t) log2 Pr(Pos|t)Pr(Pos) −

df(t)β Pr(Neg|t) log2 Pr(Neg|t)

Pr(Neg) )(1)

where −1 ≥ score(t) ≤ 1 is the sentiment score for a

candidate sentiment; it indicates how likely the candidate

word could be relevant sentiment for the specific domain.

The parameters α and β are the learning thresholds for pos-

itive sentiment and negative sentiment respectively. tanhis the hyperbolic tangent function. The term Pr(Pos|t) =df(tpos)df(t) is the estimated conditional probability that a re-

view is rated positive given that it contains a candidate sen-

timent term t. It is expressed as the fraction of the number of

positive reviews which contain the pair t (i.e., df(tpos)) over

the total number of reviews which contain t (i.e., df(t)).

Similarly, Pr(Neg|t) =df(tneg)df(t) is the estimated condi-

tional probability that a review is rated negative when it

contains the pair t. In addition, Pr(Pos) = |D+||D+|+|D−|

is the estimated priori probability that a review is positive,

whereas Pr(Neg) = |D−||D+|+|D−| is the estimated priori

probability that a review is negative in the review collection.

D+ (D−) is the set of positively (negatively) rated reviews

extracted from Web 2.0 sites. A positive score(t) indicates

that t probably contains a positive sentiment, whereas a neg-

ative score reveals that t probably carries a negative sense.

If the score of a pair is below a threshold λ, the polarity of

the pair is uncertain, and hence it will be treated as neutral.

The following formula converts the initial score of t to the

final polarity value of a feature and sentiment pair captured

in the automatically constructed sentiment lexicon:

polarity(t) =

⎧⎪⎨⎪⎩

score(t)−λ1−λ if score(t) > λ

−(

|score(t)|−λ1−λ

)if score(t) < −λ

0 otherwise(2)

To predict the polarity of an unseen opinionated expres-

sion (e.g., a product review), the automatically constructed

134

domain-specific sentiment lexicon is applied. The follow-

ing formula is applied to estimate the polarity strength of

the opinionated expression d:

strength(d) =∑

(f,t)∈d

polarity(t) (3)

where strength(d) is the polarity strength of the opinion-

ated expression d and polarity(t) is the polarity score of the

sentiment indicator captured in the automatically generated

sentiment lexicon. The term (f, t) indicates that only a sen-

timent word t adjacent to a candidate feature f will be taken

into account. The candidate feature f is identified based

on the aforementioned method. This approach can reduce

the chance of mistakenly assigning sentiment score to some

out-of-context words. For a sentiment which contributes to

the overall document score, it must be be associated with

some recognized product features. In other words, we use

product features to filter the noisy sentiments embedded in

product reviews. As a result, the sentiment score of a review

may become more accurate. Similarly, the polarity score of

a product is the summation of the polarity scores of all the

reviews posted for that product.

5 System Evaluation

Similar to previous studies [2], we retrieved real product

reviews from amazon.com using the Amazon Web services

APIs. In this experiment, we tried to test if our system gen-

erated domain-specific sentiment lexicon is more effective

than a hand-crafted generic sentiment lexicon when they

were applied to sentiment analysis tasks, particularly, the

polarity detection sub-task [15]. Our evaluation work was

conducted based on five Amazon product categories such

as Grocery, Cameras, Computers, DVD, and Software. The

details of this data set are shown in Table 1.

Table 1. The Amazon Review Data SetCategory BrowseNode ID No. of Products No. of Reviews

Grocery 3760931 16,008 122,474

Camera 281052 2,372 126,238

Computer 541966 26,466 515,182

DVD 163416 3,677 173,674

Software 409488 2,677 58,599

Total 51,200 996,167

For each product category, 90% of the product reviews

were used to build a domain-specific sentiment lexicon

first. Then, the remaining 10% reviews of a product cate-

gory were used to test the effectiveness of our experimen-

tal system. That is, using Eq. 3 to compute the polarity

score for each test review. If the review score is positive,

the polarity of that review is predicted as positive; other-

wise, it is predicted as negative. For our experiment, we

treated the user review ratings of 4-5 as positive and the

user review rating of 1 as negative; these ratings were as-

sumed to be the ground truth for our experiment. Only two

classes (positive and negative) were evaluated. On average,

around 71% of the reviews are positive in a product cat-

egory. The parameters α, β, and λ were empirically es-

tablished based on a subset of the training collection, and

their values of 300, 100, and 0.26 were used in this experi-

ment. A baseline system was also developed based on Opin-

ionFinder [20] (a widely used generic sentiment lexicon).

Both the polarity prediction results from the experimental

and the baseline system were collected and analyzed. The

common IR evaluation metrics such as precision, recall, and

F-measure [22, 21] were applied.

Table 2. Comparative Performance of PolarityPrediciton

Product Experimental Baseline F-measure

Category Recall Precision Recall Precision Δ Chg.

Grocery 0.680 0.663 0.640 0.689 +1.11%

Camera 0.702 0.688 0.660 0.717 +1.10%

Computer 0.712 0.711 0.662 0.726 +2.72%

DVD 0.690 0.683 0.636 0.711 +2.23%

Software 0.672 0.661 0.624 0.689 +1.82%

Average 0.691 0.681 0.640 0.702 +1.79%

The performance of the experimental system and the

baseline system is depicted in Table 2. The columns la-

beled “Experimental” depict the recall and precision fig-

ures of the experimental system. The average improvement

(in terms of F-measure) of sentiment analysis brought by

the domain-specific sentiment lexicon reaches 1.79%. As

shown in Table 2, the system generated domain-specific

sentiment lexicon enables more effective polarity prediction

at the document level when compared to the baseline sys-

tem which only utilizes a generic sentiment lexicon. Per-

formance improvement at the “computer” product category

is the biggest probably because sentiment words such as

“small”, “tiny”, “little”, etc are defined as negative in Opin-

ionFinder (a generic sentiment lexicon). However, these

sentiment words are generally considered positive in the do-

main of notebook computer (e.g., a “small” notebook com-

puter is good for traveling). As a whole, our experimental

results reveal that the proposed method for automatic con-

struction of domain-specific sentiment lexicon is effective,

and the resulting sentiment lexicon can improve the effec-

tiveness of sentiment analysis in general.

135

Table 3. Linear Regression for Product SalesRanks

Variables Camera DVD

ln(Price) .258*** (.059) -.069 (.082)

Rating -.153*** (.056) .119** (.059)

ln(Nreviews) -.480*** (.022) -.370*** (.120)

ln(Sentiment) -.438*** (.084) -.470*** (.086)

R2 .491 .465

Adj R2 .476 .446

F F (5,529) = 28.368*** F (5.574) = 27.164 ***

# of Observations 535 580

6 Predicting Sales with Product Sentiments

By using linear regression models, previous studies

showed that sentiments of product reviews might be a pre-

dictor of product sales [2, 5]. For our research, we try to

verify if the proposed sentiment metric (i.e. Eq. 3) is a sig-

nificant predictor for product sales. Similar to the previous

approaches which applied econometric analysis to product

reviews [2, 3, 5], we developed a linear regression model to

estimate the influence of various product related attributes

(e.g., retail price, review rating, review sentiment, etc.) on

the sales rank of a product (via the proxy of sales rank).

It has been shown that a linear relationship exists between

sales ranks and actual product sales [3]. Our linear regres-

sion model is shown below:

ln(SalesRankp) = νp + β1p · ln(Pricep)+

β2p ·Ratingp+

β3p · ln(Nreviewsp)+

β4p · ln(Sentimentp)+

β5p · ShipDummiesp+

εp

(4)

where νp is a fixed effect of product p and it may be re-

lated to factors such as the quality of the product, the brand

loyalty of the product, the popularity of the manufacturer,

etc. The variables SalesRankp and Pricep are the sales

rank and the retial price of the product as shown on ama-

zon.com. Moreover, Ratingp represents the average review

rating of the product. The variable Nreviewp represents

the number of reviews attached to the product. The rea-

son of applying the log specification (e.g., ln(Nreviewp))instead of levels is that the log specification can estimate

the effect of a change in the independent variable (e.g.,

ln(Nreviewp)) on the percentage of change in the depen-

dent variable (e.g., sales rank) [2, 3, 5]. The variable εprepresents a random disturbance factor which is assumed to

be normally distributed [2]. The dummy factor for shipment

time ShipDummiesp is used to control the product rank-

ing variability due to the different promised delivery time

of products [3]. We encode this dummy variable by refer-

ring to the estimated shipment time of a product available

on amazon.com. For instance, the estimated shipment time

by amazon.com can be expressed in terms of hours, days,

weeks, months, or no estimation at all. We then encode the

shipment dummy by the values of 1, 2, 3, 4, and 5 respec-

tively. We do not incorporate the factor of the total length

(characters) of reviews related to a product because it was

shown not a significant factor in the previous studies [2, 3].

Instead, we apply our new metric Sentimentp which rep-

resents the total sentiment scores of all the reviews related

to the product p. This factor is also the main focus of our

current empirical study. If the sales rank of a product is not

available at amazon.com, that product was not included in

our regression analysis.

The results of our regression analysis is shown in Ta-

ble 3. The number of observations represent the products

included in our regression analysis. These products have

the corresponding sales ranks and retails prices made avail-

able at amazon.com, and therefore they can be used in our

regression analysis. The notations used in the table are as

follows: * indicating p < .10, ** indicating p < .05, ***

indicating p < .01; figures in parenthesis are standard er-

rors; dummy variables are not shown in the table. All the

linear regressions are statistically significant as shown by

the F values. The interpretation of the sign of the coeffi-

cient related to an independent variable is that a negative

sign implies a negative relationship between the variable

(e.g., review rating) and sales ranks. According to Table 3,

the signs of the coefficients related to review ratings are neg-

ative (correct). A negative sign of a coefficient implies that

higher review rating (e.g., close to 5) is related to a decre-

ment of the Amazon sales rank (i.e., the product is ranked

toward the top of the Amazon ranked list) and an increment

in actual product sales [3]. According to our regression

analysis, sentiment factor is a significant predicator for sales

for the product categories of “camera” and “DVD”. Accord-

ing to our empirical study, it is feasible to use the sentiment

metric (e.g., Eq.3) derived from product reviews to predict

product sales, at least for the Amazon product categories of

Camera and DVD.

7 Conclusions and Future Work

In the era of Web 2.0, there has been an explosive

growth of the number of opinion expressions such as

product reviews, movie reviews, political comments, etc.

posted to the Internet. Sentiment analysis systems open the

door to extract valuable business intelligence from online

opinionated expressions. Nevertheless, existing sentiment

analysis systems mainly employ manually constructed

136

generic sentiment lexicons to identify sentiment words

embedded in opinionated texts. This is a classical problem

of the knowledge acquisition bottle-neck since manually

constructing sentiment lexicons is very time-consuming

and these generic lexicons may not be effectively support

sentiment analysis across different application domains.

One main contribution of our research work is the devel-

opment of a novel statistical learning based computational

method for the automatic construction of domain-specific

sentiment lexicons to enhance cross-domain sentiment

analysis. Our initial experiments show that the proposed

method can generate domain-specific sentiment lexicons

which lead to an average improvement of polarity predic-

tion at the document level by 1.79% when compared to

a baseline system which employs a hand-crafted generic

sentiment lexicon. Through linear regression, another

contribution of our research is that we empirically test the

feasibility of using the sentiment factor to predict product

sales (via the proxy of sales ranks). Such an empirical

finding has significant impact on applying the proposed

sentiment analysis method to extract business intelligence

in practice. Future work will conduct more experiments to

examine the effectiveness of the proposed method across

more diversified application domains. Moreover, direct

user evaluation about the quality of the system generated

domain-specific sentiment lexicons will be performed.

Acknowledgment

The work reported in this paper has been funded in part

by the HK GRF Research Grant (Project: 9041569).

References

[1] Giambattista Amati, Edgardo Ambrosi, Marco Bianchi,

Carlo Gaibisso, and Giorgio Gambosi. Automatic con-

struction of an opinion-term vocabulary for ad hoc retrieval.

In Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian

Ruthven, and Ryen W. White, editors, Advances in Informa-tion Retrieval, 30th European Conference on IR Research,ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceed-ings, volume 4956 of Lecture Notes in Computer Science,

pages 89–100. Springer, 2008.

[2] Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeiro-

tis. Show me the money!: deriving the pricing power of prod-

uct features by mining consumer reviews. In Pavel Berkhin,

Rich Caruana, and Xindong Wu, editors, Proceedings of the13th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, San Jose, California, USA, Au-gust 12-15, 2007, pages 56–65. ACM, 2007.

[3] J.A. Chevalier and D. Mayzlin. The effect of word of mouth

on sales: Online book reviews. Journal of Marketing Re-search, 43:345–354, 2006.

[4] Andrea Esuli and Fabrizio Sebastiani. Determining the

semantic orientation of terms through gloss classification.

In Otthein Herzog, Hans-Jorg Schek, Norbert Fuhr, Abdur

Chowdhury, and Wilfried Teiken, editors, Proceedings of the2005 ACM CIKM International Conference on Informationand Knowledge Management, Bremen, Germany, October31 - November 5, 2005, pages 617–624. ACM, 2005.

[5] Anindya Ghose and Panagiotis G. Ipeirotis. Designing novel

review ranking systems: predicting the usefulness and im-

pact of reviews. In Maria L. Gini, Robert J. Kauffman,

Donna Sarppo, Chrysanthos Dellarocas, and Frank Dignum,

editors, Proceedings of the 9th International Conference onElectronic Commerce, volume 258, pages 303–310. ACM,

2007.

[6] B. He, C. Macdonald, J. He, and I. Ounis. An effective statis-

tical approach to blog post opinion retrieval. In Proceedingsof the 17th ACM Conference on Information and KnowledgeManagement, pages 1063–1072. Association for Computing

Machinery (ACM), 2009.

[7] Minqing Hu and Bing Liu. Mining and summarizing cus-

tomer reviews. In Won Kim, Ron Kohavi, Johannes Gehrke,

and William DuMouchel, editors, Proceedings of the TenthACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, Seattle, Washington, USA, August22-25, 2004, pages 168–177. ACM, 2004.

[8] T. Kindo, H. Yoshida, T. Morimoto, and T. Watanabe. Adap-

tive personal information filtering system that organizes per-

sonal profiles automatically. In Martha E. Pollack, editor,

Proceedings of the Fifteenth International Joint Conferenceon Artificial Intelligence, pages 716–721, Nagoya, Japan,

August 23–29, 1997. Morgan Kaufmann publishers.

[9] S. Kullback and R.A. Leibler. On Information and Suffi-

ciency. The Annals of Mathematical Statistics, 22(1):79–86,

1951.

[10] R.Y.K. Lau, P. Bruza, and D. Song. Towards a Belief Revi-

sion Based Adaptive and Context-Sensitive Information Re-

trieval System. ACM Transactions on Information Systems,

26(2):8.1–8.38, 2008.

[11] R.Y.K. Lau, C.L. Lai, and Y. Li. Leveraging the web context

for context-sensitive opinion mining. In Proceedings of the2009 IEEE International Conference on Computer Scienceand Information Technology, pages 467–471. IEEE, 2009.

[12] R.Y.K. Lau, D. Song, Y. Li, C.H. Cheung, and J.X. Hao.

Towards A Fuzzy Domain Ontology Extraction Method for

Adaptive e-Learning. IEEE Transactions on Knowledge andData Engineering, 21(6):800–813, 2009.

[13] Bin Li, Feifan Liu, and Yang Liu. UTDallas at TREC 2008

blog track. In Ellen M. Voorhees and Lori P. Buckland, edi-

tors, Proceedings of The Seventeenth Text REtrieval Confer-ence, TREC 2008, Gaithersburg, Maryland, USA, November18-21, 2008, volume Special Publication 500-277. National

Institute of Standards and Technology (NIST), 2008.

[14] Tao Li, Vikas Sindhwani, Chris H. Q. Ding, and Yi Zhang.

Knowledge transformation for cross-domain sentiment clas-

sification. In James Allan, Javed A. Aslam, Mark Sanderson,

137

ChengXiang Zhai, and Justin Zobel, editors, Proceedings ofthe 32nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, SIGIR2009, Boston, MA, USA, July 19-23, 2009, pages 716–717.

ACM, 2009.

[15] C. Macdonald and I. Ounis. Overview of the TREC

2007 Blog Track. In E.M. Voorhees, editor, Pro-ceedings of the Sixteenth Text REtrieval Conference,

Gaithersburg, Maryland, 2007. NIST. Available from

http://trec.nist.gov/pubs/trec16/.

[16] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.

Thumbs up? sentiment classification using machine learn-

ing techniques. In Proceedings of the 2002 Conference onEmpirical Methods in Natural Language Processing, pages

79–86, May 27 2002.

[17] Ana-Maria Popescu, Bao Nguyen, and Oren Etzioni.

OPINE: Extracting product features and opinions from re-

views. In Proceedings of the 2005 Human Language Tech-nology Conference and Conference on Empirical Methods inNatural Language Processing, pages 339–346. The Associ-

ation for Computational Linguistics, 2005.

[18] Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. Expand-

ing domain sentiment lexicon through double propagation.

In Craig Boutilier, editor, Proceedings of the 21st Interna-tional Joint Conference on Artificial Intelligence, Pasadena,California, USA, July 11-17, 2009, pages 1199–1204, 2009.

[19] Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. Opinion

word expansion and target extraction through double propa-

gation. Computational Linguistics, 1(37):9–27, 2011.

[20] Ellen M. Riloff, Theresa Wilson, Paul Hoffmann, Swapna

Somasundaran, Jason Kessler, Janyce Wiebe, Yejin Choi,

Claire Cardie, and Siddharth Patwardhan. Opinionfinder:

a system for subjectivity analysis. In Proceedings of the2005 Human Language Technology Conference and Confer-ence on Empirical Methods in Natural Language Process-ing, pages 34–35. Association for Computational Linguis-

tics, 2005.

[21] G. Salton. Developments in automatic text retrieval. Science,

253(5023):974–980, August 1991.

[22] G. Salton and M.J. McGill. Introduction to Modern Informa-tion Retrieval. McGraw-Hill, New York, New York, 1983.

[23] D. Song, R.Y.K. Lau, P.D. Bruza, K.F. Wong, and D.Y. Chen.

An adaptive information agent for document title classifica-

tion and filtering in document-intensive domains. DecisionSupport Systems, 44(1):251–265, 2008.

[24] Qi Su, Xinying Xu, Honglei Guo, Zhili Guo, Xian Wu, Xi-

aoxun Zhang, Bin Swen, and Zhong Su. Hidden sentiment

association in chinese web opinion mining. In Jinpeng Huai,

Robin Chen, Hsiao-Wuen Hon, Yunhao Liu, Wei-Ying Ma,

Andrew Tomkins, and Xiaodong Zhang, editors, Proceed-ings of the 17th International Conference on World WideWeb, WWW 2008, Beijing, China, April 21-25, 2008, pages

959–968. ACM, 2008.

[25] V. S. Subrahmanian and Diego Reforgiato Recupero. AVA:

Adjective-verb-adverb combinations for sentiment analysis.

IEEE Intelligent Systems, 23(4):43–50, 2008.

[26] Songbo Tan, Yuefen Wang, and Xueqi Cheng. Combining

learn-based and lexicon-based techniques for sentiment de-

tection without using labeled examples. In Proceedings ofthe 31st Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, pages

743–744. ACM, 2008.

[27] Songbo Tan, Gaowei Wu, Huifeng Tang, and Xueqi Cheng.

A novel scheme for domain-transfer problem in the context

of sentiment analysis. In Mario J. Silva, Alberto H. F. Laen-

der, Ricardo A. Baeza-Yates, Deborah L. McGuinness, Bjørn

Olstad, Øystein Haug Olsen, and Andre O. Falcao, editors,

Proceedings of the Sixteenth ACM Conference on Informa-tion and Knowledge Management, CIKM 2007, Lisbon, Por-tugal, November 6-10, 2007, pages 979–982. ACM, 2007.

[28] Peter D. Turney and Michael L. Littman. Measuring praise

and criticism: Inference of semantic orientation from as-

sociation. ACM Transactions on Information Systems,

21(4):315–346, October 2003.

[29] Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. Rec-

ognizing contextual polarity: An exploration of features for

phrase-level sentiment analysis. Computational Linguistics,

35(3):399–433, 2009.

[30] Theresa Wilson, Janyce Wiebe, and Rebecca Hwa. Just how

mad are you? finding strong and weak opinion clauses. In

Deborah L. McGuinness and George Ferguson, editors, Pro-ceedings of the Nineteenth National Conference on Artifi-cial Intelligence, Sixteenth Conference on Innovative Appli-cations of Artificial Intelligence, July 25-29, 2004, San Jose,California, USA, pages 761–769. AAAI Press / The MIT

Press, 2004.

[31] X. Xu, Y. Liu, H. Xu, X. Yu, Z. Peng, X. Cheng, and L. Xiao.

ICTNET at Blog Track TREC 2010. In E.M. Voorhees, ed-

itor, Proceedings of the Nineteenth Text REtrieval Confer-ence, Gaithersburg, Maryland, 2010. NIST. Available from

http://trec.nist.gov/pubs/trec19/.

[32] Kiduk Yang, Ning Yu, and Hui Zhang. WIDIT in

TREC-2007 Blog Track: Combining lexicon-based Meth-

ods to Detect Opinionated Blogs. In E.M. Voorhees, ed-

itor, Proceedings of the Sixteenth Text REtrieval Confer-ence, Gaithersburg, Maryland, 2007. NIST. Available from

http://trec.nist.gov/pubs/trec16/.

[33] Min Zhang and Xingyao Ye. A generation model to unify

topic relevance and lexicon-based sentiment for opinion re-

trieval. In Sung-Hyon Myaeng, Douglas W. Oard, Fabrizio

Sebastiani, Tat-Seng Chua, and Mun-Kew Leong, editors,

Proceedings of the 31st Annual International ACM SIGIRConference on Research and Development in InformationRetrieval, SIGIR 2008, Singapore, July 20-24, 2008, pages

411–418. ACM, 2008.

138