[ieee 2011 ieee 8th international conference on e-business engineering (icebe) - beijing, china...
TRANSCRIPT
Learning Domain-specific Sentiment Lexicons for Predicting Product Sales
Raymond Y.K. Lau and Wenping ZhangDepartment of Information Systems
City University of Hong KongTat Chee Avenue, Kowloon, Hong Kong SAR
E-mail: {raylau, wzhang23}@cityu.edu.hk
Peter D. BruzaFaculty of Science and Technology
Queensland University of TechnologyBrisbane QLD 4001, Australia
K.F. WongDepartment of Systems Engineering & Engineering Management
Chinese University of Hong KongShatin, Hong Kong [email protected]
Abstract
Generic sentiment lexicons have been widely used forsentiment analysis these days. However, manually con-structing sentiment lexicons is very time-consuming and itmay not be feasible for certain application domains whereannotation expertise is not available. One contribution ofthis paper is the development of a statistical learning basedcomputational method for the automatic constructionof domain-specific sentiment lexicons to enhance cross-domain sentiment analysis. Our initial experiments showthat the proposed methodology can automatically generatedomain-specific sentiment lexicons which contribute to im-prove the effectiveness of opinion retrieval at the documentlevel. Another contribution of our work is that we showthe feasibility of applying the sentiment metric derivedbased on the automatically constructed sentiment lexiconsto predict product sales of certain product categories. Ourresearch contributes to the development of more effectivesentiment analysis system to extract business intelligencefrom numerous opinionated expressions posted to the Web.
Keywords: Sentiment Lexicon, Sentiment Analysis, Statisti-cal Learning, Kullback-Leibler divergence, Business Intel-ligence.
1 Introduction
Sentiment analysis involves multi-disciplinary research
such as information retrieval [15], text mining [11, 28], ma-
chine learning [16, 26], and natural language processing
NLP [29, 30]. In the field of information retrieval (IR),
sentiment analysis is seen as a special kind of document
retrieval and ranking process that aims at retrieving views
on certain entities such as products, people, organizations
rather than simply retrieving topical information of the en-
tities [15, 33]. One sub-task commonly set in sentiment
analysis is to determine the orientation (i.e., polarity) of an
opinionated expression. For research on sentiment analysis,
a Blog Track of the annual TREC conference has been es-
tablished to benchmark the performance of state-of-the-art
opinion retrieval systems [15, 31]. Commercial sentiment
analysis systems such as Reuters NewsScope Sentiment En-
gine 1 can only extract sentiments for a particular domain
(e.g., the stock investment domain). To enable an auto-
mated sentiment analysis process to be applied to a wide
range of application domains and languages, it is desirable
that an sentiment analysis system can learn a set of domain-
specific sentiment indicators (i.e., a sentiment lexicon) with
minimal human intervention. The reason is that construct-
ing sentiment lexicon manually is very labor-intensive and
the expertise required for their construction may not even
be available for certain application domains.
Nevertheless, automated sentiment lexicon construction
involves several fundamental research challenges, as does
sentiment analysis in general. First, there is inevitably a de-
gree of uncertainty related to the identification of targeted
entities and the associated sentiments expressed in natural
language. Second, it is difficult to accurately determine
the polarity of a sentiment across various domains. For in-
stance, while the token “small” has a negative orientation in
a review about hotel (e.g., “the room is small”), the same
1http://www.reuters.se/productinfo/newsscopesentiment/description.aspx
2011 Eighth IEEE International Conference on e-Business Engineering
978-0-7695-4518-9/11 $26.00 © 2011 IEEE
DOI 10.1109/ICEBE.2011.55
131
token could have a positive orientation such as “a small net-
book is convenient for a business trip” in a computer review.
In fact, the token “small” has a strong negative sense as de-
fined in the SentiWordNet sentiment lexicon [4].
It is believed that sentiment analysis can extract the busi-
ness intelligence to help business managers or marketers
enhance product design or market planning [2, 7]. By us-
ing an automatically constructed domain-specific sentiment
lexicon, we expect to be able to extract sentiment features
from online reviews or comments to facilitate business pro-
cesses (e.g., predicting product sales for more effective mar-
ket planning). One main contribution of this paper is the
development of a novel computational method to automat-
ically construct domain-specific sentiment lexicons based
on a variant of the Kullback-Leibler (KL) divergence statis-
tically learning method. Our main intuition is that there are
plenty of user-labeled opinionated documents (e.g., product
reviews) on the Web, and these labeled opinionated doc-
uments can be used as the input training examples for a
statistical learning method to learn which particular senti-
ment word is likely to be positive or negative. As with the
case of personalized information retrieval, an opinionated
document with a positive sentiment orientation is likely to
contain more positive sentiment indicators, whereas a neg-
ative opinionated document is more likely to contain more
negative sentiment indicators. By computing the divergence
of the probabilities of a sentiment word appearing in posi-
tive opinionated documents and negative opinionated doc-
uments of a given domain, it is possible to automatically
extract a set of positive sentiments and a set of negative sen-
timents to build a domain-specific sentiment lexicon.
2 Related Research
Domain-specific sentiment lexicon expansion method
has been examined by using linguistic rules [18, 19]. The
basic intuition is that sentiment words are often associated
with product features, and sentiment words may also be col-
located with other sentiment words in a sentence. There-
fore, the first propagation can be conducted by locating
some common product features, and then searching for ad-
jacent words with specific Part-Of-Speech (POS) as senti-
ment words. Once initial sentiment words are identified,
the second propagation can be invoked by further extract-
ing the sentiment words which are adjacent to the initially
identified sentiment words. To identify the candidate prod-
uct features and sentiment words, a dependency tree can be
constructed. The polarities of the automatically extracted
sentiment words are determined based on linguistic rules
as well. For instance, for sentiment words connected by
conjunction operators such as “and”, ”as well”, “too”, etc.
within the same sentence, usually they should have the same
polarity. The double propagation method is compared to a
supervised machine learning classifier, namely Conditional
Random Fields (CRF); it is shown that the proposed method
is more effective than CRF [18, 19]. The method proposed
in this paper differs from the double propagation method in
that statistical learning rather than linguistic rules is applied
to construct domain-specific sentiment lexicon. Although
linguistic rules are effective for some cases, they may not be
able to cover a variety of sentiment and product feature pat-
terns embedded in product reviews. Our proposed method
is more general and is readily applied to different languages
and domains.
Another attempt has been made to build a domain-
specific opinion dictionary based on a statistical learning
method called divergence from randomness (DFR) [6]. The
DFR approach measures the divergence between a term’s
probability distribution in a set of relevant and opinionated
documents and its probability distribution in a set of rele-
vant documents. However, the DFR method was used to
extract opinionated terms only; the orientations of these
terms were not estimated. The Average Opinionated En-
tropy (AOE) function was designed to facilitate the auto-
matic construction of an opinion lexicon for a particular
topic [1]. The AOE function measures the divergence of
the term frequencies in the set of opinionated and rele-
vant documents from those in the set of relevant-only docu-
ments. The computational mechanism for the AOE function
is underpinned by Kullback-Leibler (KL) divergence. In a
more recent study, a constrained non-negative matrix tri-
factorization method has been applied to extract sentiment-
laden terms from a domain, after which these terms are used
to conduct opinion orientation prediction for another do-
main [14].
Semantic orientation (SO) analysis was proposed to ex-
pand a list of seeding sentiments [28]. The orientation of an
arbitrary term was estimated based on the strength of asso-
ciation between the term and fourteen seeding sentiments.
Point-wise mutual information (PMI) and latent semantic
analysis (LSA) were used to estimate the strength of as-
sociation between any pair of words based on a Web cor-
pus. A semantic inference-based method was also explored
to estimate the polarities of sentiments by expanding a list
of seeding sentiments based on the synonym and antonym
sets captured in WordNet [7]. The PMI method was also
used to expand a list of seeding sentiment indicators based
on the training corpus of the 2008 TREC Blog Track [13].
The problem of generating a list of sentiment indicators
across different domains is referred as the domain-transfer
problem [26, 27]. The relative similarity ranking (RSR)
method was developed to select the most informative and
opinionated documents from a training set to retrain a clas-
sifier [26]. A mutual reinforcement approach underpinned
by PMI was also applied to extract the hidden associations
between entity feature groups and opinion indicator groups
132
from a Chinese review corpus [24].
3 System Architecture
The general system architecture of the proposed domain-
specific sentiment analysis system is depicted in Figure 1.
A user first selects a product domain for sentiment anal-
ysis. Based on the selected domain, our system will use
dedicated crawlers or external Web services 2 to retrieve
product reviews. Traditional document pre-processing pro-
cedures [22] are then invoked to process the product re-
views and product descriptions. Similar to previous stud-
ies, a product feature is represented by a Noun or a Noun
compound [2, 7, 17], and sentiment words are represented
by Adjective or Adverb [25]. As indicated in previous stud-
ies [19, 18], a sentiment is usually associated with a fea-
ture. By identifying product features, we want to prune
the Adjective or Adverb phrases which are not really sen-
timents. Based on the automatically constructed domain-
specific sentiment lexicons, the sentiment analysis module
can identify sentiment words in product reviews and de-
termine the polarity of a review or a product. Our proto-
type system was developed using Java (J2SE v 1.4.2), Java
Server Pages (JSP) 2.1, and Servlet 2.5. The prototype sys-
tem is operated under Apache Tomcat 6.0 which is hosted
on a DELL 1950 III Server with 8TB disk space.
Figure 1. The System Architecture of aDomain-specific Sentiment Analysis System
2http://ecs.amazonaws.com/AWSECommerceService/
4 Automated Learning of Domain-specificSentiment Lexicons
Although machine learning techniques have been ex-
plored for the expansion of hand-crafted sentiment lexicon,
a large number of manually labeled training examples at the
sentence level are often required to train an accurate classi-
fier. Given the sheer volume of user contributed and la-
beled opinionated expressions (e.g., product reviews) in the
era of Web 2.0, our proposed statistical learning approach
which does not require additional manual annotation to gen-
erate the training data. Therefore, our method can scale
up to process large volume of reviews and can readily be
applied to different problem domains. Moreover, our pro-
posed method also utilizes proven IR techniques [22] and
an adapted statistical learning method [8, 10] for automatic
sentiment identification and polarity prediction. These
methods have been empirically tested and they are efficient
enough to process opinionated documents of a Web scale.
First, the TFIDF measure is used to identify the most
representative terms within a training review. Second, the
nouns (candidate features) among the most representative
terms are identified. Low frequency product features are
identified by matching the terms of a training review with
the common features appearing in some product descrip-
tions. Finally, the adjacent Adjective or Adverb of the can-
didate features are treated as candidate sentiment words.
The specific TFIDF measure we used in our prototype
system is defined according to the formulation presented
in [21]. For each identified candidate product feature, the
adjacent Adjectives or Adverbs are taken as the potential
sentiment words [18, 19]. From our perspective, this is
a kind of pruning method which tries to remove the Ad-
jectives or Adverbs that are not really sentiments. The
proposed methodology of automated domain-specific sen-
timent lexicon construction can be summarized as follows:
1. Domain-independent sentiment words can be extracted
from existing linguistic resources to build the basis of a
new sentiment lexicon; for example, the strong subjec-
tive words defined in OpinionFinder [20] can be used
to infer domain-independent sentiment words;
2. The conjunctive rule [32] is applied to automatically
extract additional sentiment words and predict their
polarities from an arbitrary domain; the conjunctive
rule states that sentiment words connected by standard
conjunction operators such as “and”, “or”, “also”, etc.
in the same sentence should have the same polarity;
3. Apply a statistical learning method to expand domain-
specific sentiment words which cannot be discovered
based on the previous steps.
133
For the discovery of additional domain-dependent senti-
ment words which cannot be discovered by the conjunctive
rule, a Kullback-Leibler (KL) divergence [9] based statisti-
cal learning method is invoked. The main task is to estimate
the polarity of a domain-dependent sentiment word. Instead
of asking human annotators to tagged examples to train a
classification function as in the previous work [29, 30], we
would like to develop an automated sentiment lexicon con-
struction method which can leverage the rated reviews al-
ready available at Web 2.0 sites such as amazon.com. In the
era of Web 2.0, user-contributed data is the norm, and there
have been huge number of human labeled review data that
we can leveraged to conduct statistical learning. For exam-
ple, an Amazon rating of 4− 5 can be regarded as positive,
and a rating of 1 can be taken as negative; a mid range rating
of 2 − 3 is considered neutral. The basic intuition is that a
positive review is more likely to contain positive sentiments
than a negative review does. Therefore, we may use the
label of a product review to infer the sentiment polarity of
an individual product feature and sentiment pair. Figure 2
shows an example of a user labeled product review posted
to amazon.com.
Figure 2. Labeled Amazon Product Reviews
Based on the theory of Kullback-Leibler (KL) diver-
gence [9], an effective measure called Keyword Classifier
(KC) has been developed to identify positive, negative, and
neutral keywords representing an information seeker’s pos-
itive, negative, or neutral information needs [8, 10]. The
KC method has been adapted and successfully applied to
induce positive and negative beliefs in adaptive informa-
tion filtering [10, 23]. Instead of summing the probabili-
ties characterizing the positive and negative events as in the
original KL divergence formulation, the KC measure takes
a subtraction between the conditional probabilities related
to the positive and the negative events. Such a formula-
tion corresponds to our intuition of weighting positive and
negative sentiments presented in product reviews. For our
domain-specific sentiment lexicon construction system, we
apply the adapted KC measure [10, 23] to induce the polar-
ity and the corresponding strength of a candidate sentiment
from a review. Although it is possible that a positively rated
review may contain negative sentiments, these negative sen-
timents may not repeatedly occur in many positively rated
reviews. Accordingly, when a large number of labeled re-
views are provided, the proposed statistical learning method
can still correctly identify the positive and the negative sen-
timent words.
A variant of the KC formulation is shown as follows [12,
10]:
score(t) = tanh(df(t)α Pr(Pos|t) log2 Pr(Pos|t)Pr(Pos) −
df(t)β Pr(Neg|t) log2 Pr(Neg|t)
Pr(Neg) )(1)
where −1 ≥ score(t) ≤ 1 is the sentiment score for a
candidate sentiment; it indicates how likely the candidate
word could be relevant sentiment for the specific domain.
The parameters α and β are the learning thresholds for pos-
itive sentiment and negative sentiment respectively. tanhis the hyperbolic tangent function. The term Pr(Pos|t) =df(tpos)df(t) is the estimated conditional probability that a re-
view is rated positive given that it contains a candidate sen-
timent term t. It is expressed as the fraction of the number of
positive reviews which contain the pair t (i.e., df(tpos)) over
the total number of reviews which contain t (i.e., df(t)).
Similarly, Pr(Neg|t) =df(tneg)df(t) is the estimated condi-
tional probability that a review is rated negative when it
contains the pair t. In addition, Pr(Pos) = |D+||D+|+|D−|
is the estimated priori probability that a review is positive,
whereas Pr(Neg) = |D−||D+|+|D−| is the estimated priori
probability that a review is negative in the review collection.
D+ (D−) is the set of positively (negatively) rated reviews
extracted from Web 2.0 sites. A positive score(t) indicates
that t probably contains a positive sentiment, whereas a neg-
ative score reveals that t probably carries a negative sense.
If the score of a pair is below a threshold λ, the polarity of
the pair is uncertain, and hence it will be treated as neutral.
The following formula converts the initial score of t to the
final polarity value of a feature and sentiment pair captured
in the automatically constructed sentiment lexicon:
polarity(t) =
⎧⎪⎨⎪⎩
score(t)−λ1−λ if score(t) > λ
−(
|score(t)|−λ1−λ
)if score(t) < −λ
0 otherwise(2)
To predict the polarity of an unseen opinionated expres-
sion (e.g., a product review), the automatically constructed
134
domain-specific sentiment lexicon is applied. The follow-
ing formula is applied to estimate the polarity strength of
the opinionated expression d:
strength(d) =∑
(f,t)∈d
polarity(t) (3)
where strength(d) is the polarity strength of the opinion-
ated expression d and polarity(t) is the polarity score of the
sentiment indicator captured in the automatically generated
sentiment lexicon. The term (f, t) indicates that only a sen-
timent word t adjacent to a candidate feature f will be taken
into account. The candidate feature f is identified based
on the aforementioned method. This approach can reduce
the chance of mistakenly assigning sentiment score to some
out-of-context words. For a sentiment which contributes to
the overall document score, it must be be associated with
some recognized product features. In other words, we use
product features to filter the noisy sentiments embedded in
product reviews. As a result, the sentiment score of a review
may become more accurate. Similarly, the polarity score of
a product is the summation of the polarity scores of all the
reviews posted for that product.
5 System Evaluation
Similar to previous studies [2], we retrieved real product
reviews from amazon.com using the Amazon Web services
APIs. In this experiment, we tried to test if our system gen-
erated domain-specific sentiment lexicon is more effective
than a hand-crafted generic sentiment lexicon when they
were applied to sentiment analysis tasks, particularly, the
polarity detection sub-task [15]. Our evaluation work was
conducted based on five Amazon product categories such
as Grocery, Cameras, Computers, DVD, and Software. The
details of this data set are shown in Table 1.
Table 1. The Amazon Review Data SetCategory BrowseNode ID No. of Products No. of Reviews
Grocery 3760931 16,008 122,474
Camera 281052 2,372 126,238
Computer 541966 26,466 515,182
DVD 163416 3,677 173,674
Software 409488 2,677 58,599
Total 51,200 996,167
For each product category, 90% of the product reviews
were used to build a domain-specific sentiment lexicon
first. Then, the remaining 10% reviews of a product cate-
gory were used to test the effectiveness of our experimen-
tal system. That is, using Eq. 3 to compute the polarity
score for each test review. If the review score is positive,
the polarity of that review is predicted as positive; other-
wise, it is predicted as negative. For our experiment, we
treated the user review ratings of 4-5 as positive and the
user review rating of 1 as negative; these ratings were as-
sumed to be the ground truth for our experiment. Only two
classes (positive and negative) were evaluated. On average,
around 71% of the reviews are positive in a product cat-
egory. The parameters α, β, and λ were empirically es-
tablished based on a subset of the training collection, and
their values of 300, 100, and 0.26 were used in this experi-
ment. A baseline system was also developed based on Opin-
ionFinder [20] (a widely used generic sentiment lexicon).
Both the polarity prediction results from the experimental
and the baseline system were collected and analyzed. The
common IR evaluation metrics such as precision, recall, and
F-measure [22, 21] were applied.
Table 2. Comparative Performance of PolarityPrediciton
Product Experimental Baseline F-measure
Category Recall Precision Recall Precision Δ Chg.
Grocery 0.680 0.663 0.640 0.689 +1.11%
Camera 0.702 0.688 0.660 0.717 +1.10%
Computer 0.712 0.711 0.662 0.726 +2.72%
DVD 0.690 0.683 0.636 0.711 +2.23%
Software 0.672 0.661 0.624 0.689 +1.82%
Average 0.691 0.681 0.640 0.702 +1.79%
The performance of the experimental system and the
baseline system is depicted in Table 2. The columns la-
beled “Experimental” depict the recall and precision fig-
ures of the experimental system. The average improvement
(in terms of F-measure) of sentiment analysis brought by
the domain-specific sentiment lexicon reaches 1.79%. As
shown in Table 2, the system generated domain-specific
sentiment lexicon enables more effective polarity prediction
at the document level when compared to the baseline sys-
tem which only utilizes a generic sentiment lexicon. Per-
formance improvement at the “computer” product category
is the biggest probably because sentiment words such as
“small”, “tiny”, “little”, etc are defined as negative in Opin-
ionFinder (a generic sentiment lexicon). However, these
sentiment words are generally considered positive in the do-
main of notebook computer (e.g., a “small” notebook com-
puter is good for traveling). As a whole, our experimental
results reveal that the proposed method for automatic con-
struction of domain-specific sentiment lexicon is effective,
and the resulting sentiment lexicon can improve the effec-
tiveness of sentiment analysis in general.
135
Table 3. Linear Regression for Product SalesRanks
Variables Camera DVD
ln(Price) .258*** (.059) -.069 (.082)
Rating -.153*** (.056) .119** (.059)
ln(Nreviews) -.480*** (.022) -.370*** (.120)
ln(Sentiment) -.438*** (.084) -.470*** (.086)
R2 .491 .465
Adj R2 .476 .446
F F (5,529) = 28.368*** F (5.574) = 27.164 ***
# of Observations 535 580
6 Predicting Sales with Product Sentiments
By using linear regression models, previous studies
showed that sentiments of product reviews might be a pre-
dictor of product sales [2, 5]. For our research, we try to
verify if the proposed sentiment metric (i.e. Eq. 3) is a sig-
nificant predictor for product sales. Similar to the previous
approaches which applied econometric analysis to product
reviews [2, 3, 5], we developed a linear regression model to
estimate the influence of various product related attributes
(e.g., retail price, review rating, review sentiment, etc.) on
the sales rank of a product (via the proxy of sales rank).
It has been shown that a linear relationship exists between
sales ranks and actual product sales [3]. Our linear regres-
sion model is shown below:
ln(SalesRankp) = νp + β1p · ln(Pricep)+
β2p ·Ratingp+
β3p · ln(Nreviewsp)+
β4p · ln(Sentimentp)+
β5p · ShipDummiesp+
εp
(4)
where νp is a fixed effect of product p and it may be re-
lated to factors such as the quality of the product, the brand
loyalty of the product, the popularity of the manufacturer,
etc. The variables SalesRankp and Pricep are the sales
rank and the retial price of the product as shown on ama-
zon.com. Moreover, Ratingp represents the average review
rating of the product. The variable Nreviewp represents
the number of reviews attached to the product. The rea-
son of applying the log specification (e.g., ln(Nreviewp))instead of levels is that the log specification can estimate
the effect of a change in the independent variable (e.g.,
ln(Nreviewp)) on the percentage of change in the depen-
dent variable (e.g., sales rank) [2, 3, 5]. The variable εprepresents a random disturbance factor which is assumed to
be normally distributed [2]. The dummy factor for shipment
time ShipDummiesp is used to control the product rank-
ing variability due to the different promised delivery time
of products [3]. We encode this dummy variable by refer-
ring to the estimated shipment time of a product available
on amazon.com. For instance, the estimated shipment time
by amazon.com can be expressed in terms of hours, days,
weeks, months, or no estimation at all. We then encode the
shipment dummy by the values of 1, 2, 3, 4, and 5 respec-
tively. We do not incorporate the factor of the total length
(characters) of reviews related to a product because it was
shown not a significant factor in the previous studies [2, 3].
Instead, we apply our new metric Sentimentp which rep-
resents the total sentiment scores of all the reviews related
to the product p. This factor is also the main focus of our
current empirical study. If the sales rank of a product is not
available at amazon.com, that product was not included in
our regression analysis.
The results of our regression analysis is shown in Ta-
ble 3. The number of observations represent the products
included in our regression analysis. These products have
the corresponding sales ranks and retails prices made avail-
able at amazon.com, and therefore they can be used in our
regression analysis. The notations used in the table are as
follows: * indicating p < .10, ** indicating p < .05, ***
indicating p < .01; figures in parenthesis are standard er-
rors; dummy variables are not shown in the table. All the
linear regressions are statistically significant as shown by
the F values. The interpretation of the sign of the coeffi-
cient related to an independent variable is that a negative
sign implies a negative relationship between the variable
(e.g., review rating) and sales ranks. According to Table 3,
the signs of the coefficients related to review ratings are neg-
ative (correct). A negative sign of a coefficient implies that
higher review rating (e.g., close to 5) is related to a decre-
ment of the Amazon sales rank (i.e., the product is ranked
toward the top of the Amazon ranked list) and an increment
in actual product sales [3]. According to our regression
analysis, sentiment factor is a significant predicator for sales
for the product categories of “camera” and “DVD”. Accord-
ing to our empirical study, it is feasible to use the sentiment
metric (e.g., Eq.3) derived from product reviews to predict
product sales, at least for the Amazon product categories of
Camera and DVD.
7 Conclusions and Future Work
In the era of Web 2.0, there has been an explosive
growth of the number of opinion expressions such as
product reviews, movie reviews, political comments, etc.
posted to the Internet. Sentiment analysis systems open the
door to extract valuable business intelligence from online
opinionated expressions. Nevertheless, existing sentiment
analysis systems mainly employ manually constructed
136
generic sentiment lexicons to identify sentiment words
embedded in opinionated texts. This is a classical problem
of the knowledge acquisition bottle-neck since manually
constructing sentiment lexicons is very time-consuming
and these generic lexicons may not be effectively support
sentiment analysis across different application domains.
One main contribution of our research work is the devel-
opment of a novel statistical learning based computational
method for the automatic construction of domain-specific
sentiment lexicons to enhance cross-domain sentiment
analysis. Our initial experiments show that the proposed
method can generate domain-specific sentiment lexicons
which lead to an average improvement of polarity predic-
tion at the document level by 1.79% when compared to
a baseline system which employs a hand-crafted generic
sentiment lexicon. Through linear regression, another
contribution of our research is that we empirically test the
feasibility of using the sentiment factor to predict product
sales (via the proxy of sales ranks). Such an empirical
finding has significant impact on applying the proposed
sentiment analysis method to extract business intelligence
in practice. Future work will conduct more experiments to
examine the effectiveness of the proposed method across
more diversified application domains. Moreover, direct
user evaluation about the quality of the system generated
domain-specific sentiment lexicons will be performed.
Acknowledgment
The work reported in this paper has been funded in part
by the HK GRF Research Grant (Project: 9041569).
References
[1] Giambattista Amati, Edgardo Ambrosi, Marco Bianchi,
Carlo Gaibisso, and Giorgio Gambosi. Automatic con-
struction of an opinion-term vocabulary for ad hoc retrieval.
In Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian
Ruthven, and Ryen W. White, editors, Advances in Informa-tion Retrieval, 30th European Conference on IR Research,ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceed-ings, volume 4956 of Lecture Notes in Computer Science,
pages 89–100. Springer, 2008.
[2] Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeiro-
tis. Show me the money!: deriving the pricing power of prod-
uct features by mining consumer reviews. In Pavel Berkhin,
Rich Caruana, and Xindong Wu, editors, Proceedings of the13th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, San Jose, California, USA, Au-gust 12-15, 2007, pages 56–65. ACM, 2007.
[3] J.A. Chevalier and D. Mayzlin. The effect of word of mouth
on sales: Online book reviews. Journal of Marketing Re-search, 43:345–354, 2006.
[4] Andrea Esuli and Fabrizio Sebastiani. Determining the
semantic orientation of terms through gloss classification.
In Otthein Herzog, Hans-Jorg Schek, Norbert Fuhr, Abdur
Chowdhury, and Wilfried Teiken, editors, Proceedings of the2005 ACM CIKM International Conference on Informationand Knowledge Management, Bremen, Germany, October31 - November 5, 2005, pages 617–624. ACM, 2005.
[5] Anindya Ghose and Panagiotis G. Ipeirotis. Designing novel
review ranking systems: predicting the usefulness and im-
pact of reviews. In Maria L. Gini, Robert J. Kauffman,
Donna Sarppo, Chrysanthos Dellarocas, and Frank Dignum,
editors, Proceedings of the 9th International Conference onElectronic Commerce, volume 258, pages 303–310. ACM,
2007.
[6] B. He, C. Macdonald, J. He, and I. Ounis. An effective statis-
tical approach to blog post opinion retrieval. In Proceedingsof the 17th ACM Conference on Information and KnowledgeManagement, pages 1063–1072. Association for Computing
Machinery (ACM), 2009.
[7] Minqing Hu and Bing Liu. Mining and summarizing cus-
tomer reviews. In Won Kim, Ron Kohavi, Johannes Gehrke,
and William DuMouchel, editors, Proceedings of the TenthACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, Seattle, Washington, USA, August22-25, 2004, pages 168–177. ACM, 2004.
[8] T. Kindo, H. Yoshida, T. Morimoto, and T. Watanabe. Adap-
tive personal information filtering system that organizes per-
sonal profiles automatically. In Martha E. Pollack, editor,
Proceedings of the Fifteenth International Joint Conferenceon Artificial Intelligence, pages 716–721, Nagoya, Japan,
August 23–29, 1997. Morgan Kaufmann publishers.
[9] S. Kullback and R.A. Leibler. On Information and Suffi-
ciency. The Annals of Mathematical Statistics, 22(1):79–86,
1951.
[10] R.Y.K. Lau, P. Bruza, and D. Song. Towards a Belief Revi-
sion Based Adaptive and Context-Sensitive Information Re-
trieval System. ACM Transactions on Information Systems,
26(2):8.1–8.38, 2008.
[11] R.Y.K. Lau, C.L. Lai, and Y. Li. Leveraging the web context
for context-sensitive opinion mining. In Proceedings of the2009 IEEE International Conference on Computer Scienceand Information Technology, pages 467–471. IEEE, 2009.
[12] R.Y.K. Lau, D. Song, Y. Li, C.H. Cheung, and J.X. Hao.
Towards A Fuzzy Domain Ontology Extraction Method for
Adaptive e-Learning. IEEE Transactions on Knowledge andData Engineering, 21(6):800–813, 2009.
[13] Bin Li, Feifan Liu, and Yang Liu. UTDallas at TREC 2008
blog track. In Ellen M. Voorhees and Lori P. Buckland, edi-
tors, Proceedings of The Seventeenth Text REtrieval Confer-ence, TREC 2008, Gaithersburg, Maryland, USA, November18-21, 2008, volume Special Publication 500-277. National
Institute of Standards and Technology (NIST), 2008.
[14] Tao Li, Vikas Sindhwani, Chris H. Q. Ding, and Yi Zhang.
Knowledge transformation for cross-domain sentiment clas-
sification. In James Allan, Javed A. Aslam, Mark Sanderson,
137
ChengXiang Zhai, and Justin Zobel, editors, Proceedings ofthe 32nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, SIGIR2009, Boston, MA, USA, July 19-23, 2009, pages 716–717.
ACM, 2009.
[15] C. Macdonald and I. Ounis. Overview of the TREC
2007 Blog Track. In E.M. Voorhees, editor, Pro-ceedings of the Sixteenth Text REtrieval Conference,
Gaithersburg, Maryland, 2007. NIST. Available from
http://trec.nist.gov/pubs/trec16/.
[16] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
Thumbs up? sentiment classification using machine learn-
ing techniques. In Proceedings of the 2002 Conference onEmpirical Methods in Natural Language Processing, pages
79–86, May 27 2002.
[17] Ana-Maria Popescu, Bao Nguyen, and Oren Etzioni.
OPINE: Extracting product features and opinions from re-
views. In Proceedings of the 2005 Human Language Tech-nology Conference and Conference on Empirical Methods inNatural Language Processing, pages 339–346. The Associ-
ation for Computational Linguistics, 2005.
[18] Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. Expand-
ing domain sentiment lexicon through double propagation.
In Craig Boutilier, editor, Proceedings of the 21st Interna-tional Joint Conference on Artificial Intelligence, Pasadena,California, USA, July 11-17, 2009, pages 1199–1204, 2009.
[19] Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. Opinion
word expansion and target extraction through double propa-
gation. Computational Linguistics, 1(37):9–27, 2011.
[20] Ellen M. Riloff, Theresa Wilson, Paul Hoffmann, Swapna
Somasundaran, Jason Kessler, Janyce Wiebe, Yejin Choi,
Claire Cardie, and Siddharth Patwardhan. Opinionfinder:
a system for subjectivity analysis. In Proceedings of the2005 Human Language Technology Conference and Confer-ence on Empirical Methods in Natural Language Process-ing, pages 34–35. Association for Computational Linguis-
tics, 2005.
[21] G. Salton. Developments in automatic text retrieval. Science,
253(5023):974–980, August 1991.
[22] G. Salton and M.J. McGill. Introduction to Modern Informa-tion Retrieval. McGraw-Hill, New York, New York, 1983.
[23] D. Song, R.Y.K. Lau, P.D. Bruza, K.F. Wong, and D.Y. Chen.
An adaptive information agent for document title classifica-
tion and filtering in document-intensive domains. DecisionSupport Systems, 44(1):251–265, 2008.
[24] Qi Su, Xinying Xu, Honglei Guo, Zhili Guo, Xian Wu, Xi-
aoxun Zhang, Bin Swen, and Zhong Su. Hidden sentiment
association in chinese web opinion mining. In Jinpeng Huai,
Robin Chen, Hsiao-Wuen Hon, Yunhao Liu, Wei-Ying Ma,
Andrew Tomkins, and Xiaodong Zhang, editors, Proceed-ings of the 17th International Conference on World WideWeb, WWW 2008, Beijing, China, April 21-25, 2008, pages
959–968. ACM, 2008.
[25] V. S. Subrahmanian and Diego Reforgiato Recupero. AVA:
Adjective-verb-adverb combinations for sentiment analysis.
IEEE Intelligent Systems, 23(4):43–50, 2008.
[26] Songbo Tan, Yuefen Wang, and Xueqi Cheng. Combining
learn-based and lexicon-based techniques for sentiment de-
tection without using labeled examples. In Proceedings ofthe 31st Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, pages
743–744. ACM, 2008.
[27] Songbo Tan, Gaowei Wu, Huifeng Tang, and Xueqi Cheng.
A novel scheme for domain-transfer problem in the context
of sentiment analysis. In Mario J. Silva, Alberto H. F. Laen-
der, Ricardo A. Baeza-Yates, Deborah L. McGuinness, Bjørn
Olstad, Øystein Haug Olsen, and Andre O. Falcao, editors,
Proceedings of the Sixteenth ACM Conference on Informa-tion and Knowledge Management, CIKM 2007, Lisbon, Por-tugal, November 6-10, 2007, pages 979–982. ACM, 2007.
[28] Peter D. Turney and Michael L. Littman. Measuring praise
and criticism: Inference of semantic orientation from as-
sociation. ACM Transactions on Information Systems,
21(4):315–346, October 2003.
[29] Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. Rec-
ognizing contextual polarity: An exploration of features for
phrase-level sentiment analysis. Computational Linguistics,
35(3):399–433, 2009.
[30] Theresa Wilson, Janyce Wiebe, and Rebecca Hwa. Just how
mad are you? finding strong and weak opinion clauses. In
Deborah L. McGuinness and George Ferguson, editors, Pro-ceedings of the Nineteenth National Conference on Artifi-cial Intelligence, Sixteenth Conference on Innovative Appli-cations of Artificial Intelligence, July 25-29, 2004, San Jose,California, USA, pages 761–769. AAAI Press / The MIT
Press, 2004.
[31] X. Xu, Y. Liu, H. Xu, X. Yu, Z. Peng, X. Cheng, and L. Xiao.
ICTNET at Blog Track TREC 2010. In E.M. Voorhees, ed-
itor, Proceedings of the Nineteenth Text REtrieval Confer-ence, Gaithersburg, Maryland, 2010. NIST. Available from
http://trec.nist.gov/pubs/trec19/.
[32] Kiduk Yang, Ning Yu, and Hui Zhang. WIDIT in
TREC-2007 Blog Track: Combining lexicon-based Meth-
ods to Detect Opinionated Blogs. In E.M. Voorhees, ed-
itor, Proceedings of the Sixteenth Text REtrieval Confer-ence, Gaithersburg, Maryland, 2007. NIST. Available from
http://trec.nist.gov/pubs/trec16/.
[33] Min Zhang and Xingyao Ye. A generation model to unify
topic relevance and lexicon-based sentiment for opinion re-
trieval. In Sung-Hyon Myaeng, Douglas W. Oard, Fabrizio
Sebastiani, Tat-Seng Chua, and Mun-Kew Leong, editors,
Proceedings of the 31st Annual International ACM SIGIRConference on Research and Development in InformationRetrieval, SIGIR 2008, Singapore, July 20-24, 2008, pages
411–418. ACM, 2008.
138