probabilistic aspect mining model for drug reviews

2002 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 8, AUGUST 2014

Probabilistic Aspect Mining Model for DrugReviews

Victor C. Cheng, C.H.C. Leung, Jiming Liu, Fellow, IEEE, and Alfredo Milani

Abstract—Recent findings show that online reviews, blogs, and discussion forums on chronic diseases and drugs are becomingimportant supporting resources for patients. Extracting information from these substantial bodies of texts is useful and challenging.We developed a generative probabilistic aspect mining model (PAMM) for identifying the aspects/topics relating to class labels orcategorical meta-information of a corpus. Unlike many other unsupervised approaches or supervised approaches, PAMM has aunique feature in that it focuses on finding aspects relating to one class only rather than finding aspects for all classes simultaneouslyin each execution. This reduces the chance of having aspects formed from mixing concepts of different classes; hence the identifiedaspects are easier to be interpreted by people. The aspects found also have the property that they are class distinguishing: They canbe used to distinguish a class from other classes. An efficient EM-algorithm is developed for parameter estimation. Experimentalresults on reviews of four different drugs show that PAMM is able to find better aspects than other common approaches, whenmeasured with mean pointwise mutual information and classification accuracy. In addition, the derived aspects were also assessed byhumans based on different specified perspectives, and PAMM was found to be rated highest.

Index Terms—Drug review, opinion mining, aspect mining, text mining, topic modeling

1 INTRODUCTION

WITH the advent of Web 2.0 [1], [2], people areenabled and encouraged to contribute their con-

tents to the Internet. Many user-centered platforms arenow available for information sharing and user interac-tion, such as Epinion, Amazon, Facebook and Twitter.Nowadays when people are interested in a product or aservice, they usually not only look for official informationfrom product manufacturers or service providers, expe-rienced and practical opinions from the customers’ andusers’ points of view are also influential. As a result, onlinereviews, blogs and forums dedicated for different kindsof products are pervasive, and how to effectively analyzeand exploit such immense online information source is achallenge.

Opinion mining (or sentiment analysis) [3]–[6] dealswith the extraction of specified information (e.g., positiveor negative sentiments of a product) from a large amountof text opinions or reviews authored by Internet users. Inmany situations, solely an overall rating for a review can-not reflect the conditions of different features of a productor a service. For instance, a camera may come with excel-lent image quality but poor battery life. As a result, moresophisticated aspect level opinion mining approaches have

• V. C. Cheng, C.H.C. Leung, and J. Liu are with the Department ofComputer Science, Hong Kong Baptist University, Kowloon 1234, HongKong. E-mail: {victor, clement, jiming}@comp.hkbu.edu.hk.

• A. Milani is with the Department of Mathematics & Computer Science,University of Perugia, Perugia 06100, Italy. E-mail: [email protected].

Manuscript received 31 Jan. 2013; revised 13 Nov. 2013; accepted 17 Nov.2013. Date of publication 2 Dec. 2013; date of current version 10 July 2014.Recommended for acceptance by J. Pei.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier 10.1109/TKDE.2013.175

been proposed to extract and group aspects of a product orservice and predict their sentiments or ratings [3], [7]–[11].Recent state-of-the-art approaches such as frequency-basedapproach [5], relation-based approach [9], [12], supervisedlearning [13] and topic modeling [7], [14] showed thatfavorable results could be obtained.

Previous studies of opinion mining usually deal withpopular consumer products or services such as digitalcameras, books, electronic gadgets, etc. Entities of medi-cal domain are of far less concerned. It may be becausepatients are minority groups on the Internet and theyare only concerned with specific illnesses or drugs thatthey are experiencing. Furthermore, people tend to solicitopinions from medical professionals rather than patients.Nevertheless, recent studies have shown that patient gen-erated contents are useful and important [15]–[18], espe-cially for chronic diseases and drugs with afflicting sideeffects. Many patients hope to get more information fromother patients with similar conditions. They can also sharetheir experience and propose practical ways to alleviatesymptoms and side effects of drugs. These online com-munities were found to have positive impacts on patienthealth [19]–[21].

Unlike general products or services, drugs have a verylimited number of kinds of aspects: price, ease of use,dosages, effectiveness, side effects and people’s experi-ences. There are other more technical aspects such aschemical or molecular aspects, but they are almost not men-tioned in drug reviews. A difficulty in dealing with drugreviews is that the wording in describing effectiveness, sideeffects and people’s experiences are very diverse. In partic-ular, side effects are drug dependent: a set of side effectsymptoms for a drug is very unlikely applicable to anotherdrug. This impedes some opinion mining approaches based

1041-4347 c© 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

CHENG ET AL.: PROBABILISTIC ASPECT MINING MODEL FOR DRUG REVIEWS 2003

on lexicons. More importantly, authors sometimes do notindicate which aspects they are describing, they just givedescriptions of symptoms, feelings and comments. Thefollowing summarizes the features of drug reviews andillustrates with two samples.

• Drug reviews have a small number of kinds ofaspects: price, ease of use, dosage, effectiveness, sideeffects and people’s experiences.

• Aspects are usually not mentioned explicitly.• Descriptions of effectiveness, side effects and peo-

ple’s experiences are diverse.• Side effect and effectiveness descriptions are differ-

ent from drug to drug.

Review sample 1: I have been on this medicationsince June and it has help my moods drasticly!!! BUTsince August I have been experiencing sores in mymouth that look like canker sores. I have been treatedfor thrush that is not it, been to the dentist for put ona antibiotic and was given a steroid cream which doeshelp, but the sore come back when I stop. Could this bea side effect?? Any feed back would be GREAT!!!!

Review sample 2: I was put on 20mg a day.. It didnothing at all for me. The only thing I noticed wasfrequent yawning, and headaches. I didn’t see a changein my mood or appetite at all. A few times I’ve noticedthat when I fall asleep, I wake up at 2 am or so, andcan’t fall back asleep.

In this paper, we address the opinion mining prob-lems for drug reviews. As many drug review websites areequipped with rating functions, prediction of sentimentsis not the task. Instead a model for identifying a set ofaspects relating to class labels or meta-information of drugreviews is proposed. For example, if the reviews are asso-ciated gender information, people may be interested instudying the aspect difference between female patients andmale patients. Formally, the problem is defined as follows:

Problem definition: Given a corpus C with each reviewlabeled with y ∈ {y1, . . . , yM}, identify K clusters of words(i.e., aspects) referring to drug aspects or people’s experi-ences such that they are highly correlated with each specificlabel yi.

This task is different from general aspect-based opinionmining in which the task aims to extract all aspects andtheir sentiments from reviews. Referring to the problemdefinition, not all the aspects but only relevant aspects needto be extracted. Sometimes, an aspect may need to be seg-mented further (in finer granularity) because only limitedcomponents of it are required. For instance, considering theaspect of side effects of a drug, male patients may be anx-ious about a specific side effect while other side effects areof less concerned.

We propose a novel probabilistic aspect mining model(PAMM) to mine the aspects of drug reviews correlatedwith categorical information. This can be regarded as atopic model with the derived topics treated as aspects. Theproposed model is very useful to patients and pharmaceu-tical companies because various aspects of a drug can beidentified. In addition, the results can be used to compile

sentiment lexicons for drug reviews. Words of aspects cor-relating with high satisfactory ratings can be regarded aspositive sentiment words and vice versa. Practically, thismodel is not limited to drug reviews. It can be applied toother domains such as product reviews and service reviewsfor studying aspects pertaining to different groupings ofreviews.

2 RELATED WORKS

Given a corpus of reviews (assume every review is inbag-of-words format and a class label is assigned), wordshighly correlated with the class label can be identified bymany approaches such as class conditional probability ofwords, information gain [22], association rules [23], point-wise mutual information (PMI) [24], etc. These approaches,unfortunately, suffer from a severe problem: it is difficultto understand the underlying aspects or concepts from justa set of words correlated with a class label. There is nointuitive algorithms to group the words so that each groupconveys one or a few easily understandable concepts.

Aspect-based opinion mining is becoming popular inrecent years. Frequency based approach [5] extracts highfrequency noun phrases which meet the specified criteriaor constraints from the reviews as aspects. On the otherhand, relation based approach [9], [12] identifies aspectsbased on the aspect-sentiment relation in the reviews. Thesetwo kinds of approaches, however, may not be applicable todrug reviews as aspects are often not indicated explicitly byauthors and descriptions of side effects and people’s experi-ences is diverse. Moreover, grouping of the extracted nounphrases is another challenge as they cannot be grouped justbased on semantic meanings. In contrast, topic modelingidentifies aspects based on the co-occurrence of words inreviews. It has an advantage that aspect identification andgrouping are performed simultaneously.

Topic modeling [25]–[30] (e.g., LDA [25]) is a popularprobabilistic approach in understanding a corpus. With thisapproach, a set of topics, which are represented by multi-nomial distributions over vocabulary words, are inferred.When the words of a topic is sorted according to the proba-bilities, high probability words of a topic are usually seman-tically correlated and the concept or aspect of the topiccan be captured manually. For example, Topic SentimentMixture (TSM) [7], Joint Sentiment/Topic (JST) [10] modeland Aspect and Sentiment Unification Model (ASUM) [14]were proposed to extract both the aspects and predicttheir associated sentiments. Nevertheless, these aspect-based opinion mining methods may not be appropriateto address the problem defined in the previous section asthe extracted aspects may not be related to the specifiedclass labels and the performance depends on the manualselection of seed words.

Recently, topic modeling with supervised label infor-mation has become an interest of research. Blei andMcAuliffe [31] proposed the supervised LDA (sLDA) thatcan take care of different forms of supervised informationduring topic inference. Mimno and McCallum [32] intro-duced Dirichlet-multinomial regression to handle differentkinds of meta information. Ramage et al. [33] proposedDiscLDA to process discriminative information and find


Fig. 1. PAMM for generating observed data x and label y from latentvariable z.

topics specific to individual classes as well as topics sharedacross different classes. Labeled LDA [34] is another gen-eralization of LDA. It allows multi-label supervision andassociates each label with one topic in direct correspon-dence.

Apart from probabilistic algorithms, deterministic meth-ods for topic modeling such as non-negative matrixfactorization (NMF) [35]–[37] were also proposed. Bydecomposing the data matrix into two low rank matri-ces, topics can be identified. Semi-supervised NMF(SSNMF) [38] is an extension proposed recently to incor-porate the supervised information into NMF. The top-ics identified are more closely related to the supervisedinformation.

We propose a probabilistic model for finding the aspectscorrelated to class labels. The work differs from other pre-vious approaches, however, in that each time the modelfocuses on finding aspects correlated to one class label only.Aspects correlated to different class labels are found sepa-rately. This formulation avoids the identified aspects havingmixed contents from different classes. By focusing the taskon one class, better and more specific aspects can be found.This approach is also different from the intuitive approachof which reviews are first grouped according to their classlabels and followed by inferring aspects for the individualgroups. The proposed model uses all the reviews and findthe aspects that are specific to the target class and are help-ful in differentiating reviews of different classes. For theintuitive approach, the identified aspects may not be onlyrelated to the contents of individual groups. They may becommon to all the classes and not useful. For example, thedosages of a drug can be a common aspect to all the classesbut it may not be useful in differentiating classes.

The rest of this paper is organized as follows: Section 3introduces the proposed probabilistic aspect mining model(PAMM) and the parameter learning algorithm. Section 4presents the experimental results in applying the model tofour different corpora of user contributed drug reviews.Comparisons are made between the model and five othertopic models, including NMF, LDA, SSNMF, DiscLDA andsLDA. Human assessment on the derived aspects basedon four different perspectives are also undertaken. InSection 5, aspects correlating with different genders arepresented. Finally, conclusions and discussion are given inSection 6.

3 PROBABILISTIC ASPECT MINING MODEL

Probabilistic Aspect Mining Model (PAMM) is a genera-tive model which generates the observed data x ∈ R

M andthe class label y ∈ {0, 1} from the Gaussian latent variablez = (z1, . . . , zK)T (i.e. z ∈ R

K) with zero mean and iden-tity covariance matrix, i.e. z ∼ N (0, I). Fig. 1 describes thedata and label generation process. Referring to the figure,data points and the associated class labels are generated asfollows.

1) Draw z ∼ N (0, I);2) Draw x ∼ N (Wz + μ, σ 2I);3) Draw y ∼ (p(y = 0|z), p(y = 1|z)),

where μ is the mean of the observed data, σ 2 is theGaussian noise level on x, W ∈ R

M×K+ is a matrix havingnon-negative entries, p(y = 1|z) and p(y = 0|z) are given by

p(y = 0|z) = 1 − p(y = 1|z) , (1)

p(y = 1|z) = φ(vTz) = φ

(c

K∑i=1

zi

), and (2)

φ(t) = 11 + e−t , (3)

where φ is a logistic function and c is a constant. Thelabel y is binary and drawn from the Bernoulli distribu-tion with probabilities p(y = 1|z) and p(y = 0|z). Theaspects of the model can be obtained from W as it canbe regarded as the basis of generating the observed data.By inspecting high probability/value words of individualcolumns of W, the underlying concepts of the aspects can beinterpreted.

The following propositions give the insight to the ques-tions of what W consists of and why the model gives onlyone class aspects.

Proposition 1. Given a PAMM with z = (z1, . . . , zk)T ∈ R

K

as the latent variable and c > 0, the probability p(y = 1|z)

increases as any zi, i = 1, . . . , k, increases. Furthermore,p(y = 1|z) > 0.5 if c · ∑K

i=1 zi > 0.

Proof. Since (3) is a strictly increasing function, p(y = 1|z)

defined in (2) increases with any zi as c > 0. Furthermore,φ(t) > 0.5 as t > 0.

Proposition 2. Given a PAMM with μ = 0, c → ∞, σ → 0and let W = [w1, . . . , wk], where wi ∈ R

M+ . A data pointx = wi, 1 ≤ i ≤ k, can be generated with label 1 having theprobability p(y = 1|z) → 1.

Proof. By setting z = (0, . . . , 1, . . . , 0)T, i.e. the i-th compo-nent of z to 1 and other components to 0, x = wi (sincex = Wz when μ = 0 and σ → 0). As

∑Ki=1 zi = 1 > 0

and c → ∞, p(y = 1|z) → 1.From Proposition 2, the columns of W can be regarded

as the data points associated with label 1 when settingc → ∞ and σ → 0. By inferring W, the aspects associatedwith label 1 can be identified. Fig. 2 shows the genera-tion of a data point and the associated label under thissetting.

Similarly, by setting c → −∞, we have the followproposition.

Proposition 3. Given a PAMM with μ = 0, c → −∞, σ → 0and let W = [w1, . . . , wk], where wi ∈ R

M+ . A data point


Fig. 2. Generation of a data point for y = 1 : when μ = 0, c → ∞, σ → 0and z = (0, . . . , 1, . . . , 0)T , the data point x = wi with label y = 1 willbe generated from z. Hence wi should be a data point having the labely = 1.

x = wi, 1 ≤ i ≤ k, can be generated with label 0 having theprobability p(y = 0|z) → 1.

Proof. Setting z = (0, . . . , 1, . . . , 0)T, i.e., the i-th compo-nent of z to 1 and other components to 0, x = wi. As∑K

i=1 zi = 1 > 0 and c → −∞, p(y = 1|z) → 0. Hencep(y = 0|z) → 1, by (1).From Proposition 3, the aspects associated with label 0

can be found by setting c → −∞. Practically, in our exper-iments, c ≥ 1 and c ≤ −1 is sufficient to obtain aspectsrelating to label 1 and label 0, respectively. Even in theextreme case c → 0, p(y = 1|z) > 0.5 for Proposition 2.Similarly, p(y = 0|z) > 0.5 for Proposition 3 even for c < 0and c → 0, as shown in Fig. 3.

When σ → 0 does not hold, some Gaussian noise,depending on σ 2, will be introduced to x. Actually, thisis constructive in the data generation process. Without anynoise, the model can only generate data by linear combi-nation of the columns in W and the expressiveness of themodel is very limited. Conversely, too high noise level isharmful as the data generation will be dominated by noiseand the model may not be useful.

Originally, the matrix W need not be non-negative, itsentries can be real numbers. However, a column with bothpositive and negative entries make human interpretationof the aspects difficult, just in the case of latent seman-tic indexing (LSI). Hence, we explicitly impose a constraintthat entries of W are non-negative.

3.1 Formal Definition of PAMMSimilar to the formulations of probabilistic PCA [39] or fac-tor models [40], [41], observed N data points {xn}N

n=1 areassumed to be generated by

xn = Wzn + μ + ε , (4)

where W is a M × K non-negative matrix, with M K,xn ∈ R

M and μ ∈ RM. Latent variable zn ∈ R

K isGaussian distributed with zero mean and unit variance, i.e.zn ∼ N (0, I), and the noise ε is assumed to be isotopicand Gaussian, i.e. ε ∼ N (0, σ 2I) in which the noise leveldepends on σ 2.

Fig. 3. Columns of the matrix W are highly correlated with the classlabels.

Given for each xn, there is a class label yn ∈ {0, 1}. Thisclass label is generated according to the conditional proba-bilities p(yn = 1|zn) and p(yn = 0|zn), where p(yn = 0|zn) =1 − p(yn = 1|zn) and

p(yn = 1|zn) = φ(vTzn) , (5)

where vT = (c, . . . , c) and φ is the logistic function

φ(t) = 11 + e−t . (6)

The parameters of the model are � = {W,μ, σ 2, c}. If xnis centered (i.e., by subtracting the mean), μ = 0. For fur-ther simplifying the inference process, σ 2 and c are userspecified and remain constant during inference. In order toinfer W, it is necessary to find p(zn|xn, yn), the posterior ofzn given xn and yn. By applying Bayes’ theorem

p(zn|xn, yn) ∝ p(xn, yn|zn)p(zn), for n = 1, . . . , N. (7)

From the model equations (4) and (5), xn and yn areindependent given zn, hence (7) becomes,

p(zn|xn, yn) ∝ p(xn|zn)p(zn)p(yn|zn) for n = 1, . . . , N. (8)

By using Gaussian identities [42], p(xn|zn) is Gaussianand given by N (Wzn + μ, σ 2I). Using the Bayes’ theoremfollowed by Gaussian identities, the product p(xn|zn)p(zn)

in (8) gives the posterior probability p(zn|xn) which isGaussian as well,

p(zn|xn) = N (μzn|xn , �zn|xn

), (9)

where

�zn|xn =[

1σ 2 WTW + I

]−1

, (10)

and

μzn|xn = �zn|xn

[1σ 2 WT(x − μ)

]. (11)

Substituting (5) and (9) to (8) and noting that p(yn = 0|zn) =1 − p(yn = 1|zn),

zn|xn, yn ∼ N (μzn|xn , �zn|xn)φ(vTzn)yn(1 − φ(vTzn))1−yn .

(12)


Let L(zn) denote the logarithm of the right hand side termsof (12). After dropping the terms independent of zn, we get

L(zn) = (logφ(vTzn))yn(log(1 − φ(vTzn)))1−yn

−12(zn − μzn|xn)T�−1

zn|xn(zn − μzn|xn) . (13)

Since the negative quadratic term in (13) is concave andthe log of a sigmoidal function is concave as well, L(zn) isconcave and has only one maximum. The Newton-Raphsonmethod can be used to obtain the maximal solution. Thegradient of L(zn) and the Hessian matrix are given by

∇L(zn) = (yn − φ(vTzn))vT − �zn|xn−1(z − μzn|xn) , and

(14)

∇∇L(zn) = H − �zn|xn−1 , (15)

where

H = −φ(vTzn)(1 − φ(vTzn))vvT . (16)

The Newton-Raphson iteration equation is then

z(t+1)n = z(t)

n − (H − �−1zn|xn

)−1∇L(zn) . (17)

Let z∗n denote the converged solution. As a result,

the mode and maximum of the posterior probabilityp(zn|xn, yn) in (12) can be determined.

3.2 EM Learning for PAMMThe parameter W of the PAMM can be learnt by EM algo-rithm [43]. The E-step involves finding the posterior dis-tribution p(zn|xn, yn) for each data point xn (n = 1, . . . , N),which is given by (12). Since it is not Gaussian, it will causethe computation of M-step to be very complex. We proposeusing the mode z∗

n, computed by (17), as the surrogate asp(zn|xn, yn) is unimodal.

In the M-step, the parameter W is updated by max-imizing the log likelihood of the data points, which isgiven by

maxW

N∑n=1

log(p(z∗n)p(xn|z∗

n)p(yn|z∗n)), (18)

subject to the constraint that entries in W are non-negative.Since p(z∗

n) and p(yn|z∗n) are independent of W, (18) is

reduced to

maxW

N∑n=1

logp(xn|z∗n) . (19)

This constrained optimization problem is still inconvenientto process and we resort to using (4) to update W. As max-imizing the likelihood of data points for a linear modelwith Gaussian noise is equivalent to minimizing the meansquared error, maximizing (19) with respect to W can betransformed to the following non-negative least squareoptimization problem.

minW

N∑n=1

‖xn − Wz∗n‖2 , (20)

subject to

W ≥ 0 ,

Fig. 4. PAMM parameter inference algorithm.

where W ≥ 0 means W is non-negative and ‖ · ‖2 is theEuclidean norm. This optimization problem can be furtherdecomposed into M standard non-negative least squares(nnls) problems which can be solved readily with off-the-shelf tools such as the nnls() function of MATLAB. Thefollowing describes the i-th nnls problem, i = 1, . . . , M.

minWi,·

‖X·,i − ZWTi,·‖2 , (21)

subject to

Wi,. ≥ 0

where Wi,· is the i-th row of W, X = [x1, . . . , xN]T, Z =[z∗

1, . . . , z∗N]T, and X·,i is the i-th column of X.

This EM iteration continues until the change of Wbetween consecutive EM iterations falls below a user spec-ified threshold. This inference process is summarized inFig. 4.

4 EXPERIMENTAL RESULTS

This section describes the experiments and results of apply-ing PAMM on drug review mining. User reviews offour different drugs from the WebMD website, includingcitalopram, escitalopram, lisinopril, and simvastatin werecrawled as the datasets.1 The first two drugs are anti-depression drugs, the third is for blood pressure controland congestive heart failure, and the fourth is for control-ling elevated cholesterol. Since many drugs have differentdrug names and brand names, we mixed the reviews of adrug under the drug name and the reviews under the brandname together, provided they refer to the same drug. Thesedrugs are selected because they are commonly used longterm treatment drugs and the information is more signif-icant to patients taking the same or similar drugs. Apartfrom the text, every review has a set of user given ratings(with scores 1 - 5) for assessing different aspects of the drug.In this section, we focus on the satisfaction rating.

For each drug, the text was firstly preprocessed bylemmatization with WordNet. About 40 stop words wereremoved and reviews with less than 4 words were ignored.

1. http://www.webmd.com/drugs/index-drugs.aspx.


TABLE 1Summary of the Drug Reviews

Unigram and bigram words were formed but words withappearing frequency less than 5 were removed. For thereview labels, reviews with scores 1 and 2 were labeled 0(dissatisfaction) while reviews with scores 4 and 5 werelabeled 1 (satisfaction). Reviews with score 3 were ignoredas their sentiments were vague. Finally, the data matricesfor the drug reviews were formed by using bag-of-wordsrepresentation followed by standard TFIDF processing andnormalization. The summary of the preprocessed drugreviews are described in Table 1.

In order to evaluate the performance of the proposedPAMM, five more algorithms were compared, includingLDA, NMF, sLDA, SSNMF and DiscLDA. Both LDA andNMF are unsupervised topic modeling algorithms, withLDA being a Bayesian approach. These two algorithms arevery popular in topic/aspect modeling and they can beused to illustrate the performance of unsupervised algo-rithms in aspect mining. On the other hand, sLDA andDiscLDA are the supervised learning extensions for LDA;SSNMF is an enhancement of NMF for handling super-vised information. The quality of the generated aspectswere assessed with mean pointwise mutual information(PMI) and classification accuracy of held out data. PMI iscommonly used in text analysis in measuring the associa-tion between two entities. It can be used to measure theassociation between words of aspects and their class labels.Since an aspect is highly dominated by a few words, onlythe top 20 words with highest probabilities/values are usedin calculations. It is impractical to include all the words(amount to the vocabulary of a dataset) of an aspect as mostof them have negligible probabilities/values and irrelevantto the aspect. This formulation also eliminates the presenceof rare words in PMI evaluation as they generally have highPMI but much less useful. The derived aspects are also eval-uated using classification accuracy of held out data. Thisevaluation measures the capability of using the aspects asthe bases to classify the held-out data. If the derived aspectsare highly related to the class labels, they should be theuseful discriminating features in classification.

4.1 Experiment SettingsIn computing mean PMI, a class label should be assignedto each derived topic. For supervised algorithms PAMM,SSNMF and DiscLDA, the information was readily avail-able. For sLDA, the k-th derived aspect was labeled 1 if

the k-th entry of η (see [31]) was positive, and 0 if nega-tive. For unsupervised algorithms LDA and NMF, since itwas not clear which class label should be associated witha derived aspect, half of the aspects were labeled 1 andthe rest were labeled 0. The following depicts the overallsettings for mean PMI evaluation.

1) Number of aspects to be derived (K) : 3, 5 10, 15and 20

2) For the supervised algorithms DiscLDA, SSNMFand PAMM,

• Generate K aspects for label 1.• Generate K aspects for label 0.

3) For the algorithm sLDA

• Generate two sets of K aspects individuallywith different initializations.

• Assign label 1 to the k-th aspect if ηk ispositive, label 0 if negative.

4) For the unsupervised algorithms NMF and LDA,

• Generate two sets of K aspects individuallywith different initializations.

• Randomly select K aspects and label themwith 1.

• Label the rest with 0.

In the task of performance evaluation with classificationaccuracy, there is no need to assign labels to the derivedaspects and hence it is only necessary to generate K aspectsfor the algorithms NMF, LDA and sLDA. For the algorithmsPAMM, SSNMF and DiscLDA, we generated Round(K/2)

labeled 1 aspects and (K − Round(K/2)) labeled 0 aspects.The parameters for PAMM can be determined by fol-

lowing the algorithm shown in Fig. 4. User specifiedparameters σ 2 in (4) was set to 5.0 and c in (5) was setto 1.0/-1.0 for finding aspects associating with label 1/0,respectively. For other algorithms, we followed the sug-gested values mentioned in the papers, if available, or trieda few times to get the best parameters.

4.2 Evaluation Using Mean Pointwise MutualInformation

Given a set of 2K aspects, with each aspect sorted descend-ingly according to the individual probabilities/values of thewords assigned by an algorithm, the top 20 words of thek-th (k = 1, 2, . . . , 2K) aspect are selected and denoted by{wk,i}20

i=1. The mean pointwise mutual information (PMI) ofthis set of aspects is defined as

mean PMI = 140K

2K∑k=1

20∑i=1

logp(wk,i, Ck)

p(wk,i)p(Ck), (22)

where Ck is the class label associated with the aspect k.The probabilities p(wk,i, Ck), p(wk,i) and p(Ck) (assuming allprobabilities > 0) are empirical probabilities obtained bycounting the words and the reviews in the corpus. Thus,mean PMI gives the mean of PMI between a word in theaspect and the class label.

Table 2 illustrates the mean PMI results. It shows thataspects derived by PAMM have significantly higher asso-ciation with the class labels than other algorithms. The


TABLE 2Evaluated Mean PMI of the Derived Aspects

unsupervised NMF and LDA have similar performance.Three supervised algorithms, sLDA, SSNMF and DiscLDA,also give comparable results. In most cases, SSNMF andDiscLDA perform better than NMF and LDA. This is sen-sible because the class label information is used in derivingthe aspects.

4.3 Evaluation Using Classification AccuracyFirstly, the reviews of each drug were divided into trainingdata and test data: 80% of reviews were randomly drawnto form the training data and the rest 20% reviews were theheld-out test data. The training data were used to derivethe aspects of the drugs. As previously, only 20 words withtop probabilities/values were preserved for each derivedaspect. Then a subspace was formed from the aspects andthe evaluation of classification accuracy was performed byprojecting both the training data and test data into thesubspace.

Assuming there are K aspects, the k-th basis vectorwk ∈ R

M of the subspace is formed from the k-th aspect: thecomponent i of wk (i.e., wk,i) is set to 1.0 if the associatedword appears in the k-th aspect, and is set to 0.0 otherwise.The least square projection matrix P ∈ R

K×M is given by

P = (WTW + λI)−1WT , (23)

where W = [w1, . . . , wK] and λ is a regularization constantfor avoiding the matrix inverse computation problem whenWTW is singular or near singular. In the experiments, λ =0.01 . Let xi denote the i-th review, the projected vectorzi ∈ R

K is given by

zi = Pxi . (24)

The steps for preforming the above described projectioncan be summarized as follows:

1) For each aspect, preserve only the 20 words with highestprobabilities/values and set their values to 1.0. Values ofother words are set to 0.0.

2) Form the matrix W with columns having the valuesobtained from step 1.

3) Form the projection matrix P by using (23).4) Project both the training reviews and test reviews with

P and learn a SVM with the projected training reviewsto classify the projected test reviews.

After projecting both the training data and test data, asupport vector machine (SVM) with linear kernel was usedto classify the test data. The above process was repeatedwith 5-fold cross validation and the mean classificationaccuracy is shown in Table 3. Furthermore, a classificationaccuracy (SVM_input) computed by directly applying SVMon the input space (i.e. the bag-of-words representation) ofthe reviews is also given in the table for reference.

Referring to Table 3, it is clear that the aspects derivedfrom the supervised algorithms perform better than theunsupervised algorithms. NMF is marginally better thanthan LDA and SSNMF performs closely with DiscLDAwith the former give better results in more cases. PAMMgives the best accuracy in all cases. Comparing with theSVM_input, all models seem inferior, except for the drugsimvastatin processed with PAMM. This is reasonablebecause only 20 distinct words, from each aspect, were usedto evaluate the projection matrix P while all the words ofthe vocabulary were used to learn the SVM to give theSVM_input results. Moreover, SVM is a classifier dedicatedfor classification and not for deriving aspects. For testingthe statistically significance of the results, the Wilcoxonsigned-rank test [44] was used rather than the paired t-testbecause the latter needs the normal distribution assump-tion and has serious weaknesses mentioned in [45]. FromTable 3 there is four drug datasets and each is tested withfive different values of K, thus this gives a total of 20 tests.Since PAMM has the highest accuracy in all the cases, thevalue of Wilcoxon test statistic is 0 when PAMM com-pared with other algorithms. Hence, PAMM significantlyperforms better than others even with 0.01 significancelevel.

4.4 Illustration of Derived Aspects and HumanAssessment

For saving space, we only present the aspects for the drugcitalopram.2 Also, only NMF, SSNMF, PAMM are usedto identify the aspects. NMF and SSNMF are includedbecause they are the next best unsupervised algorithmand supervised algorithm, respectively, in classificationand mean PMI evaluations as described in the previ-ous subsection. Moreover, it is much easier for humansto simultaneously compare and contrast three sets oftables rather than five sets of tables. Since NMF is unsu-pervised, it may not be fair and convenient for it tobe compared with other two supervised algorithms. In

2. The derived aspects for the drugs are illustrated athttp://www.comp.hkbu.edu.hk/%7Evictor/results.html.


TABLE 3Classification Accuracy of Held-Out Reviews Using the Aspects Derived from Different Algorithms (Only 20 Highest Probability

Words are Used from Each Aspect; the Values in Parentheses are the Standard Deviations of 5-Fold Cross Validation)

order to alleviate this situation, the reviews were firstlygrouped according to the satisfaction class labels and NMFwas applied separately to obtain satisfaction aspects anddissatisfaction aspects. For complete illustration of all the

TABLE 4Aspects Identified by Using PAMM on the Drug Citalopram

derived aspects in this paper, the number of aspects forall derivations was set to four. Tables 4–6 demonstrate thederived aspects relating to patient satisfaction and dissat-isfaction for the drug citalopram. For each aspect in thetables, only the ten highest probability/value words areshown.

Referring to Tables 4–6, it seems that all three algorithmscan find some aspects related to satisfaction and dissatis-faction. Some patients felt great and better with the drugand had no side effects while their depression could becontrolled well. In contrast, by inspecting dissatisfactionaspects, some other patients had severe side effects suchas different kinds of pain and they even stopped takingthe drug. Though the reviews were pre-grouped for NMF,there were still a few aspects that do not relate to satisfac-tion. For example, the first satisfaction aspect (the top firstcolumn of Table 5) is about dosage and usage of the drug.The top third aspect of Table 6, derived with SSNMF, alsomay not be related to satisfaction.

Although the quality of aspects has been assessed withclassification accuracy and mean PMI evaluations, it isdesirable to obtain human assessment as humans are themain users of them. A survey was formulated with agroup of thirty-one volunteers to assess the results obtainedwith different algorithms for the drug citalopram. Twenty-eight of them were the students of the Hong Kong Baptist


TABLE 5Aspects Identified by Using NMF on the Drug Citalopram

TABLE 6Aspects Identified by Using SSNMF on the Drug Citalopram

University and the rest were personnel working in pharma-cies. They were presented with Tables 4–6 simultaneouslyand asked to assign appropriate scores (higher is better)according to the four perspectives described as follows:

• Content relatedness : aspects are related to satisfac-tion/dissatisfaction (score: 1–10).

• Understandability : the underlying semantic mean-ing of aspects are easy to be captured by humans(score: 1–10).3

• Word cohesiveness : words of individual aspects arerelated to each other or correlated to an underlyingsemantic meaning (score: 1–10).

3. Since each aspect is believed to have one or more underlyingtopics or themes, understandability refers to the ease of capturing themeaning of these topics.

TABLE 7Human Assessment on Different Aspect Mining Algorithms

• Content diversity : contents of aspects are less over-lapping (score: 1–10).

These four perspectives are the criteria that we con-sider that good aspects should possess. Aspects derivedfrom an algorithm should not only be closely related tothe desired contexts, they should also be easily under-stood by people. Identified words of an aspects shouldbe informative and closely related to each other and themain underlying semantic meaning. On the other hand,the derived aspects should be less overlapping thus diversecontents of the studying corpus can be covered. In assessingunderstandability and cohesiveness, sometimes it may benecessary to link two or three words in an aspect togetherfor understanding. For example, some of the words in thefirst column of Table 4 can be linked as: “great”+“result”,“self” + “feel great”, etc. The rating scale 1-10 scores wasused instead of 1-5 scores because it is necessary to differ-entiate the quality of the aspects of three tables and finerrating scale is more appropriate.

The results of this survey are shown in Table 7. Fromthe table, aspects derived with PAMM consistently have thehighest scores in all perspectives while SSNMF is rankedsecond. NMF is rated the worst even though the reviewswere grouped according to the satisfaction label beforeapplying the algorithm. Among the volunteers, about 68%voted for the table derived from PAMM and 16% for NMFand SSNMF.

In assessing the inter-annotator agreement of this survey,we resorted to Krippendorff’s alpha [46], rather than kappastatistic [47], as this is a popular statistical measure for mul-tiple raters and multiple entities while kappa statistic isusually for two raters and two entities. The Krippendorff’salpha for the survey results shown in Table 7 is 0.29. Thisresult shows that the agreement is not overly strong, how-ever, in considering that the perspectives are highly abstrac-tive, the assessments are highly subjective, we believe thatthe results are reliable.

5 ASPECTS RELATING TO GENDERS

Apart from satisfaction/dissatisfaction, finding aspectsrelating to different segments of patents is also of inter-est. For instance, reviews can be labeled according to thegenders or age groups. This section illustrates the aspectsthat mostly related to female patients and male patients.


TABLE 8Derived Aspects Using PAMM on the Drug Citalopram

TABLE 9Derived Aspects Using NMF on the Drug Citalopram

In this study, female patients were labeled 0 and malewere labeled 1. The class size imbalance might pose dif-ficulties and challenges to aspect mining algorithms, asthere were 1730 reviews posted by female patients andonly 460 were posted by male patients (reviews withunidentified gender were ignored). Like the previous sec-tion, only the results for the drug citalopram using NMF,SSNMF and PAMM are presented. Tables 8–10 illustrate theaspects derived with different algorithms. To summarize,the aspects mainly correlated to specific genders are givenbelow:

• Female patients:

– mood swing– sleepy and tiredness– weight gain

TABLE 10Derived Aspects Using SSNMF on the Drug Citalopram

TABLE 11Human Assessment on Different Aspect Mining Algorithms

• Male patients:

– sex organ– sex affairs

We have searched the corpus the words appearing in theaspects and confirmed that most of the sex related wordsare highly correlated with male patient reviews and thewords in female patient aspects such as “mood”, “swing”,“weight gain” are highly correlated with female patientreviews.

We also conducted human assessment on the derivedaspects following the arrangement described in Section 4.4except that the first assessment perspective, “content relat-edness”, was changed to “content specificness” (words arerelated to a specific aspect/topic). It is because the con-tent of the aspects corresponding to different genders isnot known beforehand. This new perspective is includedas the desired aspects should convey specific and clearcontents rather than just give general terms such as “sideeffects”, “panic attack”, etc. From the survey results shownin Table 11, PAMM is still consistently preferred in all per-spectives and large proportions of the survey responsesrated the table as having the best aspects. This shows thatPAMM is significantly better than other algorithms andperforms well in all perspectives.


6 CONCLUSION AND DISCUSSION

Nowadays, online reviews, blogs and discussion forumsfor different kinds of products and services are pervasive.Extracting information from these substantial bodies oftexts is useful and challenging. In particular, it is helpfulto identify the aspects of a product that people are happyto with or finding the aspects that may anger customers.As human lifespan becomes longer and our living environ-ment becomes increasingly polluted, medical domain datamining becomes one of the focused research areas. In thispaper, we propose PAMM for mining aspects relating tospecified labels or groupings of drug reviews.

Comparing with other supervised topic modeling algo-rithms, PAMM has a unique feature that it focuses onderiving aspects for one class only. This feature reduces theopportunities of forming aspects from reviews of differentclasses and hence the derived aspects are easier for peopleto interpret. Unlike the intuitive approach in which reviewsare first grouped according to their classes and followed byinferring aspects for individual groups, PAMM uses all thereviews and finds the aspects that are helpful in identifyingthe target class. The experimental results in Section 4.3 haveshown that the aspects obtained with PAMM give higherclassification accuracy.

Parameter estimation of PAMM is not complex asonly one matrix needs to be estimated from the trainingdata. This matrix can be obtained by using the algorithmdescribed in Fig. 4. Experiments on reviews of four dif-ferent drugs showed that the aspects found were betterthan some other popular unsupervised or supervised algo-rithms, measured with mean pointwise mutual informationand classification accuracy. Apart from the quantitativeassessments, the aspects were assessed by a group of peoplebased on four different perspectives and PAMM obtainedthe highest score. The model was also applied to findingthose aspects relating to the genders of patients. Its perfor-mance advantage over other approaches is more prominentas very specific aspects are discovered.

Like other clustering or dimensionality reduction algo-rithms, the number of aspects needs to be determinedmanually. If the value is too small, significant aspectsmay be missed. Conversely, having too large a value maycause insignificant aspects or even unrelated aspects to beincluded. This is an area which will benefit from furtherresearch. It is worth noting that the patient reviews shouldnot be treated as random samples. They were reportedvoluntarily and hence may contain some degree of bias.Moreover, The contents they reported may not be com-prehensive as there is no systematic reporting guidance.Patients might be more interested in reporting somethingthat they are concerned with. Therefore, results derivedfrom the reviews should not be used to directly comparewith formal clinical trials. On the other hand, clinical tri-als are costly and very time consuming. It usually takesa few years or even over a decade to finish. Their sam-ple sizes are usually not large enough to give significantconclusions. Thus, studying of patient reviews provides avalue reference from the patient’s points of view.

For future work, it is interesting to apply the modelto find aspects relating to different segmentation of datasuch as different age groups or other attributes. It is also

useful to work with aspect interpretation as aspects are nowrepresented by a list of keywords. If a few sentences canbe extracted or generated automatically to summarize thekeywords, interpretation and understanding will be greatlyimproved.

REFERENCES

[1] T. O’Reilly, “What is web2.0: Design patterns and business mod-els for the next generation of software,” Univ. Munich, Germany,Tech. Rep. 4578, 2007.

[2] D. Giustini, “How web 2.0 is changing medicine,” BMJ, vol. 333,no. 7582, pp. 1283–1284, 2006.

[3] M. Hu and B. Liu, “Mining and summarizing customer reviews,”in Proc. 10th ACM SIGKDD Int. Conf. KDD, Washington, DC, USA,2004, pp. 168–177.

[4] B. Pang and L. Lee, “Opinion mining and sentiment analysis,”Found. Trends Inf. Ret., vol. 2, no. 1–2, pp. 1–135, Jan. 2008.

[5] A.-M. Popescu and O. Etzioni, “Extracting product features andopinions from reviews,” in Proc. Conf. Human Lang. Technol. Emp.Meth. NLP, Stroudsburg, PA, USA, 2005, pp. 339–346.

[6] L. Zhuang, F. Jing, and X. Zhu, “Movie review mining and sum-marization,” in Proc. 15th ACM CIKM, New York, NY, USA, 2006,pp. 43–50.

[7] Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai, “Topic sentimentmixture: Modeling facets and opinions in weblogs,” in Proc. 16thInt. Conf. WWW, New York, NY, USA, 2007, pp. 171–180.

[8] S. Moghaddam and M. Ester, “Aspect-based opinion mining fromonline reviews,” in Proc. Tutorial 35th Int. ACM SIGIR Conf., NewYork, NY, USA, 2012.

[9] B. Liu, M. Hu, and J. Cheng, “Opinion observer: Analyzing andcomaring opinions on the web,” in Proc. 14th Int. Conf. WWW,New York, NY, USA, 2005, pp. 342–351.

[10] C. Lin and Y. He, “Joint sentiment/topic model for sentimentanalysis,” in Proc. 18th ACM CIKM, New York, NY, USA, 2009,pp. 375–384.

[11] I. Titov and R. McDonald, “A joint model of text and aspect rat-ings for sentiment summarization,” in Proc. 46th Annu. MeetingACL, 2008, pp. 308–316.

[12] S. Baccianella, A. Esuli, and F. Sebastiani, “Multi-facet rating ofproduct reviews,” in Proc. 31st ECIR , Berlin„ Germany, 2009,pp. 461–472.

[13] W. Jin, H. Ho, and R. Srihari, “Opinionminer: A novel machinelearning system for web opinion mining and extraction,” in Proc.15th ACM SIGKDD Int. Conf. KDD, New York, NY, USA, 2009,pp. 1195–1204.

[14] Y. Jo and A. Oh, “Aspect and sentiment unification model foronline review analysis,” in Proc. 4th ACM Int. Conf. WSDM, NewYork, NY, USA, 2011, pp. 815–824.

[15] J. Sarasohn-Kahn, “The wisdom of patients: Health care meetsonline social media,” California Healthcare Foundation, Tech.Rep., 2009.

[16] K. Denecke and W. Nejdl, “How valuable is medical social mediadata? content analysis of the medical web,” J. Inform. Sci., vol. 179,no. 12, pp. 1870–1880, 2009.

[17] X. Ma, G. Chen, and J. Xiao, “Analysis on an online health socialnetwork,” in Proc. 1st ACM Int. Health Inform. Symp., New York,NY, USA, 2010, pp. 297–306.

[18] A. Névéol and Z. Lu, “Automatic integration of drug indica-tions from multiple health resources,” in Proc. 1st ACM Int. HealthInform. Symp., New York, NY, USA, 2010, pp. 666–673.

[19] J. Leimeister, K. Schweizer, S. Leimeister, and H. Krcmar, “Dovirtual communities matter for the social support of patients?Antecedents and effects of virtual relationships in online communi-ties,” Inform. Technol. People, vol. 21, no. 4, pp. 350–374, 2008.

[20] R. Schraefel, R. White, P. André, and D. Tan, “Investigating websearch strategies and forum use to support diet and weight loss,”in Proc. 27th CHI EA, New York, NY, USA, 2009, pp. 3829–3834.

[21] J. Zrebiec and A. Jacobson, “What attracts patients with diabetesto an internet support group? A 21-month longitudinal websitestuey,” Diabetic Med., vol. 18, no. 2, pp. 154–158, 2008.

[22] T. Mitchell, Machine Learning. Boston, MA, USA: McGraw-Hill,1997.

[23] R. Agrawal and R. Srikant, “Fast algorithms for mining associa-tion rules,” in Proc. 20th Int. Conf. VLDB, San Francisco, CA, USA,1994, pp. 487–499.


[24] C. Manning and H. Schütze, Foundations of Statistical NaturalLanguage Processing. Cambridge, MA, USA: MIT Press, 1999.

[25] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet allocation,”J. Mach. Learn. Res., vol. 3, pp. 993–1022, Jan. 2003.

[26] T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum, “Integratingtopics and syntax,” in Proc. Adv. NIPS, 2005, pp. 537–544.

[27] D. Blei and J. Lafferty, “Correlated topic models,” in Proc. Adv.NIPS, 2006, pp. 147–154.

[28] A. McCallum and X. Wang, “Topic and role discovery in socialnetworks with experiments on enron and academic email,”J. Artif. Intell. Res., vol. 30 no. 1, pp. 249–272, 2007.

[29] D. Putthividhya, H. Attias, and S. Nagarajan, “Independent factortopic models,” in Proc. 26th Annu. Int. Conf. Mach. Learn., NewYork, NY, USA, 2009, pp. 833–840.

[30] M. Rosen-Zvi, C. Chemudugunta, T. Griffiths, P. Smyth, andM. Steyvers, “Learning author-topic models from text corpora,”ACM TOIS, vol. 28, no. 1, pp. 1–38, Jan. 2010.

[31] D. Blei and J. Mcauliffe, “Supervised topic models,” in Proc. Adv.NIPS, 2007, pp. 121–128.

[32] D. Mimno and A. McCallum, “Topic models conditioned onarbitary features with Dirichlet-multinomial regression,” in Proc.24th Conf. Uncertain. Artif. Intell., 2008, pp. 411–418.

[33] S. Lacoste-Julien, F. Sha, and M. Jordan, “DiscLDA: Discriminativelearning for dimensionality reduction and classification,” in Proc.Adv. NIPS, 2008, pp. 897–904.

[34] D. Ramage, D. Hall, R. Nallapati, and C. Manning, “Labeled LDA:A supervised topic model for credit attribution in multi-labeledcorpora,” in Proc. Conf. EMNLP, Stroudsburg, PA, USA, 2009,pp. 248–256.

[35] D. Lee and H. Seung, “Algorithms for non-negative matrixfactorization,” in Proc. Adv. NIPS, 2000, pp. 556–562.

[36] W. Xu, X. Liu, and Y. Gong, “Document clustering based onnon-negative matrix factorization,” in Proc. 26th Annu. Int. ACMSIGIR Conf. Res. Develop. Inform. Ret., New York, NY, USA, 2003,pp. 267–273.

[37] D. Donoho and V. Stodden, “When does non-negative matrix fac-torization give a correct decomposition into parts?” in Proc. Adv.NIPS, 2003.

[38] H. Lee, J. Yoo, and S. Choi, “Semi-supervised nonnegative matrixfactorization,” IEEE Signal Process. Lett., vol. 17, no. 1, pp. 4–7,Jan. 2010.

[39] M. Tipping and C. Bishop, “Probabilistic principal componentanalysis,” J. Roy. Statist. Soc., vol. 61, no. 3, pp. 611–622, 1999.

[40] D. J. Bartholomew, Latent Variable Models and Factor Analysis.London, U.K.: Charles Griffin & Co.Ltd., 1987.

[41] A. Basilevsky, Statistical Factor Analysis and Related Methods. NewYork, NY, USA: Wiley, 1994.

[42] K. Petersen and M. Pedersen. (2012). The Matrix Cookbook,Technical University Denmark [Online]. Available:http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=3274

[43] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likeli-hood from incomplete data via the EM algorithm,” J. Roy. Statist.Soc. Ser. B (Methodological), vol. 39, no. 1, pp. 1–38, 1977.

[44] F. Wilcoxon, “Individual comparisons by ranking methods,”Biometrics, vol. 1, no. 6, pp. 80–83, 1945.

[45] J. Demšar, “Statistical comparisons of classifiers over multipledata sets,” J. Mach. Learn. Res., vol. 7, pp. 1–30, Jan. 2006.

[46] Wikipedia. (2013, Aug. 25). Krippendorff’s Alpha [Online].Available: http://en.wikipedia.org/wiki/Krippendorff’s_alpha

[47] A. Viera and J. M. Garrett, “Understanding interobserver agree-ment: The kappa statistic,” Family Med., vol. 37, no. 5, pp. 360–363,2005.

Victor C. Cheng received the PhD degree fromHong Kong Baptist University in 2011 and theBEng and MPhil degrees from The Hong KongPolytechnic University in 1990 and 1993, respec-tively. He is currently a postdoctoral researcherin the Department of Computer Science, HongKong Baptist University. His research interestsinclude text mining, dimensionality reduction,graphical models, and Bayesian learning.

C.H.C. Leung received the BSc degree (withfirst class honors) in mathematics from McGillUniversity, the MSc degree in mathematics fromthe University of Oxford, and the PhD degree incomputer science from the University of London.Before joining the university sector, he held var-ious technical positions in industry in the UnitedKingdom. Before joining Hong Kong BaptistUniversity as a professor, he held the foundationchair in computer science at Victoria University,Melboume, and prior to that, held an established

chair in computer science at the University of London. His servicesto the research community include serving as a program chair, pro-gram cochair, keynote speaker, and panel expert of major internationalconferences. He also served on the program committee and steeringcommittee of major international conferences. In addition to contributingto the editorship of a number of international journals, he has servedas the chairman of the International Association for Pattern RecognitionTechnical Committee on Multimedia and Visual Information Systems, aswell as on the International Standards (ISO) MPEG-7 Committee, whichwas responsible for generating standards for digital multimedia.

Jiming Liu received M.Eng. and Ph.D.degrees in Electrical Engineering from McGillUniversity, Montreal, after obtaining the Masterof Arts degree from Concordia University andB.Sc. degree in Physics from East ChinaNormal University, Shanghai. Before 1994, heworked in the IT industry in Canada includ-ing Computer Research Institute of Montreal(CRIM), Virtual Prototypes Inc. (VPI), andKnowledge Engineering Technology Inc.(KENTEK)). He is currently Chair Professor

in the Department of Computer Science and Associate Dean of theFaculty of Science at Hong Kong Baptist University. His researchareas includes: data-driven models, complex systems engineering,data-mining in large-scale social/complex networks, multi-agentautonomy-oriented computing (AOC), collective intelligence, emergingweb intelligence (WI) services and systems, and self-adaptive andself-organizing systems software. Dr. Liu has served as an Editor-in-Chief of Web Intelligence and Agent Systems, Associate Editor of IEEETransactions on Knowledge and Data Engineering (2005-2009), IEEETransactions on Systems, Man, and Cybernetics, Part B (2009-), andComputational Intelligence (2007-), among others, and an EditorialBoard Member of several international journals. He is the Chair of IEEEComputer Society Technical Committee on Intelligent Informatics andCo-Director of Web Intelligence Consortium. He is a fellow of the IEEE.

Alfredo Milani received a Laurea degree inScience dell’Informazione from University of Pisain 1987. Since 1987, he was a research fel-low at National Research Council Institute forComputational Linguistic, ILC-CNR, Pisa, Italy,and professional consultant of University of Pisa.In 1990, he obtained the position of tenuredresearcher in the Department of Mathematicsat the University of Perugia. Currently, he isan Associate Professor of Computer Science inthe Department of Mathematics and Computer

Science at the University of Perugia. His research interests involve thearea of Artificial Intelligence and focus on automated planning and evo-lutionary algorithms, with applications to web based adaptive systems,image processing, and more generally knowledge based technologies.

� For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

probabilistic aspect mining model for drug reviews

Documents