leveraging structural and semantic correspondence for ...leveraging structural and semantic...

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processingand the 9th International Joint Conference on Natural Language Processing, pages 5528–5538,Hong Kong, China, November 3–7, 2019. c©2019 Association for Computational Linguistics

5528

Leveraging Structural and Semantic Correspondence forAttribute-Oriented Aspect Sentiment Discovery

Zhe ZhangWatson Group

IBM CorporationResearch Triangle Park, NC 27703-9141

[email protected]

Munindar P. SinghDepartment of Computer ScienceNorth Carolina State University

Raleigh, NC [email protected]

AbstractOpinionated text often involves attributes suchas authorship and location that influence thesentiments expressed for different aspects. Weposit that structural and semantic correspon-dence is both prevalent in opinionated text, es-pecially when associated with attributes, andcrucial in accurately revealing its latent aspectand sentiment structure. However, it is not rec-ognized by existing approaches.

We propose Trait, an unsupervised probabilis-tic model that discovers aspects and senti-ments from text and associates them with dif-ferent attributes. To this end, Trait infersand leverages structural and semantic corre-spondence using a Markov Random Field.We show empirically that by incorporating at-tributes explicitly Trait significantly outper-forms state-of-the-art baselines both by gener-ating attribute profiles that accord with our in-tuitions, as shown via visualization, and yield-ing topics of greater semantic cohesion.

1 Introduction

Opinionated text is often associated with differ-ent attributes—latent variables that serve as ref-erence frames relative to which the underlying as-pects and sentiments are expressed. Common at-tributes in consumer reviews include author type(e.g., business traveler or tourist for hotel reviews;location for reviews of music (McDermott et al.,2016); culture on reviews of food (Bahauddin andShaarani, 2015)). Whereas current approachesconsider attributes in a one-off manner in eachapplication, we posit that attributes can be sys-tematically extracted if we can properly capturethe structural and semantic correspondence that isprevalent in opinionated text. We claim that ig-noring attributes may lead to biased inference onaspects and sentiments. As evidence, we demon-strate an approach that outperforms the state of theart and yields intuitive and cohesive topics.

We propose Trait, a general model for discover-ing attribute-oriented aspects and sentiments fromtext. By incorporating attributes, Trait automat-ically generates profiles that describe attributes interms of sentiments and aspects. To leverage struc-tural and semantic correspondence, Trait applies aMarkov Random Field as regularization over sen-tences during inference. We evaluate Trait on fourdatasets from two domains and consider three at-tributes. Trait successfully discovers aspects asso-ciated with sentiments; the generated word clus-ters are more cohesive than the state-of-the-artbaselines; the generated attribute profiles are wellcorrelated with ground truth.

Motivating Example. Figure 1 presents two ho-tel reviews from TripAdvisor. We manually as-sign aspect labels for sentences and calculate pair-wise cosine similarity between sentences usingsentence embedding from a pretrained sentenceencoding model (Cer et al., 2018).

Review A and Review B, which mention as-pects Room, Location, and Type, exhibit structuraland semantic correspondence. We posit that thecorrespondence of Location and Type is a result ofthe attribute value, Las Vegas, common to the tworeviews. Location is a crucial aspect for hotels inLas Vegas. On a randomly selected set of 5,000hotel reviews for Las Vegas, we observe that 4,281sentences from 2,624 reviews mention the location“Strip.” Using a similarity threshold of 0.6, we ob-tain 1,929 sentences similar to sentence A3 from1,519 reviews, including sentence B3 in Review B.We obtain 133 sentences similar to sentence B4from 128 reviews including A4 in Review A. Fig-ure 1 shows some of these sentences. Likewise,using authorship as an attribute, we observe thatusers stick to their writing styles. For example, inhotel reviews, some users describe the conditionof a room and others share travel tips.

5529

Walk distance to the strip

It’s only a few minute walk from the strip

Just off the strip within walk distance

It's within driving distance of the strip

It takes about #TIME to walk to the strip

Nice hotel close enough to strip to walk

We stayed in a 2 bedroom suite, very comfortable.

The staff is very friendly and helpful.

A convenient walking distance from the strip.

Great for families.

<ROOM>

<SERVICE>

<LOCATION>

<TYPE>

The rooms were spacious, … nicely decorated.

The pool area is also very nice.

It's within driving distance of the strip.

It is fantastic for family.

<ROOM>

<AMENITY>

<LOCATION>

<TYPE>

Great for families

Excellent for family with thing for the kid to do

It is perfect for our family of 5 with kid age

It is great for ages

It is great for all kids

As a family this place is perfect

0.76

0.86

0.74

0.72

0.74

0.69

0.64

0.76

0.86

A1:

A2:

A3:

A4:

Sentences similar to A3

Review A Review B

B1:

B2:

B3:

B4:

Sentences similar to B4

0.83

0.76

0.83

0.79

0.73

0.71

Figure 1: A motivating example.

Contributions and Novelty. Our contributionsinclude: (1) a general model that generates at-tribute profiles associating aspects and sentimentswith attributes in text; (2) empirical results demon-strating the benefit of incorporating attributes on amodel’s quality; and (3) empirical results demon-strating generalizability using diverse attributesand the quality of the generated attribute profiles.

Trait’s novelty lies in its ability to accommodateattributes. First, it is general across attributes asopposed to being limited to predefined attributes.Second, the handling of attributes means that Traitavoids overfitting to the more prevalent attributesin a dataset. That is, Trait can learn a more refinedconditional probability distribution that incorpo-rates specific attributes than otherwise possible.Ignoring the observable attribute variables wouldrelax the constraints on the distribution, meaningthat the learned approximate distribution would bebiased toward the majority attribute.

Summary of Findings. We demonstrate that in-corporating attributes into generative models pro-vides a superior, more refined representation ofopinionated text. The resulting model generatestopics with high semantic cohesion. We show thatMarkov Random Field can be used for effectivelycapturing structural and semantic correspondence.

2 Related Work

Generative probabilistic modeling has beenwidely applied for unsupervised text analysis.Given the observed variables, e.g., tokens in doc-uments, a generative probabilistic model defines aset of dependencies between hidden and observedvariables that encodes statistical assumptions

underlying the data. Latent Dirichlet Allocation(LDA) (Blei et al., 2003), a well-known topicmodel, represents a document as a mixture oftopics, each topic being a multinomial distributionover words. The learning process approximatesthe topic and word distributions based on theirco-occurrence in documents.

Many efforts guide the topics learned by incor-porating additional information. Rosen-Zvi et al.’s(2004) Author Topic model (AT) captures author-ship by building a topic distribution for each au-thor. When generating a word in a document, ATconditions the probability of topic assignment onthe author of the document. Kim et al.’s (2012)model captures entities mentioned in documentsand models the probability of generating a wordas conditioned on both entity and topic. Diaoand Jiang (2013) jointly model topics, events, andusers on Twitter. Trait goes beyond these mod-els by incorporating sentiments and attributes in aflexible way, which eliminates the model’s depen-dency on specific attribute types.

Several probabilistic models tackle opinion-ated text. Titov and McDonald (2008b) handleglobal and local topics in documents. JST (Linet al., 2012) and ASUM (Jo and Oh, 2011) modela review via multinomial distributions of topicsand sentiments used to condition the probabilityof generating words. Kim et al. (2013) extendASUM’s probabilistic model to discover a hierar-chical structure of aspect-based sentiments. Wanget al.’s (2016) topic model discovers aspect, senti-ment, and both general and aspect-specific opinionwords. Whereas these models identify aspects andsentiments, they disregard attribute information.

Titov and McDonald (2008a) discover top-

5530

ics using aspect ratings provided by reviewers.Mukherjee et al.’s (2014) JAST considers authorsduring aspect and sentiment discovery. Poddaret al.’s (2017) AATS jointly considers author,aspect, sentiment, and the nonrepetitive genera-tion of aspect sequences via a Bernoulli process.Zhang and Singh (2018)’s model jointly capturesaspect, sentiment, author, and discourse relations.Trait is novel in that, unlike the above models, it isnot tied to a specific attribute.

3 Model and Inference

We now introduce Trait’s model and inferencemechanism.

3.1 Sentence Embeddings

Measuring semantic similarity between sentencesis integral to capturing the structural and seman-tic correspondence among reviews: high similar-ity indicates a high degree of correspondence. Ceret al. (2018) propose a pretrained sentence encod-ing model, Universal Sentence Encoder (USE).USE is based on Vaswani et al.’s (2017) attention-based neural network. Perone et al. (2018) showUSE yields the best results among sentence em-bedding techniques on semantic relatedness andtextual similarity tasks. Trait adopts USE to gen-erate sentence embeddings and cosine similarity tomeasure semantic similarity between sentences.

3.2 Structural and Semantic Correspondence

A Markov Random Field (MRF) defines a jointprobability distribution over a set of variablesgiven the dependencies based on an undirectedgraph. The joint distribution is a factorized prod-uct of potential functions.

To capture structural and semantic correspon-dence, Trait defines an MRF over latent aspectsof sentences. Given a set of reviews Da shar-ing a common attribute a, for sentence l in Da,Trait creates its corresponding sentence set L byadding sentence li in Da if the semantic similar-ity between sentence li and l is larger than a presetthreshold ρ. For each pair of l and li, Trait createsan undirected edge between the aspects associatedwith the two sentences (tl, tli). To promote l andsentences in L having a high probability of associ-ating with the same aspect, Trait defines a binaryedge potential, exp{I(tl, tli)}, where I(·) is an in-dicator function. This binary potential produces alarge value if the two sentences have the same as-

pect and a small value otherwise. Given attributea, sentiment s, and a document consisting of Nsentences, Trait computes the joint probability ofaspect assignments of sentences as:

p(t|ψs,a,λ) =N∏l

p(tl|ψs,a)

exp

{λ

∑(tl,tli )∈El

I(tl = tli)

|El|

},

(1)

where ψs,a is the aspect distribution given senti-ment s and attribute a; parameter λ controls thereinforcing effects of correspondence regulariza-tion; and El is the set of undirected edges for l.

3.3 Generative ProcessTo capture the desired associations, given an at-tribute type, Trait generates a mixture over senti-ments and aspects for each attribute value. Traitassumes that reviews are mixtures of sentimentsand considers sentences the basic unit for asentiment-aspect pair.

D

S

MN

T

w

↵

�

AS

�

✓

�

a

…

Markov Random Field

tln

sl tl

tl1 tl2

Figure 2: Generative process of Trait.

Figure 2 shows Trait’s model. Hyperparameterα is the Dirichlet (Dir(·)) prior of the word distri-bution φ; β is the Dirichlet prior of the sentimentdistribution θ; and γ is the Dirichlet prior of theaspect distribution ψ. Given a set of reviews D as-sociated with a set of attribute values A over a setof aspects T and a set of sentiments S, each reviewcontains M sentences and each sentence containsN words. Trait’s generative process is as follows.

First, for each pair of aspect t and sentiments, draw a word distribution φt,s ∼ Dir(α). Sec-ond, for each attribute value a and each sentiment

5531

s, draw an aspect distribution ψs,a ∼ Dir(γ).Third, given a review d with attribute a, drawa sentiment distribution θd ∼ Dir(β), and foreach sentence in d, (1) choose a sentiment s ∼Multinomial(θd); (2) given s, choose an aspectt ∼ Multinomial(ψs,a); (3) given t and s, sampleword w ∼ Multinomial(φt,s).

Trait estimates p(s, t|w, a), the posterior distri-bution of latent variables, sentiments s, and as-pects t, given all words used in reviews involvingattribute a. We factor the joint probability of as-signments of sentiments, aspects, and words for a:

p(s, t,w|a) = p(w|s, t)p(t|s, a)p(s). (2)

3.4 InferenceWe use collapsed Gibbs sampling (Liu, 1994) forposterior inference. By integrating over Φ ={φi}S×Ti=1 , we obtain Equation 2’s first term (Sec-tion 4.1 explains αv).

p(w|s,t,α)=

∫p(w|s,t,Φ)p(Φ|α)dΦ

=

(Γ(∑W

v=1αv)∏Wv=1Γ(αv)

)S×T

×S∏

s=1

T∏t=1

∏Wv=1Γ(nvs,t+αv)

Γ[∑W

v=1(nvs,t+αv)

],(3)

whereW is the vocabulary size; nvs,t is the numberof occurrences of word v assigned to sentiment sand aspect t; and Γ(·) is the Gamma function.

Next, by integrating over Ψa = {ψi}Si=1, wecalculate the second term in Equation 2 as (Sec-tion 4.1 explains γt):

p(t|s,γ,a)=

∫p(t|s,Ψa,a)p(Ψa|γ)dΨa

=

(Γ(∑T

t=1γt)∏Tt=1Γ(γt)

)S

×S∏

s=1

∏Tt=1Γ(nts,a+γt)

Γ[∑T

t=1(nts,a+γt)

]×

M∏m=1

exp

{λ

∑tj∈Lm

I(tj=t)

|Lm|

},

(4)

where nts,a equals the number of sentences in re-views associated with attribute a, sentiment s, andaspect t; M is the number of sentences in reviews;Lm is the set of sentences corresponding to sen-tence m.

Similarly, for the third term in Equation 2, byintegrating over Θ = {θi}Di=1, we obtain (Sec-tion 4.1 explains βs):

p(s|β)=

∫p(s|Θ)p(Θ|β)dΘ

=

(Γ(∑S

s=1βs)∏Ss=1Γ(βs)

)D

×D∏

d=1

∏Ss=1Γ(nsd+βs)

Γ[∑S

s=1(nsd+βs)

] ,(5)

where D is the number of reviews; nsd is the num-ber of times that a sentence from review d is asso-ciated with sentiment s; and nd is the number ofsentences in review d.

For each sweep of a Gibbs iteration, we samplelatent aspect t and sentiment s as follows:

p(si = s, ti = t|s−i, t−i,w, a)

∝nts,a,−i + γt∑T

t=1(nts,a,−i + γt)

×nsd,−i + βs∑S

s=1(nsd,−i + βs)

×∏

v∈Wi

∏Civ−1

c=0 (nvs,t,−i + αv + c)∏Ci−1c=0 (ns,t,−i +

∑Wv=1 αv + c)

× exp{λ

∑tj∈Li

I(tj = t)

|Li|

},

(6)

where nts,a is the number of sentences from re-views associated with attribute a, sentiment s, andaspect t; nsd is the number of sentences from re-view d associated with sentiment s; Wi is the setof words in sentence i. Ci

v is the count of word v insentence i; Ci is the number of words in sentencei; nvs,t is the number of words v assigned senti-ment s and aspect t; ns,t is the number of wordsassigned sentiment s and aspect t in all reviews; Li

is the set of corresponding sentences of sentence i;and an index of −i indicates excluding sentence ifrom the count.

Equations 7, 8, and 9, respectively, approximatethe probabilities of word w occurring given senti-ment s and aspect t; of aspect t of a sentence oc-curring given sentiment s and attribute a; of senti-ment s occurring given document d.

φs,t,w =nws,t + αw∑W

v=1(nws,t + αv)

, (7)

ψs,t,a =nts,a + γt∑T

t=1(nts,a + γt)

, (8)

θd,s =nsd + βs∑S

s=1(nsd + βs)

. (9)

The generalized Polya Urn model (Mahmoud,2008) has been used for encoding word co-

5532

occurrence information into topic models. Con-sider an urn containing a mixture of balls, eachof which is tagged with a term, for each samplingsweep we draw a ball from the urn. In a stan-dard Polya Urn model, as used in LDA, we returnthe ball to the urn with another ball tagged withthe same term. This process provides burstinessof the probability of seeing a term but ignores thecovariance. The probability increase of one termdecreases the probability of the other words. Inthe generalized Polya Urn model, when a ball isdrawn from the urn, we replace it with two newballs with a set of balls tagged with related terms.Similar to previous models (Mimno et al., 2011;Chen et al., 2013; Fei et al., 2014; Zhang andSingh, 2018), to increase the probability of hav-ing semantically related words appear in the sametopic, Trait applies a generalized Polya Urn modelin each Gibbs sweep and uses weight ε to promoterelated words based on the cosine similarity be-tween their Word2Vec (Mikolov et al., 2013) em-beddings.

4 Evaluation

To assess Trait’s effectiveness, we select the hoteland restaurant domains and prepare four reviewdatasets associated with three attributes: author,trip type, and location. HotelUser, HotelType,and HotelLoc are sets of hotel reviews collectedfrom TripAdvisor. HotelUser contains 28,165 re-views posted by 202 randomly selected review-ers, each of whom contributes at least 100 hotelreviews. HotelType contains reviews associatedwith five trip types including business, couple,family, friend, and solo. HotelLoc contains a totalof 136,446 reviews about seven US cities, split ap-proximately equally. ResUser is a set of restaurantreviews from Yelp Dataset Challenge (2019). Itcontains 23,874 restaurant reviews posted by 144users, each of whom contributes at least 100 re-views. Table 1 summarizes our datasets. Datasetsand source code are available for research pur-poses (Trait, 2019).

Table 1: Summary of the evaluation datasets.

Statistic HotelUser ResUser HotelType HotelLoc

# of reviews 28,165 23,873 22,984 136,446# of sentences 362,153 276,008 302,920 1,428,722Sentence/Review 13 12 13 10Words/Sentence 8 7 7 7

We remove stop words and HTML tags, expandtypical abbreviations, and mark special named en-tities using a rule-based algorithm (e.g., replace aURL by #LINK# and replace a monetary amountby #MONEY#) and the Stanford named entity rec-ognizer (Finkel et al., 2005). We use Porter’s(1980) stemming algorithm. To handle negation,for any word pair whose first word is no, not, ornothing, we replace the word pair by a negatedterm, e.g., producing not work and not quiet. Fi-nally, we split each review into constituent sen-tences.

4.1 Parameter SettingsTrait includes three manually tuned hyperparam-eters that have a smoothing effect on the corre-sponding multinomial distributions. Hyperparam-eter α is the Dirichlet prior of the word distribu-tion. We use asymmetric priors based on a sen-timent lexicon. Table 2 shows Trait’s sentimentword list as prior knowledge to set asymmetric pri-ors. This list extends Turney and Littman’s (2003)list with additional general sentiment words. Forany word in the positive list, we set α to 0 ifthis word appears in a sentence assigned a nega-tive sentiment, and to 5 if this word appears in asentence assigned a positive sentiment, and con-versely for words in the negative list. For all re-maining words, we set α to 0.05. We set hyper-parameter β, the Dirichlet prior of the sentimentdistribution, to 5 for both sentiments. We set hy-perparameter γ, the Dirichlet prior of the aspectdistribution, to 50/T for all models, where T isthe number of aspects. We set the reinforcementweight of structural and semantic correspondenceλ to 1.0; sentence semantic similarity ρ to 0.7; and,related word promoting weights to 0.3 for hotel re-views and 0.1 for restaurant reviews.

4.2 Quantitative EvaluationWhether topics (word clusters) are semanticallycohesive is crucial in assessing topic modeling ap-proaches. As in previous studies (Nguyen et al.,2015a,b; Yang et al., 2017), we adopt NormalizedPointwise Mutual Information (NPMI) (Lau et al.,2014) and W2V (O’Callaghan et al., 2015) as ourevaluation metrics. Higher NPMI and W2V scoresindicate greater semantic cohesion. We compareTrait with four baselines: AT, JST, ASUM, andAATS. We perform our evaluation on HotelUserand ResUser based on the top 20 words in eachsentiment-aspect pair. We split data into five folds

5533

Table 2: Lists of sentiment words.

Positive

amazing, attractive, awesome, best, comfort-able, correct, enjoy, excellent, fantastic, fa-vorite, fortunate, free, fun, glad, good, great,happy, impressive, love, nice, not bad, perfect,positive, recommend, satisfied, superior, thank,worth

Negative

annoying, bad, complain, disappointed, hate,inferior, junk, mess, nasty, negative, not good,not like, not recommend, not worth, poor,problem, regret, slow, small, sorry, terri-ble, trouble, unacceptable, unfortunate, upset,waste, worst, worthless, wrong

and use training split to train all models. Foreach number of aspects, we conduct a two-tailedpaired t-test for each of the pairwise comparisons.Throughout, ∗, †, and ‡ indicate significance at0.05, 0.01, and 0.001, respectively.

Table 3 shows average NMPI and W2V scoresfor different numbers of aspects. AT performsworst, possibly due to missing conditions on sen-timents. ASUM and JST are comparable. Traitoutperforms all others, with the highest NMPI andW2V scores for each number of aspects. Table 4shows similar conclusions for restaurant reviews.

For both datasets, Trait’s improvements of topiccoherence over baseline models are statisticallysignificant for HotelUser (p< 0.001) and ResUser(p = 0.002). Trait allows reviews written by thesame or similar authors to have idiosyncratic pref-erences over aspects and sentiments. Trait assignsaspects to sentences by sampling attribute-specificaspect distributions. These distributions are regu-larized by the Markov Random Fields. Sentenceswith a high degree of correspondence have a highprobability to be assigned the same aspects.

4.3 Sentiment Classification

Automatically detecting the sentiment of a docu-ment is an important task in sentiment analysis.We compare Trait with JST, ASUM, and AATSfor document-level sentiment classification usingHotelUser and ResUser. We use integer ratingsof reviews to collect ground-truth labels. Reviewswith ratings at three and above are labeled as pos-itive and the rest are labeled as negative. Notethat our datasets are imbalanced. We conduct five-

Table 3: Topic coherence: Hotel reviews.

NPMI T=10 T=20 T=30 T=40 T=50 T=60

AT 3.64 4.04 4.37 4.49 4.86 5.14AATS 5.63 9.08 10.41 10.78 11.05 11.00JST 8.99 10.78 11.45 11.54 11.56 11.46ASUM 9.48 10.64 11.02 11.33 11.39 11.56Trait 15.50‡ 16.91‡ 17.31‡ 17.32‡ 16.46‡ 15.34‡

W2V T=10 T=20 T=30 T=40 T=50 T=60


Table 4: Topic coherence: Restaurant reviews.

NPMI T=10 T=20 T=30 T=40 T=50 T=60

AT 5.64 5.21 5.30 5.65 6.54 7.94AATS 6.05 8.02 9.03 9.35 9.90 9.95JST 9.46 11.13 11.73 11.92 12.14 12.31ASUM 8.81 9.7 9.92 10.09 10.07 10.04Trait 11.27‡ 13.02‡ 13.62‡ 13.18‡ 12.36 11.95

W2V T=10 T=20 T=30 T=40 T=50 T=60


fold cross-validation with the two-tailed paired t-test. For each user, we use 80% of reviews fortraining and 20% for testing. For evaluation met-rics, we adopt accuracy (Acc) and area under thecurve (AUC) of the Receiver Operating Character-istic (ROC) curve. ROC plots the true positive rateagainst the false positive rate. AUC-ROC is a stan-dard metric for evaluating classifiers’ performanceon imbalanced data.

Table 5: Accuracy and AUC of sentiment classificationon hotel reviews.

T=20 T=40 T=60

Acc AUC Acc AUC Acc AUC

AATS 0.79 0.45 0.82 0.48 0.84 0.48JST 0.61 0.82 0.64 0.83 0.67 0.84ASUM 0.80 0.83 0.84 0.84 0.87 0.83Trait 0.85† 0.86* 0.87* 0.85* 0.88 0.86†

Table 5 reports the results of sentiment classi-fication on hotel reviews. AATS achieves betteraccuracy but worse AUC scores than JST. ASUMyields better accuracy and comparable AUC scorescompared with JST. Trait consistently outperformsall baseline models given different aspect numberswith an average gain in accuracy of 3%. Incorpo-rating attributes and structural and semantic cor-

5534

respondence into conditional probability distribu-tions greatly benefit the model in capturing depen-dencies among attributes, aspects, and sentiments.

Table 6: Accuracy and AUC of sentiment classificationon restaurant reviews.

T=20 T=40 T=60

Acc AUC Acc AUC Acc AUC

AATS 0.79 0.47 0.80 0.48 0.82 0.50JST 0.59 0.71 0.61 0.73 0.64 0.73ASUM 0.80 0.78 0.84 0.78 0.87 0.74Trait 0.86† 0.79 0.87† 0.78 0.88 0.79*

4.4 Attribute Profile

Given reviews with selected attributes, we expectTrait to generate profiles representing the charac-teristics associated with those attributes. To evalu-ate profiles, we run Trait on HotelUser, HotelLoc,and HotelType, associated with three attributes:authors, locations, and trip types, respectively.

4.4.1 SummarizationTrait outputs profiles that summarize attributes interms of aspects and sentiments in reviews.

Figure 3 shows the profiles of four US citiesgenerated by Trait. We visualize the profiles asaspect-clouds using the top 30 aspects in each sen-timent. The size of the aspect label corresponds toits aspect probabilities. Due to space constraints,we place the profiles of Boston, Chicago, and Or-lando in Table 7.

These profiles yield salient summaries for eachcity. For example, Strip and Casino are the toptwo positive aspects for Las Vegas, a resort cityfor gambling. We see from the reviews that mosthotels with high ratings are located on the Strip.For Boston, Chicago, and New York, Location andPublicTrans are the top positive aspects. Thesecities rank top on the lists of U.S. cities with hightransit ridership (Ridership, 2019) and walkability(Walkability, 2019). Hotels’ proximity to publictransportation, shopping, restaurants, and attrac-tions is appealing to several reviewers. We seeRoomSize appears in the top five negative aspects.These three cities have among the most expensivehotel room rates (Statista, 2019). Assuming con-sumers expect more when they pay more, we con-jecture that a failed expectation could be causedby room size, especially in New York, where roomsizes are smaller than elsewhere in the US (NYC,2019). For Miami, Transportation is attractive,

Table 7: Top five aspects discovered by Trait.

Boston (P) Boston (N) Chicago (P) Chicago (N)

PublicTrans RoomSize Location RoomSizeLocation Service PublicTrans Service

Decor Parking Decor ParkingBar RoomNoise Bar Elevator

Service StreetNoise Service ACHotwater

Orlando (P) Orlando (N) Friend (P) Friend (N)

ThemePark Pool Tour UpgradePool Transportation Value StaffFood Supplies View BreakfastArea

Amenity Frontdesk Breakfast BathroomOverall Bed Location Breakfast

Author A (P) Author A(N) Author B (P) Author B (N)

Helpfulness NotReturn Helpfulness NotReturnView Value View ValueValue Charge Breakfast Checkin

Breakfast Checkout Value HotelSizeComfort Checkin Transportation Checkout

Author C (P) Author C (N) Author D (P) Author D (N)

Internet Elevator Staff ValueComfort Room Room NotReturnTripType Food Recommend Checkout

View StreetNoise Value RoomBreakfast Checkout TripType Charge

presumably because many cruises depart from Mi-ami.

Figure 4 shows the results for HotelType:Cleanliness, Internet, TV, and Upgrade are mostlikely to lead to a negative sentiment for businesstravelers. For couples, Atmosphere and RestAreaare most preferred positive aspects. Family, Tour,and Attraction are most positive aspects for fam-ilies. Solo travelers, on business or tourism, ex-press most opinions toward both Transportation.

Table 7 (bottom rows) lists the top five aspectsfor four authors from HotelUser. We can ob-serve strong commonality between Author A andB, They both like to express positive sentiment onHelpfulness, View, Value, and Breakfast. They arelike to express not returning a hotel and the nega-tive sentiments are mostly toward Value, Checkin,and Checkout. There is little commonality be-tween Authors C and D. For Author C, Internetand Comfort are most attractive whereas Staff andRoom are most appealing aspects for Author D.In terms of negatives, Checkout is the only aspectshared between Authors C and D.

4.4.2 Similarity

Attribute profiles can be used not only for summa-rization, but also for measuring similarity betweenattribute values with respect to aspects and senti-ments, which can support attribute-based applica-

5535

Las Vegas New York Los Angeles Miami

Figure 3: An aspect-cloud visualization of US cities (positive aspects above; negative aspects below).

Business Couple Solo Family

Figure 4: An aspect-cloud visualization of trip types. (positive aspects above; negative aspects below).

tions such as recommender systems. Our metric ofsimilarity between distinct values of the same at-tribute is the Jensen-Shannon distance (JSD) (En-dres and Schindelin, 2003), the square root ofJensen-Shannon divergence. We compute, DJS,the JSD of attribute profiles P and Q as

DJS =

√1

2DKL(P ||M) +

1

2DKL(Q||M)

M =1

2(P +Q),

(10)

where DKL(P ||Q) is the Kullback-Leibler (KL)divergence of probability distributions P ={p1, . . . , pn} and Q = {q1, . . . , qn}:

DKL(P ||Q) =∑i

pi logpiqi

. (11)

As a baseline, we use a vector space modelbased on USE sentence embeddings. We calculatemean sentence embeddings for each review. Then,given two sets of reviews, D = {d1, . . . , dm} andR = {r1, . . . , rn}, we compute their similarity asfollows (here sim(di, rj) is the cosine similaritybetween review di and rj):

SD,R =1

m+ n

m∑i=1

n∑j=1

sim(di, rj). (12)

Figure 5a shows the similarities among the pro-files of the seven cities generated by the baselinemodel. We see that Boston is close to New Yorkand Chicago; Las Vegas is far away from Los An-geles and Miami but close to New York, Boston,and Chicago. Figure 5b shows the results gener-ated by Trait. Here, dissimilarity corresponds todistance normalized to [0, 1]. Boston, Chicago,and New York are close to each other, as are LosAngeles and Miami; Las Vegas is far from eachof the others; and, Orlando is far from all exceptMiami. Trait’s results are arguably more plausiblethan what the baseline approach produces.

As discussed earlier, hotels in Boston, Chicago,and New York have common characteristics; LasVegas differs strongly from the others because itis a resort city and its major hotels are combinedwith casinos; Orlando is a tourism destination butdiffers from Las Vegas in that it is famous for localattractions, such as theme parks. Orlando’s profileexhibits that travelers there tend to be more awareof aspect Attraction than elsewhere. An interest-ing pair is Los Angeles and Miami. We see thatLocation appears as the most important aspect onthe positive side for both of them. Such a simi-larity could be partially explained by the fact thatboth Los Angeles and Miami serve as locations fortaking cruises. Also, the common aspect Safety

5536

Las Vegas

Los Angeles

Miami

Orlando

1 of 1 3/4/19, 2:43 PM

New YorkBoston

Chicago

(a) Baseline.

BostonChicago

Las Vegas

Los AngelesMiami

New York

Orlando

1 of 1 3/4/19, 2:39 PM

(b) Trait.

Figure 5: Force fields of similarities: cities.(negative for both) could increase their similarity.

Figure 6a shows similarities among the five triptypes generated by the baseline model. Businessis closer to Family than Solo and Friend; Solo iscloser to Family than Friend. Trait generates morereasonable results, as shown in Figure 6b. Busi-ness is far from others but is closer to Friend andSolo than to Couple and Family. Couple, Fam-ily, and Friend are relatively close to each other.Business reviewers attend to different aspects fromother reviewers. Further, Solo and Friend containreviews of business trips, although the authors didnot select Business as the trip type. However, thissituation does not happen for Couple and Family.

Figure 7 shows similarities among 20 authors,including the four authors mentioned in Sec-tion 4.4.1. We see that Authors A and B are closeto each other whereas Authors C and D are faraway from each other. The results are aligned withtheir aspect and sentiment profiles.

5 Discussion and Conclusion

Trait not only shows that capturing structural andsemantic correspondence leads to improved per-formance in terms of coherence and naturalnessof the aspects discovered but can also be realizedin an unsupervised framework. Trait outperformscompeting approaches across multiple datasets.

These results open up interesting directions for

Family

Friend

Couple

Business

Solo

1 of 1 3/4/19, 2:49 PM

(a) Baseline.

BusinessCouple

Family

Friend

Solo

1 of 1 3/4/19, 2:27 PM

(b) Trait.

Figure 6: Force fields of similarities: Purpose.

Author AAuthor B

Author C

Author D

1 of 1 3/4/19, 1:22 PM

Figure 7: Force field of similarities: 20 authors.future work. One direction is to learn disentan-gled latent representations for attributes in neuralnetwork’s space, such as for disentangling aspects(Jain et al., 2018), text style (John et al., 2019),and syntax and semantics (Chen et al., 2019; Baoet al., 2019). Another direction is to develop acontent-based recommender based on Trait, sinceit provides an effective unsupervised solution forgenerating profiles based on different attributes.

6 Acknowledgments

We would like to thank the anonymous reviewersfor helpful comments and corrections. MPS wouldlike to acknowledge the US Department of De-fense for partial support through the NCSU Lab-oratory for Analytic Sciences.

5537

ReferencesAhmad Riduan Bahauddin and Sharifudin Md.

Shaarani. 2015. The impact of geographical loca-tion on taste sensitivity and preference. Interna-tional Food Research Journal, 22(2):731–738.

Yu Bao, Hao Zhou, Shujian Huang, Lei Li, Lili Mou,Olga Vechtomova, Xin-yu Dai, and Jiajun Chen.2019. Generating sentences from disentangled syn-tactic and semantic spaces. In Proceedings of the57th Annual Meeting of the Association for Compu-tational Linguistics (ACL), pages 6008–6019, Flo-rence.

David M. Blei, Andrew Y. Ng, and Michael I. Jordan.2003. Latent Dirichlet allocation. Journal of Ma-chine Learning Research, 3:993–1022.

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua,Nicole Limtiaco, Rhomni St. John, Noah Constant,Mario Guajardo-Cespedes, Steve Yuan, Chris Tar,Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil.2018. Universal sentence encoder for English.In Proceedings of the 23rd Conference on Em-pirical Methods in Natural Language Processing(EMNLP): System Demonstrations, pages 169–174,Brussels.

Mingda Chen, Qingming Tang, Sam Wiseman, andKevin Gimpel. 2019. A multi-task approach for dis-entangling syntax and semantics in sentence repre-sentations. In Proceedings of the 17th Annual Con-ference of the North American Chapter of the Asso-ciation for Computational Linguistics: Human Lan-guage Technologies (NAACL-HLT), pages 2453–2464, Minneapolis.

Zhiyuan Chen, Arjun Mukherjee, Bing Liu, MeichunHsu, Malu Castellanos, and Riddhiman Ghosh.2013. Exploiting domain knowledge in aspect ex-traction. In Proceedings of the 18th Conference onEmpirical Methods in Natural Language Processing(EMNLP), pages 1655–1667, Seattle.

Qiming Diao and Jing Jiang. 2013. A unified modelfor topics, events and users on Twitter. In Proceed-ings of the 18th Conference on Empirical Methodsin Natural Language Processing (EMNLP), pages1869–1879, Seattle.

Dominik Maria Endres and Johannes E. Schindelin.2003. A new metric for probability distribu-tions. IEEE Transactions on Information Theory,49(7):1858–1860.

Geli Fei, Zhiyuan Chen, and Bing Liu. 2014. Re-view topic discovery with phrases using the PolyaUrn model. In Proceedings of the 25th Inter-national Conference on Computational Linguistics(COLING), pages 667–676, Dublin.

Jenny Rose Finkel, Trond Grenager, and Christo-pher D. Manning. 2005. Incorporating non-localinformation into information extraction systems byGibbs sampling. In Proceedings of the 43rd Annual

Meeting of the Association for Computational Lin-guistics (ACL), pages 363–370, Ann Arbor.

Sarthak Jain, Edward Banner, Jan-Willem van deMeent, Iain J. Marshall, and Byron C. Wallace.2018. Learning disentangled representations of textswith application to biomedical abstracts. In Pro-ceedings of the 23rd Conference on Empirical Meth-ods in Natural Language Processing (EMNLP): Sys-tem Demonstrations, pages 4683–4693, Brussels.

Yohan Jo and Alice Haeyun Oh. 2011. Aspect and sen-timent unification model for online review analysis.In Proceedings of the 4th ACM International Con-ference on Web Search and Data Mining (WSDM),pages 815–824, Hong Kong.

Vineet John, Lili Mou, Hareesh Bahuleyan, and OlgaVechtomova. 2019. Disentangled representationlearning for non-parallel text style transfer. In Pro-ceedings of the 57th Annual Meeting of the Asso-ciation for Computational Linguistics (ACL), pages424–434, Florence.

Hyungsul Kim, Yizhou Sun, Julia Hockenmaier, andJiawei Han. 2012. ETM: Entity topic models formining documents associated with entities. In Pro-ceedings of the 12th IEEE International Conferenceon Data Mining (ICDM), pages 349–358, Brussels.

Suin Kim, Jianwen Zhang, Zheng Chen, Alice H.Oh, and Shixia Liu. 2013. A hierarchical aspect-sentiment model for online reviews. In Proceed-ings of the 27th AAAI Conference on Artificial In-telligence (AAAI), pages 804–812, Bellevue.

Jey Han Lau, David Newman, and Timothy Baldwin.2014. Machine reading tea leaves: Automaticallyevaluating topic coherence and topic model quality.In Proceedings of the 14th Conference of the Euro-pean Chapter of the Association for ComputationalLinguistics (EACL), pages 530–539, Gothenburg.

Chenghua Lin, Yulan He, Richard Everson, and Ste-fan M. Ruger. 2012. Weakly supervised jointsentiment-topic detection from text. IEEE Trans-actions on Knowledge and Data Engineering,24(6):1134–1145.

Jun S. Liu. 1994. The collapsed Gibbs sampler inBayesian computations with applications to a generegulation problem. Journal of the American Statis-tical Association, 89(427):958–966.

Hosam Mahmoud. 2008. Polya Urn Models. Texts inStatistical Science. Chapman & Hall/CRC, London.

Josh H. McDermott, Alan F. Schultz, Eduardo A Un-durraga, and Ricardo A. Godoy. 2016. Indifferenceto dissonance in native Amazonians reveals culturalvariation in music perception. Nature, 535:547–550.

https://doi.org/10.1162/jmlr.2003.3.4-5.993

https://doi.org/10.18653/v1/D18-2029

https://doi.org/10.18653/v1/N19-1254

https://doi.org/10.18653/v1/N19-1254

https://doi.org/10.18653/v1/N19-1254

https://doi.org/10.1109/TIT.2003.813506

https://doi.org/10.1109/TIT.2003.813506

https://doi.org/10.3115/1219840.1219885

https://doi.org/10.3115/1219840.1219885

https://doi.org/10.3115/1219840.1219885

https://doi.org/10.18653/v1/D18-1497

https://doi.org/10.18653/v1/D18-1497

https://doi.org/10.1145/1935826.1935932

https://doi.org/10.1145/1935826.1935932

https://doi.org/10.1109/ICDM.2012.107

https://doi.org/10.1109/ICDM.2012.107

https://doi.org/10.3115/v1/E14-1056

https://doi.org/10.3115/v1/E14-1056

https://doi.org/10.1109/TKDE.2011.48

https://doi.org/10.1109/TKDE.2011.48

https://doi.org/10.2307/2290921

https://doi.org/10.2307/2290921

https://doi.org/10.2307/2290921

https://doi.org/10.1038/nature18635



5538

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S.Corrado, and Jeffrey Dean. 2013. Distributed rep-resentations of words and phrases and their compo-sitionality. In Proceedings of the 27th Annual Con-ference on Neural Information Processing Systems(NIPS), pages 3111–3119, Lake Tahoe.

David M. Mimno, Hanna M. Wallach, Edmund M.Talley, Miriam Leenders, and Andrew McCallum.2011. Optimizing semantic coherence in topic mod-els. In Proceedings of the 16th Conference on Em-pirical Methods in Natural Language Processing(EMNLP), pages 262–272, Edinburgh.

Subhabrata Mukherjee, Gaurab Basu, and SachindraJoshi. 2014. Joint author sentiment topic model. InProceedings of the 14th International Conference onData Mining (SDM), pages 370–378, Philadelphia.

Dat Quoc Nguyen, Richard Billingsley, Lan Du, andMark Johnson. 2015a. Improving topic modelswith latent feature word representations. Transac-tions of the Association for Computational Linguis-tics (TACL), 3:299–313.

Thang Nguyen, Jordan L. Boyd-Graber, Jeffrey Lund,Kevin D. Seppi, and Eric K. Ringger. 2015b. Is youranchor going up or down? Fast and accurate super-vised topic models. In Proceedings of the 14th An-nual Conference of the North American Chapter ofthe Association for Computational Linguistics: Hu-man Language Technologies (NAACL-HLT), pages746–755, Denver.

NYC. 2019. About New York hotels.http://www.nyc.com/visitor guide/about new yorkhotels.703528/editorial review.aspx. Accessed:05/20/2019.

Derek O’Callaghan, Derek Greene, Joe Carthy, andPadraig Cunningham. 2015. An analysis of the co-herence of descriptors in topic modeling. ExpertSystems with Applications, 42(13):5645–5657.

Christian S. Perone, Roberto Silveira, and Thomas S.Paula. 2018. Evaluation of sentence embeddingsin downstream and linguistic probing tasks. CoRR,abs/1806.06259.

Lahari Poddar, Wynne Hsu, and Mong-Li Lee. 2017.Author-aware aspect topic sentiment model to re-trieve supporting opinions from reviews. In Pro-ceedings of the 22nd Conference on Empirical Meth-ods in Natural Language Processing (EMNLP),pages 472–481, Copenhagen.

Martin F. Porter. 1980. An algorithm for suffix strip-ping. Program: Electronic Library and InformationSystems, 14(3):130–137.

Ridership. 2019. List of U.S. cities with high tran-sit ridership. https://en.wikipedia.org/wiki/List ofU.S. cities with high transit ridership. Accessed:08/24/2019.

Michal Rosen-Zvi, Thomas L. Griffiths, MarkSteyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Pro-ceedings of the 20th Conference in Uncertainty inArtificial Intelligence (UAI), pages 487–494, Banff,Canada.

Statista. 2019. A ranking of the most expensivecities in the US. http://www.statista.com/statistics/214585/most-expensive-cities-in-the-us-ordered-by-hotel-prices-2010/. Accessed: 05/20/2019.

Ivan Titov and Ryan T. McDonald. 2008a. A jointmodel of text and aspect ratings for sentiment sum-marization. In Proceedings of the 46th AnnualMeeting on Association for Computational Linguis-tics (ACL), pages 308–316, Columbus, Ohio.

Ivan Titov and Ryan T. McDonald. 2008b. Modelingonline reviews with multi-grain topic models. InProceedings of the 17th International Conference onWorld Wide Web (WWW), pages 308–316, Beijing.

Trait. 2019. https://research.csc.ncsu.edu/mas/code/trait/. Accessed: 08/17/2019.

Peter D. Turney and Michael L. Littman. 2003. Mea-suring praise and criticism: Inference of semanticorientation from association. ACM Transactions onInformation Systems, 21(4):315–346.

Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, LukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need. In Proceedings of the 31st Annual Con-ference on Neural Information Processing Systems(NIPS), pages 6000–6010, Long Beach.

Walkability. 2019. Most walkable cities in theUnited States. https://www.walkscore.com/cities-and-neighborhoods/. Accessed: 08/24/2019.

Shuai Wang, Zhiyuan Chen, and Bing Liu. 2016. Min-ing aspect-specific opinion using a holistic lifelongtopic model. In Proceedings of the 25th Inter-national Conference on World Wide Web (WWW),pages 167–176, Montreal.

Weiwei Yang, Jordan L. Boyd-Graber, and PhilipResnik. 2017. Adapting topic models using lexicalassociations with tree priors. In Proceedings of the22nd Conference on Empirical Methods in NaturalLanguage Processing (EMNLP), pages 1901–1906,Copenhagen.

Yelp. 2019. Yelp dataset challenge. https://www.yelp.com/dataset challenge/. Accessed:05/20/2019.

Zhe Zhang and Munindar P. Singh. 2018. Lim-bic: Author-based sentiment aspect modeling reg-ularized with word embeddings and discourse re-lations. In Proceedings of the 23rd Conference onEmpirical Methods in Natural Language Processing(EMNLP), pages 3412–3422, Brussels.

https://doi.org/10.1137/1.9781611973440.43

https://doi.org/10.3115/v1/N15-1076

https://doi.org/10.3115/v1/N15-1076

https://doi.org/10.3115/v1/N15-1076

http://www.nyc.com/visitor_guide/about_new_york_hotels.703528/editorial_review.aspx

http://www.nyc.com/visitor_guide/about_new_york_hotels.703528/editorial_review.aspx

https://doi.org/10.1016/j.eswa.2015.02.055

https://doi.org/10.1016/j.eswa.2015.02.055

http://arxiv.org/abs/1806.06259

http://arxiv.org/abs/1806.06259

https://doi.org/10.18653/v1/D17-1049

https://doi.org/10.18653/v1/D17-1049

https://doi.org/10.1108/eb046814

https://doi.org/10.1108/eb046814

https://en.wikipedia.org/wiki/List_of_U.S._cities_with_high_transit_ridership

https://en.wikipedia.org/wiki/List_of_U.S._cities_with_high_transit_ridership

http://www.statista.com/statistics/214585/most-expensive-cities-in-the-us-ordered-by-hotel-prices-2010/



https://doi.org/10.1145/1367497.1367513

https://doi.org/10.1145/1367497.1367513

https://research.csc.ncsu.edu/mas/code/trait/

https://research.csc.ncsu.edu/mas/code/trait/

https://doi.org/10.1145/944012.944013

https://doi.org/10.1145/944012.944013

https://doi.org/10.1145/944012.944013

https://www.walkscore.com/cities-and-neighborhoods/

https://www.walkscore.com/cities-and-neighborhoods/

https://doi.org/10.1145/2872427.2883086

https://doi.org/10.1145/2872427.2883086

https://doi.org/10.1145/2872427.2883086

https://doi.org/10.18653/v1/D17-1203

https://doi.org/10.18653/v1/D17-1203

https://www.yelp.com/dataset_challenge/

https://www.yelp.com/dataset_challenge/

https://doi.org/10.18653/v1/D18-1378

https://doi.org/10.18653/v1/D18-1378

https://doi.org/10.18653/v1/D18-1378

https://doi.org/10.18653/v1/D18-1378

leveraging structural and semantic correspondence for ...leveraging structural and semantic...

Documents