kingdom: knowledge-guided domain adaptation for sentiment … · wallpaper design conceptnet...

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3198–3210July 5 - 10, 2020. c©2020 Association for Computational Linguistics

3198

KinGDOM: Knowledge-Guided DOMain Adaptationfor Sentiment Analysis

Deepanway Ghosal† , Devamanyu HazarikaΦ , Abhinaba Roy⊘,Navonil Majumder†, Rada Mihalcea⊓ , Soujanya Poria†

†Singapore University of Technology and Design, SingaporeΦNational University of Singapore, Singapore⊘Nanyang Technological University, Singapore

⊓University of Michigan, USA{deepanway ghosal@mymail., sporia@, nmder@}sutd.edu.sg,

{[email protected], abhinaba.roy@ntu}.edu.sg, [email protected]

AbstractCross-domain sentiment analysis has receivedsignificant attention in recent years, promptedby the need to combat the domain gap betweendifferent applications that make use of senti-ment analysis. In this paper, we take a novelperspective on this task by exploring the roleof external commonsense knowledge. We in-troduce a new framework, KinGDOM, whichutilizes the ConceptNet knowledge graph toenrich the semantics of a document by provid-ing both domain-specific and domain-generalbackground concepts. These concepts arelearned by training a graph convolutional au-toencoder that leverages inter-domain con-cepts in a domain-invariant manner. Condi-tioning a popular domain-adversarial baselinemethod with these learned concepts helps im-prove its performance over state-of-the-art ap-proaches, demonstrating the efficacy of ourproposed framework.

1 Introduction

Sentiment Analysis (SA) is a popular NLP taskused in many applications (Zhang et al., 2018).Current models trained for this task, however, can-not be reliably deployed due to the distributionalmismatch between the training and evaluation do-mains (Daume III and Marcu, 2006). Domain adap-tation, a case of transductive transfer learning, isa widely studied field of research that can be ef-fectively used to tackle this problem (Wilson andCook, 2018).

Research in the field of cross-domain SA has pro-posed diverse approaches, which include learningdomain-specific sentiment words/lexicons (Sarmaet al., 2018; Hamilton et al., 2016b), co-occurrencebased learning (Blitzer et al., 2007a), domain-adversarial learning (Ganin et al., 2016), among

sketch

Source domain: Books

Target domain: Electronics

TypeOf

RelatedTo

Revi

ew

Docu

men

ts

wallpaper

designCon

cept

net

Conceptual bridge

source-specific domain-general target-specificConcepts:

draw

RelatedTo screen saver

TypeOf

Figure 1: ConceptNet provides networks with backgroundconcepts that enhance their semantic understanding. For ex-ample, for a target sentence from electronics domain, Thesoftware came with decent screen savers, comprising domain-specific terms like screen saver or wallpaper, ConceptNethelps connecting them to general concepts like design, thusallowing a network better understand their meaning. Further-more, inter-domain conceptual bridge can also be establishedto connect source and target domains (wallpaper–sketch havesimilar conceptual notions under the link design).

others. In this work, we adopt the domain-adversarial framework and attempt to improveit further by infusing commonsense knowledgeusing ConceptNet – a large-scale knowledgegraph (Speer et al., 2017).

Augmenting neural models with external knowl-edge bases (KB) has shown benefits across a rangeof NLP applications (Peters et al., 2019; Li et al.,2019; IV et al., 2019; liu et al., 2019; Bi et al.,2019). Despite their popularity, efforts to incor-porate KBs into the domain-adaptation frameworkhas been sporadic (Wang et al., 2008; Xiang et al.,2010). To this end, we identify multiple advantagesof using commonsense KBs for domain adaptation.

First, KBs help in grounding text to real enti-

3199

ties, factual knowledge, and commonsense con-cepts. Commonsense KBs, in particular, providea rich source of background concepts–related bycommonsense links–which can enhance the seman-tics of a piece of text by providing both domain-specific and domain-general concepts (Yang et al.,2019; Zhong et al., 2019; Agarwal et al., 2015;Zhong et al., 2019) (see Fig. 1). For cross-domainSA, word polarities might vary among differentdomains. For example, heavy can be a positivefeature for a truck, but a negative feature for asmartphone. It is, however, difficult to assigncontextual-polarities solely from data, especiallywhen there is no supervision (Boia et al., 2014).In this domain-specific scenario, commonsenseknowledge provides a dynamic way to enhancethe context and help models understand sentiment-bearing terms and opinion targets through its struc-tural relations (Cambria et al., 2018). They alsooften aid in unearthing implicitly expressed senti-ment (Balahur et al., 2011).

Second, domains often share relations throughlatent semantic concepts (Kim et al., 2017a). Forexample, notions of wallpaper (from electron-ics) and sketch (from books) can be associatedvia related concepts such as design (see Fig. 1).Multi-relational KBs provide a natural way to lever-age such inter-domain relationships. These connec-tions can help models understand target-specificterms by associating to known domain-general oreven source-specific concepts.

Following these intuitions, we propose a two-step modular framework, KinGDOM (Knowledge-Guided Domain adaptation), which utilizes com-monsense KB for domain adaptation. KinGDOMfirst trains a shared graph autoencoder using agraph convolution network (GCN) on ConceptNet,so as to learn: 1) inter-domain conceptual linksthrough multiple inference steps across neighbor-ing concepts; and 2) domain-invariant concept rep-resentations due to shared autoencoding. It thenextracts document-specific sub-graph embeddingsand feeds them to a popular domain-adversarialmodel DANN (Ganin et al., 2016). Addition-ally, we also train a shared autoencoder on theseextracted graph embeddings to promote furtherdomain-invariance (Glorot et al., 2011).

Our main contributions in this work are:

1. We propose KinGDOM, a domain-adversarialframework that uses an external KB (Concept-Net) for unsupervised domain adaptation. KinG-

DOM learns domain-invariant features of KBconcepts using a graph autoencoding strategy.

2. We demonstrate, through experiments, thatKinGDOM surpasses state-of-the-art methodson the Amazon-reviews dataset (Blitzer et al.,2007b), thus validating our claim that externalknowledge can aid the task of cross-domain SA.

In the remaining paper, §2 explains related worksand compares KinGDOM to them; §3 presents taskdefinition and preliminaries; §4 introduces our pro-posed framework, KinGDOM; §5 discusses exper-imental setup followed by results and extensiveanalyses in §6; finally, §7 concludes this paper.

2 Related Work

Domain adaptation methods can be broadly cat-egorized into three approaches: a) instance-selection (Jiang and Zhai, 2007; Chen et al., 2011;Cao et al., 2018), b) self-labeling (He and Zhou,2011) and c) representation learning (Glorot et al.,2011; Chen et al., 2012; Tzeng et al., 2014). Ourfocus is on the third category which has emergedas a popular approach in this deep representationlearning era (Ruder, 2019; Poria et al., 2020).

Domain-adversarial Training. Our work dealswith domain-adversarial approaches (Kouw andLoog, 2019), where we extend DANN Ganin et al.(2016). Despite its popularity, DANN cannotmodel domain-specific information (e.g. indica-tors of tasty, delicious for kitchen domain) (Penget al., 2018b). Rectifications include shared-privateencoders that model both domain-invariant and -specific features (Li et al., 2012; Bousmalis et al.,2016a; Kim et al., 2017b; Chang et al., 2019), us-ing adversarial and orthogonality losses (Liu et al.,2017; Li et al., 2018). Although we do not use pri-vate encoders, we posit that our model is capableof capturing domain-specificity via the sentence-specific concept graph. Also, our approach is flex-ible enough to be adapted to the setup of shared-private encoders.

External Knowledge. Use of external knowl-edge has been explored in both inductive and trans-ductive settings (Banerjee, 2007; Deng et al., 2018).Few works have explored external knowledge indomain adaptation based on Wikipedia as auxil-iary information, using co-clustering (Wang et al.,2008) and semi-supervised learning (SSL) (Xianget al., 2010). SSL has also been explored by Alam

3200

et al. (2018) in the Twitter domain. Although weshare a similar motivation, there exist crucial differ-ences. Primarily, we learn graph embeddings at theconcept level, not across complete instances. Also,we do not classify each concept node in the graph,which renders SSL inapplicable to our setup.

Domain Adaptation on Graphs. With the ad-vent of graph neural networks, graph-based meth-ods have become a new trend (Ghosal et al., 2019)in diverse NLP tasks such as emotion recogni-tion in conversations (Poria et al., 2019). Graph-based domain adaptation is categorized based onthe availability of cross-domain connections. Fordomain-exclusive graphs, approaches include SSLwith GCNs (Shen and Chung, 2019) and domain-adversarial learning (Dai et al., 2019). For cross-domain connected graphs, co-regularized train-ing (Ni et al., 2018) and joint-embedding (Xu et al.,2017) have been explored. We also utilize GCNsto learn node representations in our cross-domainConceptNet graph. However, rather than using ex-plicit divergence measures or domain-adversariallosses for domain invariance, we uniquely adopta shared-autoencoder strategy on GCNs. Suchideas have been explored in vector-based ap-proaches (Glorot et al., 2011; Chen et al., 2012).

Sentiment Analysis. One line of work modelsdomain-dependent word embeddings (Sarma et al.,2018; Shi et al., 2018; K Sarma et al., 2019)or domain-specific sentiment lexicons (Hamiltonet al., 2016a), while others attempt to learn rep-resentations based on co-occurrences of domain-specific with domain-independent terms (Blitzeret al., 2007a; Pan et al., 2010; Sharma et al.,2018). Our work is related to approaches that ad-dress domain-specificity in the target domain (Penget al., 2018b; Bhatt et al., 2015). Works like Liuet al. (2018) attempts to model target-specificity bymapping domain-general information to domain-specific representations by using domain descriptorvectors. In contrast, we address relating domain-specific terms by modeling their relations with theother terms in knowledge bases like ConceptNet.

3 Background

3.1 Task Definition

Domain adaptation deals with the training of mod-els that can perform inference reliably in multipledomains. Across domains, it is assumed that thefeature and label spaces are the same but with dis-

crepancies in their feature distributions. In oursetup, we consider two domains: sourceDs and tar-get domainDt with different marginal data distribu-tions, i.e., PDs(x) ≠ PDt(x). This scenario, alsoknown as the covariate shift (Elsahar and Galle,2019), is predominant in SA applications and arisesprimarily with shifts in topics – causing a differ-ence in vocabulary usage and their correspondingsemantic and sentiment associations.

We account for unsupervised domain adapta-tion, where we are provided with labeled instancesfrom the source domain Dl

s = {(xi, yi)}Nsi=1 and

unlabeled instances from the target domain Dut =

{(xi)}Nti=1.1 This is a realistic setting as curating

annotations for the target domain is often expensiveas well as time consuming. Given this setup, ourgoal is to train a classifier that can achieve goodclassification performance on the target domain.

3.2 Domain-Adversarial Neural NetworkWe base our framework on the domain-adversarialneural network (DANN) proposed by Ganin et al.(2016). DANN learns a shared mapping of bothsource and target domain instances M(xs/t) suchthat a classifierC trained for the source domain canbe directly applied for the target domain. Trainingof C is performed using the cross-entropy loss:

Lcls = E(xs,ys) (−K

∑k=1

1[k=ys] logC (M (xs))) ,

whereK is the number of labels. Both the mappingfunction M and the classifier C are realized usingneural layers with parameters θM and θC .

Adversarial Loss. The core idea of DANN is toreduce domain gap by learning common represen-tations that are indistinguishable to a domain dis-criminator. To learn a domain-invariant mapping,DANN uses an adversarial discriminatorDadv withparameters θD, whose job is to distinguish betweensource and target instances, M(xs) vs. M(xt). Itis trained using the cross-entropy loss:

LadvD= −Exs (logDadv (M (xs)))

−Ext (log (1 −Dadv (M (xt)))) .

The mapping function then learns domain in-variance by pitting against the discriminator ina minimax optimization with loss LadvM

=

−LadvD(Tzeng et al., 2017). This setup forces

the features to become discriminative to the main1For our case, each instance is a review document

3201

domain-specific concepts

- source - target

domain-general concepts }

Dom

ain-

aggr

egat

ed

Gra

phC

once

ptne

t 𝒢𝒢′

Filtering with seed concepts

R-GC

N

DistM

ult

Edge Loss

encoder decoder

Bag-of-words

Task Classifier C

Domain-discriminator

Dadv

Commonsense Graph Feature Extraction

GCN autoencoder

gradient-reversal

Review Document

Step 1: Knowledge Graph Training Step 2: Domain-adversarial Training

Graph Feature Reconstructor

Drecon

𝒢′ 𝒲xcg

x

graph feature encoder M′

ℒcls

ℒadvD

ℒrecon

ℒadvM= − ℒadvD

zgrp

𝒢′

DAN

N

encoder

M zdann

Figure 2: Illustration of KinGDOM: Step 1 uses GCN to learn concept representations. Step 2 feeds concept features to DANN.

All domains DVD Books Kitchen ElectronicsRelatedTo (580k) RelatedTo RelatedTo RelatedTo RelatedToHasContext (80k) HasContext HasContext IsA IsAIsA (60k) IsA IsA Synonym SynonymDerivedFrom (42k) Synonym Synonym DerivedFrom DerivedFromSynonym (40k) DerivedFrom DerivedFrom HasContext HasContextAtLocation (14k) AtLocation CapableOf AtLocation AtLocationUsedFor (12k) CapableOf AtLocation UsedFor UsedForCapableOf (11k) UsedFor SimilarTo SimilarTo SimilarToSimilarTo (10k) SimilarTo UsedFor CapableOf CapableOfEtymologically (5k) Antonym Antonym Antonym Antonym

Table 1: Top-10 relations of G′ based on frequency. Top relations for each domain are also mentioned.

learning task and indistinguishable across domains.The point estimates of the parameters are decidedat a saddle point using the minimax objective:

θ∗ = argminθM,C

maxθD

(Lcls + λLadvD) ,

where λ is a hyper-parameter. The minimax objec-tive is realized by reversing the gradients of LadvDwhen back-propagating through M .

4 Our Proposed Method

KinGDOM aims to improve the DANN approachby leveraging an external knowledge source i.e.,ConceptNet. Such a knowledge base is particularlyuseful for domain adaptation as it contains bothdomain specific and domain general knowledge.Unlike traditional word embeddings and seman-tic knowledge graphs (e.g. WordNet), ConceptNetis unique as it contains commonsense related in-formation. We posit that both these properties of

ConceptNet will be highly useful for domain adap-tation. KinGDOM follows a two-step approachdescribed below:

Step 1: This step deals with training a domain-aggregated sub-graph of ConceptNet. In particular,it involves: a) Creating a sub-graph of Concept-Net based on all domains (§4.1). b) Training agraph-convolutional autoencoder to learn conceptembeddings (Schlichtkrull et al., 2018) (§4.2).

Step 2: After the graph autoencoder is trained,a) we extract and pool document-relevant featuresfrom the trained graph for each instance in thedataset (§4.3). b) The corresponding graph fea-ture vector is then fed into the DANN architecturefor adversarial training (Ganin et al., 2016). Tofurther enforce domain invariance, we also intro-duce a shared autoencoder to reconstruct the graphfeatures (§4.4).

3202

4.1 Step 1a) Domain-AggregatedCommonsense Graph Construction

We construct our domain-aggregated graph fromConceptNet (Speer et al., 2017). First, we in-troduce the following notation: the ConceptNetgraph is represented as a directed labeled graphG = (V,E ,R), with concepts/nodes 2 vi ∈ V andlabeled edges (vi, rij , vj) ∈ E , where rij ∈R is therelation type of the edge between vi and vj . Theconcepts in ConceptNet are unigram words or n-gram phrases. For instance one such triplet fromConceptNet is [baking-oven, AtLocation, kitchen].

ConceptNet has approximately 34 million edges,from which we first extract a subset of edges.From the training documents of all domains in ourdataset, we first extract the set of all the uniquenouns, adjectives, and adverbs.3 These extractedwords are treated as the seeds that we use to fil-ter ConceptNet into a sub-graph. In particular, weextract all the triplets from G which are within a dis-tance of 1 to any of those seed concepts, resulting ina sub-graph G′ = (V ′,E ′,R′), with approximately356k nodes and 900k edges. This sub-graph wouldthus contain concepts across all domains along withinter-concept links. Looking at the sub-graph G′

from the lens of each domain, we can observe thetop-10 relations within the domain in Table 1.

4.2 Step 1b) Knowledge Graph Pre-trainingTo utilize G′ in our task, we first need to compute arepresentation of its nodes. We do this by traininga graph autoencoder model to perform link predic-tion. The model takes as input an incomplete setof edges E ′ from E ′ in G′ and then assign scores topossible edges (c1, r, c2), determining how likelyare these edges to be in E ′. Following Schlichtkrullet al. (2018), our graph autoencoder model consistsof: a R-GCN entity encoder and a DistMult scoringdecoder.

Encoder Module. We employ the RelationalGraph Convolutional Network (R-GCN) encoderfrom Schlichtkrull et al. (2018) as our graph en-coder network. The power of this model comesfrom its ability to accumulate relational evidencein multiple inference steps from the local neighbor-hood around a given concept. The neighborhood-based convolutional feature transformation processalways ensures that distinct domains are connected

2We use node, concept, and entity interchangeably3We use the Spacy POS Tagger: https://spacy.io/

usage/linguistic-features#pos-tagging

via underlying concepts and influence each other tocreate enriched domain-aggregated feature vectors.

Precisely, our encoder module consists of twoR-GCN encoders stacked upon one another. Theinitial concept feature vector gi is initialized ran-domly and thereafter transformed into the domain-aggregated feature vector hi ∈ Rd using the two-step graph convolution process. The transformationprocess is detailed below:

f(xi, l) = σ(∑r∈R∑j∈Nr

i

1

ci,rW (l)r xj +W

(l)0 xi),

hi = h(2)i = f(h

(1)i ,2) ; h

(1)i = f(gi 1),

where N ri denotes the neighbouring concepts of

concept i under relation r ∈ R; ci,r is a normal-ization constant which either can be set in ad-vance, such that, ci,r = ∣N r

i ∣, or can be learnedin a gradient-based learning setup. σ is an activa-tion function such as ReLU, and W (1/2)

r , W (1/2)0

are learnable parameters of the transformation.This stack of transformations effectively accu-

mulates the normalized sum of the local neighbor-hood i.e. the neighborhood information for eachconcept in the graph. The self-connection ensuresself-dependent feature transformation.

Decoder Module. DistMult factorization (Yanget al., 2014) is used as the scoring function. For atriplet (ci, r, cj), the score s is obtained as follows:

s(ci, r, cj) = σ(hTciRrhcj),

where σ is the logistic function; hci , hcj ∈ Rd arethe R-GCN encoded feature vectors for conceptsci, cj . Each relation r ∈R is also associated with adiagonal matrix Rr ∈ Rd×d.

Training. We train our graph autoencoder modelusing negative sampling (Schlichtkrull et al., 2018).For triplets in E ′ (positive samples), we create anequal number of negative samples by randomlycorrupting the positive triplets. The corruption isperformed by randomly modifying either one ofthe constituting concepts or the relation, creatingthe overall set of samples denoted by T .

The task is set as a binary classification betweenthe positive/negative triplets, where the model istrained with the standard cross-entropy loss:

LG′ = −1

2∣E ′∣∑

(ci,r,cj ,y)∈T(y log s(ci, r, cj)+

(1 − y) log(1 − s(ci, r, cj))).

https://spacy.io/usage/linguistic-features#pos-tagging

https://spacy.io/usage/linguistic-features#pos-tagging

3203

Once we train the autoencoder graph model, itwill ensure that target domain-specific concepts(crucial for KG) can possibly be explained viadomain-general concepts and further via inter-domain knowledge. In other words, the encodednode representations hi will capture commonsensegraph information in the form of domain-specificand domain-general features and thus will be ef-fective for the downstream task when there is adistributional shift during evaluation.

4.3 Step 2a) Commonsense Graph FeatureExtraction

The trained graph autoencoder model as explainedin the previous section §4.2, can be used for featureextraction. We now describe the methodology toextract the document-specific commonsense graphfeatures for a particular document x:

1) The first step is to extract the set of all uniquenouns, adjectives, and adverbs present in thedocument. We call this setW .

2) Next, we extract a subgraph from G′, where wetake all triplets for which both the constitutingnodes are either inW or are within the vicinityof radius 1 of any of the words inW . We callthis graph G′W .

3) We then make a forward pass of G′W through theencoder of the pre-trained graph autoencodermodel. This results in feature vectors hj for allunique nodes j in G′W .

4) Finally, we average over the feature vectors hjfor all unique nodes in G′W , to obtain the com-monsense graph features xcg for document x.

We surmise that since most documents will haveboth domain-specific and domain-general wordsin W , xcg will inherently capture the common-sense information likely to be helpful during do-main adaptation.

4.4 Step 2b) Domain-adversarial Training

We feed the commonsense graph feature xcg pooledfrom G′W for document x (§4.3) into the DANN ar-chitecture (see §3.2). We proceed by learninga encoder function for the graph vector zgrp =

M′

θG(xcg) and combine its representation with the

DANN encoder zdann = MθM (x) to get the finalfeature representation [zdann;zgrp], of the docu-ment x. Here, [a; b] represents concatenation.

The task classifier C and domain-discriminatorDadv now takes this modified representation,[zdann;zgrp], as its input instead of only zdann.To further enforce domain-invariance into the en-coded graph representation zgrp, we consider itas a hidden code in a traditional autoencoder andconsequently add a shared decoder Drecon (withparameters θR) with a reconstruction loss (mean-squared error):

Lrecon (Xs,Xt) = Lrecon (Xs) +Lrecon (Xt) ,

s.t. Lrecon = −Excg (∥Drecon(zgrp) − xcg∥22) .

We hypothesize that if θR can reconstruct graphfeatures for both domains, then it would ensurestronger domain invariance constraints in zgrp. Thefinal optimization of this domain-adversarial setupis based on the minimax objective:

θ∗ = argminθG,M,C,R

maxθD

(Lcls + λLadvD+ γLrecon) ,

where λ and γ are hyper-parameters.

5 Experimental Setup

5.1 DatasetWe consider the Amazon-reviews benchmarkdataset for domain adaptation in SA (Blitzer et al.,2007b). This corpus consists of Amazon productreviews and ranges across four domains: Books,DVDs, Electronics, and Kitchen appliances. Eachreview is associated with a rating denoting its sen-timent polarity. Reviews with rating up to 3 starsare considered to contain negative sentiment and4 or 5 stars as positive sentiment. The dataset fol-lows a balanced distribution between both labelsyielding 2k unlabelled training instances for eachdomain. Testing contains 3k - 6k samples for evalu-ation. We follow similar pre-processing as bone byGanin et al. (2016); Ruder and Plank (2018) whereeach review is encoded into a 5000-dimensional tf-idf weighted bag-of-words (BOW) feature vectorof unigrams and bigrams.

5.2 Training DetailsWe follow Ganin et al. (2016) in training our net-work. Our neural layers i.e., DANN encoder (M ),graph feature encoder (M ′), graph feature recon-structor (Drecon), task classifier (C) and domaindiscriminator (Dadv) are implemented with 100 di-mensional fully connected layers. We use a cyclicλ as per (Ganin et al., 2016) and γ = 1 after vali-dating with γ ∈ {0.5,1,2}. 25% dropout is used in

3204

70

77

84

91

B -> D K -> D E -> D E -> B K -> B D -> B B -> E K -> E D -> E B -> K D -> K E -> K Avg.

82.3

88.4

84.685

81.7

87.4

82.281.5

78.276.9

78.880.7

83.1

80.9

88.6

8381.8

79.9

86.9

79.980.3

75.974.9

78.679.2

82.6

76.3

85.4

78.377.9

75.4

84.3

73.372.3

70.971.3

73.874

78.4

DANN DANN+ KinGDOM

target: DVD target: Books target: Electronics target: Kitchen

Figure 3: Results of DANN vs DANN+ vs KinGDOM across different target domains. Best viewed in colour.

the fully connected layers and the model is trainedwith Adam (Kingma and Ba, 2015) optimizer.

5.3 Baseline Methods

In this paper, to inspect the role of external com-monsense knowledge and analyze the improvementin performance it brings, we intentionally use BOWfeatures and compare them against other baselinemodels that also use BOW features. This issue hasalso been addressed by Poria et al. (2020). Theflexibility of KinGDOM allows other approaches,such as mSDA, CNN, etc. to be easily incorporatedin it, which we plan to analyze in the future.

We compare KinGDOM with the following un-supervised domain adaptation baseline methods:DANN (Ganin et al., 2016) is a domain-adversarialmethod, based on which we develop KinGDOM(§3.2); DANN+ The DANN model where we usean Adam optimizer instead of the original SGDoptimizer. The network architecture and the rest ofthe hyperparameters are kept same; Variational FairAutoencoder (VFAE) (Louizos et al., 2015) learnslatent representations independent from sensitivedomain knowledge, while retaining enough task in-formation by using a MMD-based loss; Central Mo-ment Discrepancy (CMD) (Zellinger et al., 2017)is a regularization method which minimizes the dif-ference between feature representations by utiliz-ing equivalent representation of probability distri-butions by moment sequences; Asym (Saito et al.,2017) is the asymmetric tri-training framework thatuses three neural networks asymmetrically for do-main adaptation; MT-Tri (Ruder and Plank, 2018)is similar to Asym, but uses multi-task learning;Domain Separation Networks (DSN) (Bousmaliset al., 2016b) learns to extract shared and privatecomponents of each domain. As per Peng et al.(2018a), it stands as the present state-of-the-artmethod for unsupervised domain adaptation; Task

Refinement Learning (TRL) (Ziser and Reichart,2019) Task Refinement Learning is an unsuper-vised domain adaptation framework which itera-tively trains a Pivot Based Language Model to grad-ually increase the information exposed about eachpivot; TAT (Liu et al., 2019) is the transferable ad-versarial training setup to generate examples whichhelps in modelling the domain shift. TAT adversari-ally trains classifiers to make consistent predictionsover these transferable examples; CoCMD (Penget al., 2018a) is a co-training method based on theCMD regularizer which trains a classifier on simul-taneously extracted domain specific and invariantfeatures. CoCOMD, however, is SSL-based as ituses labeled data from the target domain. Althoughit falls outside the regime of unsupervised domainadaptation, we report its results to provide a fullpicture to the reader.

6 Results and Analysis

As mentioned in §5.3, we reimplemented the base-line DANN model using Adam optimizer and ob-served that its results has been notably under-reported in many of the unsupervised domainadaptation literature for sentiment analysis (seeTable 2). In the original DANN implementa-tion (Ganin et al., 2016), Stochastic Gradient De-scent (SGD) was used as the optimizer. However,in DANN+, using Adam optimizer leads to sub-stantial performance jump that outperforms manyof the recent advanced domain adaptation methods– CMD (Zellinger et al., 2017), VFAE (Louizoset al., 2015), ASym (Saito et al., 2017), and MT-Tri (Ruder and Plank, 2018).

We compare the performance of KinGDOM withits base models – DANN and DANN+. As ob-served in Fig. 3, KinGDOM surpasses DANN+by 1.4% which asserts the improvement in domain-invariance due to the incorporation of external com-

3205

Source↓

Target

DA

NN

(5k)

DA

NN

+(5

k)

VFA

E(5

k)

CM

D(5

k)

Asy

m(5

k)

MT-

Tri(

5k)

TR

L*

(5k)

DSN

(5k)

CoC

MD

*(5

k)

Kin

GD

OM

(5k)

DA

NN

+(3

0k)

TAT

(30k

)

Kin

GD

OM

(30k

)

B → D 78.4 82.6 79.9 80.5 80.7 81.2 82.2 82.8 83.1 83.1 84.7 84.5 85.0B → E 73.3 79.9 79.2 78.7 79.8 78.0 - 81.9 83.0 82.2 83.0 80.1 83.9B → K 77.9 81.8 81.6 81.3 82.5 78.8 82.7 84.4 85.3 85.0 84.0 83.6 86.6D → B 72.3 80.3 75.5 79.5 73.2 77.1 - 80.1 81.8 81.4 82.7 81.9 82.7D → E 75.4 79.9 78.6 79.7 77.0 81.0 - 81.4 83.4 81.7 83.4 81.9 83.9D → K 78.3 83.0 82.2 83.0 82.5 79.5 - 83.3 85.5 84.6 85.3 84.0 87.1E → B 71.3 74.9 72.7 74.4 73.2 73.5 - 75.1 76.9 76.9 77.1 83.2 78.4E → D 73.8 78.6 76.5 76.3 72.9 75.4 75.8 77.1 78.3 78.8 79.6 77.9 80.3E → K 85.4 88.6 85.0 86.0 86.9 87.2 - 87.2 87.3 88.4 89.0 90.0 89.4K → B 70.9 75.9 72.0 75.6 72.5 73.8 72.1 76.4 77.2 78.2 77.1 75.8 80.0K → D 74.0 79.2 73.3 77.5 74.9 77.8 - 78.0 79.6 80.7 81.3 77.7 82.3K → E 84.3 86.9 83.8 85.4 84.6 86.0 - 86.7 87.2 87.4 88.0 88.2 88.6Avg. 76.3 80.9 78.4 79.8 78.4 79.1 - 81.2 82.4 82.3 82.9 82.4 84.0

Table 2: Comparison with different baseline and state-of-the-art models (§5.3). TRL* reported results on four combinations.CoCMD* is a semi-supervised domain adaptation method. DSN is the current state-of-the-art for unsupervised domain adaptationon the Amazon reviews dataset. Scores for MT-Tri are extrapolated from the graphs illustrated in Ruder and Plank (2018). Note:B: Books, D: DVD, E:Electronics, and K: Kitchen domains. 5k, 30k signify 5000 and 30,000 dimensional BOW features.

monsense knowledge.Next, we look at Table 2 where comparisons are

made with other baselines, including the state-of-the-art DSN approach. As observed, KinGDOMoutperforms DSN in all the task scenarios, indi-cating the efficacy of our approach. Blitzer et al.(2007b), in their original work, noted that domaintransfer across the two groups of DVD, Books andElectronics, Kitchen is particularly challenging. In-terestingly, in our results, we observe the high-est gains when the source and target domains arefrom these separate groups (e.g., Kitchen → DVD,Kitchen → Books, Electronics→ Books).

In Table 2, we also compare KinGDOM againstCoCMD and TAT. Although CoCMD is a semi-supervised method, KinGDOM surpasses its per-formance in several of the twelve domain-pair com-binations and matches its overall result withoutusing any labelled samples from the target domain.TAT is the state-of-the-art method for unsuperviseddomain adaptation in the Amazon reviews datasetwhen used with 30,000 Bag-Of-Words (BOW) fea-tures. Interestingly, KinGDOM used with 5000BOW features can match TAT with 30,000 BOWfeatures and outperforms TAT by around 1.6% over-all when used with the same 30,000 BOW features.The reimplementation of DANN – DANN+ with30,000 BOW also surpasses the result of TAT by0.5%. The results indicate that external knowledge,

when added to a simple architecture such as DANN,can surpass sophisticated state-of-the-art models,such as DSN and TAT. Our primary intention toutilize DANN as the base model is to highlight therole of knowledge base infusion in domain adapta-tion, devoid of sophisticated models, and complexneural maneuvering. Nevertheless, the flexibilityof KinGDOM allows it to be associated with ad-vanced models too (e.g., DSN, TAT), which webelieve could perform even better. We intend toanalyze this in the future.

6.1 Ablation Studies

We further analyze our framework and challengeour design choices. Specifically, we consider threevariants of our architecture based on alternativeways to condition DANN with the graph features.Each of these variants reveals important clues re-garding the invariance properties and task appro-priateness of zgrp. Variant 1 denotes separate de-coders Drecon for source and target domains. InVariant 2, domain classifierDadv takes only zdannas input whereas the sentiment classifier C takesthe concatenated feature [zdann;zgrp]. Finally, inVariant 3, Dadv takes input [zdann;zgrp] whereasC only takes zdann. As seen in Fig. 4, all thevariants perform worse than KinGDOM. For Vari-ant 1, the performance drop indicates that havinga shared decoder Drecon in KinGDOM facilitates

3206

74

76

78

80

82

Books Dvd

80.9

78.8

77.2

75.1

79.9

77

80.2

77.9

80.3

78

Variant 1 Variant 2 Variant 3 Glove-DANN KinGDOM

75

78

82

85

88

Electronics Kitchen

8683.8

82.4

77.1

82.281

84.983.1

85.5

82.4

Figure 4: Average accuracy (%) on target domains acrossdifferent variants defined in §6.1. Best viewed in colour.

learning invariant representations and helps targetdomain classification. For Variant 2, removal ofzgrp from domain classifier diminishes the domain-invariance capabilities, thus making the domainclassifier stronger and leading to a drop in senti-ment classification performance. For Variant 3,removal of zgrp from sentiment classifier C de-grades the performance. This indicates that inKinGDOM, zgrp contain task appropriate featuresretrieved from external knowledge (see §1).

Besides ablations, we also look at alternatives tothe knowledge graph and bag-of-words represen-tation used for the documents. For the former, weconsider replacing ConceptNet with WordNet (Fell-baum, 2010), which is a lexical knowledge graphwith conceptual-semantic and lexical connections.We find the performance of KinGDOM with Word-Net to be 1% worse than ConceptNet in terms ofaverage accuracy score. This indicates the compat-ibility of ConceptNet with our framework. How-ever, the competitive performance with WordNetalso suggests the usability of our framework withany structural resource comprising inter-domainconnections. For the latter, we use Glove-averagedembeddings with DANN. Glove is a popular wordembedding method which captures semantics usingco-occurrence statistics (Pennington et al., 2014).Results in Fig. 4 show that using only Glove doesnot provide the amount of conceptual semanticsavailable in ConceptNet.

6.2 Case Studies

We delve further into our results and qualitativelyanalyze KinGDOM. We look at a particular testdocument from DVD domain, for which KinG-DOM predicts the correct sentiment, both when the

target domain: DVD

sour

ce d

omai

n:

Elec

tron

ics

sour

ce d

omai

n:

Book

s

CGI

film

graphic

graphics card

computer graphic

graphic novel

writing

RelatedToSynonym

RelatedToRelatedTo RelatedTo

RelatedTo

Figure 5: Domain-general term graphic bridges the common-sense knowledge between domain-specific terms in Electron-ics, Books and DVD.

source domain is Electronics and also Books. Insimilar settings, DANN mispredicts the same doc-ument. Looking at the corresponding document-specific sub-graph for this document, we observeconceptual links to both domain-general conceptsand domain-specific concepts from the source do-main. In Fig. 5, we can see the domain-specificterms CGI and film to be related to the gen-eral concept graphic which is further linked todomain-specific concepts like graphics card,writing, etc. from Electronics, Books, respec-tively. This example shows how KinGDOM mightuse these additional concepts to enhance the seman-tics as required for sentiment prediction.

7 Conclusion

In this paper, we explored the role of external com-monsense knowledge for domain adaptation. Weintroduced a domain-adversarial framework calledKinGDOM, which relies on an external common-sense KB (ConceptNet) to perform unsuperviseddomain adaptation. We showed that we can learndomain-invariant features for the concepts in theKB by using a graph convolutional autoencoder.Using the standard Amazon benchmark for domainadaption in sentiment analysis, we showed thatour framework exceeds the performance of previ-ously proposed methods for the same task. Ourexperiments demonstrate the usefulness of exter-nal knowledge for the task of cross-domain senti-ment analysis. Our code is publicly available athttps://github.com/declare-lab/kingdom.

Acknowledgments

This research is supported by A*STAR under itsRIE 2020 Advanced Manufacturing and Engineer-ing (AME) programmatic grant, Award No. -A19E2b0098.

https://github.com/declare-lab/kingdom

3207

ReferencesBasant Agarwal, Namita Mittal, Pooja Bansal, and

Sonal Garg. 2015. Sentiment analysis usingcommon-sense and context information. Comp. Int.and Neurosc., 2015:715730:1–715730:9.

Firoj Alam, Shafiq R. Joty, and Muhammad Imran.2018. Domain adaptation with adversarial train-ing and graph embeddings. In Proceedings of the56th Annual Meeting of the Association for Com-putational Linguistics, ACL 2018, Melbourne, Aus-tralia, July 15-20, 2018, Volume 1: Long Papers,pages 1077–1087.

Alexandra Balahur, Jesus M. Hermida, and AndresMontoyo. 2011. Detecting implicit expressions ofsentiment in text based on commonsense knowledge.In Proceedings of the 2nd Workshop on Compu-tational Approaches to Subjectivity and SentimentAnalysis, WASSA@ACL 2011, Portland, OR, USA,June 24, 2011, pages 53–60. Association for Com-putational Linguistics.

Somnath Banerjee. 2007. Boosting inductive transferfor text classification using wikipedia. In The SixthInternational Conference on Machine Learning andApplications, ICMLA 2007, Cincinnati, Ohio, USA,13-15 December 2007, pages 148–153. IEEE Com-puter Society.

Himanshu Sharad Bhatt, Deepali Semwal, and ShouryaRoy. 2015. An iterative similarity based adapta-tion technique for cross-domain text classification.In Proceedings of the 19th Conference on Compu-tational Natural Language Learning, CoNLL 2015,Beijing, China, July 30-31, 2015, pages 52–61.

Bin Bi, Chen Wu, Ming Yan, Wei Wang, JiangnanXia, and Chenliang Li. 2019. Incorporating ex-ternal knowledge into machine reading for gener-ative question answering. In Proceedings of the2019 Conference on Empirical Methods in Natu-ral Language Processing and the 9th InternationalJoint Conference on Natural Language Processing(EMNLP-IJCNLP), pages 2521–2530, Hong Kong,China. Association for Computational Linguistics.

John Blitzer, Mark Dredze, and Fernando Pereira.2007a. Biographies, bollywood, boom-boxes andblenders: Domain adaptation for sentiment classifi-cation. In ACL 2007, Proceedings of the 45th An-nual Meeting of the Association for ComputationalLinguistics, June 23-30, 2007, Prague, Czech Repub-lic.

John Blitzer, Mark Dredze, and Fernando Pereira.2007b. Biographies, Bollywood, boom-boxes andblenders: Domain adaptation for sentiment classi-fication. In Proceedings of the 45th Annual Meet-ing of the Association of Computational Linguistics,pages 440–447, Prague, Czech Republic. Associa-tion for Computational Linguistics.

Marina Boia, Claudiu Cristian Musat, and Boi Falt-ings. 2014. Acquiring commonsense knowledge for

sentiment analysis through human computation. InProceedings of the Twenty-Eighth AAAI Conferenceon Artificial Intelligence, July 27 -31, 2014, QuebecCity, Quebec, Canada, pages 901–907. AAAI Press.

Konstantinos Bousmalis, George Trigeorgis, NathanSilberman, Dilip Krishnan, and Dumitru Erhan.2016a. Domain separation networks. In Advancesin Neural Information Processing Systems 29: An-nual Conference on Neural Information ProcessingSystems 2016, December 5-10, 2016, Barcelona,Spain, pages 343–351.

Konstantinos Bousmalis, George Trigeorgis, NathanSilberman, Dilip Krishnan, and Dumitru Erhan.2016b. Domain separation networks. In Advancesin neural information processing systems, pages343–351.

Erik Cambria, Soujanya Poria, Devamanyu Hazarika,and Kenneth Kwok. 2018. Senticnet 5: Discover-ing conceptual primitives for sentiment analysis bymeans of context embeddings. In Proceedings of theThirty-Second AAAI Conference on Artificial Intelli-gence, (AAAI-18), the 30th innovative Applicationsof Artificial Intelligence (IAAI-18), and the 8th AAAISymposium on Educational Advances in Artificial In-telligence (EAAI-18), New Orleans, Louisiana, USA,February 2-7, 2018, pages 1795–1802. AAAI Press.

Zhangjie Cao, Mingsheng Long, Jianmin Wang, andMichael I. Jordan. 2018. Partial transfer learningwith selective adversarial networks. In 2018 IEEEConference on Computer Vision and Pattern Recog-nition, CVPR 2018, Salt Lake City, UT, USA, June18-22, 2018, pages 2724–2732.

Wei-Lun Chang, Hui-Po Wang, Wen-Hsiao Peng, andWei-Chen Chiu. 2019. All about structure: Adapt-ing structural information across domains for boost-ing semantic segmentation. In IEEE Conference onComputer Vision and Pattern Recognition, CVPR2019, Long Beach, CA, USA, June 16-20, 2019,pages 1900–1909.

Minmin Chen, Kilian Q. Weinberger, and John Blitzer.2011. Co-training for domain adaptation. In Ad-vances in Neural Information Processing Systems24: 25th Annual Conference on Neural InformationProcessing Systems 2011. Proceedings of a meetingheld 12-14 December 2011, Granada, Spain, pages2456–2464.

Minmin Chen, Zhixiang Eddie Xu, Kilian Q. Wein-berger, and Fei Sha. 2012. Marginalized denoisingautoencoders for domain adaptation. In Proceed-ings of the 29th International Conference on Ma-chine Learning, ICML 2012, Edinburgh, Scotland,UK, June 26 - July 1, 2012.

Quanyu Dai, Xiao Shen, Xiao-Ming Wu, and DanWang. 2019. Network transfer learning via adver-sarial domain adaptation with graph convolution.CoRR, abs/1909.01541.

https://doi.org/10.1155/2015/715730

https://doi.org/10.1155/2015/715730

https://doi.org/10.18653/v1/P18-1099

https://doi.org/10.18653/v1/P18-1099

https://www.aclweb.org/anthology/W11-1707/

https://www.aclweb.org/anthology/W11-1707/

https://doi.org/10.1109/ICMLA.2007.39

https://doi.org/10.1109/ICMLA.2007.39

https://www.aclweb.org/anthology/K15-1006/

https://www.aclweb.org/anthology/K15-1006/

https://doi.org/10.18653/v1/D19-1255

https://doi.org/10.18653/v1/D19-1255

https://doi.org/10.18653/v1/D19-1255

https://www.aclweb.org/anthology/P07-1056/



https://www.aclweb.org/anthology/P07-1056



http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8244


http://papers.nips.cc/paper/6254-domain-separation-networks

https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16839



https://doi.org/10.1109/CVPR.2018.00288


http://openaccess.thecvf.com/content_CVPR_2019/html/Chang_All_About_Structure_Adapting_Structural_Information_Across_Domains_for_Boosting_CVPR_2019_paper.html



http://papers.nips.cc/paper/4433-co-training-for-domain-adaptation

http://icml.cc/2012/papers/416.pdf

http://icml.cc/2012/papers/416.pdf

http://arxiv.org/abs/1909.01541


3208

Hal Daume III and Daniel Marcu. 2006. Domain adap-tation for statistical classifiers. J. Artif. Intell. Res.,26:101–126.

Yang Deng, Ying Shen, Min Yang, Yaliang Li, Nan Du,Wei Fan, and Kai Lei. 2018. Knowledge as A bridge:Improving cross-domain answer selection with ex-ternal knowledge. In Proceedings of the 27th Inter-national Conference on Computational Linguistics,COLING 2018, Santa Fe, New Mexico, USA, August20-26, 2018, pages 3295–3305.

Hady Elsahar and Matthias Galle. 2019. To annotateor not? predicting performance drop under domainshift. In Proceedings of the 2019 Conference onEmpirical Methods in Natural Language Processingand the 9th International Joint Conference on Natu-ral Language Processing (EMNLP-IJCNLP), pages2163–2173, Hong Kong, China. Association forComputational Linguistics.

Christiane Fellbaum. 2010. Wordnet. In Theory andapplications of ontology: computer applications,pages 231–243. Springer.

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pas-cal Germain, Hugo Larochelle, Francois Laviolette,Mario March, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal ofMachine Learning Research, 17(59):1–35.

Deepanway Ghosal, Navonil Majumder, Soujanya Po-ria, Niyati Chhaya, and Alexander Gelbukh. 2019.Dialoguegcn: A graph convolutional neural networkfor emotion recognition in conversation. In Proceed-ings of the 2019 Conference on Empirical Methodsin Natural Language Processing and the 9th Inter-national Joint Conference on Natural Language Pro-cessing (EMNLP-IJCNLP), pages 154–164.

Xavier Glorot, Antoine Bordes, and Yoshua Bengio.2011. Domain adaptation for large-scale sentimentclassification: A deep learning approach. In Pro-ceedings of the 28th International Conference onMachine Learning, ICML 2011, Bellevue, Washing-ton, USA, June 28 - July 2, 2011, pages 513–520.

William L. Hamilton, Kevin Clark, Jure Leskovec, andDan Jurafsky. 2016a. Inducing domain-specific sen-timent lexicons from unlabeled corpora. In Pro-ceedings of the 2016 Conference on Empirical Meth-ods in Natural Language Processing, EMNLP 2016,Austin, Texas, USA, November 1-4, 2016, pages 595–605. The Association for Computational Linguistics.

William L. Hamilton, Jure Leskovec, and Dan Jurafsky.2016b. Diachronic word embeddings reveal statisti-cal laws of semantic change. In Proceedings of the54th Annual Meeting of the Association for Compu-tational Linguistics (Volume 1: Long Papers), pages1489–1501, Berlin, Germany. Association for Com-putational Linguistics.

Yulan He and Deyu Zhou. 2011. Self-training fromlabeled features for sentiment analysis. Inf. Process.Manage., 47(4):606–616.

Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters,Matt Gardner, and Sameer Singh. 2019. Barack’swife hillary: Using knowledge graphs for fact-awarelanguage modeling. In Proceedings of the 57th Con-ference of the Association for Computational Lin-guistics, ACL 2019, Florence, Italy, July 28- August2, 2019, Volume 1: Long Papers, pages 5962–5971.Association for Computational Linguistics.

Jing Jiang and ChengXiang Zhai. 2007. Instanceweighting for domain adaptation in NLP. In ACL2007, Proceedings of the 45th Annual Meeting of theAssociation for Computational Linguistics, June 23-30, 2007, Prague, Czech Republic.

Prathusha K Sarma, Yingyu Liang, and WilliamSethares. 2019. Shallow domain adaptive embed-dings for sentiment analysis. In Proceedings ofthe 2019 Conference on Empirical Methods in Nat-ural Language Processing and the 9th InternationalJoint Conference on Natural Language Processing(EMNLP-IJCNLP), pages 5548–5557, Hong Kong,China. Association for Computational Linguistics.

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung KwonLee, and Jiwon Kim. 2017a. Learning to discovercross-domain relations with generative adversarialnetworks. In Proceedings of the 34th InternationalConference on Machine Learning, ICML 2017, Syd-ney, NSW, Australia, 6-11 August 2017, pages 1857–1865.

Young-Bum Kim, Karl Stratos, and Dongchan Kim.2017b. Adversarial adaptation of synthetic or staledata. In Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics, ACL2017, Vancouver, Canada, July 30 - August 4, Vol-ume 1: Long Papers, pages 1297–1307.

Diederik P. Kingma and Jimmy Ba. 2015. Adam: Amethod for stochastic optimization. In 3rd Inter-national Conference on Learning Representations,ICLR 2015, San Diego, CA, USA, May 7-9, 2015,Conference Track Proceedings.

Wouter Marco Kouw and Marco Loog. 2019. A reviewof domain adaptation without target labels. IEEEtransactions on pattern analysis and machine intelli-gence.

Lianghao Li, Xiaoming Jin, and Mingsheng Long.2012. Topic correlation analysis for cross-domaintext classification. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence,July 22-26, 2012, Toronto, Ontario, Canada. AAAIPress.

Pengfei Li, Kezhi Mao, Xuefeng Yang, and Qi Li. 2019.Improving relation extraction with knowledge-attention. In Proceedings of the 2019 Conference onEmpirical Methods in Natural Language Processingand the 9th International Joint Conference on Natu-ral Language Processing (EMNLP-IJCNLP), pages229–239, Hong Kong, China. Association for Com-putational Linguistics.

https://doi.org/10.1613/jair.1872

https://doi.org/10.1613/jair.1872

https://www.aclweb.org/anthology/C18-1279/



https://doi.org/10.18653/v1/D19-1222

https://doi.org/10.18653/v1/D19-1222

https://doi.org/10.18653/v1/D19-1222

http://jmlr.org/papers/v17/15-239.html

http://jmlr.org/papers/v17/15-239.html

https://icml.cc/2011/papers/342_icmlpaper.pdf

https://icml.cc/2011/papers/342_icmlpaper.pdf

https://www.aclweb.org/anthology/D16-1057/


https://doi.org/10.18653/v1/P16-1141

https://doi.org/10.18653/v1/P16-1141

https://doi.org/10.1016/j.ipm.2010.11.003

https://doi.org/10.1016/j.ipm.2010.11.003






https://doi.org/10.18653/v1/D19-1557

https://doi.org/10.18653/v1/D19-1557

http://proceedings.mlr.press/v70/kim17a.html



https://doi.org/10.18653/v1/P17-1119

https://doi.org/10.18653/v1/P17-1119





https://doi.org/10.18653/v1/D19-1022

https://doi.org/10.18653/v1/D19-1022

3209

Yitong Li, Timothy Baldwin, and Trevor Cohn. 2018.What’s in a domain? learning domain-robust textrepresentations using adversarial training. In Pro-ceedings of the 2018 Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies,NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers), pages 474–479.

Hong Liu, Mingsheng Long, Jianmin Wang, andMichael Jordan. 2019. Transferable adversarialtraining: A general approach to adapting deep clas-sifiers. In International Conference on MachineLearning, pages 4013–4022.

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017.Adversarial multi-task learning for text classifica-tion. In Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics, ACL2017, Vancouver, Canada, July 30 - August 4, Vol-ume 1: Long Papers, pages 1–10.

Qi Liu, Yue Zhang, and Jiangming Liu. 2018. Learningdomain representation for multi-domain sentimentclassification. In Proceedings of the 2018 Confer-ence of the North American Chapter of the Associ-ation for Computational Linguistics: Human Lan-guage Technologies, NAACL-HLT 2018, New Or-leans, Louisiana, USA, June 1-6, 2018, Volume 1(Long Papers), pages 541–550.

zhibin liu, Zheng-Yu Niu, Hua Wu, and Haifeng Wang.2019. Knowledge aware conversation generationwith explainable reasoning over augmented graphs.In Proceedings of the 2019 Conference on EmpiricalMethods in Natural Language Processing and the9th International Joint Conference on Natural Lan-guage Processing (EMNLP-IJCNLP), pages 1782–1792, Hong Kong, China. Association for Computa-tional Linguistics.

Christos Louizos, Kevin Swersky, Yujia Li, MaxWelling, and Richard Zemel. 2015. The variationalfair autoencoder. arXiv preprint arXiv:1511.00830.

Jingchao Ni, Shiyu Chang, Xiao Liu, Wei Cheng,Haifeng Chen, Dongkuan Xu, and Xiang Zhang.2018. Co-regularized deep multi-network embed-ding. In Proceedings of the 2018 World Wide WebConference on World Wide Web, WWW 2018, Lyon,France, April 23-27, 2018, pages 469–478.

Sinno Jialin Pan, Xiaochuan Ni, Jian-Tao Sun, QiangYang, and Zheng Chen. 2010. Cross-domain senti-ment classification via spectral feature alignment. InProceedings of the 19th International Conference onWorld Wide Web, WWW 2010, Raleigh, North Car-olina, USA, April 26-30, 2010, pages 751–760.

Minlong Peng, Qi Zhang, Yu-gang Jiang, and Xuan-Jing Huang. 2018a. Cross-domain sentiment classi-fication with target domain specific information. InProceedings of the 56th Annual Meeting of the As-sociation for Computational Linguistics (Volume 1:Long Papers), pages 2505–2513.

Minlong Peng, Qi Zhang, Yu-Gang Jiang, and Xuan-jing Huang. 2018b. Cross-domain sentiment classi-fication with target domain specific information. InProceedings of the 56th Annual Meeting of the As-sociation for Computational Linguistics, ACL 2018,Melbourne, Australia, July 15-20, 2018, Volume 1:Long Papers, pages 2505–2513.

Jeffrey Pennington, Richard Socher, and Christopher D.Manning. 2014. Glove: Global vectors for wordrepresentation. In Proceedings of the 2014 Confer-ence on Empirical Methods in Natural LanguageProcessing, EMNLP 2014, October 25-29, 2014,Doha, Qatar, A meeting of SIGDAT, a Special Inter-est Group of the ACL, pages 1532–1543. ACL.

Matthew E. Peters, Mark Neumann, Robert Logan, RoySchwartz, Vidur Joshi, Sameer Singh, and Noah A.Smith. 2019. Knowledge enhanced contextual wordrepresentations. In Proceedings of the 2019 Con-ference on Empirical Methods in Natural LanguageProcessing and the 9th International Joint Confer-ence on Natural Language Processing (EMNLP-IJCNLP), pages 43–54, Hong Kong, China. Associ-ation for Computational Linguistics.

Soujanya Poria, Devamanyu Hazarika, Navonil Ma-jumder, and Rada Mihalcea. 2020. Beneath the tipof the iceberg: Current challenges and new direc-tions in sentiment analysis research. arXiv preprintarXiv:2005.

Soujanya Poria, Navonil Majumder, Rada Mihalcea,and Eduard Hovy. 2019. Emotion recognition inconversation: Research challenges, datasets, and re-cent advances. IEEE Access, 7:100943–100953.

Sebastian Ruder. 2019. Neural Transfer Learning forNatural Language Processing. Ph.D. thesis, NA-TIONAL UNIVERSITY OF IRELAND, GALWAY.

Sebastian Ruder and Barbara Plank. 2018. Strong base-lines for neural semi-supervised learning under do-main shift. In Proceedings of the 56th Annual Meet-ing of the Association for Computational Linguistics,ACL 2018, Melbourne, Australia, July 15-20, 2018,Volume 1: Long Papers, pages 1044–1054. Associa-tion for Computational Linguistics.

Kuniaki Saito, Yoshitaka Ushiku, and Tatsuya Harada.2017. Asymmetric tri-training for unsupervised do-main adaptation. In Proceedings of the 34th Interna-tional Conference on Machine Learning-Volume 70,pages 2988–2997. JMLR. org.

Prathusha K. Sarma, Yingyu Liang, and Bill Sethares.2018. Domain adapted word embeddings for im-proved sentiment classification. In Proceedings ofthe 56th Annual Meeting of the Association for Com-putational Linguistics, ACL 2018, Melbourne, Aus-tralia, July 15-20, 2018, Volume 2: Short Papers,pages 37–42.

Michael Sejr Schlichtkrull, Thomas N. Kipf, PeterBloem, Rianne van den Berg, Ivan Titov, and Max

https://www.aclweb.org/anthology/N18-2076/


https://doi.org/10.18653/v1/P17-1001

https://doi.org/10.18653/v1/P17-1001




https://doi.org/10.18653/v1/D19-1187

https://doi.org/10.18653/v1/D19-1187

https://doi.org/10.1145/3178876.3186113

https://doi.org/10.1145/3178876.3186113

https://doi.org/10.1145/1772690.1772767

https://doi.org/10.1145/1772690.1772767

https://doi.org/10.18653/v1/P18-1233

https://doi.org/10.18653/v1/P18-1233



https://doi.org/10.18653/v1/D19-1005

https://doi.org/10.18653/v1/D19-1005

https://doi.org/10.18653/v1/P18-1096

https://doi.org/10.18653/v1/P18-1096

https://doi.org/10.18653/v1/P18-1096

https://doi.org/10.18653/v1/P18-2007

https://doi.org/10.18653/v1/P18-2007

3210

Welling. 2018. Modeling relational data with graphconvolutional networks. In The Semantic Web - 15thInternational Conference, ESWC 2018, Heraklion,Crete, Greece, June 3-7, 2018, Proceedings, volume10843 of Lecture Notes in Computer Science, pages593–607. Springer.

Raksha Sharma, Pushpak Bhattacharyya, SandipanDandapat, and Himanshu Sharad Bhatt. 2018. Iden-tifying transferable information across domains forcross-domain sentiment classification. In Proceed-ings of the 56th Annual Meeting of the Associa-tion for Computational Linguistics, ACL 2018, Mel-bourne, Australia, July 15-20, 2018, Volume 1: LongPapers, pages 968–978.

Xiao Shen and Fu-Lai Chung. 2019. Network embed-ding for cross-network node classification. CoRR,abs/1901.07264.

Bei Shi, Zihao Fu, Lidong Bing, and Wai Lam.2018. Learning domain-sensitive and sentiment-aware word embeddings. In Proceedings of the 56thAnnual Meeting of the Association for Computa-tional Linguistics, ACL 2018, Melbourne, Australia,July 15-20, 2018, Volume 1: Long Papers, pages2494–2504. Association for Computational Linguis-tics.

Robert Speer, Joshua Chin, and Catherine Havasi. 2017.Conceptnet 5.5: An open multilingual graph of gen-eral knowledge. In Thirty-First AAAI Conference onArtificial Intelligence.

Eric Tzeng, Judy Hoffman, Kate Saenko, and TrevorDarrell. 2017. Adversarial discriminative domainadaptation. In 2017 IEEE Conference on ComputerVision and Pattern Recognition, CVPR 2017, Hon-olulu, HI, USA, July 21-26, 2017, pages 2962–2971.IEEE Computer Society.

Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko,and Trevor Darrell. 2014. Deep domain confu-sion: Maximizing for domain invariance. CoRR,abs/1412.3474.

Pu Wang, Carlotta Domeniconi, and Jian Hu. 2008. Us-ing wikipedia for co-clustering based cross-domaintext classification. In Proceedings of the 8th IEEEInternational Conference on Data Mining (ICDM2008), December 15-19, 2008, Pisa, Italy, pages1085–1090. IEEE Computer Society.

Garrett Wilson and Diane J. Cook. 2018. Adversarialtransfer learning. CoRR, abs/1812.02849.

Evan Wei Xiang, Bin Cao, Derek Hao Hu, and QiangYang. 2010. Bridging domains using world wideknowledge for transfer learning. IEEE Trans. Knowl.Data Eng., 22(6):770–783.

Linchuan Xu, Xiaokai Wei, Jiannong Cao, and Philip S.Yu. 2017. Embedding of embedding (EOE): jointembedding for coupled heterogeneous networks. InProceedings of the Tenth ACM International Confer-ence on Web Search and Data Mining, WSDM 2017,

Cambridge, United Kingdom, February 6-10, 2017,pages 741–749. ACM.

Bishan Yang, Wen-tau Yih, Xiaodong He, JianfengGao, and Li Deng. 2014. Embedding entities andrelations for learning and inference in knowledgebases. arXiv preprint arXiv:1412.6575.

Pengcheng Yang, Lei Li, Fuli Luo, Tianyu Liu, andXu Sun. 2019. Enhancing topic-to-essay generationwith external commonsense knowledge. In Proceed-ings of the 57th Annual Meeting of the Associationfor Computational Linguistics, pages 2002–2012,Florence, Italy. Association for Computational Lin-guistics.

Werner Zellinger, Thomas Grubinger, Edwin Lughofer,Thomas Natschlager, and Susanne Saminger-Platz.2017. Central moment discrepancy (cmd) fordomain-invariant representation learning. arXivpreprint arXiv:1702.08811.

Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deeplearning for sentiment analysis: A survey. Wiley In-terdiscip. Rev. Data Min. Knowl. Discov., 8(4).

Peixiang Zhong, Di Wang, and Chunyan Miao. 2019.Knowledge-enriched transformer for emotion de-tection in textual conversations. In Proceedingsof the 2019 Conference on Empirical Methods inNatural Language Processing and the 9th Interna-tional Joint Conference on Natural Language Pro-cessing (EMNLP-IJCNLP), pages 165–176, HongKong, China. Association for Computational Lin-guistics.

Yftah Ziser and Roi Reichart. 2019. Task refinementlearning for improved accuracy and stability of unsu-pervised domain adaptation. In Proceedings of the57th Annual Meeting of the Association for Compu-tational Linguistics, pages 5895–5906.

https://doi.org/10.1007/978-3-319-93417-4_38

https://doi.org/10.1007/978-3-319-93417-4_38

https://doi.org/10.18653/v1/P18-1089

https://doi.org/10.18653/v1/P18-1089

https://doi.org/10.18653/v1/P18-1089



https://doi.org/10.18653/v1/P18-1232

https://doi.org/10.18653/v1/P18-1232





https://doi.org/10.1109/ICDM.2008.136





https://doi.org/10.1109/TKDE.2010.31

https://doi.org/10.1109/TKDE.2010.31

https://doi.org/10.1145/3018661.3018723

https://doi.org/10.1145/3018661.3018723

https://doi.org/10.18653/v1/P19-1193

https://doi.org/10.18653/v1/P19-1193

https://doi.org/10.1002/widm.1253

https://doi.org/10.1002/widm.1253

https://doi.org/10.18653/v1/D19-1016

https://doi.org/10.18653/v1/D19-1016

kingdom: knowledge-guided domain adaptation for sentiment … · wallpaper design conceptnet...

Documents