modelling the interplay of metaphor and emotion through ... · metaphor identiﬁcation on nouns,...

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processingand the 9th International Joint Conference on Natural Language Processing, pages 2218–2229,Hong Kong, China, November 3–7, 2019. c©2019 Association for Computational Linguistics

2218

Modelling the interplay of metaphor and emotion through multitasklearning

Verna Dankers1 Marek Rei2,3 Martha Lewis1 Ekaterina Shutova1

1Institute for Logic, Language and Computation, University of Amsterdam2The ALTA Institute, Computer Laboratory, University of Cambridge

3Department of Computing, Imperial College [email protected], [email protected], {m.a.f.lewis,e.shutova}@uva.nl

Abstract

Metaphors allow us to convey emotion byconnecting physical experiences and abstractconcepts. The results of previous researchin linguistics and psychology suggest thatmetaphorical phrases tend to be more emotion-ally evocative than their literal counterparts. Inthis paper, we investigate the relationship be-tween metaphor and emotion within a com-putational framework, by proposing the firstjoint model of these phenomena. We experi-ment with several multitask learning architec-tures for this purpose, involving both hard andsoft parameter sharing. Our results demon-strate that metaphor identification and emotionprediction mutually benefit from joint learningand our models advance the state of the art inboth of these tasks.

1 Introduction

Metaphors allow us to reason about abstract con-cepts by linking them to our physical experiences(Lakoff and Johnson, 1983). Metaphorical lan-guage arises through systematic association be-tween two distinct semantic domains — the sourceand the target — as illustrated by the sentence“The news leaked out despite the secrecy”, wherea term from the source domain of liquids is usedto describe information (the target domain). Thismetaphorical association widely manifests itself inlanguage, e.g. we can similarly talk about “beingengulfed by a stream of bad news”. Metaphoricalassociations allow us to project knowledge fromthe source domain to the target, inviting new rea-soning frameworks and connotations to emerge.

Much previous research on metaphorical lan-guage in fields such as linguistics (Blanchetteet al., 2001; Kövecses, 2003), cognitive psychol-ogy (Crawford, 2009; Thibodeau and Boroditsky,2011) and neuroscience (Aziz-Zadeh and Dama-sio, 2008; Jabbi et al., 2008) points to its prevalent

affective content. Linguistic expressions describ-ing one’s emotional state have a relatively highincidence of figurative language and metaphor inparticular (Fainsilber and Ortony, 1987; Fusselland Moss, 1998; Gibbs Jr et al., 2002), as illus-trated by the phrase “My mind was seething andboiling”. On the other hand, a stronger emo-tion appears to be conveyed through the associa-tion of source and target domains more generally.Mohammad et al. (2016) found that metaphori-cal phrases are consistently perceived as carryingmore emotion than their literal paraphrases and theliteral uses of the same source domain words. Forinstance, “leaking information” conveys an im-plicit judgement, as compared to the more neu-tral paraphrase “disclosing information”. Their re-sults also suggest that the emotional content of themetaphor is not due to the properties of individualsource and target domains, but rather arises com-positionally through their interaction. These find-ings are supported through a range of psycholin-guistic studies: Citron and Goldberg (2014) findtaste metaphors to be more emotionally evocativethan their literal counterparts. Citron et al. (2016)show that conventional metaphorical language inshort stories from various domains elicits more ac-tivation in brain regions involved in emotional pro-cessing, compared to literal language.

Computational modelling of metaphor (Shutovaet al., 2016; Rei et al., 2017; Gao et al., 2018) andemotion (Wang et al., 2016; Zhang et al., 2018;Wu et al., 2019) are tasks widely addressed innatural language processing (NLP), with a rangeof applications from machine translation (Fadaeeet al., 2018) to opinion mining (Yadollahi et al.,2017). However, the two phenomena have beentypically modelled independently. Exceptions in-clude the use of hand-engineered emotion fea-tures when training a classifier for metaphor iden-tification (Strzalkowski et al., 2014) and auto-

2219

matic identification of affect carried by metaphors(Kozareva, 2013; Strzalkowski et al., 2014). How-ever, none of this research has attempted to modelmetaphor and emotion within a unified model ofsemantic composition. In this paper, we presentthe first joint model of metaphor and emotion,trained to learn the patterns of their interaction viaflexible parameter sharing techniques offered bymultitask learning (MTL). Our model is composi-tional, building meaning representations of wordsand phrases in context. The intuition is that themeaning of a word is not intrinsically metaphor-ical or emotional, but both of these phenomenamay manifest when the word is used in a partic-ular context.

Specifically, we train deep learning architec-tures on metaphor identification and emotion pre-diction tasks jointly. Metaphor identification isperformed at word level and sentence level, whileemotion prediction is modelled as a regressiontask, predicting numerical scores for the valence,arousal and dominance dimensions of emotion.We experiment with MTL architectures employ-ing both hard and soft parameter sharing methods.Models employing hard parameter sharing jointlyencode the lower-level word representations usinglayers shared among the tasks. The soft parametersharing methods have two task-specific networksconnected through linear units or gates.

Our models outperform existing approaches toboth metaphor identification and emotion predic-tion tasks, advancing the state of the art in theseareas. Moreover, we show that jointly learningboth tasks within one model provides stable per-formance improvements across architectures.

2 Related work

2.1 Computational models of metaphor

Early approaches to metaphor identification usedhand-engineered features and a trained classifier,such as logistic regression (Dunn, 2013), ran-dom forests (Tsvetkov et al., 2014), decision trees(Mohler et al., 2013; Gargett and Barnden, 2015)or support vector machines (Hovy et al., 2013;Mohler et al., 2013). Examples of linguistic fea-tures used are POS tags (Hovy et al., 2013), con-creteness or imageability ratings (Turney et al.,2011; Broadwell et al., 2013; Gargett and Barn-den, 2015), ontological concepts (Dunn, 2013)and WordNet super-senses (Hovy et al., 2013) andsynsets (Mohler et al., 2013). To become less re-

liant on hand-crafted features, corpus-driven ap-proaches emerged, using sparse (Shutova et al.,2010; Gutierrez et al., 2016) or dense word em-beddings (Shutova et al., 2016; Bulat et al., 2017).

Recently, the use of deep neural networks formetaphor identification has gained popularity. Reiet al. (2017) presented a network designed to pre-dict the metaphoricity of a word pair, by mod-elling the words’ interaction using a gating func-tion. Other approaches treated metaphor identifi-cation as a sequence labelling task. Do Dinh andGurevych (2016) proposed a multi-layer percep-tron acting on word embeddings. Do Dinh et al.(2018) present a MTL approach combining multi-ple metaphor identification tasks using two archi-tectures: a hard parameter sharing recurrent net-work and the recurrent Sluice network of Ruderet al. (2019). During a recent shared task onmetaphor identification, various deep neural ar-chitectures were presented (Leong et al., 2018),among which were several hybrid approaches thatincorporated linguistic features in recurrent net-works. Gao et al. (2018) presented the currentbest-performing model for metaphor sequence la-belling. They employed GloVe (Pennington et al.,2014) and ELMo (Peters et al., 2018) embeddingsas input to a bidirectional LSTM (Bi-LSTM) fol-lowed by a classification layer.

2.2 Computational models of emotion

The vast majority of NLP research on affectivelanguage analysis has focused on the prediction ofemotion categories and sentiment analysis. Earlywork on emotion prediction assumed categoricalmodels of emotion, such as Ekman’s model of sixemotions (Ekman, 1992) (anger, disgust, fear, joy,sadness and surprise). A variety of computationalmodels have been proposed for emotion classifica-tion, ranging from vector space models (Danismanand Alpkocak, 2008), to machine learning classi-fiers (Perikos and Hatzilygeroudis, 2016) and deeplearning architectures (Zhang et al., 2018).

Recently, multi-dimensional emotion analysishas gained popularity: it represents emotionthrough a more fine-grained and psychologically-motivated model (Buechel and Hahn, 2017). Weemploy the Valence-Arousal-Dominance (VAD)model (Mehrabian, 1996) that describes affectivestates relative to these emotional dimensions. Va-lence represents the polarity, arousal the degree ofexcitement, and dominance the perceived degree

2220

of control over a situation.Existing methods for dimensional emotion anal-

ysis are either lexicon-based or use supervisedlearning. Lexicon-based methods assume theemotional value of a sentence to be a composi-tion of per-word values. These values are ex-tracted from an affect lexicon (Warriner et al.,2013) and combined using their mean (Kim et al.,2010), a weighted mean, or a Gaussian MixtureModel (Paltoglou et al., 2013). Other approachestrain classifiers using n-gram and sentiment fea-tures (Malandrakis et al., 2013; Buechel and Hahn,2016), and deep learning models.

Wang et al. (2016) were among the first topresent a deep learning architecture for dimen-sional emotion analysis using the VA model: theyproposed a convolutional network operating on re-gions within the input, and a LSTM layer acting onthe region encodings. Wang et al. (2018) used theVAD labelled corpus of Buechel and Hahn (2017),but considered only valence, effectively reducingthe task to sentiment analysis. They present a deepnetwork of stacked Bi-LSTM layers with resid-ual connections. Akhtar et al. (2018) performedregression for all three dimensions using a con-volutional and two recurrent networks combinedin an ensemble extended with hand-crafted fea-tures. The emotion dimensions were consideredseparately and in a MTL setup. Most recently, Wuet al. (2019) proposed a variational auto-encodermodel including a recurrent module trained to per-form emotion prediction. The model was trainedin a semi-supervised way, using only the labels of40% of the training samples.

2.3 Metaphor and emotion

Existing work combining metaphor and emotioneither focuses on the inclusion of emotion fea-tures in metaphor identification or on the auto-matic identification of affect carried by metaphors.

Kozareva (2013) and Strzalkowski et al. (2014)modelled the affect carried by metaphors and eval-uate their approaches on a metaphor-rich corpuscontaining data from four languages. Kozareva(2013) performs polarity classification and va-lence regression using the AdaBoost classifier andsupport vector regression trained on informationfrom the sentence, its context, and source and tar-get domain annotations. Strapparava and Mihal-cea (2007) proposed an affect calculus to estimatethe affect expressed by a linguistic metaphor as

positive, negative, or neutral. The affect calculustakes into account the metaphor target, the sourcerelation, the relation’s arguments and type, and theprior affect of the target.

Gargett and Barnden (2015) consideredmetaphor identification on nouns, verbs andprepositions using hand-engineered features,including lexicon-based VAD emotion features.The emotion features proved most beneficial formetaphor identification for nouns and verbs.

3 Tasks and datasets

VUA metaphor corpus The VUA metaphorcorpus1 (Steen, 2010) is a subset of the BritishNational Corpus (Leech, 1992) in which eachword is annotated as literally- or metaphorically-used. The corpus contains over ten thousandsentences, sampled from four genres: academicwriting, news, conversation and fiction. The re-ported inter-annotator agreement is 0.84 in termsof Fleiss’s κ. For comparability reasons, we use apreprocessed variant of the corpus as provided byGao et al. (2018), who use 25% of the sentencesfor testing. We perform metaphor identification atword level, experimenting in a sequence labellingparadigm.

LCC metaphor corpus The Language Com-puter Corporation (LCC) metaphor dataset(Mohler et al., 2016) is a metaphor-rich corpuscontaining data in English, Farsi, Spanish andRussian.2 We use the English portion of thisdataset that consists of data from the ClueWebcorpus and the Debate Politics online forum. An-notators rated the metaphoricity of sentences fromzero (i.e. literal) to three (i.e. clearly metaphori-cal). Mohler et al. (2016) considered agreementbetween annotators to be a difference of ≤ 1 (ona range from 0 to 3). With this definition, theinter-annotator agreement on metaphoricity is92.8%.

We extract nine thousand samples from thefreely available portion of the dataset, average thescores assigned by individual annotators and nor-malise them to the scale from zero to one. We usethe data to perform sentence-level regression, em-ploying ten-fold cross-validation using 70-10-20splits for training, validating and testing, respec-tively.

1Preprocessed variant available at: https://github.com/gao-g/metaphor-in-context.

2Available upon request from Mohler et al. (2016).

https://github.com/gao-g/metaphor-in-context

https://github.com/gao-g/metaphor-in-context

2221

Sentence Val. Arous. Dom.

“Tell her I love her.” .94 .88 .83Tell me, or I’ll kill – .35 .69 .83What did you say? .50 .54 .50This is torture. .14 .72 .27

Table 1: EmoBank examples with normalised scores,illustrating the differences among the dimensions.

EmoBank corpus EmoBank,3 (Buechel andHahn, 2017) is one of the most recent corporadeveloped based on the VAD model. EmoBankcontains ten thousand sentences from the man-ually annotated sub-corpus of American English(Ide et al., 2008) and the Affective Text corpus(Strapparava and Mihalcea, 2007). The corpusbalances many genres: news headlines, blogs, es-says, fiction, letters, newspapers and travel guides.Each sentence is rated on a scale from one to fivefor each dimension, from the perspective of thewriter and the reader. The inter-annotator agree-ment rates are 0.61 and 0.63 in terms of Pearson’sr for the two perspectives, respectively. We com-bine the scores of readers and writers, normalisedto the scale from zero to one. We use EmoBank toperform sentence-level regression for each of theV, A, and D dimensions separately, using ten-foldcross-validation with 70-10-20 splits for training,validating and testing, respectively. Table 1 listsexamples exhibiting a range of VAD values.

Since we focus on the interaction of metaphorand emotion, we pair up the word-level andsentence-level metaphor tasks with the regressiontasks for each separate dimension of V, A, or Done by one, in a MTL setup.

4 Methods

We construct a recurrent neural architecture op-erating at two different levels. Based on theVUA metaphor corpus, the model learns to de-tect metaphor at word level in a sequence labellingparadigm. When optimised on the LCC metaphorcorpus, the architecture is adapted to predict ametaphoricity score at sentence level. For emo-tion prediction, the architecture is the same as forthe sentence-level metaphor prediction task.

The system receives a tokenised sentence as in-put and maps it to word embeddings, by concate-nating representations from pre-trained GloVe and

3https://github.com/JULIELab/EmoBank.

crying angry tears

L M L 14

⊕

(a) Hard parameter sharing

crying angry tears

L M L 14

⊕

α

α

(b) Cross-stitch network

crying angry tears

L M L

14

⊕⊗ ⊗ ⊗

⊗ ⊗ ⊗⊗ ⊗ ⊗

⊗ ⊗ ⊗

(c) Gated networkM ⊕ ⊗

ELMo, GloVe Bi-LSTM Classification Output Cross-Stitch Unit Attention Gate

Figure 1: Overview of the MTL architectures. The pa-rameters of the main and auxiliary task are indicatedthrough highlighting, where the metaphor detectiontask is considered the main task in this setup. For com-pactness, two of the three Bi-LSTM layers are shown.

ELMo models. Next, these embeddings are passedthrough a Bi-LSTM, building task- and context-dependent representations for each word. For to-ken labelling, the hidden states from each direc-tion are concatenated and passed through a feed-forward layer, followed by a sigmoid activation.We model metaphor detection as a binary task forcomparability to the literature. Gao et al. (2018)use a similar model for metaphor detection at wordlevel, but employ a softmax activation.

For the sentence-level score prediction, theconcatenated Bi-LSTM hidden states are passedthrough the attention function, which includes alinear layer and softmax normalisation, in orderto construct a sentence representation. The result-ing vector is passed through a feedforward layer,then used to predict a sentence-level score withsigmoid activation. Since we used a sigmoid ac-

https://github.com/JULIELab/EmoBank

2222

tivation function in the token labelling task, boththe metaphor and emotion tasks are structurallyvery similar. We train for metaphor detection us-ing the binary cross-entropy loss function and forregression using the root mean squared error lossfunction.

We also experiment with fine-tuning a pre-trained BERT architecture (Devlin et al., 2019) foreach of the tasks. This validates that the perfor-mance differences are due to the task interactionsand not specific to the recurrent architecture. Theinputs to this network consist of the BERT-specificword and position embeddings. For the word-level sequence labelling task, the outputs of thelast Transformer layer are fed to the classificationlayer. For the sentence-level tasks, an additionalattention module is again used to construct sen-tence representations. An alternative way of us-ing BERT would be to provide contextualised em-beddings. We do not consider this in the presentwork but leave it as an area to be explored. BERTperforms labelling on subword units called Word-Pieces; we consider a word metaphorical if anyof these subword units is labelled metaphorical.This choice is motivated by the fact that although ametaphorical prefix or suffix could result in an in-correct metaphorical label, this is unlikely: what ismuch more likely is that a common prefix or suf-fix is not considered metaphorical while the mainpiece is.

In the following sections we describe three dif-ferent approaches to optimising these networks us-ing MTL. We experiment with four setups: onerecurrent and one BERT hard parameter sharingsetup, one recurrent cross-stitch network and onegated recurrent network. In the MTL setups, themodels are trained on two tasks at once, but wedistinguish between a main and auxiliary task bydown-weighting the loss of the auxiliary task toallow the networks to specialise most in one di-rection. For example, when seeking performanceimprovements in the word-level metaphor identi-fication task, metaphor identification is the maintask and emotion regression is the auxiliary task.

4.1 Hard parameter sharing

We customise the models to jointly performmetaphor detection and emotion prediction. Bytraining the model to identify emotional statesin text, the system learns to recognise emotion-related features which can be useful for the task

of metaphor detection. In addition, optimising fortwo different but related tasks helps prevent themodel from overfitting to either of them.

Following established work on MTL (Caruana,1993), we first experiment with hard parametersharing. In this setting, the architecture shares theword embeddings and lower Bi-LSTM layers be-tween the two tasks, as shown in Figure 1a. On topof these shared components, each task has one sep-arate Bi-LSTM layer, followed by a task-specificoutput layer. For sentence scoring, the attentionfunction for constructing sentence representationsis also learned individually for every task. Thissetup allows the model to learn shared feature de-tectors in the lower layers, while top layers are stillable to learn task-specific features.

The hard parameter sharing setup using BERTshares all of BERT’s Transformer layers amongthe tasks, apart from the last layer to allow for spe-cialisation. Furthermore, the output and attentionlayers are task-specific as well.

4.2 Cross-stitch network

As an alternative to hard parameter sharing, softsharing provides parallel models with dedicatedparameters for each task, while also connectingthem together to allow for information transfer. Inthe cross-stitch network, the soft sharing mech-anism is a cross-stitch unit (Misra et al., 2016).These units contain α-parameters which regulatethe information flow in each direction and are opti-mised during training. We apply cross-stitch shar-ing after each recurrent layer, computing the up-dated hidden states as:

h̃A = αAAhA + αBAhB (1)

h̃B = αBBhB + αABhA (2)

where hA and hB are the concatenated Bi-LSTMhidden states, from parallel networks for tasks Aand B, while h̃A and h̃B are the updated hiddenstates. Note that the α-parameters are specific toeach layer. The α-parameters control the direc-tions of information flow; for example, αAB scalesthe information passed from networkA to networkB. The cross-stitch network is shown in Figure 1b.

If both tasks operate at sentence level, an ad-ditional cross-stitch sharing unit is placed after theattention module. The α-parameters are initialisedwith a bias towards favouring the information inthe same network, with αAA = αBB = 0.9 andαAB = αBA = 0.1. These values are optimised

2223

during training but remain static during testing.

4.3 Gated network

The cross-stitch network learns a single set ofshared values for the α-parameters during optimi-sation. As an alternative, we can construct a net-work that calculates these values dynamically foreach input sentence, even at testing time. This al-lows the model greater flexibility and modulatesthe information flow depending on the particularinput sentence.

In this architecture, shown in Figure 1c, theα-parameters are replaced with gates (Liu et al.,2016). Each pair of parallel layers has two gates,where one modulates the information flow fromthe main to the auxiliary task, while the other con-trols the information flow in the opposite direc-tion. For two jointly learned sentence-level tasks,two more gates are placed before the classificationlayer, operating on the sentence representations.

Equations (3)-(6) detail the gating mechanisms:

gA = σ(WA[hA;hB] + bA) (3)

h̃A = (1− gA)� hA + gA � hB (4)

gB = σ(WB[hA;hB] + bB) (5)

h̃B = (1− gB)� hB + gB � hA (6)

where gA and gB are the gates for task A and B,WA and WB are weight matrices, bA and bB arebias vectors. The bias parameters of the gates areinitialised with a bias towards one task.

5 Experiments and results

5.1 Experimental setup

MTL training procedure We apply pairwisejoint learning, where at each step in the trainingprocess one of the two tasks is selected at randomand a batch is sampled from that task. To distin-guish main tasks from auxiliary tasks the loss ofthe auxiliary task is down-weighted by a factor λsuch that it comprises 10% of the loss of the maintask. λ is initialised with 1

10 and computed dynam-ically as training progresses.

Hyperparameters The input to the recurrentnetwork consists of concatenated ELMo andGloVe embeddings, with 1, 024 and 300 dimen-sions respectively. The recurrent encoder containsthree Bi-LSTM layers with a dimensionality of200. The models are trained using a batch size of64 for 2, 000 steps and the Adam optimiser withinitial learning rates of 4e−3, 1e−3 and 0.5e−3

Approach Metaphor TaskWord (F1) Sent. (r)

Gao et al. (2018) .726 -

LSTM (single task) .737 .544Hard Sharing+ Valence .740 .559+ Arousal .740 .558+ Dominance .743 .560Cross-Stitch Network+ Valence .741 .556+ Arousal .740 .558+ Dominance .743 .563Gated Network+ Valence .742 .561+ Arousal .741 .558+ Dominance .745 .560

BERT (single task) .763 .604Hard Sharing+ Valence .769 .614+ Arousal .765 .610+ Dominance .768 .614

Table 2: System performance for the word- andsentence-level metaphor tasks using the F1-score andPearson’s r respectively. Statistically significant (p <0.05) differences to the single task models are shownin boldface.

for metaphor detection, metaphor regression andemotion regression respectively. Models are se-lected based on validation data.

We employ the pretrained BERT Base model,whose inputs and hidden states have 768 dimen-sions. The model contains 12 Transformer lay-ers and is fine-tuned as described by Devlin et al.(2019), using the Adam optimiser with an ini-tial learning rate of 5e−5 and a batch size of 32.BERT is fine-tuned for 3, 000 steps for the regres-sion tasks and for 8, 000 steps for the word-levelmetaphor detection task. The difference is com-pensated for through down-scaling λ.

Significance testing We test for statistical sig-nificance using the one-sided approximate ran-domisation test (Edgington, 1969) for metaphordetection, and Williams’s test (Williams, 1959) forregression tasks. For Williams’s test we considerthe number of samples to be the number of uniquesamples in the dataset. All performance measuresreported are averages from models initialised withten random seeds.

2224

Approach Emotion TaskVal. Arous. Dom.

Akhtar et al. (2018) .616 .355 .237+ Val., Arous., Dom. .635 .375 .277Wu et al. (2019)† .620 .508 .333

LSTM (single task) .728 .557 .373Hard Sharing+ Metaphor (Token) .734 .564 .384+ Metaphor (Sent.) .734 .558 .388Cross-Stitch Network+ Metaphor (Token) .737 .564 .388+ Metaphor (Sent.) .735 .558 .384Gated Network+ Metaphor (Token) .738 .563 .389+ Metaphor (Sent.) .735 .560 .384

BERT (single task) .771 .565 .403Hard Sharing+ Metaphor (Token) .779 .572 .420+ Metaphor (Sent.) .778 .570 .417

Table 3: System performance for emotion regressiontasks according to Pearson’s r. Statistically significant(p < 0.05) differences to the single task model areshown in boldface. †Used 40% of the gold labels.

5.2 Results

Table 2 presents the results for the two metaphortasks. The STL setup already provides improve-ments over the current state of the art, but more-over, we see further improvements when MTL isintroduced. Each MTL setup should be comparedto the corresponding STL setup, which involvestraining the model on the metaphor task only. Re-gardless of the MTL architecture, the auxiliarytask of dominance regression provides statisticallysignificant (p < 0.05) improvements over the STLsetup. Furthermore, valence regression providessignificant improvements as well in a select num-ber of setups. The largest improvement is achievedby replacing the recurrent encoder with BERT.This indicates that the rich contextual informa-tion learned by BERT in the pre-training phase ishighly relevant for metaphor identification. Forsentence-level metaphor regression, MTL setupsconsistently improve upon STL setups, indicatingthat the effect is not specific to the VUA metaphorcorpus. Our MTL models outperform the previousstate of the art in metaphor identification (F1 of0.726 on the VUA corpus Gao et al. (2018)) withboth LSTM (F1 of 0.745) and BERT (F1 of 0.769)

AA BA

1

2

3

Laye

r

0.76 0.24

0.79 0.21

0.96 0.04

(a) Cross-Stitch Units

0 10 20 30 40 50Gate Saturation (%)

1

2

3

Laye

r

(b) Gates

Figure 2: Illustration of the information flow in be-tween the Bi-LSTM layers, for the dominance regres-sion (B) and metaphor identification (A) tasks. Gatesaturation % is calculated by averaging across the hid-den dimensionality for every word in the test set.

encoders. These results lend support to the hy-pothesis on the interaction of metaphor and emo-tion in semantic composition.

Table 3 presents the results for the emotionregression tasks. Again, STL setups performstrongly, and MTL architectures improve thiseven further. While the valence and dominancetasks consistently improve with the addition ofthe metaphor task, the improvements achieved onarousal regression are less stable. Once again, ourMTL models outperform the best-performing ex-isting approaches to dimensional emotion mod-elling (Akhtar et al., 2018; Wu et al., 2019), ad-vancing the state of the art in this task. These re-sults suggest that it may be beneficial to includeinformation about metaphor into emotion analysisand, more broadly, sentiment analysis systems.

The differences between hard and soft param-eter sharing manifest most with metaphor identi-fication. This is possibly due to the explicit per-word sharing mechanisms in the top layer. Whilethe bottom layers capture more general informa-tion, the top layer captures task-specific informa-tion. This can be seen through analysing the mod-els’ behaviour: the gating mechanism is most se-lective at the top layer and is the most active foremotion-laden words. The cross-stitch units grad-ually share less information from the bottom to thetop layer. This behaviour is illustrated in Figure 2.

6 Data analysis and discussion

Metaphor identification Most improvements inword-level metaphor identification are achievedby corrections from a literal prediction in STLto metaphorical in MTL. To establish this fact,

2225

Auxiliary Task Sentence STL MTL Gold

Valence It is sad, and somewhat ominous, that so little of that L M Mshould have been reflected in the sombre statement (. . . )

Arousal There is still endless dithering on how broad a safety net L M MBritain will extend to its citizens.

Dominance In a bare, mud-walled cell, sitting on the floor, is Tepilit. L M M

Table 4: Examples of how MTL improves over STL on metaphor identification, with gold metaphors underlinedand corrections by MTL bolded. L refers to literal and M to metaphorical.

Main Task Sentence STL MTL Gold

Valence Scam lures victims with free puppy offer. .48 .41 .40Valence (. . . ) looking at me like I was breaking her poor, sweet heart. .52 .42 .24Arousal (. . . ) the authors depict her as bewitchingly beautiful. .51 .57 .61Dominance In fact, I’ve never felt so out of place in all my life. .73 .54 .17Dominance Frustration rises as North Korea nuclear talks stall. .46 .41 .30

Table 5: Examples of how MTL using metaphor identification improves over STL emotion prediction, with pre-dicted metaphors underlined.

we pooled predictions on test data for ten mod-els trained with different random seeds. The STLoutputs were compared to MTL outputs to deter-mine whether corrections changed from literal tometaphorical or the other way around. Examiningthis set of corrected predictions gives us insightinto the behaviour of the MTL model.

While some improvements hold across allemotion dimensions, others are unique to each.Among the corrections unique to each dimensionwe find multiple terms indicative of the dimen-sion: for valence regression we find improvementsfor attractiveness (wise, attractive) and aversenessindicators (severity, drain) and for arousal excite-ment (flame, crisis) or calmness indicators (empty,rest). Dominance corrects various terms relatedto control (capable, courtesy) and submissiveness(owe, puny). We selected the presented exampleterms from the set of corrections described previ-ously, and established the scores using the ANEWlexicon (Warriner et al., 2013).

Table 4 illustrates model decisions correctedthrough joint learning. Examples for valence andarousal illustrate how emotion-laden words par-ticipate in metaphors, such as “a sombre state-ment” or “a safety net”. The example for domi-nance illustrates that control indicators may seemless related to emotion (e.g. bare), but carry affectthrough the contexts in which they are embedded.

Overall, introducing emotion improves perfor-

mance of metaphor identification, as we expected.Dominance appears to be most beneficial, whilearousal contributes the least. This might be dueto the fact that arousal (and to some extent va-lence) predictions rely strongly on explicit senti-ment markers in text. In contrast, dominance pre-diction requires the model to learn richer semanticrepresentations. In our models, this manifests inthe sparseness of the attention distribution: mod-els trained using arousal have the most sparse at-tention patterns as measured through the Gini in-dex sparsity measure (Hurley and Rickard, 2009),while models trained using dominance have theleast sparse attention patterns.

Dominance regression is the most complex taskand yet the most beneficial performance-wise, de-spite it sometimes being discarded by previous re-search in favour of the VA emotion model. Severalstudies argue for the inclusion of dominance inemotion analysis (Stamps III, 2005; Bakker et al.,2014). Bakker et al. (2014) emphasise that whilevalence and arousal highlight the affective andcognitive aspects of emotion, dominance is relatedto environmental factors and social influences.This relates to the role of metaphorical framing inthe social world, e.g. in politics. Metaphor allowsus to highlight certain aspects of a target domainand mask others, encouraging specific inferences(Lakoff, 1991; Entman, 1993). These in turn acti-vate emotional considerations (Boeynaems et al.,

2226

2017), allowing the metaphor to steer the emo-tions recalled and affecting the evaluation of theargument made and persuasiveness of the speaker(Marcus, 2000).

Emotion prediction Table 5 lists examples forwhich including metaphor identification improvedperformance on emotion regression. The exam-ples for valence show that metaphor can be usedto describe an emotion (“to break one’s heart”)explicitly or to convey an implicit judgement (e.g.luring). For arousal, example improvements in-clude applying excitement indicators to objectsor concepts, such as “being bewitchingly beauti-ful”. Examples for dominance regression indicatethe importance of power; one has no control overstalling or “feeling out of place”.

While joint learning improves the estimationsoverall, it also introduces some new errors. Al-though generally the presence of metaphor makesphrases more emotionally evocative (Mohammadet al., 2016), this does not always hold. Lexi-calised metaphors – e.g. up in “grades going up”– are no longer viewed as metaphorical by lay lan-guage users and throw the models off. Other com-mon errors introduced are related to misinterpret-ing the perspective – e.g. confusing “to be knockedout” and “to knock out” – and the direction inwhich words contribute to emotion – e.g. negativemetaphorical terms can contribute to the positivesentiment and vice versa, such as in “Stop cancerwith a shot”.

7 Conclusion

In this paper, we introduced the first compositionaldeep learning model to jointly capture the phe-nomena of metaphor and emotion. We consideredmetaphor tasks at word and sentence level andmodelled emotion through the dimensional modelof valence, arousal and dominance. We experi-mented with multiple MTL techniques, regulatingthe information flow between the two tasks.

We demonstrated that the proposed methods ad-vance the state of the art for the tasks of metaphoridentification and emotion regression. Both tasksbenefit from joint learning, with the emotiondimension of dominance contributing most tometaphor and benefiting most from metaphor. Ourresults support the hypothesis on the interaction ofmetaphor and emotion, and suggest that it may bebeneficial to incorporate a model of metaphor intoemotion- and sentiment-related NLP applications

in the future.

Acknowledgments

Martha Lewis gratefully acknowledges fundingfrom NWO Veni grant ‘Metaphorical Meaningsfor Artificial Agents’. Marek Rei’s research issupported by Cambridge English via the ALTA In-stitute.

ReferencesMd Shad Akhtar, Deepanway Ghosal, Asif Ekbal, and

Pushpak Bhattacharyya. 2018. A multi-task ensem-ble framework for emotion, sentiment and intensityprediction. arXiv preprint arXiv:1808.01216.

Lisa Aziz-Zadeh and Antonio Damasio. 2008. Embod-ied semantics for actions: Findings from functionalbrain imaging. Journal of Physiology - Paris, 102(1-3).

Iris Bakker, Theo van der Voordt, Peter Vink, andJan de Boon. 2014. Pleasure, arousal, dominance:Mehrabian and Russell revisited. Current Psychol-ogy, 33(3):405–421.

Isabelle Blanchette, Kevin Dunbar, John Hummel, andRichard Marsh. 2001. Analogy use in naturalisticsettings: The influence of audience, emotion andgoals. Memory and Cognition, 29(5).

Amber Boeynaems, Christian Burgers, Elly A Konijn,and Gerard J Steen. 2017. The effects of metaphor-ical framing on political persuasion: A systematicliterature review. Metaphor and Symbol, 32(2):118–134.

George Aaron Broadwell, Umit Boz, Ignacio Cases,Tomek Strzalkowski, Laurie Feldman, Sarah Tay-lor, Samira Shaikh, Ting Liu, Kit Cho, andNick Webb. 2013. Using imageability and topicchaining to locate metaphors in linguistic corpora.In International Conference on Social Comput-ing, Behavioral-Cultural Modeling, and Prediction,pages 102–110. Springer.

Sven Buechel and Udo Hahn. 2016. Emotion anal-ysis as a regression problem—dimensional modelsand their implications on emotion representation andmetrical evaluation. In Proceedings of the Twenty-second European Conference on Artificial Intelli-gence, pages 1114–1122. IOS Press.

Sven Buechel and Udo Hahn. 2017. Emobank: Study-ing the impact of annotation perspective and rep-resentation format on dimensional emotion analy-sis. In Proceedings of the 15th Conference of theEuropean Chapter of the Association for Compu-tational Linguistics: Volume 2, Short Papers, vol-ume 2, pages 578–585.

https://arxiv.org/abs/1808.01216



https://doi.org/10.1016/j.jphysparis.2008.03.012



https://doi.org/10.1007/s12144-014-9219-4

https://doi.org/10.1007/s12144-014-9219-4

https://doi.org/10.3758/BF03200475

https://doi.org/10.3758/BF03200475

https://doi.org/10.3758/BF03200475

https://doi.org/10.1080/10926488.2017.1297623

https://doi.org/10.1080/10926488.2017.1297623

https://doi.org/10.1080/10926488.2017.1297623

https://doi.org/10.1007/978-3-642-37210-0_12

https://doi.org/10.1007/978-3-642-37210-0_12

https://doi.org/10.3233/978-1-61499-672-9-1114

https://doi.org/10.3233/978-1-61499-672-9-1114

https://doi.org/10.3233/978-1-61499-672-9-1114

https://doi.org/10.3233/978-1-61499-672-9-1114

https://doi.org/10.18653/v1/E17-2092

https://doi.org/10.18653/v1/E17-2092

https://doi.org/10.18653/v1/E17-2092

https://doi.org/10.18653/v1/E17-2092

2227

Luana Bulat, Stephen Clark, and Ekaterina Shutova.2017. Modelling metaphor with attribute-based se-mantics. In Proceedings of the 15th Conference ofthe European Chapter of the Association for Compu-tational Linguistics: Volume 2, Short Papers, pages523–528.

Rich Caruana. 1993. Multitask learning: Aknowledge-based source of inductive bias. In Pro-ceedings of the International Conference on Ma-chine Learning.

Francesca MM Citron and Adele E Goldberg. 2014.Metaphorical sentences are more emotionally en-gaging than their literal counterparts. Journal ofcognitive neuroscience, 26(11):2585–2595.

Francesca MM Citron, Jeremie Güsten, NoraMichaelis, and Adele E Goldberg. 2016. Con-ventional metaphors in longer passages evokeaffective brain response. NeuroImage, 139:218–230.

Elizabeth Crawford. 2009. Conceptual Metaphors ofAffect. Emotion Review, 1(2).

Taner Danisman and Adil Alpkocak. 2008. Feeler:Emotion classification of text using vector spacemodel. In AISB 2008 Convention Communica-tion, Interaction and Social Intelligence, volume 1,page 53.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. BERT: pre-training ofdeep bidirectional transformers for language under-standing. In Proceedings of the 2019 Conferenceof the North American Chapter of the Associationfor Computational Linguistics: Human LanguageTechnologies, NAACL-HLT 2019, Minneapolis, MN,USA, June 2-7, 2019, Volume 1 (Long and Short Pa-pers), pages 4171–4186.

Erik-Lân Do Dinh, Steffen Eger, and Iryna Gurevych.2018. Killing four birds with two stones: Multi-task learning for non-literal language detection. InProceedings of the 27th International Conference onComputational Linguistics, pages 1558–1569.

Erik-Lân Do Dinh and Iryna Gurevych. 2016. Token-level metaphor detection using neural networks. InProceedings of the Fourth Workshop on Metaphor inNLP, pages 28–33.

Jonathan Dunn. 2013. Evaluating the premises and re-sults of four metaphor identification systems. In In-ternational Conference on Intelligent Text Process-ing and Computational Linguistics, pages 471–486.Springer.

Eugene S Edgington. 1969. Approximate randomiza-tion tests. The Journal of Psychology, 72(2):143–149.

Paul Ekman. 1992. An argument for basic emotions.Cognition & Emotion, 6(3-4):169–200.

Robert M Entman. 1993. Framing: Toward clarifica-tion of a fractured paradigm. Journal of Communi-cation, 43(4):51–58.

Marzieh Fadaee, Arianna Bisazza, and Christof Monz.2018. Examining the tip of the iceberg: A data setfor idiom translation. In Proceedings of the EleventhInternational Conference on Language Resourcesand Evaluation, pages 925–929.

Lynn Fainsilber and Andrew Ortony. 1987. Metaphor-ical uses of language in the expression of emotions.Metaphor and Symbol, 2(4):239–250.

Susan R Fussell and Mallie M Moss. 1998. Figurativelanguage in emotional communication. Social andcognitive approaches to interpersonal communica-tion, pages 113–141.

Ge Gao, Eunsol Choi, Yejin Choi, and Luke Zettle-moyer. 2018. Neural Metaphor Detection in Con-text. In Conference on Empirical Methods in Natu-ral Language Processing.

Andrew Gargett and John Barnden. 2015. Modelingthe interaction between sensory and affective mean-ings for detecting metaphor. In Proceedings of theThird Workshop on Metaphor in NLP, pages 21–30.

Raymond W Gibbs Jr, John S Leggitt, and Elizabeth ATurner. 2002. What’s special about figurative lan-guage in emotional communication? In The verbalcommunication of emotions, pages 133–158. Psy-chology Press.

E Dario Gutierrez, Ekaterina Shutova, Tyler Marghetis,and Benjamin Bergen. 2016. Literal and metaphor-ical senses in compositional distributional semanticmodels. In Proceedings of the 54th Annual Meet-ing of the Association for Computational Linguistics(Volume 1: Long Papers), volume 1, pages 183–193.

Dirk Hovy, Shashank Shrivastava, Sujay Kumar Jauhar,Mrinmaya Sachan, Kartik Goyal, Huying Li, Whit-ney Sanders, and Eduard Hovy. 2013. Identifyingmetaphorical word use with tree kernels. In Pro-ceedings of the First Workshop on Metaphor in NLP,pages 52–57.

Niall Hurley and Scott Rickard. 2009. Comparingmeasures of sparsity. IEEE Transactions on Infor-mation Theory, 55(10):4723–4741.

Nancy Ide, Collin Baker, Christiane Fellbaum, CharlesFillmore, and Rebecca Passonneau. 2008. Masc:The manually annotated sub-corpus of AmericanEnglish. In 6th International Conference on Lan-guage Resources and Evaluation, pages 2455–2460. European Language Resources Association(ELRA).

Mbemba Jabbi, Jojanneke Bastiaansen, and ChristianKeysers. 2008. A common anterior insula represen-tation of disgust observation, experience and imagi-nation shows divergent functional connectivity path-ways. PLoS ONE, 3(8):e2939.

https://doi.org/10.18653/v1/E17-2084

https://doi.org/10.18653/v1/E17-2084

https://doi.org/10.1016/B978-1-55860-307-3.50012-5

https://doi.org/10.1016/B978-1-55860-307-3.50012-5

https://doi.org/10.1162/jocn_a_00654

https://doi.org/10.1162/jocn_a_00654

https://doi.org/10.1016/j.neuroimage.2016.06.020



https://doi.org/10.1177/1754073908100438

https://doi.org/10.1177/1754073908100438

https://aclweb.org/anthology/papers/N/N19/N19-1423/



https://www.aclweb.org/anthology/C18-1132

https://www.aclweb.org/anthology/C18-1132

https://doi.org/10.18653/v1/W16-1104

https://doi.org/10.18653/v1/W16-1104

https://doi.org/10.1007/978-3-642-37247-6_38

https://doi.org/10.1007/978-3-642-37247-6_38

https://doi.org/10.1080/00223980.1969.10543491

https://doi.org/10.1080/00223980.1969.10543491

https://doi.org/10.1080/02699939208411068

https://doi.org/10.1111/j.1460-2466.1993.tb01304.x

https://doi.org/10.1111/j.1460-2466.1993.tb01304.x

https://doi.org/10.1207/s15327868ms0204_2

https://doi.org/10.1207/s15327868ms0204_2

https://doi.org/10.18653/v1/D18-1060

https://doi.org/10.18653/v1/D18-1060

https://doi.org/10.3115/v1/W15-1403

https://doi.org/10.3115/v1/W15-1403

https://doi.org/10.3115/v1/W15-1403

https://doi.org/10.18653/v1/P16-1018

https://doi.org/10.18653/v1/P16-1018

https://doi.org/10.18653/v1/P16-1018

https://doi.org/10.1109/TIT.2009.2027527

https://doi.org/10.1109/TIT.2009.2027527

https://doi.org/10.1371/journal.pone.0002939




2228

Sunghwan Mac Kim, Alessandro Valitutti, andRafael A Calvo. 2010. Evaluation of unsupervisedemotion models to textual affect recognition. InProceedings of the NAACL HLT 2010 Workshop onComputational Approaches to Analysis and Gener-ation of Emotion in Text, pages 62–70. Associationfor Computational Linguistics.

Zoltán Kövecses. 2003. Metaphor and emotion: Lan-guage, culture, and body in human feeling. Cam-bridge University Press, Cambridge.

Zornitsa Kozareva. 2013. Multilingual affect polarityand valence prediction in metaphor-rich texts. InProceedings of the 51st Annual Meeting of the As-sociation for Computational Linguistics (Volume 1:Long Papers), volume 1, pages 682–691.

George Lakoff. 1991. Metaphor and war: Themetaphor system used to justify war in the Gulf.Peace Research, 23(2/3):25–32.

George Lakoff and Mark Johnson. 1983. Metaphorswe live by. University of Chicago Press, 59.

Geoffrey Leech. 1992. 100 million words of English:the British National Corpus. Language Research,28(1):1–13.

Chee Wee Ben Leong, Beata Beigman Klebanov, andEkaterina Shutova. 2018. A report on the 2018 VUAmetaphor detection shared task. In Proceedings ofthe Workshop on Figurative Language Processing,pages 56–66.

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016.Recurrent neural network for text classification withmulti-task learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial In-telligence, pages 2873–2879. AAAI Press.

Nikolaos Malandrakis, Alexandros Potamianos, EliasIosif, and Shrikanth Narayanan. 2013. Distribu-tional semantic models for affective text analysis.IEEE Transactions on Audio, Speech, and LanguageProcessing, 21(11):2379–2392.

George E Marcus. 2000. Emotions in politics. AnnualReview of Political Science, 3(1):221–250.

Albert Mehrabian. 1996. Pleasure-arousal-dominance:A general framework for describing and measuringindividual differences in temperament. Current Psy-chology, 14(4):261–292.

Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, andMartial Hebert. 2016. Cross-stitch networks formulti-task learning. In Proceedings of the IEEEConference on Computer Vision and Pattern Recog-nition, pages 3994–4003.

Saif Mohammad, Ekaterina Shutova, and Peter Tur-ney. 2016. Metaphor as a medium for emotion: Anempirical study. In Proceedings of the Fifth JointConference on Lexical and Computational Seman-tics, pages 23–33.

Michael Mohler, David Bracewell, Marc Tomlinson,and David Hinote. 2013. Semantic signatures forexample-based linguistic metaphor detection. InProceedings of the First Workshop on Metaphor inNLP, pages 27–35.

Michael Mohler, Mary Brunson, Bryan Rink, and MarcTomlinson. 2016. Introducing the LCC metaphordatasets. In Proceedings of the Tenth InternationalConference on Language Resources and Evaluation,pages 4221–4227.

Georgios Paltoglou, Mathias Theunis, Arvid Kappas,and Mike Thelwall. 2013. Predicting emotional re-sponses to long informal text. IEEE Transactions onAffective Computing, 1(4):106–115.

Jeffrey Pennington, Richard Socher, and ChristopherManning. 2014. Glove: Global vectors for wordrepresentation. In Proceedings of the 2014 confer-ence on empirical methods in natural language pro-cessing (EMNLP), pages 1532–1543.

Isidoros Perikos and Ioannis Hatzilygeroudis. 2016.Recognizing emotions in text using ensemble ofclassifiers. Engineering Applications of ArtificialIntelligence, 51:191–201.

Matthew Peters, Mark Neumann, Mohit Iyyer, MattGardner, Christopher Clark, Kenton Lee, and LukeZettlemoyer. 2018. Deep contextualized word rep-resentations. In Proceedings of the 2018 Confer-ence of the North American Chapter of the Associ-ation for Computational Linguistics: Human Lan-guage Technologies, Volume 1 (Long Papers), pages2227–2237.

Marek Rei, Luana Bulat, Douwe Kiela, and EkaterinaShutova. 2017. Grasping the finer point: A su-pervised similarity network for metaphor detection.In Proceedings of the 2017 Conference on Empiri-cal Methods in Natural Language Processing, pages1537–1546.

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein,and Anders Søgaard. 2019. Latent multi-task ar-chitecture learning. In The Thirty-Third AAAI Con-ference on Artificial Intelligence, AAAI 2019, pages4822–4829.

Ekaterina Shutova, Douwe Kiela, and Jean Maillard.2016. Black holes and white rabbits: Metaphoridentification with visual features. In Proceedings ofthe 2016 Conference of the North American Chap-ter of the Association for Computational Linguistics:Human Language Technologies, pages 160–170.

Ekaterina Shutova, Lin Sun, and Anna Korhonen.2010. Metaphor identification using verb and nounclustering. In Proceedings of the 23rd InternationalConference on Computational Linguistics, pages1002–1010. Association for Computational Linguis-tics.

Arthur E Stamps III. 2005. In search of dominance:The case of the missing dimension. Perceptual andmotor skills, 100(2):559–566.

https://www.aclweb.org/anthology/P13-1067

https://www.aclweb.org/anthology/P13-1067

https://doi.org/10.1515/cogsem.2009.4.2.5

https://doi.org/10.1515/cogsem.2009.4.2.5

https://doi.org/10.18653/v1/W18-0907

https://doi.org/10.18653/v1/W18-0907

http://www.ijcai.org/Abstract/16/408

http://www.ijcai.org/Abstract/16/408

https://doi.org/10.1109/TASL.2013.2277931

https://doi.org/10.1109/TASL.2013.2277931

https://doi.org/10.1007/BF02686918

https://doi.org/10.1007/BF02686918

https://doi.org/10.1007/BF02686918

https://doi.org/10.1109/CVPR.2016.433

https://doi.org/10.1109/CVPR.2016.433

https://doi.org/10.18653/v1/S16-2003

https://doi.org/10.18653/v1/S16-2003

https://doi.org/10.1109/T-AFFC.2012.26

https://doi.org/10.1109/T-AFFC.2012.26

https://doi.org/10.3115/v1/D14-1162

https://doi.org/10.3115/v1/D14-1162

https://doi.org/10.1016/j.engappai.2016.01.012

https://doi.org/10.1016/j.engappai.2016.01.012

https://doi.org/10.18653/v1/N18-1202

https://doi.org/10.18653/v1/N18-1202

https://doi.org/10.18653/v1/D17-1162

https://doi.org/10.18653/v1/D17-1162

https://aaai.org/ojs/index.php/AAAI/article/view/4410

https://aaai.org/ojs/index.php/AAAI/article/view/4410

https://doi.org/10.18653/v1/N16-1020

https://doi.org/10.18653/v1/N16-1020

http://aclweb.org/anthology/C10-1113

http://aclweb.org/anthology/C10-1113

https://doi.org/10.2466/PMS.100.2.559-566

https://doi.org/10.2466/PMS.100.2.559-566

2229

Gerard Steen. 2010. A method for linguistic metaphoridentification: From MIP to MIPVU, volume 14.John Benjamins Publishing.

Carlo Strapparava and Rada Mihalcea. 2007. Semeval-2007 task 14: Affective text. In Proceedings of theFourth International Workshop on Semantic Evalu-ations (SemEval-2007), pages 70–74.

Tomek Strzalkowski, Samira Shaikh, Kit Cho,George Aaron Broadwell, Laurie Feldman, SarahTaylor, Boris Yamrom, Ting Liu, Ignacio Cases,Yuliya Peshkova, et al. 2014. Computing affect inmetaphors. In Proceedings of the Second Workshopon Metaphor in NLP, pages 42–51.

Paul H. Thibodeau and Lera Boroditsky. 2011.Metaphors we think with: The role of metaphor inreasoning. PLoS ONE, 6(2):e16782.

Yulia Tsvetkov, Leonid Boytsov, Anatole Gershman,Eric Nyberg, and Chris Dyer. 2014. Metaphor detec-tion with cross-lingual model transfer. In Proceed-ings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Pa-pers), volume 1, pages 248–258.

Peter D Turney, Yair Neuman, Dan Assaf, and YohaiCohen. 2011. Literal and metaphorical sense iden-tification through concrete and abstract context. InProceedings of the Conference on Empirical Meth-ods in Natural Language Processing, pages 680–690. Association for Computational Linguistics.

Jin Wang, Bo Peng, and Xuejie Zhang. 2018. Using astacked residual LSTM model for sentiment inten-sity prediction. Neurocomputing, 322:93–101.

Jin Wang, Liang-Chih Yu, K Robert Lai, and Xue-jie Zhang. 2016. Dimensional sentiment analysisusing a regional CNN-LSTM model. In Proceed-ings of the 54th Annual Meeting of the Associationfor Computational Linguistics (Volume 2: Short Pa-pers), volume 2, pages 225–230.

Amy Beth Warriner, Victor Kuperman, and Marc Brys-baert. 2013. Norms of valence, arousal, and dom-inance for 13,915 English lemmas. Behavior re-search methods, 45(4):1191–1207.

Evan James Williams. 1959. Regression Analysis. Wi-ley publication in applied statistics. Wiley.

Chuhan Wu, Fangzhao Wu, Sixing Wu, Zhigang Yuan,Junxin Liu, and Yongfeng Huang. 2019. Semi-supervised dimensional sentiment analysis withvariational autoencoder. Knowledge-Based Sys-tems, 165:30–39.

Ali Yadollahi, Ameneh Gholipour Shahraki, and Os-mar R Zaiane. 2017. Current state of text sentimentanalysis from opinion to emotion mining. ACMComputing Surveys (CSUR), 50(2):25.

Yuxiang Zhang, Jiamei Fu, Dongyu She, Ying Zhang,Senzhang Wang, and Jufeng Yang. 2018. Text emo-tion distribution learning via multi-task convolu-tional neural network. In Proceedings of the 27thInternational Joint Conference on Artificial Intelli-gence, pages 4595–4601. AAAI Press.

https://doi.org/10.1075/celcr.14

https://doi.org/10.1075/celcr.14

https://doi.org/10.1.1.78.3087

https://doi.org/10.1.1.78.3087

https://doi.org/10.3115/v1/W14-2306

https://doi.org/10.3115/v1/W14-2306



https://doi.org/10.3115/v1/P14-1024

https://doi.org/10.3115/v1/P14-1024

http://www.aclweb.org/anthology/D11-1063

http://www.aclweb.org/anthology/D11-1063

https://doi.org/10.1016/j.neucom.2018.09.049



https://doi.org/10.18653/v1/P16-2037

https://doi.org/10.18653/v1/P16-2037

https://doi.org/10.3758/s13428-012-0314-x

https://doi.org/10.3758/s13428-012-0314-x

https://doi.org/10.1016/j.knosys.2018.11.018



https://doi.org/10.1145/3057270

https://doi.org/10.1145/3057270

https://doi.org/10.24963/ijcai.2018/639



modelling the interplay of metaphor and emotion through ... · metaphor identiﬁcation on nouns,...

Documents