arxiv:1606.05545v2 [cs.cl] 5 jan 2017 · in spite of being powerful and accurate, su-pervised...

19
arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 Universal, Unsupervised (Rule-Based), Uncovered Sentiment Analysis David Vilares, Carlos G´ omez-Rodr´ ıguez and Miguel A. Alonso Grupo LyS, Departamento de Computaci´ on, Universidade da Coru˜ na Campus de A Coru ˜ na s/n, 15071, A Coru ˜ na, Spain {david.vilares,carlos.gomez,miguel.alonso}@udc.es Abstract We present a novel unsupervised approach for multilingual sentiment analysis driven by compositional syntax-based rules. On the one hand, we exploit some of the main advantages of unsupervised algo- rithms: (1) the interpretability of their output, in contrast with most supervised models, which behave as a black box and (2) their robustness across differ- ent corpora and domains. On the other hand, by introducing the concept of com- positional operations and exploiting syn- tactic information in the form of uni- versal dependencies, we tackle one of their main drawbacks: their rigidity on data that are structured differently depend- ing on the language concerned. Exper- iments show an improvement both over existing unsupervised methods, and over state-of-the-art supervised models when evaluating outside their corpus of ori- gin. Experiments also show how the same compositional operations can be shared across languages. The system is available at http://www.grupolys. org/software/UUUSA/. * NOTICE: this is the authors version of a work that was accepted for publication in Knowledge-Based Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version has been published in Knowledge-Based systems (http://dx.doi.org/10.1016/j.knosys.2016.11.014). c 2016. This manuscript version is made available under the CC-BY- NC-ND 4.0 license http://creativecommons.org/licenses/by- nc-nd/4.0/ 1 Introduction Sentiment Analysis (SA) is a subfield of natural language processing (NLP) that deals with the au- tomatic comprehension of the opinions shared by users in different media (Pang and Lee, 2008; Cambria, 2016). One of the main challenges ad- dressed by SA focuses on emulating the semantic composition process carried out by humans when understanding the sentiment of an opinion (i.e., if it is favorable, unfavorable or neutral). In the sen- tence ‘He is not very handsome, but he has some- thing that I really like’, humans have the ability to infer that the word ‘very’ emphasizes ‘hand- some’, ‘not’ affects the whole expression ‘very handsome’, and ‘but’ decreases the relevance of ‘He is not very handsome’ and increases the one of ‘he has something that I really like’. Based on this, a human could justify a positive overall sen- timent on that sentence. Our main contribution is the introduction of the first universal and unsupervised (knowledge- based) model for compositional sentiment anal- ysis (SA) driven by syntax-based rules. We in- troduce a formalism for compositional operations, allowing the creation of arbitrarily complex rules to tackle relevant phenomena for SA, for any lan- guage and syntactic dependency annotation. We implement and evaluate a set of practical universal operations defined using part-of-speech (PoS) tags and dependency types under the universal guide- lines of Petrov et al. (2012) and McDonald et al. (2013): universal annotation criteria that can be used to represent the morphology and syntax of any language in a uniform way. The model outper- forms existing unsupervised approaches as well as state-of-the-art compositional supervised mod- els (Socher et al., 2013) on domain-transfer set- tings, and shows that the operations can be shared across languages, as they are defined using univer-

Upload: others

Post on 27-Oct-2019

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

arX

iv:1

606.

0554

5v2

[cs.

CL]

5 J

an 2

017

Universal, Unsupervised(Rule-Based), Uncovered Sentiment Analysis∗

David Vilares, Carlos Gomez-Rodrıguez and Miguel A. AlonsoGrupo LyS, Departamento de Computacion, Universidade da Coruna

Campus de A Coruna s/n, 15071, A Coruna, Spain{david.vilares,carlos.gomez,miguel.alonso}@udc.es

Abstract

We present a novel unsupervised approachfor multilingual sentiment analysis drivenby compositional syntax-based rules. Onthe one hand, we exploit some of themain advantages of unsupervised algo-rithms: (1) the interpretability of theiroutput, in contrast with most supervisedmodels, which behave as a black boxand (2) their robustness across differ-ent corpora and domains. On the otherhand, by introducing the concept of com-positional operations and exploiting syn-tactic information in the form of uni-versal dependencies, we tackle one oftheir main drawbacks: their rigidity ondata that are structured differently depend-ing on the language concerned. Exper-iments show an improvement both overexisting unsupervised methods, and overstate-of-the-art supervised models whenevaluating outside their corpus of ori-gin. Experiments also show how thesame compositional operations can beshared across languages.The system isavailable athttp://www.grupolys.org/software/UUUSA/.

∗NOTICE: this is the authors version of a work thatwas accepted for publication in Knowledge-Based Systems.Changes resulting from the publishing process, such aspeer review, editing, corrections, structural formatting, andother quality control mechanisms may not be reflected inthis document. Changes may have been made to thiswork since it was submitted for publication. A definitiveversion has been published in Knowledge-Based systems(http://dx.doi.org/10.1016/j.knosys.2016.11.014).c©2016.This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

1 Introduction

Sentiment Analysis (SA) is a subfield of naturallanguage processing (NLP) that deals with the au-tomatic comprehension of the opinions shared byusers in different media (Pang and Lee, 2008;Cambria, 2016). One of the main challenges ad-dressed bySA focuses on emulating the semanticcomposition process carried out by humans whenunderstanding the sentiment of an opinion (i.e., ifit is favorable, unfavorable or neutral). In the sen-tence‘He is not very handsome, but he has some-thing that I really like’, humans have the abilityto infer that the word‘very’ emphasizes‘hand-some’, ‘not’ affects the whole expression‘veryhandsome’, and ‘but’ decreases the relevance of‘He is not very handsome’and increases the oneof ‘he has something that I really like’. Based onthis, a human could justify a positive overall sen-timent on that sentence.

Our main contribution is the introduction ofthe first universal and unsupervised (knowledge-based) model for compositional sentiment anal-ysis (SA) driven by syntax-based rules. We in-troduce a formalism for compositional operations,allowing the creation of arbitrarily complex rulesto tackle relevant phenomena forSA, for any lan-guage and syntactic dependency annotation. Weimplement and evaluate a set of practical universaloperations defined using part-of-speech (PoS) tagsand dependency types under the universal guide-lines of Petrov et al. (2012) and McDonald et al.(2013): universal annotation criteria that can beused to represent the morphology and syntax ofany language in a uniform way. The model outper-forms existing unsupervised approaches as wellas state-of-the-art compositional supervised mod-els (Socher et al., 2013) on domain-transfer set-tings, and shows that the operations can be sharedacross languages, as they are defined using univer-

Page 2: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

sal guidelines.The remainder of this article is structured as fol-

lows. §2 reviews related work.§3 introduces theformalism for compositional operations, which isused in§4 to define a set of universal rules that canprocess relevant linguistic phenomena forSA inany language.§5 presents experimental results ofour approach on different corpora and languages.Finally, §6 concludes and discusses directions forfuture work.

2 Related work

In this section we describe previous work rele-vant to the topics covered in this article: the is-sue of multilinguality inSA, semantic composi-tion through machine learning models and seman-tic composition on knowledge-based systems.

2.1 Multilingual SA

Monolingual sentiment analysis systems havebeen created for languages belonging to avariety of language families, such as Afro-Asiatic (Aldayel and Azmi, Forthcoming), Indo-European (Vilares et al., 2015a; Vilares et al.,2015b; Ghorbel and Jacot, 2011; Scholz andConrad, 2013; Neri et al., 2012; Habernal etal., 2014; Medagoda et al., 2013; Medagoda etal., 2013), Japonic (Arakawa et al., 2014), Sino-Tibetan (Vinodhini and Chandrasekaran, 2012;Zhang et al., 2009) and Tai-Kadai (Inrak andSinthupinyo, 2010), among others.

The performance of a given approach for sen-timent analysis varies from language to language.In the case of supervised systems, the size of thetraining set is a relevant factor (Cheng and Zhulyn,2012; Demirtas and Pechenizkiy, 2013), but per-formance is also affected by linguistic particulari-ties (Boiy and Moens, 2009; Wan, 2009) and theavailability of language processing tools (Klingerand Cimiano, 2014) and resources (Severyn et al.,2016). With respect to the latter point, sentimentlexicons are scarce for languages other than En-glish, and therefore a great deal of effort has beendedicated to building lexical resources for senti-ment analysis (Kim et al., 2009; Hogenboom et al.,2014; Cruz et al., 2014; Volkova et al., 2013; Gaoet al., 2013; Chen and Skiena, 2014). A commonapproach for obtaining a lexicon for a new lan-guage consists in translating pre-existent Englishlexicons (Brooke et al., 2009), but it was foundthat even if the translation is correct, two paral-

lel words do not always share the same semanticorientation across languages due to differences incommon usage (Ghorbel and Jacot, 2011).

Another approach for building a monolingualSA system for a new language is based on the useof machine translation (MT) in order to translatethe text into English automatically, to then applya polarity classifier for English, yielding as a re-sult a kind of cross-language sentiment analysissystem (Balahur and Turchi, 2012b; Wan, 2009;Perea-Ortega et al., 2013; Martınez Camara et al.,2014). It was found that text with more senti-ment is harder to translate than text with less sen-timent (Chen and Zhu, 2014) and that translationerrors produce an increase in the sparseness of fea-tures, a fact that degrades performance (Balahurand Turchi, 2012a; Balahur and Turchi, 2014). Todeal with this issue, several methods have beenproposed to reduce translation errors, such as ap-plying both directions of translation simultane-ously (Hajmohammadi et al., 2014) or enrichingthe MT system with sentiment patterns (Hiroshiet al., 2004). In the case of supervised systems,self-training and co-training techniques have alsobeen explored to improve performance (Gui et al.,2013; Gui et al., 2014).

Few multilingual systems forSA tasks havebeen described in the literature. Banea etal. (Banea et al., 2010; Banea et al., 2014) de-scribe a system for detecting subjectivity (i.e., de-termining if a text contains subjective or objec-tive information) in English and Romanian texts,finding that 90% of word senses maintained theirsubjectivity content across both languages. Xiaoand Guo (Xiao and Guo, 2012) confirm on thesame dataset that boosting on several languagesimproves performance for subjectivity classifica-tion with respect to monolingual methods.

Regarding the few multilingual polarity classi-fication systems described in the literature, theyare based on a supervised setting. In this re-spect, Yan et al. (Yan et al., 2014) describe a su-pervised multilingual system forSA working onpreviously tokenized Chinese and English texts.Vilares et al. (2015c) present a multilingualSA

system trained on a multilingual dataset that isable to outperform monolingual systems on somemonolingual datasets and that can work success-fully on code-switching texts, i.e., texts that con-tain terms written in two or more different lan-guages (Vilares et al., 2016a). Some approaches

Page 3: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

rely on MT to deal with multi-linguality. Bal-ahur et al. (Balahur et al., 2014) build a supervisedmultilingual SA system by translating the EnglishSemEval 2013 Twitter dataset (Chowdhury et al.,2013) into other languages by means ofMT, whichimproves on the results of monolingual systemsdue to the fact that, when multiple languages areused to build the classifier, the features that are rel-evant are automatically selected. They also pointout that the performance of the monolingual Span-ish SA system trained on Spanish machine trans-lated data can be improved by adding native Span-ish data for training from the Spanish TASS 2013Twitter dataset (Villena-Roman et al., 2014). Incontrast, Balahur and Perea-Ortega (Balahur andPerea-Ortega, 2015) inform that performance de-creases when machine-translated English data isused to enlarge the TASS 2013 training corpus forSpanish sentiment analysis.

Other approaches advocate the use of language-independent indicators of sentiment, such asemoticons (Davies and Ghahramani, 2011), forbuilding language-independentSA systems, al-though the accuracy of a system built followingthis approach is worse than the combined accu-racy of monolingual systems (Narr et al., 2012).The use of other language-independent indicators,such as character and punctuation repetitions, re-sults in low recall (Cui et al., 2011).

2.2 Composition in machine learning SAsystems

A naıve approach to emulating the comprehen-sion of the meaning of multiword phrases forSA

consists in usingn-grams of words, withn >

1 (Pang et al., 2002). The approach is limitedby the curse of dimensionality, although crawl-ing data from the target domain can help to re-duce that problem (Kiritchenko et al., 2014). Joshiand Penstein-Rose (2009) went one step forwardand proposed generalized dependency triplets asfeatures for subjectivity detection, capturing non-local relations. Socher et al. (2012) modeled a re-cursive neural network that learns compositionalvector representations for phrases and sentencesof arbitrary syntactic type and length. Socher etal. (2013) presented an improved recursive deepmodel for SA over dependency trees, and trainedit on a sentiment treebank tagged using AmazonMechanical Turk, pushing the state of the art up to85.4% on the Pang and Lee 2005 dataset (Pang and

Lee, 2005). Kalchbrenner et al. (2014) showedhow convolutional neural networks (CNN) can beused for semantic modeling of sentences. Themodel implicitly captures local and non-local re-lations without the need of a parse tree. It can beadapted for any language, as long as enough datais available. Severyn and Moschitti (2015) showedthe effectiveness of aCNN in a SemEval senti-ment analysis shared task (Rosenthal et al., 2015),although crawling tens of millions of messageswas first required to achieve state-of-the-art re-sults.With a different purpose, Poria et al. (2016)presented a deep learning approach for aspect ex-traction in opinion mining, classifying the terms ofa sentence as aspect or non-aspect. The system isthen enriched with linguistic patterns specificallydefined for aspect-detection tasks, which helps im-prove the overall performance and shows the util-ity of combining supervised and rule-based ap-proaches.

In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly, they do not perform so well on domaintransfer applications (Aue and Gamon, 2005; Pangand Lee, 2008). Finally, feature and hyper-parameter engineering can be time and resourcecostly options.

2.3 Composition in knowledge-based SAsystems

When the said limitations of machine learningmodels need to be addressed, unsupervised ap-proaches are useful. In this line, Turney (2002)proposed an unsupervised learning algorithm tocalculate the semantic orientation (SO) of a word.Taboada et al. (2011) presented a lexical rule-based approach to handle relevant linguistic phe-nomena such as intensification, negation,‘but’clauses andirrealis. Thelwall et al. (2012) re-leased SentiStrength, a multilingual unsupervisedsystem for micro-textSA that handles negationand intensification, among other web linguisticphenomena. It is limited to snippet-based andword-matching rules, since noNLP phases suchas part-of-speech tagging or parsing are applied.Regarding syntax-based approaches, the few de-scribed in the literature are language-dependent.Jia et al. (2009) define a set of syntax-based rulesfor handling negation in English. Vilares et al.(2015a) propose a syntacticSA method, but lim-

Page 4: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

ited to Spanish reviews and Ancora trees (Taule etal., 2008). Cambria et al. (2014) release Sentic-Net v3, a resource for performing sentiment anal-ysis in English texts at the semantic level ratherthan at the syntactic level, by combining exist-ing resources such as ConceptNet (Liu and Singh,2004) and AffectiveSpace (Cambria et al., 2009).By exploiting artificial intelligence (AI ), seman-tic web technologies and dimensionality reductiontechniques it computes the polarity of multiwordcommon-sense concepts (e.g.buy Christmas

present). With a different goal, Liu et al. (2016)automatically select syntactical rules for an unsu-pervised aspect extraction approach, showing theutility of rule-based systems on opinion miningtasks.

In brief, most unsupervised approaches arelanguage-dependent, and those that can managemultilinguality, such as SentiStrength, cannot ap-ply semantic composition.

3 Unsupervised Compositional SA

In contrast with previous work, we propose a for-malism for compositional operations, allowing thecreation of arbitrarily complex rules to tackle rele-vant phenomena forSA, for any language and syn-tactic dependency annotation.

3.1 Operations for compositional SA

Let w=w1, ..., wn be a sentence, where each wordoccurrencewi ∈W .

Definition 1. A tagged sentenceis a list of tu-ples(wi, ti) where eachwi is assigned a part-of-speech tag,ti, indicating its grammatical category(e.g. noun, verb or adjective).

Definition 2. A dependency treefor w is an edge-labeled directed treeT = (V,E) where V ={0, 1, 2, . . . , n} is the set of nodes andE = V ×D × V is the set of labeled arcs. Each arc, ofthe form(i, d, j), corresponds to a syntacticde-pendencybetween the wordswi and wj ; wherei is the index of thehead word, j is the indexof thechild word andd is thedependency typerepresenting the kind of syntactic relation betweenthem. Following standard practice, we use node0as a dummy root node that acts as the head of thesyntactic root(s) of the sentence.

Example 1. Figure 1 shows a valid dependencytree for our running example.

We will write id−→ j as shorthand for(i, d, j) ∈

E and we will omit the dependency types when

they are not relevant. Given a dependency treeT = (V,E), and a nodei ∈ V , we define a setof functions to obtain the context of nodei:

• ancestorT (i, δ) = {k ∈ V : there is a pathof lengthδ from k to i in T}, i.e., the single-ton set containing theδth ancestor ofi (or theempty set if there is no such node),

• childrenT (i) = {k ∈ V | i → k}, i.e., theset of children of nodei,

• lm-branchT (i, d) = min{k ∈ V | id−→ k},

i.e., the set containing the leftmost among thechildren ofi whose dependencies are labeledd (or the empty set if there is no such node).

Our compositionalSA system will associate anSO valueσi to each nodei in the dependency treeof a sentence, representing theSO of the subtreerooted ati. The system will use a set of compo-sitional operations to propagate changes to the se-mantic orientations of the nodes in the tree. Onceall the relevant operations have been executed, theSO of the sentence will be stored asσ0, i.e., thesemantic orientation of the root node.

A compositional operation is triggered when anode in the tree matches a given condition (relatedto its associated PoS tag, dependency type and/orword form); it is then applied to a scope of oneor more nodes calculated from the trigger node byascending a number of levels in the tree and thenapplying a scope function. More formally, we de-fine our operations as follows:

Definition 3. Given a dependency treeT (V,E),a compositional operation is a tuple o =(τ, C, δ, π, S) such that:

• τ : R → R is a transformation function toapply on theSO (σ) of nodes,

• C : V → {true, false} is a predicate thatdetermines whether a node in the tree willtrigger the operation,

• δ ∈ N is a number of levels that we need toascend in the tree to calculate the scope ofo,i.e., the nodes ofT whose SO is affected bythe transformation functionτ ,

• π is a priority that will be used to break tieswhen several operations coincide on a givennode, and

• S is a scope calculation function that will beused to determine the nodes affected by theoperation.

Page 5: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

He not

is

very

handsome , but

he

has

something that I really like

Figure 1: Example of a valid dependency tree for our introductory sentence:‘He is not very handsome,but he has something that I really like’, following the McDonald et al. (2013) guidelines. For simplicity,we omit the dummy root in the figures.

In practice, our system definesC(i) by meansof sets of words, part-of-speech tags and/or depen-dency types such that the operation will be trig-gered ifwi, ti and/or the head dependency ofi

are in those sets.Compositional operations whereC(i) is defined using only universal tags and de-pendency types, and which therefore do not de-pend on any specific words of a given language,can be shared across languages, as showed in§5.

We propose two options for the transformationfunctionτ :

• shiftα(σ) =

{

σ − α if σ > 0σ + α if σ < 0

whereα

is the shifting factor andα, σ ∈ R.

• weightingβ(σ) = σ× (1+β) whereβ is theweighting factor andβ, σ ∈ R.1

The scope calculation function,S, allows us tocalculate the nodes ofT whoseSO is affected bythe transformationτ . For this purpose, if the op-eration was triggered by a nodei, we applyS toancestorT (i, δ), i.e., theδth ancestor ofi (if it ex-ists), which we call thedestination nodeof theoperation. The proposed scopes are as follows (seealso Figure 2):

• dest (destination node): The transforma-tion τ is applied directly to theSO ofancestorT (i, δ) (see Figure 2.a).

• lm-branchd (branch of d): The affectednodes arelm-branchT (ancestorT (i, δ), d)(see Figure 2.b).

• rcn (n right children): τ affects theSO of the n smallest indexes of{j ∈

1From a theoretical point of view,β is not restricted to anyvalue. In a practical implementation,β values (which willvary according to the intensifier) should serve to intensify,diminish or even cancel theσ of the affected scope in a usefulway. In this article,β’s for intensifiers are directly taken fromexisting lexical resources and are not tuned in any way, asexplained in§5.

childrenT (ancestorT (i, δ)) | j > i}, i.e., itmodifies the globalσ of the closest (leftmost)n right children ofancestorT (i, δ) (see Fig-ure 2.c).

• lcn (n left children): The transformationaffects the n largest elements of{j ∈childrenT (ancestorT (i, δ)) | j < i}, i.e.,it modifies the globalσ of the closest (right-most)n left children ofancestorT (i, δ) (seeFigure 2.d).2

• subjr (first subjective right branch): Theaffected node ismin{j ∈ childrenT

(ancestorT (i, δ)) | j > i ∧ σj 6= 0}, i.e.,it modifies theσ of the closest (leftmost)subjective right child ofancestorT (i, δ) (seeFigure 2.e).

• subjl (first subjective left branch): Theaffected node ismax{j ∈ childrenT

(ancestorT (i, δ)) | j < i ∧ σj 6= 0}, i.e., itmodifies theσ of the closest (rightmost) sub-jective left child ofancestorT (i, δ) (see Fig-ure 2.f).

Compositional operations can be defined forany language or dependency annotation criterion.While it is possible to add rules for language-specific phenomena if needed (see§ 3.2), in thispaper we focus on universal rules to obtain a trulymultilingual system. Apart from universal tree-banks and PoS tags, the only extra informationused by our rules is a short list of negation words,intensifiers, adversative conjunctions and wordsintroducing conditionals (like the English “if” or“would”). While this information is language-specific, it is standardly included in multilingual

2lc

n and rcn might be useful in dependency structures

where elements such as some coordination forms (e.g. it is‘very expensive and bad’) are represented as children of thesame node, for example.

Page 6: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

o,�=1

a) �=1, s=dest

o,�=2

b) �=2, s=lm-branch d

d

c) �=1, s=rc²

o,�=1

o,�=0

e) �=1, s=subjr

�=0 =0 �0

o,�=1

o, =0

d) �=1, s=lc²

o,�=1

o,�=0

o,�=0

d

o,�=1

o,�=0

f) �=1, s=subjl

�=0 �=0��0

o,�=0

o,�=1

Figure 2: Graphical representation of the proposed set of influence scopes S.© indicates the node thattriggers an operationo, � the nodes to which it is applied (colored in blue).

sentiment lexica which are available for many lan-guages (§ 3.2), so it does not prevent our systemfrom working on a wide set of languages withoutany adaptation, apart from modifying the subjec-tive lexicon.

3.2 An algorithm for unsupervised SA

Algorithm 1 Compute SO of a node1: procedure COMPUTE(i,O ,T )

⊲ Initialization of queues2: Ai ← []3: Qi ← []

⊲ Enqueue operations triggered by nodei:4: for o = (τ, C, δ, π, S) in O do5: if C(i) then6: if δ > 0 then7: push((τ, C, δ, π, S), Qi)8: else9: push((τ, C, δ, π, S), Ai)

⊲ Enqueue operations coming from child nodes:10: for c in childrenT (i) do11: for o = (τ, C, δ, π, S) in Qc do12: if δ − 1 = 0 then13: push((τ, C, δ − 1, π, S), Ai)14: else15: push(τ, C, δ − 1, π, S), Qi)

⊲ Execute operations that have reached their destinationnode:

16: while Ai is not empty do17: o = (τ, C, δ, π, S)← pop(Ai)18: for j in S(i) do19: σj ← τ (σj)

⊲ Join the SOs for nodei and its children:20: σi ← σi +

∑c∈childrenT (i) σc

To execute the operations and calculate theSO

of each node in the dependency tree of the sen-tence, we start by initializing theSO of each wordusing a subjective lexicon, in the manner of tradi-tional unsupervised approaches (Turney, 2002).

Then, we traverse the parse tree in postorder,applying Algorithm 1 to update semantic orienta-tions when visiting each nodei. In this algorithm,O is the set of compositional operations definedin our system,Ai is a priority queue of the com-positional operations to be applied at nodei (be-causei is their destination node); andQi is anotherpriority queue of compositional operations to bequeued for upper levels at nodei (as i is not yettheir destination node).Push insertso in a pri-ority queue andpop pulls the operation with thehighest priority (ties are broken by giving pref-erence to the operation that was queued earlier).When visiting a node, apush into Qi (Algorithm1, line 7) is executed when the nodei triggers anoperationo that must be executed at the ances-tor of i locatedδ levels upward from it. Apushinto Ai ( Algorithm 1, line 9) is executed whenthe nodei triggers an operation that must be ex-ecuted at that same nodei (i.e., δ = 0). On theother hand, at nodei, the algorithm must also de-cide what to do with the operations coming fromchildrenT (i). Thus, apush into Ai (Algorithm1, line 13) is made when an operation from a childhas reached its destination node (i.e.,δ − 1 = 0),so that it must be applied at this level. Apush intoQi (Algorithm 1, line 15) is made when the oper-ation has still not reached its destination node andmust be spreadδ − 1 more levels up.

At a practical level, the set of compositional op-erations are specified using a simpleXML file:

• <forms>: Indicates the tokens to be takeninto account for the conditionC that triggersthe operation. Regular expressions are sup-

Page 7: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

ported.

• <dependency>: Indicates the dependencytypes taken into account forC.

• <postags>: Indicates the PoS tags thatmust match to trigger the rule.

• <rule>: Defines the operation to be exe-cuted when the rule is triggered.

• <levelsup>: Defines the number of levelsfrom i to spread before applyingo.

• <priority>: Defines the priority ofowhen more than one operation needs to be ap-plied overi (a larger number implies a biggerpriority).

3.3 NLP tools for universal unsupervised SA

The following resources serve us as the startingpoint to carry out state-of-the-art universal, unsu-pervised and syntactic sentiment analysis.

The system developed by Gimpel et al. (2011)is used for tokenizing. Although initially intendedfor English tweets, we have observed that it alsoperforms robustly for many other language fam-ilies (Romance, Slavic, etc.). For part-of-speechtagging we rely on the free distribution of theToutanova and Manning (2000) tagger. Depen-dency parsers are built using MaltParser (Nivreet al., 2007) and MaltOptimizer (Ballesteros andNivre, 2012). We trained a set of taggers andparsers for different languages using the universaltag and dependency sets (Petrov et al., 2012; Mc-Donald et al., 2013).In particular, we are relyingon the monolingual models using universal part-of-speech tags presented by Vilares et al. (2016b).

With respect to multilingual lexical resources,there are a number of alternatives: SentiStrength(subjective data for up to 34 languages); the Chenand Skiena (2014) approach, which introduced amethod for building sentiment lexicons for 136languages; or SentiWordNet (Esuli and Sebastiani,2006), where each synset from WordNet is as-signed a objective, positive and negative score.Our implementation supports the lexicon formatof SentiStrength, which can be plugged directlyinto the system. Additionally, we provide theoption to create different dictionary entries de-pending on PoS tags to avoid conflicts betweenhomonymous words (e.g.‘I’m fine’ versus‘Theygave me a fine’).

4 Defining compositional operations

We presented above a formalism to define arbitrar-ily complex compositional operations for unsuper-visedSA over a dependency tree. In this section,we show the definition of the most important rulesthat we used to evaluate our system. In practi-cal terms, this implies studying how syntactic con-structions that modify the sentiment of an expres-sion are represented in the annotation formalismused for the training of the dependency parser, inthis case, Universal Treebanks. We are using ex-amples following those universal guidelines, sincethey are available for more than 40 languages and,as shown in§ 5, the same rules can be competitiveacross different languages.

4.1 Intensification

Intensification amplifies or diminishes the senti-ment of a word or phrase. Simple cases of thisphenomenon can be‘I have huge problems’ or‘This is a bit dissapointing’. Traditional lexicon-based methods handle most of these cases withsimple heuristics (e.g. amplifying or diminishingthe sentiment of the word following an intensifier).However, ambiguous cases might appear wheresuch lexical heuristics are not sufficient. For ex-ample, ‘huge’ can be a subjective adjective in-troducing its ownSO (e.g. ‘The house is huge’),but also an amplifier when it modifies a subjectivenoun or adjective (e.g.‘I have huge problems’,where it makes‘problems’more negative).

Universal compositional operations overcomethis problem without the need of any heuristic.A dependency tree already shows the behav-ior of a word within a sentence thanks to itsdependency type, and it shows the role of aword independently of the language. Figure 3shows graphically how universal dependenciesrepresent the cases discussed above these lines.Formally, the operation for these forms of in-tensification is: (weightingβ, w ∈ intensifiers ∧t ∈ {ADV,ADJ} ∧ d ∈ {advmod,amod,nmod}, 1, 3,dest ∪ lm-branchacomp), with the value of βdepending on the strength of the intensifier asgiven by the sentiment lexicon.

4.1.1 ‘But’ clauses

Compositional operations can also be defined tomanage more challenging cases, such as clausesintroduced by‘but’ , considered as a special caseof intensification by authors such as Brooke et al.

Page 8: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

subjective node

intensifier

[ADJ]

advmod,

amod,

nmod acomp

really

(β=0.15)

[ADV]The

is

huge (σ=3)

[ADJ]

acomp

is

It

a

non-intensifier

[ADJ]

acomp

intensifier

[ADV]

scope

is

house huge (σ=3)

[ADJ]

acomp

a)

b)

c)

σ*(1+β) =3.45

amod

house

The

huge (β=0.25)

[ADJ]

problem (σ=-2)

σ*(1+β) =-2.50

Figure 3: Skeleton for intensification composi-tional operations (2.a, 2.c) and one case withoutintensification (2.b), together with examples anno-tated with universal dependencies.Semantic ori-entation values are for instructional purposes only.In 2.a,‘huge’ is a term considered in a list of inten-sifiers, labeled as an ADJ, whose dependency typeis amod, matching the definition of the intensifica-tion compositional operation. As a result, theo forintensification is triggered, spreadingδ = 1 levelsup (i.e., up to‘problem’) and amplifying theσ ofdest(the first scope of the operation that matches,i.e., ‘problem’) by (1+β). In 2.b, ‘huge’ is again aword occurring in the intensifier list and tagged asan ADJ, but its dependency type isacomp, whichis not considered among the intensification depen-dency types. As a result, no operation is triggeredand the word is treated as a regular word (intro-ducing its ownSO rather than modifying others).In 2.c, ‘really’ is the term acting as intensifier,triggering again an intensification operation on thenodeδ = 1 levels up from it (‘is’ node). Differ-ently from 2.a, in this case the scopedest is notapplicable since the word‘is’ is not subjective, butthere is a matching for the second candidate scope,the branch labeled asacomp (the branch rooted at‘huge’), so theσ associated with that node of thetree is amplified.

(2009) or Vilares et al. (2015a). It is assumedthat the main clause connected by‘but’ becomesless relevant for the reader (e.g.‘It is expen-sive,but I love it’). Figure 4 shows our proposedcomposition operation for this clause, formally:(weightingβ , w ∈ {but} ∧ t ∈ {CONJ} ∧ d ∈{cc}, 1, 1, subjl ) with β = −0.25. Note that thepriority of this operation (π = 1) is lower thanthat of intensification (π = 3), since we first needto process intensifiers, which are local phenom-ena, before resolving adversatives, which have alarger scope.

love (σ=3)

itI

cc

i

but

[CONJ]

subjl

is

It expensive

(σ=-3)

,

cc

but

β=-0.25)

[CONJ]

σexpensive*(1+β) + σlove = 0.75

Figure 4: Skeleton for‘but’ compositional opera-tion illustrated with one example according to uni-versal dependencies.The term‘but’ matches theword form, tag and dependency types required toact as a sentence intensifier, so the compositionaloperation is queued to be appliedδ = 1 levelsupward (i.e., at the‘is’ node). The scope of theoperation is the first subjective branch that is aleft child of said‘is’ node (i.e., the branch rootedat ‘expensive’). As a result, theσ rooted at thisbranch is diminished by multiplying it by (1+β)(note thatβ is negative in this case) and the result-ing value is added to theσ computed at‘is’ for therest of the subjective children.

4.2 Negation

Negation is one of the most challenging phenom-ena to handle inSA, since its semantic scope canbe non-local (e.g.‘I do not plan to make you suf-fer’). Existing unsupervised lexical approachesare limited to considering a snippet to guess thescope of negation. Thus, it is likely that they con-sider as a part of the scope terms that should notbe negated from a semantic point of view. De-pendency types help us to determine which nodesshould act as negation and which should be itsscope of influence. For brevity, we only illustratesome relevant negation cases and instructional ex-amples in Figure 5. Formally, the proposed com-positional operation to tackle most forms of nega-tion under universal guidelines is:(shiftα, w ∈negations∧ t ∈ U ∧ d ∈ {neg}, 1, 2, dest ∪lm-branchattr ∪ lm-branchacomp∪ subjr), where

Page 9: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

U represents the universal tag set. The priorityof negation (π = 2) is between those of intensi-fication and‘but’ clauses because its scope can benon-local, but it does not go beyond an adversativeconjuction.

subjective node

negation

is

awesome

(σ=5)This

hate

(σ=-4)I do n't

(α=4)

it

n't

(α=4)

negneg

objective node

negation

neg

scope

neg

a)

b)

σ + α = 0

σ - α = 1

Figure 5: Skeleton for negation compositional op-erations illustrated together with one example.In5.a, the term‘n’t’ matches the form word of anegator and its dependency type isneg, queuinga negation compositional operation to be appliedδ = 1 levels upward (i.e., at the‘hate’ node). Thefirst candidate scope for that operation matches,becausedest is a subjective word (‘hate’), shiftingtheσ of such word according to the definition ofour shiftα(σ) transformation function. In a sim-ilar way, in 5.b,‘n’t’ also acts a negator term, butin this case the candidate scope that matches is thesecond one (i.e.,lm-branchattr ).

4.3 Irrealis

Irrealis denotes linguistic phenomena used to re-fer to non-factual actions, such as conditional, sub-junctive or desiderative sentences (e.g.‘He wouldhave diedif he hadn’t gone to the doctor’). Itis a very complex phenomenon to deal with, andsystems are either usually unable to tackle this is-sue or simply define rules to ignore sentences con-taining a list of irrealis stop-words (Taboada et al.,2011). We do not address this phenomenon in de-tail in this study, but only propose a rule to dealwith ‘if ’ constructions (e.g.‘if I die [...]’ or ‘if youare happy [...]’, considering that the phrase thatcontains it should be ignored from the final com-putation. Formally:(weightingβ, w ∈ {if} ∧ t ∈U ∧ d ∈ {mark}, 2, 3, dest ∪ subjr). Its graphicalrepresentation would be very similar to intensifi-cation (see Figures 2 a) and e)).

4.4 Discussion

Figure 6 represents an analysis of our introductorysentence‘He is not very handsome, but he hassomething that I really like’, showing how com-positional operations accurately capture semanticcomposition.3 Additionally, Table 1 illustrates theinternal state of the algorithm and theSO updatesmade at each step for our running example.

Step Wordindex Aword(δ,π) Qword(δ,π) σword σword ← A

1 He1 [ ] [ ] 0 02 not3 [ ] [Nnot(1,2) ] 0 03 very4 [ ] [I very(1,3) ] 0 04 handsome5 [Ivery(0,3) ] [ ] 4 55 ,6 [ ] [ ] 0 05 but7 [ ] [I but(1,1) ] 0 06 he8 [ ] [ ] 0 07 something10 [ ] [ ] 0 08 has9 [ ] [ ] 0 09 I12 [ ] [ ] 0 010 that11 [ ] [ ] 0 011 really12 [ ] [I really(1,3) ] 0 012 like13 [Ireally(1,3) ] [ ] 1 1.1513 is2 [Nnot(0,2) , Ibut(0,1) ] [ ] 0 1.90

Table 1: Internal state andSO updates made bythe proposed algorithm for the running example.Each row corresponds to a step in which a node(Wordindex) is visited in the postorder traversal.Columns Aword(δ,π) and Qword(δ,π) show the stateof the queues after the enqueuing operations, butbeforeA is emptied (i.e., immediately before line16 of Algorithm 1). Theσword column shows theSO of the visited node at that same point in time,andσword ← A is the newSO that is assigned byapplying compositional operations and joining theSOs of children (lines 16-20 of Algorithm 1).Nand I refer to negation and intensification opera-tions.

It is hard to measure the coverage of our rulesand the potential of these universal compositionaloperations, since it is possible to define arbitrarilycomplex operations for as many relevant linguis-tic phenomena as wished. In this line, Poria etal. (2014) define a set of English sentic patternsto determine how sentiment flows from concept toconcept in a variety of situations (e.g. relationsof complementation, direct nominal objects, rela-tive clauses, . . . ) over a dependency tree followingthe De Marneffe and Manning (2008) guidelines.The main difference of our work with respect toPoria et al. (2014) or Vilares et al. (2015a) is thatthey present predefined sets of linguistic patternsfor language-specific SA, whereas our approach is

3The system released together with this paper shows anequivalent ASCII text representation that can be obtained onthe command line. It is also possible to check how the systemworks at https://miopia.grupolys.org/demo/

Page 10: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

4x(1+0.25)=5 1x(1+0.15) = 1.15

a) π=3

c) π=1

5-4=1

He not

is

very

[β=0.25]

handsome

[σ=4]

, but

he something that I really

[β=0.15]

like

[σ=1]

b) π=2

He not

[α=-4]

is

very

handsome

[σ=5]

, but

he something that I really

like

[σ=1.15]

He not

is

very

handsome

[σ=1]

, but

[β=-0.25]

he something that I really

like

[σ=1.15]

1x(1-0.25)+1.15 = 1.91.90

Figure 6: Analysis of a sentence applying universal unsupervised prediction. For the sake of clarity, thereal post-order traversal is not illustrated. Instead we show an (in this case) equivalent computation byapplying all operations with a given priority,π, at the same time, irrespective of the node. Semantic ori-entation, intensification and negation values are extracted from the dictionaries of Taboada et al. (2011).Phase a) shows how the intensification is computed on the branches rooted at‘handsome’and ‘like’ .Phase b) shows how the negation shifts the semantic orientation of the attribute (again, the branch rootedat ‘handsome’). Phase c) illustrates how the clause‘but’ diminishes the semantic orientation of the mainsentence, in particular the semantic orientation of the attribute, the first left subjective branch of its head.Elements that are not playing a role in a specific phase appeardimmed. One of the interesting pointsin this example comes from illustrating how three differentphenomena involving the same branch (theattribute‘handsome’) are addressed properly thanks to the assignedπ.

Page 11: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

a theoretical formalism to define arbitrarily com-plex patterns given tagging and parsing guidelines,which has been implemented and tested on a uni-versal set of syntactic annotation guidelines thatwork across different languages (see§5).

Under this approach, switching the system fromone language to another only requires having atagger and a parser following the Universal Tree-banks (v2.0) guidelines and a subjectivity lexicon,but compositional operations remain unchanged(as shown in§5).4

The performance of the algorithm might varyaccording to the quality of the resources on whichit relies. Mistakes committed by the tagger andthe parser might have some influence on the ap-proach. However, preliminary experiments on En-glish texts show that having a parser with a LAS5

over 75% is enough to properly exploit composi-tional operations. With respect to the lexicalizedparsing (and tagging) models, usually a differentmodel is needed per language, even when usinguniversal guidelines. In this respect, recent studies(Vilares et al., 2016b; Ammar et al., 2016; Guo etal., 2016) have showed how it is possible to traina single model on universal treebanks to parse dif-ferent languages with state-of-the-art results. Thismakes it possible to universalize one of the mostrelevant previous steps of our approach. The samesteps can be taken to train multilingual taggingmodels (Vilares et al., 2016b).

Adapting or creating new compositional opera-tions for other tagging and parsing guidelines dif-ferent from Universal Treebanks only requires: (1)becoming familiar with the new tag and depen-dency sets to determine which tags and depen-dency types should be included in eachC, and(2) manually inspecting sentences parsed with the

4There is a difference between the number of composi-tional operations that are defined in the system (one for eachphenomenon considered: intensification,‘but’ clauses, nega-tion and irrealis), and the number of compositional operationinstances created at runtime given such definitions. The latterdepends on the words, tags and dependency types that matcheach operation’s predicateC. While the matching tags anddependency types are fixed and common to all languages, thenumber of words that can matchC depends on the lexicon, sothe number of operation instances varies across languages de-pending on the use of SO-CAL and SentiStrength as lexicalresources (e.g. English is the language that generates moreinstances, with 1411 compositional operations, due to hav-ing the largest intensifier lexicon among the languages andresources considered).

5Labeled Attachment Score (LAS): The percentage of de-pendencies where both the head and the dependency typehave been assigned correctly. The English model used hasa LAS of 89.36%.

target guidelines to detect if they give a differ-ent structural representation of relevant phenom-ena. In this case, a new set ofS, π or δ valuesmay be needed, so that we can correctly traversethe tree and determine scopes on such dependencystructure. At the moment, new practical operationsneed to be added manually, by defining them in theXML file.

5 Experimental results

We compare our algorithm with respect to existingapproaches on three languages: English, Spanishand German. The availability of corpora and otherunsupervisedSA systems for English and Spanishenables us to perform a richer comparison than inthe case of German, where we only have anad-hoccorpus.

We compare our algorithm with respect to twoof the most popular and widely used unsupervisedsystems: (1)SO-CAL (Taboada et al., 2011), alanguage-dependent system available for Englishand Spanish guided by lexical rules at the mor-phological level, and (2) SentiStrength, a multilin-gual system that does not apply any PoS taggingor parsing step in order to be able to do multilin-gual analysis, relying instead on a set of subjec-tivity lexica, snippet-based rules and treatment ofnon-grammatical phenomena (e.g. character repli-cation). Additionally, for the Spanish evaluation,we also took into account the system developedby Vilares et al. (2015a), an unsupervised syntax-based approach available for Spanish but, in con-trast to ours, heavily language-dependent.

For comparison against state-of-the-art super-vised approaches, we consider the deep recursiveneural network presented by Socher et al. (2013),trained on a movie sentiment treebank (English).To the best of our knowledge, there are no seman-tic compositional supervised methods for Spanishand German.

Accuracy is used as the evaluation metric fortwo reasons: (1) it is adequate for measuring theperformance of classifiers when the chosen cor-pora are balanced and (2) the selected systems forcomparison also report their results using this met-ric.

5.1 Resources

We selected the following standard English cor-pora for evaluation:

Page 12: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

• Taboada and Grieve (2004) corpus: Ageneral-domain collection of 400 long re-views (200 positive, 200 negative) about ho-tels, movies, computers or music amongother topics, extracted from epinions.com.

• Pang and Lee 2004 corpus (Pang and Lee,2004): A corpus of 2 000 long movie reviews(1 000 positive, 1 000 negative).

• Pang and Lee 2005 corpus (Pang and Lee,2005): A corpus of short movie reviews (sen-tences). In particular, we used the test splitused by Socher et al. (2013), removing theneutral ones, as they did, for the binary clas-sification task (total: 1 821 subjective sen-tences).

To show the universal capabilities of our sys-tem we include an evaluation for Spanish using thecorpus presented by Brooke et al. (2009) (200 pos-itive and 200 negative long reviews from ciao.es).For German, we rely on a dataset of 2 000 re-views (1 000 positive and 1 000 negative reviews)extracted from Amazon.

As subjectivity lexica, we use the same dictio-naries used bySO-CAL for both English (2 252 ad-jectives, 1 142 nouns, 903 verbs, 745 adverbs and177 intensifiers) and Spanish (2 049 adjectives,1 333 nouns, 739 verbs, 594 adverbs and 165 in-tensifiers). For German, we use the German Sen-tiStrength dictionaries (Momtazi, 2012) instead(2 677 stems and 39 intensifiers), as Brooke et al.(2009) dictionaries are not available for languagesother than Spanish or English. These are freelyavailable resources that avoid the need to collectsubjective words, intensifiers or negators. Wejust take those resources and directly plug theminto our system. The weights were not tuned orchanged in any way.6 The list of emoticons fromSentistrength is also used as a lexical resource. Ifa term does not appear in these dictionaries, it willnot have any impact on the computation of the

6To test the soundness of our theoretical formalism andthe practical viability and competitiveness of its implemen-tation, it does not matter what resource is chosen. We couldhave selected other available lexical resources such as Sen-tiWordNet. The motivation for choosing SentiStrength (andSO-CAL) dictionaries is purely evaluative. We have com-pared our model with respect to other three state-of-the-artand widely used SA systems that use said resources. Ouraim is not to evaluate our algorithm over a variety of differ-ent lexical resources, but to check if our universal system andcompositional operations can compete with existing unsuper-vised systems under the same conditions (namely, using thesame dictionaries and analogous sets of rules).

SO.7 The content of these dictionaries and theirparameters are not modified or tuned.

5.2 Comparison to unsupervised approaches

Table 2 compares the performance of our modelwith respect to SentiStrength8 and SO-CAL onthe Taboada and Grieve (2004) corpus. With re-spect toSO-CAL, results show that our handling ofnegation and intensification provides better results(outperformingSO-CAL by 3.25 percentage pointsoverall). With respect to SentiStrength, our systemachieves better performance on long reviews.

Table 3 compares these three unsupervised sys-tems on the Pang and Lee 2004 corpus (Pangand Lee, 2004), showing the robustness of ourapproach across different domains. Our systemagain performs better thanSO-CAL for negationand intensification (although it does not behave aswell when dealing with irrealis, probably due tothe need for more complex compositional opera-tions to handle this phenomenon), and also betterthan SentiStrength on long movie reviews.

Rules SentiStrength SO-CAL Our systemBaseline N/A 65.50 65.00+negation N/A 67.75 71.75+intensification 66.00 69.25 74.25+irrealis N/A 71.00 73.75

Table 2: Accuracy (%) on the Taboada and Grieve(2004) corpus. We only provide one row for Sen-tiStrength since we are using the standard config-uration for English (which already includes nega-tion and intensification functionalities).

Rules SentiStrength SO-CAL Our systemBaseline N/A 68.05 67.77+negation N/A 70.10 71.85+intensification 56.90 73.47 74.00+irrealis N/A 74.95 74.10

Table 3: Accuracy (%) on Pang and Lee 2004 testset (Pang and Lee, 2004).

Table 4 compares the performance of our uni-versal approach on a different language (Spanish)with respect to: Spanish SentiStrength (Vilares etal., 2015d), the SpanishSO-CAL (Brooke et al.,

7Out-of-vocabulary words are not given a special treat-ment at the moment.

8We used the default configuration, which already appliesmany optimizations. We set the length of the snippet betweena negator and its scope to 3, based on empirical evaluation,and applied the configuration to compute sentiment on longreviews.

Page 13: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

Rules SentiStrength SO-CAL Our system Vilares et al. (2015a)

Baseline N/A N/A 63.00 61.80+negation N/A N/A 71.00 N/A+intensification 73.00 N/A 74.25 75.75+irrealis N/A 74.50 75.75 N/A

Table 4: Accuracy (%) on the Spanish Brooke etal. (2009) test set.

2009) and a syntactic language-dependent systeminspired on the latter (Vilares et al., 2015a). Weused exactly the same set of compositional opera-tions as used for English (only changing the list ofword forms for negation, intensification and‘but’clauses, as explained in§3.1). Our universal sys-tem again outperforms SentiStrength andSO-CAL

in its Spanish version. The system also obtains re-sults very similar to the ones reported by Vilares etal. (2015a), even though their system is language-dependent and the set of rules is fixed and writtenspecifically for Spanish.

In order to check the validity of our approachfor languages other than English and Spanish, wehave considered the case of German. It is worthnoting that the authors of this article have no no-tions of German at all. In spite of this, we havebeen able to create a state-of-the-art unsupervisedSA system by integrating an existing sentimentlexicon into the framework that we propose in thisarticle.

We use the German SentiStrength system(Momtazi, 2012) for comparison. The use of theGerman SentiStrength dictionary, as mentionedin Section 5.1, allows us to show how our sys-tem is robust when using different lexica. Ex-perimental results show an accuracy of 72.75%on the Amazon review dataset when all rulesare included, while SentiStrength reports 69.95%.Again, adding first negation (72.05%) and thenintensification (72.85%) as compositional opera-tions produced relevant improvements over ourbaseline (69.85%). The results are comparable tothose obtained for other languages, using a datasetof comparable size, reinforcing the robustness ofour approach across different domains, languages,and base dictionaries.

5.3 Comparison to supervised approaches

Supervised systems are usually unbeatable on thetest portion of the corpus with which they havebeen trained. However, in real applications, a suf-ficiently large training corpus matching the targettexts in terms of genre, style, length, etc. is often

not available; and the performance of supervisedsystems has proven controversial on domain trans-fer applications (Aue and Gamon, 2005).

Table 5 compares our universal unsupervisedsystem to Socher et al. (2013) on a number of cor-pora: (1) the collection used in the evaluation ofthe Socher et al. system (Pang and Lee, 2005),(2) a corpus of the same domain, i.e., movies(Pang and Lee, 2004), and (3) the Taboada andGrieve (2004) collection. Socher et al.’s systemprovides sentence-level polarity classification withfive possible outputs:very positive, positive, neu-tral, negative, very negative. Since the Pang andLee (2004) and Taboada and Grieve (2004) cor-pora are collections of long reviews, we neededto collect the global sentiment of the text.Forthe document-level corpora, we count the numberof outputs of each class9 (very positiveand verynegativecount double,positiveandnegativecountone andneutral counts zero). We take the major-ity class, and in the case of a tie, it is classified asnegative.10

The experimental results show that our ap-proach obtains better results on corpora (2) and(3). It is worth mentioning that our unsupervisedcompositional approach outperformed the super-vised model not only on an out-of-domain corpus,but also on another dataset of the same domain(movies) as the one where the neural network wastrained and evaluated. This reinforces the useful-ness of an unsupervised approach for applicationsthat need to analyze a number of texts comingfrom different domains, styles or dates, but there isa lack of labeled data to train supervised classifiersfor all of them. As expected, Socher et al. (2013)is unbeatable for an unsupervised approach on thetest set of the corpus where it was trained. How-ever, our unsupervised algorithm also performsvery robustly on this dataset.

6 Conclusions and future work

In this article, we have described, implementedand evaluated a novel model for universal and un-

9When trying to analyze the document-level corpora withSocher et al.’s system, we hadout-of-memory problemson a64-bit Ubuntu server with 128GB of RAM memory, so wedecided to choose a counting approach instead over the sen-tences of such corpora.

10These criteria were selected empirically. Assigning thepositive class in the case of a tie was also tested, as well as notdoubling thevery positiveandvery negativeoutput, but thesesettings produced similar or worse results with the (Socheretal., 2013) system.

Page 14: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

Corpora Socher et al. (2013) Our systemOrigin corpus of Socher et al. (2013) modelPang and Lee 2005 (Pang and Lee, 2005) 85.40 75.07Other corporaTaboada and Grieve (2004) 62.00 73.75Pang and Lee 2004 (Pang and Lee, 2004) 63.80 74.10

Table 5: Accuracy (%) on different corpora forSocher et al. (2013) and our system. On thePang and Lee 2005 (Pang and Lee, 2005) collec-tion, our detailed results taking into account differ-ent compositional operations were: 73.75 (base-line), 74.13 (+negation), 74.68 (+intensification)and 75.07 (+irrealis)

supervised sentiment analysis driven by a set ofsyntactic rules for semantic composition. Existingunsupervised approaches are purely lexical, theirrules are heavily dependent on the language con-cerned or they do not consider any kind of natu-ral language processing step in order to be able tohandle different languages, using shallow rules in-stead.

To overcome these limitations, we introducefrom a theoretical and practical point of view theconcept of compositional operations, to define ar-bitrarily complex semantic relations between dif-ferent nodes of a dependency tree. Universal part-of-speech tagging and dependency parsing guide-lines make it feasible to create multilingual senti-ment analysis compositional operations that effec-tively address semantic composition over naturallanguage sentences. The system is not restrictedto any corpus or language, and by simply adaptingor defining new operations it can be adapted to anyother PoS tag or dependency annotation criteria.

We have compared our universal unsupervisedmodel with state-of-the-art unsupervised and su-pervised approaches. Experimental results show:(1) that our algorithm outperforms two of the mostcommonly used unsupervised systems, (2) the uni-versality of the model’s compositional operationsacross different languages and (3) the usefulnessof our approach on domain-transfer applications,especially with respect to supervised models.

As future work, we plan to design algorithmsfor the automatic extraction of compositional op-erations that capture the semantic relations be-tween tree nodes.We would also like to col-lect corpora to extend our evaluation to more lan-guages, since collections that are directly avail-able on the web are scarcer than expected. Weplan to pay special attention to Sino-Tibetan andAfro-Asiatic languages. With respect to ungram-

matical texts, we plan to integrate Tweebo parser(Kong et al., 2014) into our system. Although itdoes not follow universal guidelines, it will allowus to define compositional operations specificallyintended for English tweets and their particularstructure. Additionally, the concept of composi-tional operations is not limited to genericSA andcould be adapted for other tasks such as univer-sal aspect extraction. Finally, we plan to adapt thePoria et al. (2014) sentic patterns as compositionaloperations, so they can be handled universally.

Acknowledgments

This research is supported by the Ministeriode Economıa y Competitividad (FFI2014-51978-C2). David Vilares is funded by the Ministeriode Educacion, Cultura y Deporte (FPU13/01180).Carlos Gomez-Rodrıguez is funded by an Oportu-nius program grant (Xunta de Galicia). We thankRoman Klinger for his help in obtaining the Ger-man data.

References

Haifa K. Aldayel and Aqil M. Azmi. Forthcom-ing. Arabic tweets sentiment analysis — a hybridscheme. Journal of Information Science. DOI:10.1177/0165551515610513.

W. Ammar, G. Mulcaire, M. Ballesteros, C. Dyer, andN. A. Smith. 2016. One Parser, Many Languages.arXiv preprint arXiv:1602.01595.

Yui Arakawa, Akihiro Kameda, Akiko Aizawa, andTakafumi Suzuki. 2014. Adding Twitter-specificfeatures to stylistic features for classifying tweets byuser type and number of retweets.Journal of the As-sociation for Information Science and Technology,65(7):1416–1423, July.

A. Aue and M. Gamon. 2005. Customizing sen-timent classifiers to new domains: A case study.In Proceedings of the International Conference onRecent Advances in Natural Language Processing(RANLP).

Alexandra Balahur and Jose M. Perea-Ortega. 2015.Sentiment analysis system adaptation for multilin-gual processing: The case of tweets.InformationProcessing and Management, 51(4):547–556, July.

Alexandra Balahur and Marco Turchi. 2012a.Comparative experiments for multilingual senti-ment analysis using machine translation. In Mo-hamed Medhat Gaber, Mihaela Cocea, StephanWeibelzahl, Ernestina Menasalvas, and Cyril Labbe,editors, SDAD 2012, The 1st International Work-shop on Sentiment Discovery from Affective Data,pages 75–86, Bristol, UK.

Page 15: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

Alexandra Balahur and Marco Turchi. 2012b. Mul-tilingual sentiment analysis using machine transla-tion? In WASSA 2012, 3rdWorkshop on Compu-tational Approaches to Subjectivity and SentimentAnalysis, Proceedings of the Workshop, pages 52–60, Jeju, Republic of Korea.

Alexandra Balahur and Marco Turchi. 2014. Com-parative experiments using supervised learning andmachine translation for multilingual sentiment anal-ysis. Computer Speech and Language, 28(1):56–75,January.

Alexandra Balahur, Marco Turchi, Ralf Steinberger,Jose Manuel Perea-Ortega, Guillaume Jacquet,Dilek Kucuk, Vanni Zavarella, and Adil El Ghali.2014. Resource creation and evaluation for mul-tilingual sentiment analysis in social media texts.In Nicoletta Calzolari (Conference Chair), KhalidChoukri, Thierry Declerck, Hrafn Loftsson, BenteMaegaard, Joseph Mariani, Asuncion Moreno, JanOdijk, and Stelios Piperidis, editors,Proceedings ofthe Ninth International Conference on Language Re-sources and Evaluation (LREC’14), Reykjavik, Ice-land, May. European Language Resources Associa-tion (ELRA).

M Ballesteros and J Nivre. 2012. MaltOptimizer: anoptimization tool for MaltParser. InProceedings ofthe Demonstrations at the 13th Conference of theEuropean Chapter of the Association for Compu-tational Linguistics, pages 58–62. Association forComputational Linguistics.

Carmen Banea, Rada Mihalcea, and Janyce Wiebe.2010. Multilingual subjectivity: Are more lan-guages better? In Chu-Ren Huang and Dan Juraf-sky, editors,COLING 2010. 23rd International Con-ference on Computational Linguistics. Proceedingsof the Conference, volume 2, pages 28–36, Beijing,August. Tsinghua University Press.

Carmen Banea, Rada Mihalceaa, and Janyce Wiebe.2014. Sense-level subjectivity in a multilingual set-ting. Computer Speech & Language, 28(1):7–19,January.

Erik Boiy and Marie-Francine Moens. 2009. Amachine learning approach to sentiment analysisin multilingual web texts. Information Retrieval,12(5):526–558, October.

J. Brooke, M. Tofiloski, and M. Taboada. 2009. Cross-Linguistic Sentiment Analysis: From English toSpanish. InProceedings of RANLP 2009, RecentAdvances in Natural Language Processing, pages50–54, Bovorets, Bulgaria, September.

Erik Cambria, Amir Hussain, Catherine Havasi, andChris Eckl. 2009. Affectivespace: blending com-mon sense and affective knowledge to perform emo-tive reasoning.WOMSA at CAEPIA, Seville, pages32–41.

Erik Cambria, Daniel Olsher, and Rajagopal Dheeraj.2014. SenticNet 3: a common and common-senseknowledge base for cognition-drivensentiment anal-ysis. InTwenty-eighth AAAI conference on artificialintelligence, pages 1515–1521.

Erik Cambria. 2016. Affective computing and senti-ment analysis.IEEE Intelligent Systems, 31(2):102–107.

Y. Chen and S. Skiena. 2014. Building Sentiment Lex-icons for All Major Languages. InThe 52nd AnnualMeeting of the Association for Computational Lin-guistics. Proceedings of the Conference. Volume 2:Short Papers. ACL 2014, pages 383–389, Baltimore,June. ACL.

Boxing Chen and Xiaodan Zhu. 2014. Bilingual senti-ment consistency for statistical machine translation.In The 52nd Annual Meeting of the Association forComputational Linguistics. Proceedings of the Con-ference. Volume 1: Long Papers. ACL 2014, pages607–615, Baltimore, June. ACL.

Alex Cheng and Oles Zhulyn. 2012. A system for mul-tilingual sentiment learning on large data sets. InMartin Kay and Christian Boitet, editors,COLING2012. 24th International Conference on Computa-tional Linguistics. Proceedings of COLING 2012:Technical Papers, pages 577–592, Mumbai, India,December.

Md. Faisal Mahbub Chowdhury, Marco Guerini, SaraTonelli, and Alberto Lavelli. 2013. FBK: Sentimentanalysis in Twitter with Tweetsted. InSecond JointConference on Lexical and Computational Seman-tics (*SEM), Volume 2: Seventh International Work-shop on Semantic Evaluation (SemEval 2013), pages466–470, Atlanta, Georgia, June. ACL.

Fermın Cruz, Jose A. Troyano, Beatriz Pontes, andF. Javier Ortega. 2014. Building layered, multi-lingual sentiment lexicons at synset and lemma lev-els. Expert Systems with Applications, 41(13):5984–5994, October.

Anqi Cui, Min Zhang, Yiqun Liu, and ShaopingMa. 2011. Emotion tokens: Bridging the gapamong multilingual Twitter sentiment analysis. InMohamed Vall Mohamed Salem, Khaled Shaalan,Farhad Oroumchian, Azadeh Shakery, and HalimKhelalfa, editors,Information Retrieval Technology.7th Asia Information Retrieval Societies Conference,AIRS 2011, Dubai, United Arab Emirates, Decem-ber 18-20, 2011. Proceedings, volume 7097 ofLec-ture Notes in Computer Science, pages 238–249.Springer, Berlin and Heidelberg.

Alex Davies and Zoubin Ghahramani. 2011.Language-independent bayesian sentiment miningof Twitter. In The 5th SNA-KDD Workshop’11(SNA-KDD’11), San Diego, CA, August. ACM.

M. De Marneffe and C. D. Manning. 2008. TheStanford typed dependencies representation. InCol-ing 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation,

Page 16: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

pages 1–8. Association for Computational Linguis-tics.

Erkin Demirtas and Mykola Pechenizkiy. 2013. Cross-lingual polarity detection with machine translation.In Proceedings of the Second International Work-shop on Issues of Sentiment Discovery and Opin-ion Mining (WISDOM 2013), page Article No. 9,Chicago, IL, USA, Aug. ACM.

Andrea Esuli and Fabrizio Sebastiani. 2006. Senti-wordnet: A publicly available lexical resource foropinion mining. In Proceedings of the Fifth In-ternational Conference on Language Resources andEvaluation (LREC’06), volume 6, pages 417–422,Genoa, Italy.

Dehong Gao, Furu Wei, Wenjie Li, Xiaohua Liu, andMing Zhou. 2013. Cotraining based bilingual senti-ment lexicon learning. InWorkshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence.AAAI Conference Late-Breaking Papers, Bellevue,Washington, USA, July. AAAI.

Hatem Ghorbel and David Jacot. 2011. Sentimentanalysis of French movie reviews. In VincenzoPallotta, Alessandro Soro, and Eloisa Vargiu, edi-tors,Advances in Distributed Agent-Based RetrievalTools, volume 361 ofStudies in Computational In-telligence, pages 97–108. Springer, Berlin and Hei-delberg.

K. Gimpel, N. Schneider, B. O’Connor, D. Das,D. Mills, J. Eisenstein, M. Heilman, D. Yogatama,J. Flanigan, and N. A. Smith. 2011. Part-of-speechtagging for twitter: Annotation, features, and exper-iments. InProceedings of the 49th Annual Meet-ing of the Association for Computational Linguis-tics: Human Language Technologies: short papers-Volume 2, pages 42–47. Association for Computa-tional Linguistics.

Lin Gui, Ruifeng Xu, Jun Xu, Li Yuan, Yuanlin Yao,Jiyun Zhou, Qiaoyun Qiu, Shuwei Wang, Kam-FaiWong, and Ricky Cheung. 2013. A mixed modelfor cross lingual opinion analysis. In GuodongZhou, Juanzi Li, Dongyan Zhao, and Yansong Feng,editors, Natural Language Processing and Chi-nese Computing, volume 400 ofCommunicationsin Computer and Information Science, pages 93–104, Heidelberg, NewYork, Dordrecht and London.Springer.

Lin Gui, Ruifeng Xu, Qin Lu, Jun Xu, Jiang Xu, BinLiu, and Xiaolong Wang. 2014. Cross-lingual opin-ion analysis via negative transfer detection. InThe52nd Annual Meeting of the Association for Compu-tational Linguistics. Proceedings of the Conference.Volume 2: Short Papers. ACL 2014, pages 860–865,Baltimore, June. ACL.

Jiang Guo, Wanxiang Che, Haifeng Wang, and TingLiu. 2016. Exploiting multi-typed treebanksfor parsing with deep multi-task learning.arXivpreprint arXiv:1606.01161.

Ivan Habernal, Tomas Ptacek, and Josef Steinberg.2014. Supervised sentiment analysis in Czech socialmedia. Information Processing and Management,50(5):693–707, September.

Mohammad Sadegh Hajmohammadi, Roliana Ibrahim,and Ali Selamat. 2014. Bi-view semi-supervisedactive learning for cross-lingual sentiment classifi-cation. Information Processing and Management,50(5):718–732, September.

Kanayama Hiroshi, Nasukawa Tetsuya, and WatanabeHideo. 2004. Deeper sentiment analysis usingmachine translation technology. InProceedings ofthe 20th international conference on ComputationalLinguistics (COLING 2004), Geneva, Switzerland,August.

Alexander Hogenboom, Bas Heerschop, Flavius Fras-incar, Uzay Kaymak, and Franciska de Jong. 2014.Multi-lingual support for lexicon-based sentimentanalysis guided by semantics.Decision SupportSystems, 62:43–53, June.

Piyatida Inrak and Sukree Sinthupinyo. 2010. Apply-ing latent semantic analysis to classify emotions inThai text. In2nd International Conference on Com-puter Engineering and Technology (ICCET), vol-ume 6, pages 450–454, Chengdu, April. IEEE.

L. Jia, C. Yu, and W. Meng. 2009. The effect of nega-tion on Sentiment Analysis and Retrieval Effective-ness. InCIKM’09 Proceeding of the 18th ACM con-ference on Information and knowledge management,pages 1827–1830, Hong Kong, November. ACM,ACM Press.

M. Joshi and C. Penstein-Rose. 2009. Generalizingdependency features for opinion mining. InPro-ceedings of the ACL-IJCNLP 2009 Conference ShortPapers, ACLShort ’09, pages 313–316, Strouds-burg, PA, USA. Association for Computational Lin-guistics.

N. Kalchbrenner, E. Grefenstette, and P. Blunsom.2014. A Convolutional Neural Network for Mod-elling Sentences. InThe 52nd Annual Meeting ofthe Association for Computational Linguistics. Pro-ceedings of the Conference. Volume 1: Long Papers,pages 655–665, Baltimore, Maryland, USA. ACL.

Jungi Kim, Hun-Young Jung, Sang-Hyob Nam, YehaLee, and Jong-Hyeok Lee. 2009. Found in trans-lation: Conveying subjectivity of a lexicon of onelanguage into another using a bilingual dictionaryand a link analysis algorithm. In Wenjie Li andDiego Molla-Aliod, editors,Computer Processing ofOriental Languages. Language Technology for theKnowledge-based Economy, volume 5459 ofLec-ture Notes in Computer Science, pages 112–121.Springer, Berlin and Heidelberg.

S. Kiritchenko, X. Zhu, and S. M. Mohammad. 2014.Sentiment Analysis of Short Informal Texts.Jour-nal of Artificial Intelligence Research, 50(1):723–762, may.

Page 17: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

Roman Klinger and Philipp Cimiano. 2014. The US-AGE review corpus for fine-grained, multi-lingualopinion analysis. In Nicoletta Calzolari (ConferenceChair), Khalid Choukri, Thierry Declerck, HrafnLoftsson, Bente Maegaard, Joseph Mariani, Asun-cion Moreno, Jan Odijk, and Stelios Piperidis, ed-itors, Proceedings of the Ninth International Con-ference on Language Resources and Evaluation(LREC’14), Reykjavik, Iceland, May. EuropeanLanguage Resources Association (ELRA).

Lingpeng Kong, Nathan Schneider, SwabhaSwayamdipta, Archna Bhatia, Chris Dyer, andNoah A Smith. 2014. A Dependency Parser forTweets. InProceedings of the 2014 Conference onEmpirical Methods in Natural Language Processing(EMNLP), pages 1001–1012, Doha, Qatar, oct.ACL.

Hugo Liu and Push Singh. 2004. Conceptneta practi-cal commonsense reasoning tool-kit.BT technologyjournal, 22(4):211–226.

Qian Liu, Zhiqiang Gao, Bing Liu, and Yuanlin Zhang.2016. Automated rule selection for opinion targetextraction. Konwledge-Based Systems, 104:74–88,July.

Eugenio Martınez Camara, M. Teresa Martın Valdivia,M. Dolores Molina-Gonzalez, and Jose M. Perea-Ortega. 2014. Integrating Spanish lexical resourcesby meta-classifiers for polarity classification.Jour-nal of Information Science, 40(4):538–554, August.

R. McDonald, J. Nivre, Y. Quirmbach-brundage,Y. Goldberg, D Das, K. Ganchev, K. Hall, S. Petrov,H. Zhang, O. Tackstrom, C. Bedini, N. Castello, andJ. Lee. 2013. Universal Dependency Annotation forMultilingual Parsing. InProceedings of the 51st An-nual Meeting of the Association for ComputationalLinguistics, pages 92–97. Association for Computa-tional Linguistics.

Nishantha Medagoda, Subana Shanmuganathan, andJacqueline Whalley. 2013. A comparative analy-sis of opinion mining and sentiment classification innon-English languages. InProceedings of ICTER2013, International Conference on Advances in ICTfor Emerging Regions, Colombo, Sri Lanka, Decem-ber. IEEE.

Saeedeh Momtazi. 2012. Fine-grained german sen-timent analysis on social media. In Nicoletta Cal-zolari (Conference Chair), Khalid Choukri, ThierryDeclerck, Mehmet Uur Doan, Bente Maegaard,Joseph Mariani, Asuncion Moreno, Jan Odijk, andStelios Piperidis, editors,Proceedings of the EightInternational Conference on Language Resourcesand Evaluation (LREC’12), pages 1215–1220, Is-tanbul, Turkey. European Language Resources As-sociation (ELRA).

Sascha Narr, Michael Hulfenhaus, and Sahin Alnayrak.2012. Language-independent Twitter sentimentanalysis. InWorkshop on Knowledge Discovery,

Data Mining and Machine Learning (KDML-2012),Dortmund, Germany, September.

Federico Neri, Carlo Aliprandi, Federico Capeci,Montserrat Cuadros, and Tomas By. 2012. Senti-ment analysis on social media. InProceedings ofthe 2012 IEEE/ACM International Conference onAdvances in Social Networks Analysis and Mining,pages 951–958. IEEE, August.

J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit,S. Kubler, S. Marinov, and E. Marsi. 2007. Malt-Parser: A language-independent system for data-driven dependency parsing.Natural Language En-gineering, 13(2):95–135.

B. Pang and L. Lee. 2004. A sentimental education:Sentiment analysis using subjectivity summarizationbased on minimum cuts. InProceedings of the 42ndannual meeting on Association for ComputationalLinguistics, pages 271–278. Association for Com-putational Linguistics.

B. Pang and L. Lee. 2005. Seeing stars: Exploitingclass relationships for sentiment categorization withrespect to rating scales. InProceedings of the 43rdAnnual Meeting on Association for ComputationalLinguistics, pages 115–124. Association for Com-putational Linguistics.

B Pang and L Lee. 2008.Opinion Mining and Senti-ment Analysis. now Publishers Inc., Hanover, MA,USA.

B Pang, L Lee, and S Vaithyanathan. 2002. Thumbsup? Sentiment classification using machine learningtechniques. InProceedings of EMNLP, pages 79–86.

Jose M. Perea-Ortega, M. Teresa Martın-Valdivia,L. Alfonso Urena Lopez, and Eugenio Martınez-Camara. 2013. Improving polarity classification ofbilingual parallel corpora combining machine learn-ing and semantic orientation approaches.Journal ofthe American Society for Information Science andTechnology, 64(9):1864–1877, September.

Slav Petrov, Dipanjan Das, and Ryan McDonald.2012. A universal part-of-speech tagset. In Nico-letta Calzolari (Conference Chair), Khalid Choukri,Thierry Declerck, Mehmet Uur Doan, Bente Mae-gaard, Joseph Mariani, Asuncion Moreno, JanOdijk, and Stelios Piperidis, editors,Proceedingsof the Eight International Conference on LanguageResources and Evaluation (LREC’12), Istanbul,Turkey, may. European Language Resources Asso-ciation (ELRA).

Soujanya Poria, Erik Cambria, Gregoire Winterstein,and Guang-Bin Huang. 2014. Sentic patterns:Dependency-based rules for concept-level sentimentanalysis.Knowledge-Based Systems, 69:45–63, Oc-tober.

Page 18: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

Soujanya Poria, Erik Cambria, and Alexander Gel-bukh. 2016. Aspect extraction for opinion min-ing with a deep convolutional neural network.Knowledge-Based Systems, 108:42–49.

S. Rosenthal, P. Nakov, S. Kiritchenko, S. M Moham-mad, A. Ritter, and V. Stoyanov. 2015. Semeval-2015 task 10: Sentiment analysis in Twitter. InPro-ceedings of the 9th International Workshop on Se-mantic Evaluation (SemEval 2015).

Thomas Scholz and Stefan Conrad. 2013. Linguisticsentiment features for newspaper opinion mining. InNatural Language Processing and Information Sys-tems. 18th International Conference on Applicationsof Natural Language to Information Systems, NLDB2013, Salford, UK, June 19-21, 2013. Proceedings,volume 7934 ofLecture Notes in Computer Science,pages 272–277. Springer, Berlin and Heidelberg.

A Severyn and A Moschitti. 2015. UNITN: Train-ing Deep Convolutional Neural Network for Twit-ter Sentiment Classification. InProceedings of the9th International Workshop on Semantic Evaluation(SemEval 2015), pages 464–469, Denver, Colorado.Association for Computational Linguistics.

Aliaksei Severyn, Alessandro Moschitti, OlgaUryupina, Barbara Plank, and Katja Filip-pova. 2016. Multi-lingual opinion mining onTouTube. Information Processing and Manage-ment, 52(1):46–60, January.

R. Socher, B. Huval, C. D. Manning, and A. Y. Ng.2012. Semantic compositionality through recursivematrix-vector spaces. InProceedings of the 2012Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational NaturalLanguage Learning, pages 1201–1211. Associationfor Computational Linguistics.

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Man-ning, A. Ng, and C. Potts. 2013. Recursive DeepModels for Semantic Compositionality Over a Sen-timent Treebank. InEMNLP 2013. 2013 Confer-ence on Empirical Methods in Natural LanguageProcessing. Proceedings of the Conference, pages1631–1642, Seattle, Washington, USA, oct. ACL.

M. Taboada and J. Grieve. 2004. Analyzing appraisalautomatically. InProceedings of AAAI Spring Sym-posium on Exploring Attitude and Affect in Text(AAAI Technical Report SS0407), Stanford Univer-sity, CA, pages 158–161. AAAI Press.

M. Taboada, J. Brooke, M. Tofiloski, K. Voll, andM. Stede. 2011. Lexicon-based methods forsentiment analysis. Computational Linguistics,37(2):267–307.

M. Taule, M. A. Martı, and M. Recasens. 2008. An-Cora: Multilevel Annotated Corpora for Catalan andSpanish. In Nicoletta Calzolari, Khalid Choukri,Bente Maegaard, Joseph Mariani, Jan Odjik, SteliosPiperidis, and Daniel Tapias, editors,Proceedings of

the Sixth International Conference on Language Re-sources and Evaluation (LREC’08), pages 96–101,Marrakech, Morocco.

M. Thelwall, K. Buckley, and G. Paltoglou. 2012. Sen-timent strength detection for the social web.J. Am.Soc. Inf. Sci. Technol., 63(1):163–173.

K. Toutanova and C. D. Manning. 2000. Enriching theknowledge sources used in a maximum entropy part-of-speech tagger. InProceedings of the 2000 JointSIGDAT conference on Empirical methods in nat-ural language processing and very large corpora:held in conjunction with the 38th Annual Meetingof the Association for Computational Linguistics-Volume 13, pages 63–70.

P. D. Turney. 2002. Thumbs up or thumbs down?:semantic orientation applied to unsupervised classi-fication of reviews. InProceedings of the 40th An-nual Meeting on Association for Computational Lin-guistics, ACL ’02, pages 417–424, Stroudsburg, PA,USA. ACL.

D. Vilares, M. A. Alonso, and C. Gomez-Rodrıguez.2015a. A syntactic approach for opinion mining onSpanish reviews.Natural Language Engineering,21(1):139–163.

David Vilares, Miguel A. Alonso, and Carlos Gomez-Rodrıguez. 2015b. On the usefulness of lexicaland syntactic processing in polarity classification ofTwitter messages.Journal of the Association forInformation Science and Technology, 66(9):1799–1816, September.

David Vilares, Miguel A Alonso, and Carlos Gomez-Rodrıguez. 2015c. Sentiment Analysis on Mono-lingual, Multilingual and Code-Switching TwitterCorpora. InProceedings of the 6th Workshop onComputational Approaches to Subjectivity, Senti-ment and Social Media Analysis, pages 2–8, Lis-boa, Portugal, sep. Association for ComputationalLinguistics.

David Vilares, Mike Thelwall, and Miguel A Alonso.2015d. The megaphone of the people? Spanish Sen-tiStrength for real-time analysis of political tweets.Journal of Information Science, 41(6):799–813.

David Vilares, Miguel A. Alonso, and Carlos Gomez-Rodrıguez. 2016a. En-es-cs: An english-spanishcode-switching twitter corpus for multilingual senti-ment analysis. In Nicoletta Calzolari (ConferenceChair), Khalid Choukri, Thierry Declerck, MarkoGrobelnik, Bente Maegaard, Joseph Mariani, Asun-cion Moreno, Jan Odijk, and Stelios Piperidis, edi-tors,Proceedings of the Tenth International Confer-ence on Language Resources and Evaluation (LREC2016), pages 4149–4153, Portoroz, Slovenia, May.ELRA.

David Vilares, Carlos Gomez-Rodrıguez, andMiguel A. Alonso. 2016b. One model, twolanguages: training bilingual parsers with har-monized treebanks. InProceedings of the 54th

Page 19: arXiv:1606.05545v2 [cs.CL] 5 Jan 2017 · In spite of being powerful and accurate, su-pervised approaches like these also present draw-backs. Firstly, they behave as a black box. Sec-ondly,

Annual Meeting of the Association for Computa-tional Linguistics (Volume 2: Short Papers), pages425–431, Berlin, Germany, August. Association forComputational Linguistics.

Julio Villena-Roman, Janine Garcıa-Morera, CristinaMoreno-Garcıa, Sara Lana-Serrano, and Jose Car-los Gonzalez-Cristoba. 2014. TASS 2013 — a sec-ond step in reputation analysis in Spanish.Proce-samiento del Lenguaje Natural, 52:37–44, March.

G. Vinodhini and RM. Chandrasekaran. 2012. Sen-timent analysis and opinion mining: A survey.In-ternational Journal of Advanced Research in Com-puter Science and Software Engineering, 2(6):282–292, June.

Svitlana Volkova, Theresa Wilson, and DavidYarowsky. 2013. Exploring sentiment in socialmedia: Bootstrapping subjectivity clues from multi-lingual Twitter streams. InACL 2013. 51st AnnualMeeting of the Association for ComputationalLinguistics Proceedings of the Conference. Volume2: Short Papers, pages 505–510, Sofia, Bulgaria,August. ACL.

Xiaojun Wan. 2009. Co-training for cross-lingual sen-timent classification. InACL-IJCNLP 2009. JointConference of the 47th Annual Meeting of the As-sociation for Computational Linguistics and 4th In-ternational Joint Conference on Natural LanguageProcessing of the AFNLP. Proceedings of the Con-ference, pages 235–243, Suntec, Singapore, August.ACL, World Scientific Publishing Co Pte Ltd.

Min Xiao and Yuhong Guo. 2012. Multi-view Ad-aBoost for multilingual subjectivity analysis. InMartin Kay and Christian Boitet, editors,COLING2012. 24th International Conference on Computa-tional Linguistics. Proceedings of COLING 2012:Technical Papers, pages 2851–2866, Mumbai, India,December.

Gonjun Yan, Wu He, Jiancheng Shen, and ChuanyiTang. 2014. A bilingual approach for conductingChinese and English social media sentiment analy-sis. Computer Networks, 75(B):491–503, Decem-ber.

Changli Zhang, Daniel Zeng, Jiexum Li, Fei-YueWang, and Wanli Zuo. 2009. Sentiment analysisof Chinese documents: From sentence to documentlevel. Journal of the American Society for Infor-mation Science and Technology, 60(12):2474–2487,December.