collecting and evaluating lexical polarity with a game

Proceedings of Recent Advances in Natural Language Processing, pages 329–337,Hissar, Bulgaria, Sep 7–9 2015.

Collecting and Evaluating Lexical Polarity with a Game With A Purpose

Mathieu LafourcadeLIRMM

Univ. Montpellier, [email protected]

Nathalie Le BrunImagin@t

34400 Lunel, [email protected]

Alain JoubertLIRMM

Univ. Montpellier, [email protected]

Abstract

Sentiment analysis from a text requiresamongst others having a polarity lexicalresource. We designed LikeIt, a GWAP(Game With A Purpose) that allows to at-tribute a positive, negative or neutral valueto a term, and thus obtain a resulting polar-ity for most of the terms of the freely avail-able lexical network of the JeuxDeMotsproject. We present a quantitative analy-sis of data obtained through our approach,together with the comparison method wedeveloped to validate them qualitatively.

1 IntroductionBeing able to evaluate feelings is essential in nat-ural language processing, whether to analyze po-litical speeches or opinion of the general pub-lic on the provision of services, tourist, cultural,or about consumer goods. Whatever type of ap-proach, statistics supervised or more linguistic one(Brun, 2011), such ability requires referring to apolarity lexical resource, in wich terms are en-dowed with positive, negative and neutral values.The polarity can be expressed using a single nu-merical value (Taboada et al., 2011), or more: twovalues (positive/negative) are used in Emolex (Saifand Turney, 2013), a lexical polarity / feelingsresource in English, produced by crowdsourcing(using Amazon Mechanical Turk, which can beproblematic, see Fort et al. (2014)). SentiWord-Net (Esuli and Sebastiani, 2006), as well as Word-Net Affect (Strapparava and Valitutti, 2004) areextensions of WorldNet in which terms are polar-ized along three values (positive, negative, objec-tive) of which the last one is opposed to the firsttwo. Approaches through propagation by startingfrom a manual core (see Gala and Brun (2012) andLafourcade and Fort (2014)) have also been per-formed, but such approaches may not exactly re-flect the views of speakers. Learning algorithms

and compositional approaches can use such polar-ity data (Kim and Hovy, 2004) and (Turney, 2002).The GWAP (Game With A Purpose) JeuxDeMots(JDM) (Lafourcade, 2007) resulted in a lexicalnetwork in constant expansion, in which terms arelinked by lexical-semantic relations. Contributiveapproaches with non-experts have been analysedin (Snow et al., 2013) and proven quite efficient.Within the JDM project, some alternative type ofgames allow to validate and verify lexical relationsproduced through the main game (Lafourcade etal., 2015). The JDM project thus provides suitablecontext to test various methods of polarity infor-mation acquisition.

It may be relevant to assign to words some in-formation in the form of finite sets of values. Thus,the polarity can be defined by three values: posi-tive, negative and neutral. It may be noticed thatmany semantic features can be characterized inthis way, i.e. associated with such variable-sizedsets of values: feelings/emotions (anger, fear, joy,love, sadness . . . ), or colors (red, blue, yellow,green, orange, violet, black, white . . . ). Since thistype of association cannot be obtained through themain game of the JDM project, we designed sev-eral other games for characterizing the words ac-cording to various criteria (I like/I don’t like, asso-ciated feeling, associated color. . . ). Applicationsof these data are numerous, either in discourseanalysis or disambiguation. But such an annota-tion is complex because it is subjective and heavilyinfluenced by the context: for example, the sameremark can be considered a trait of humor, an ad-vice, a criticism or a reprimand ... according to theenunciator, the interlocutor and context.

In this article, we firstly introduce LikeIt, aGWAP designed to collect polarization data, andhow the polarization of the terms spreads withinthe lexical network. Then, we present the re-sults obtained through a quantitative and qualita-tive analysis. Our method of qualitative assess-

329

ment, based on a comparison between the polaritydata and the feelings data (i.e. feelings that peoplespontaneously associate with a given term) is de-scribed in detail. Finally, we discuss the prospectsthat this work allows to consider.

2 LikeIt, a Polarity Game

Similarly to social networks, the game is to assignthe assessment I like, I do not like or I don’t careto a displayed term. Of course, this assessmentis not only very subjective, but closely linked tothe context. However, we hypothetise that manywords have intrinsic polarity that it is possible togather, by asking enough people. A majority po-larity may emerge from answers, and if so, we canverify which one.

2.1 How Does it Work?The player has to answer yes, no or I do not careto the question do you like the idea of followedby a term. This framework seems to be the mostflexible and most comprehensive way to enrich thelexical network with polarity information. This al-lows in particular to distinguish between the termsfor which people are mostly indifferent (major-ity of I do not care, neutral polarity) and thosethat raise sharply divided opinions (roughly equalamounts of yes and no, polarity equally dividedbetween positive and negative). Within the contextof word sense disambiguation, preliminary resultsshow that polarity is sufficient for selecting thecorrect meaning of a term in about 50% of cases.Polarity data may also be used in opinions analy-sis, by combining the polarities of highly polarizedterms (i.e. those whose highest polarity is greaterthan 50% of the cumulative values of the threepossible polarities). Figure 1 shows screenshotsof LikeIt game. Among the qualities that makethis vote game by consensus an effective GWAP(Lafourcade et al., 2015), it can be emphasized:

• simplicity: although the response procedure(yes, no, I don’t care) is identical to that ofsurveys, diversity of vocabulary and topics issuch that people do not feel they complete asurvey. In addition, the response by a simpleclick makes possible to play from a smartphoneor tablet, to pass the time during relatively shortwaiting situations (waiting room, queue, trans-port, ...). Quantitatively, very short games andimmediate rerun make likeIt a very effectivegame to collect data.

• diversity of vocabulary and topics and vari-ability of response: a number of words elicitmixed feelings, even opposing (e.g., the termoperating room, theoretically seen as positive,but negative if we are personally concerned),and the feelings of a player can evolve overtime and according to circumstances. Thus theword bachelor or school exam creates a nega-tive feeling among high school students, but sig-nificantly positive for graduates. The choice toprovide some very general vocabulary makes itinteresting and varied game.

• reactivity: as soon he answered, the player cansee the percentage of people who share his opin-ion, which may induce some emotions about thefact of being or not like everyone. Direct feed-back is thus given, while the game is immedi-ately rerun with a new question.

2.2 The Collected Data

For each word, the responses of players gener-ate a triplet of values representing the numberof votes for each of the three possible polarities.Their percentage distribution represents what wecall the polarization of the term, similar to a three-component vector, whose norm can be calculated.The higher are the intensity (i.e. the number ofvotes for the word) and the vector norm, the morereliable the polarization is. The minimum inten-sity from which the polarization can be consid-ered as reliable is difficult to define because var-ious factors are involved, but we can estimate theminimum number of votes as 20 times the num-ber of poles (i.e. at least 20 votes for a monopo-larity, 40 votes for bipolarity, and 60 votes for atripolarity). A word is highly polarized when oneof the three values is greater than 50%. Table 1shows some examples of different polarizations,with corresponding intensities and norms.

Although this is out of the scope of this arti-cle, we should note that considering polarizationas a vector of the three polarities (with possiblynull polarities) lead us to very interesting manip-ulation, comparison and combination possibilitiespertaining to vectors and to norms. Basically, theManhattan norm is the count of votes, and the p-norm with p = 2 is the Euclidian norm (the onementioned here). Roughly speaking, in the contextof sentiment analysis of a given text, combiningpolarizations of contained words means addingsuch normed vectors.

330

Figure 1: Two consecutive screenshots of LikeIt. Further to the answer given in the left screen (bar-rier), the player immediately can see at the top of the next screen (right image and bottom zoom), thepercentage of players who share his view: the game thus provides a feedback to the player while beingimmediately rerun with a new question.

Table 1: Examples of polarizations obtained with LikeIt: the term gift is strongly positively monopo-larized, while others show a more heterogeneous distribution. The norm value is the norm of the vectorcomposed of the values of positive, neutral and negative polarities. The higher the norm, the more con-fident we can be in the the representativeness of the polarity distribution.

2.3 The Term Selection Algorithm

A very large proportion of words having a neutraloverall polarity, if we randomly select the termswithin the network, the game may be monotonousand boring for the player. In addition, the networkincludes highly specialized terms, which is inter-esting if you know the term, but discouraging oth-erwise. For these reasons, the terms are selectedwithin the network via a propagation algorithmwhose principle is:

• A term T whose neutral polarity represents lessthan 50% in the distribution of the three polari-ties is selected randomly;

• The proposed word is either T, with a probabil-ity p of 0.5, either a neighbor N that is randomlyselected among neighbors of T, with probability1-p;

• In order to accelerate the propagation, the prob-ability p is changed under various conditions(empirically determined). If the total number of

votes is under 30 (resp. over 300, over 1000) forN, then p = 0.25 (resp. 0.75, 0. 9);

• The propagation algorithm was initiated bymanually assigning a positive polarity to theterm good (1 positive vote) and a negative po-larity to the word bad (1 negative vote).

This simple algorithm performs within the net-work a propagation between the words for whichpolarity information is relevant, i.e. those that arenot strongly neutral, and this while partially avoid-ing the terms that have already a lot of polarityvotes. Thus, a neutral term will be mostly selectedthrough its neighbors. A highly linked term (toother terms in the network) will be polarized morequickly than others as it will be more often reachedthrough neighboring (at least as long as the num-ber of votes remains below the set threshold).

2.4 Observed Experimental BiasesA first bias observed is due to polysemy: for apolysemous term, it is possible that the player’sresponse is influenced by an anecdotal sense, but

331

strongly negative or positive. Thus, polarizationof vache (cow), whose the dominant sense, the an-imal, is broadly neutral (or even slightly positive)can be influenced by the meaning vache (mechant)(nasty), which is strongly negative. Indeed, theplayers, who are de facto in a polarity context,think at first to the most polarized sense and thusthey assign it a negative polarity. It is the samefor fumier (dominant sense: manure, substitutedsense: insult), cellule (cell) (dominant sense: biol-ogy, substituted sense: jail cell), etc. However,the refinements of these terms show a polariza-tion consistent with that expected. Some terms arebipolarized because the polarity can vary depend-ing on the diegetic perspective (the player identi-fies with the character and is involved in the con-text) or extradiegetic (he adopts an external per-spective). Thus dragon, orc, vampire, witch, etc.are both negative (diegetic perspective) and posi-tive (extradiegetic perspective). It is a second bias.A third bias, which tends to favour the positive po-larization, is explained in the next section.

3 Evaluation of Polarity Data

3.1 Quantitative EvaluationDuring the first three months, more than 25,000terms were polarized (i.e. characterized with in-formation on the polarity) with a total of over150,000 votes. Within 3 years, more than 385,000words were polarized with more than 100 mil-lion votes. The network containing about 490,000words, we see that about 75% were reached by thepropagation algorithm 1 .

Table 2: Quantitative data of polarity obtainedwith LikeIt: the average number of votes for termsand polarities.

We can clearly see on table 2 that the aver-age number of positive votes is higher than neu-tral and negative votes altogether. Hence play-ers seem more reluctant to vote neutral or nega-tive than positive. This could be interpreted as

1all data are freely available at the following url in real-time: url anonymized

an analogy to what happens on social networks,where people are invited to click like to show theirapproval, but where there is no way to indicatethat one do not like, disapprove, or even just isindifferent. Thus, it is possible that many peo-ple unconsciously behave in a ”socially correct”way, i.e. giving only positive opinions and pass-ing over terms that would generate a negative one.We should note however than the mean numbermight not always be a good indicator of the dis-tribution of the votes, especially when the distri-bution roughly follows a power law. The medianvalue is certainly more meaningful.

As regards the global distribution of polarities(table 3), there is a slight predominance of neutralpolarity, which is not surprising. Although the al-gorithm is designed not to offer too many neutralterms, current vocabulary still remains predomi-nantly neutral. On the other hand, the positive po-larities are almost twice as high as the negativeones, which may be explained in different ways,in addition to the above assumption.

Data in tables 3 and 5 thus appear to be biasedtowards the positive polarity that represents 55%of votes. Indeed, interviewing the players, it turnsout that many terms rather perceived as neutral(e.g. Odonata) are often labeled positively. Thebias seems to be the result of the adage that ”I lovewhat I do not hate”. It is difficult to assess the im-pact of such a bias because the terms that would bepositive or neutral are not known a priori, but thiseffect would be in addition to the one mentionedabove (reluctance to express a negative opinion) toexplain the strong predominance of positive votes.

The positive bias can also be explained as aneffect of the term selection algorithm : the pro-posed terms are mostly named entities or words infields which usually arouse approval : thus the vastmajority of famous people are perceived ratherpositively, especially actors and actresses; in thesame way, named entities of works (films, paint-ings, novels. . . ) mainly generate positive feelings,as well as most of the culinary vocabulary , espe-cially names of culinary specialty, of drinks . . .

The distribution of polarities according to num-ber of votes in table 4 has a median value around80 (it means that there are so many polarities witha number of votes lower than 80, as of polari-ties with a number of votes higher than 80). Itis quite enough votes for being statistically mean-ingful. We could consider that at least 20 votes are

332

Table 3: Quantitative data obtained with LikeIt: distribution of polarities (left) and votes (right). We cansee the distribution of polarities does not stick exactly to the distribution of votes. There is a majority ofneutral polarities but a majority of positive votes.

needed to define a representative polarity; 811,666polarities are above this threshold (89% of all po-larities).

The average (120 see table 2) is shifted to theright due to a number of relations with a very highnumber of votes. These are the ”hub” terms ofthe network, i.e. the very general terms, whichare connected to several tens of thousands ofwords. For instance, the term animal has morethan 26,052 outcoming relations. Such terms aremore often proposed than less connected words,and thus rapidly collect a large number of votes.

Figure 2: This figure shows the distribution of themajority polarities (> 50%). Such polarities arelargely positive and are twice the number of neu-tral and negative polarities altogether. Howeverfor higher distributions (from 60% to 80%) num-ber of positive drops sharply.

In table 5 we see that the dominating polar-ities combinations are: positive/neutral bipolar-ity (43%, not necessarily with the same weight)and ”positive/neutral/negative” tripolarity (42%,not necessarily evenly distributed). For the first, itconfirms that people tend to vote either neutral orpositive, or more precisely to vote positively evenif they are rather indifferent. Conversely, a nega-tive vote would truly reflect a marked opinion. For

the second, it indicates that many words arouse anopinion shared, although in these polarities distri-butions there may be a strong dominance of oneamong the three. This distribution also shows thatunanimity is rare: only 6.4% shows a single polar-ity. Figure 3 is cumulative and shows that there aremany (in proportion) negative and neutral polari-ties with a low number of votes, and significantlymore positive polarities over 200 votes. This isconsistent with the hypothesis mentioned above :people seem more likely to vote positively thannegatively or neutrally. In figure 4, the distribu-tion of polarities according to their weight (linearand log) shows that over approximately 400 votes,negative polarities are more numerous than oth-ers. This is due to the presence in the networkof very negative ”hubs” : hightly connected wordsfor which the vote is almost always negative, asdeath, illness, accident, cancer ...Ups and downsare a consequence of the structure of the network,the algorithm and the fact that players can passover, all combined.

Table 5: Quantitative data of polarity obtainedwith LikeIt: the distribution of terms according tomono, bi or tripolarization.

3.2 Qualitative Evaluation MethodThe problem of the qualitative evaluation of ourdata is complex insofar as there is no lexical re-source of polarity to which the polarity data fromLikeIt could be compared. A manual assessmentwhich would be to check the relevance of the po-larity assigned to a number of terms is unthink-

333

Table 4: Distributions of polarities depending upon the number of votes which median value is around80, the median for each polarity being given in table 3.

Figure 3: Cumulative number of polarities accord-ing to the number of votes (weight). The me-dian values concerning the negative and neutralpolarities (resp. 36 and 70) are significantly lowerthan the median value for positive polarities (about200).

able due to the data size. In addition, how wouldwe select the terms to be checked? Within theproject JDM, two games allow associations be-tween terms and the feelings they evoke: termsrelative to feelings can be proposed openly via atext field in the main game, and in a semi-openway (chosen by clic or given through free answerin advanced mode) in Emot game (Lafourcade etal., 2015) (url anonymized) .

So, for each term, we get a list of weighted as-sociated feelings as follows:

• gift: joy (1712)(+); surprise (1142)(+); happi-ness (980)(+); love (780)(+); pleasure (741)(+);friendship (660)(+); gratitude (310)(+); disap-pointment (260)(-); amazement (222)(+); grate-fulness (210)(+); generosity (200)(+); satis-faction (160)(+); contentment (140)(+); enjoy-

ment (120)(+); desire (100)(+); embar-rassment(90)(-); emotion (81)(+); delight (80)(+); impa-tience (70)(-); jealousy (70)(-); happy (60)(+);party (50)(+); liking (50)(+); frustration (50)(-);awkwardness (50)(-);

• policeman: security (1027)(+); fear (1007)(-);violence (817)(-); hatred (357)(-); apprehension(297)(-); anger (186)(-); strength (137)(*); pro-tection (127)(+); repression (127)(-); insecu-rity (127)(-); anxiety (117)(-); revolt (117)(-);insecurity (127)(-); injustice (97)(-); brutality(97)(-); panic (97)(-); respect (97)(+); terror(87)(-); aggressiveness (117)(-); fury (87)(-);distrust (87)(-); worry (77)(-); pain (77)(-); re-ject (77)(-); blue funk (67)(-); blindness (66)(-);mistrust (65)(-); shame (63)(-); incomprehen-sion (57)(-); distress (57)(-); relief (57)(+);fright (32)(-); disquietude (32)(-);

• arm: strength (110)(*); protection (100)(+);support (80)(+); union (5)(-); indifference (4)().

The terms concerning feelings were the first tobe reached by the propagation algorithm, so theyare polarized. In the list above, for each feel-ing term, following the weight of the relation, asymbol in brackets indicates the majority polarity(which accounts for over 50% of votes) or the ab-sence of a dominant polarity. The (+) correspondsto a positive dominant polarity, (-) indicates a neg-ative dominant polarity, () a predominantly neutralpolarity, and (*) indicates the absence of a major-ity polarity. We so notice that the term strength,associated with arm and with policeman does notpresent any majority polarity.

A polarization can thus be calculated for a term,by making the sum of polarity vectors of everyfeeling term associated, and it can be comparedto that stemming from the LikeIt game. We com-pare then a polarity inferred to a polarity directly

334

Figure 4: Distribution of polarities (on the left) and of the log of polarities (on the right) according totheir number of votes (weight). The number of polarities above 500 votes is low (see table 4) - they arenot shown on these figures.

established by the players. This is done via a co-sine measure and a measure of the max (max=1 ifboth dominant polarities coincide). The advantageof such an approach is that it can be automated, sowe can reserve the effort of manual inspection fordivergent cases. We calculated the cos and maxvalues, and ordered the first 5,000 terms by de-creasing weights for the feelings relation (thus themost often played for this relation at the first).

The average of the maximal polarities from thegame (mpa) can be seen as the maximum rate ofagreement reached on average by the general opin-ion, for the n most played words. Between 1,000and 5,000 first most played words, the differencebetween mpa and unanimity (100-mpa) varies be-tween 15 and 12 %: it seems logic that the numberof divergent opinions increases with the numberof votes. The manual review of cases of diver-gence (max = 0) shows that they mainly concernthe terms that can be perceived from a diegetic orextradiegetic perspective, such as:

thesis, earwig, analysis, moray, micropenis,woman [agent-of] express something, dragon,custard pie attack. . .

To associate feelings with a given term, theplayer seems to get a diegetic perspective, while headopts an external one (extradiegetic perception)to assign one polarity to a given word with LikeIt.Indeed, all the cases of difference concern wordspolarized negatively via the associated feelings,and positively via LikeIt. Note that the highly po-larized words are not concerned by the perspective

diegetic / extradiegetic. Moreover, we emphasizethat the terms that elicit the most subjectivity ofopinion display a heterogeneous polarity, but itsdistribution into positive/negative/neutral is con-sistent in both modes of assessment.

4 Conclusion and Future WorkOur results and the method we developed to char-acterize the polarity through various GWAP allowto consider a number of perspectives. First, it isto continue the double approach (polarity inferredfrom associated feelings, and polarity directly as-signed through the game LikeIt) to further expandthe already abundant lexical resource of polarity(385,000 words with a polarity information as afreely available resource).

Then, our approach can be extrapolated: in-deed, all types of characteristics (size, tempera-ture, weight / balance, temporality, location ...)may be characterized and quantified using crowd-sourcing through GWAP. But a preliminary studyto identify the most useful and informative hasnecessarily to be undertaken, to avoid boring andthus demotivating the players by multiplying thistype of games. Note that the data generatedthrough these games, that require only knowledgeand a good command of language, are of goodquality, which justifies this approach.

It is also necessary to keep in mind that the po-larities data are not static but potentially fluctuat-ing, especially in time, and depending on the cir-cumstances. For example, the term volcano ratherarouses curiosity or indifference, but when an im-

335

Figure 5: A screenshot of the Emot game. The player is invited to choose one associated feeling arousedby the word surprise. The data obtained with Emot allow us to cross-evaluate those obtained with LikeIt.

Table 6: Qualitative assessment of polarization data from LikeIt compared with those calculated fromthe associated feelings. There is a significant correlation between the polarization defined by LikeIt andthat induced by the associated feelings.

minent eruption threatens populations or air traf-fic, anxiety and fear become the majority amongthe feelings expressed. Similarly, feelings about acelebrity, or a work (named entities) can be veryfluctuating over time, and if contradictory feelingsappear for the same word in the network, introducea notion of context may be interesting, for exam-ple DSK [context] IMF, and DSK[context] Sofitel.

We could polarize the words automatically,based on their relations within the network: forexample, the relation characteristic is very polar-izing; widow [characteristic] sad allows to assigna negative polarity to widow. However, the crowd-sourcing approach is generally more reliable andfaster, both for highly monopolarized words andthose whose polarity is more heterogeneous.

The approach and tools presented in this arti-cle are relatively new, and the number of polarizedterms represents a significant proportion (70%) ofthe entire network. It can be assumed that the mostinteresting common words are those which are themost played in JeuxDeMots, hence the most ap-propriately linked to other words, as claimed in(Chamberlain et al., 2006). As our propagation al-gorithm selects the vast majority of such terms, wemay conclude that our approach allows to effec-tively polarize them. Given the results, we reckonwe have demonstrated the feasibility, the interestand the perspective of our project, and broadly un-dertook to build the corresponding resource.

336

ReferencesAgarwa C. and Bhattachary P. 2006. Augmenting

Wordnet with Polarity Information on Adjectives.3rd International Wordnet Conference, Jeju Island,Korea, South Jeju (Seogwipo), 2006, 7 p.

Brun C. 2011. Detecting opinions using Deep Syn-tactic Analysis. Proceedings of Recent Advancesin Natural Language Processing, (RANLP 2011),Hissar, Bulgaria, pp. 392-398.

Chamberlain J., Fort K., Kruschwitz U., Lafourcade M.and Poesio M. 2013. Using Games to CreateLanguage Resources: Successes and Limitations ofthe Approach. Theory and Applications of NaturalLanguage Processing. Gurevych, Iryna; Kim, Jungi(Eds.), Springer, ISBN 978-3-642-35084-9, 2013,42 p.

Esuli A. and Sebastiani F. 2006. SentiWordNet: a pub-licly available lexical resource for opinion mining.Proceedings of LREC-06, Gene, Italie, 6 p.

Fort K., Adda G., Sagot B., Mariani J. and Couillaut A.2014. Crowdsourcing for Language Resource De-velopment: Criticisms about Amazon MechanicalTurk Overpowering Use. Lecture Notes in Artifi-cial Intelligence,Springer, pp. 303-314, 2014, 978-3-319-08957-7.

Gala N. and Brun C. 2012. Propagation de po-larites dans des familles de mots : impact de lamorphologie dans la construction d’un lexique pourl’analyse d’opinions. Actes de Traitement Automa-tique des Langues Naturelles, (TALN 12), Grenoble,juin 2012, pp. 495-502.

Kim S. and Hovy E. 2004. Determining the sentimentof opinions. Proceedings of COLING- 04, (TALN12), Barcelone, Espagne, pp. 1367-1373.

Lafourcade M. 2007. Making people play for Lexi-cal Acquisition. Proceedings of 7th Symposium onNatural Language Processing, Thailand, 13-15 De-cember 2007, 8p.

Lafourcade M., Le Brun N. and Joubert A. 2015. Jeuxet intelligence collective – resolution de problemeset acquisition de donnees sur le Web. CollectionScience cognitive et management des connaissances(sous la direction de Joseph Mariani et PatrickParoubek), ISTE editions, 2015, 156 p.

Lafourcade M. and Fort K. 2014. Propa-L: a se-mantic filtering service from a lexical network cre-ated using Games With A Purpose. Proceedings ofthe Ninth International Conference on Language Re-sources and Evaluation, Reykjavik, Islande, 26-31May 2014, pp. 1676-1681.

Saif M. and Turney P. 2013. Crowdsourcing a Word-Emotion Association Lexicon. Computational Intel-ligence, 29 (3), pp. 436-465.

Snow R., OConnor B., Jurafsky D., and Y. Ng A.2008 Cheap and Fast But is it Good? Evaluat-ing Non-Expert Annotations for Natural LanguageTasks. Proceedings of the 2008 Conference on Em-pirical Methods in Natural Language Processing,Honolulu, Association for Computational Linguis-tics, pp. 254263.

Strapparava C. and Valitutti A. 2004. WordNet Af-fect: an affective extension of WordNet. Proceed-ings 4th International conference on Language Re-sources and Evaluation, (LREC-04), Lisbon, Portu-gal, pp. 1083-1086.

Taboada M., Brooke J., Tofiloski M., Voll K. and StedeM. 2011. Lexicon-based methods for sentimentanalysis. Computational Linguistics, Volume 37 (2),pp. 267-307.

Turney P. 2002. Thumbs up or thumbs down? Seman-tic orientation applied to unsupervised classificationof reviews. Proceedings of ACL-02, Philadelphia,USA, pp. 417-424.

Vegnaduzzo V. 2004. Acquisition of subjective adjec-tives with limited resources. AAAI spring sympo-sium on exploring attitude and affect in text: Theo-ries and Applications, Stanford, USA.

337

collecting and evaluating lexical polarity with a game

Documents