arxiv:1709.05467v1 [cs.cl] 16 sep 2017edge into our language models in order to better represent...

10
Acquiring Background Knowledge to Improve Moral Value Prediction Ying Lin 1 , Joe Hoover 2 , Morteza Dehghani 2,3 , Marlon Mooijman 2 , Heng Ji 1 1 Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, USA {liny9,jih}@rpi.edu 2 Department of Psychology, University of Southern California, Los Angeles, CA, USA {jehoover,mdehghan,mooijman}@usc.edu 3 Department of Computer Science, University of Southern California, Los Angeles, CA, USA Abstract In this paper, we address the problem of detecting expressions of moral values in tweets using content analysis. This is a particularly challenging problem because moral values are often only implicitly sig- naled in language, and tweets contain lit- tle contextual information due to length constraints. To address these obstacles, we present a novel approach to automati- cally acquire background knowledge from an external knowledge base to enrich input texts and thus improve moral value pre- diction. By combining basic text features with background knowledge, our overall context-aware framework achieves perfor- mance comparable to a single human an- notator. To the best of our knowledge, this is the first attempt to incorporate back- ground knowledge for the prediction of implicit psychological variables in the area of computational social science. 1 Introduction Moral values are principles that define right and wrong for a given individual. They influence de- cision making, social judgments, motivation, and behavior and are thought of as the glue that holds society together (Haidt, 2012). However, moral values are not universal, and disagreements about what is moral or sacred can give rise to seemingly intractable conflicts (Dehghani et al., 2010; Gin- ges et al., 2007). Accordingly, public demonstra- tions and protests often involve moral conflicts be- tween different groups. For example, as Figure 1 shows, during the 2015 Baltimore protests 1 , users posted their viewpoints about this event on Twitter, 1 https://en.wikipedia.org/wiki/2015 Baltimore protests demonstrating divergent and even opposite moral values. Detecting moral values in user-generated con- tent not only can provide insight into these con- flicts but also inform applications that aim to model social phenomena such as voting behavior and public opinions. For example, (Koleva et al., 2012) shows that moral concerns play an impor- tant role in one’s attitude and ideological position across a wide range of issues, such abortion and same-sex marriage. Moral values have also been used to investigate various political attitudes in the United States. Liberals and conservatives attend to different moral intuitions (Graham et al., 2009): Liberals focus on the notions of Harm and Fair- ness, while conservatives attend to ideas of Loy- alty to in-group members, Authority, and Purity. Figure 1: Tweets related to 2015 Baltimore Protest. In this work, we predict the moral values ex- pressed in social media text via a suite of Natural Language Processing (NLP) techniques. A given text can contain any one or more moral values, as defined by Moral Foundation Theory (MFT, elab- orated in Section 2)(Graham et al., 2013), or it can be non-moral. In previous work, computa- tional linguistic measurements of latent attributes such as moral values, personality, and political orientation have primarily relied on textual fea- tures directly derived from target texts; these fea- tures have ranged from n-grams, word embed- arXiv:1709.05467v1 [cs.CL] 16 Sep 2017

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

Acquiring Background Knowledge to Improve Moral Value PredictionYing Lin1 , Joe Hoover2, Morteza Dehghani2,3, Marlon Mooijman2, Heng Ji1

1 Computer Science Department,Rensselaer Polytechnic Institute, Troy, NY, USA

{liny9,jih}@rpi.edu2 Department of Psychology,

University of Southern California, Los Angeles, CA, USA{jehoover,mdehghan,mooijman}@usc.edu

3 Department of Computer Science,University of Southern California, Los Angeles, CA, USA

Abstract

In this paper, we address the problem ofdetecting expressions of moral values intweets using content analysis. This is aparticularly challenging problem becausemoral values are often only implicitly sig-naled in language, and tweets contain lit-tle contextual information due to lengthconstraints. To address these obstacles,we present a novel approach to automati-cally acquire background knowledge froman external knowledge base to enrich inputtexts and thus improve moral value pre-diction. By combining basic text featureswith background knowledge, our overallcontext-aware framework achieves perfor-mance comparable to a single human an-notator. To the best of our knowledge,this is the first attempt to incorporate back-ground knowledge for the prediction ofimplicit psychological variables in the areaof computational social science.

1 Introduction

Moral values are principles that define right andwrong for a given individual. They influence de-cision making, social judgments, motivation, andbehavior and are thought of as the glue that holdssociety together (Haidt, 2012). However, moralvalues are not universal, and disagreements aboutwhat is moral or sacred can give rise to seeminglyintractable conflicts (Dehghani et al., 2010; Gin-ges et al., 2007). Accordingly, public demonstra-tions and protests often involve moral conflicts be-tween different groups. For example, as Figure 1shows, during the 2015 Baltimore protests1, usersposted their viewpoints about this event on Twitter,

1https://en.wikipedia.org/wiki/2015 Baltimore protests

demonstrating divergent and even opposite moralvalues.

Detecting moral values in user-generated con-tent not only can provide insight into these con-flicts but also inform applications that aim tomodel social phenomena such as voting behaviorand public opinions. For example, (Koleva et al.,2012) shows that moral concerns play an impor-tant role in one’s attitude and ideological positionacross a wide range of issues, such abortion andsame-sex marriage. Moral values have also beenused to investigate various political attitudes in theUnited States. Liberals and conservatives attendto different moral intuitions (Graham et al., 2009):Liberals focus on the notions of Harm and Fair-ness, while conservatives attend to ideas of Loy-alty to in-group members, Authority, and Purity.

Figure 1: Tweets related to 2015 BaltimoreProtest.

In this work, we predict the moral values ex-pressed in social media text via a suite of NaturalLanguage Processing (NLP) techniques. A giventext can contain any one or more moral values, asdefined by Moral Foundation Theory (MFT, elab-orated in Section 2) (Graham et al., 2013), or itcan be non-moral. In previous work, computa-tional linguistic measurements of latent attributessuch as moral values, personality, and politicalorientation have primarily relied on textual fea-tures directly derived from target texts; these fea-tures have ranged from n-grams, word embed-

arX

iv:1

709.

0546

7v1

[cs

.CL

] 1

6 Se

p 20

17

Page 2: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

dings, emoticons, to word categories (Rao et al.,2010; Tumasjan et al., 2010; Golbeck et al., 2011;Conover et al., 2011; Schwartz et al., 2013; De-hghani et al., 2014, 2016). While such approachescan yield powerful representations of text, they fallfar short of human representation, which is greatlyenhanced by the capacity to actively acquire back-ground knowledge for reasoning and prediction.In the domain of moral value detection, the capac-ity for external knowledge integration is particu-larly important. For example, consider the tweetshown in Figure 2. A reader who has no knowl-edge of “Westboro Baptist” could look it up andlearn that it is a church known for anti-LGBT andracist hate speech. This reader might then inferthat this tweet conveys moral values concerningPurity/Degradation and Fairness/Cheating. Con-versely, an algorithm that lacks access to back-ground knowledge would be unable to exploit thisinformation-rich indicator. Accordingly, we applyEntity Linking (EL) to identify entities in tweets,link them to an external knowledge-base (KB;Wikipedia in this work), and acquire their abstractdescriptions and properties. From the backgroundknowledge, we extract words showing a strongcorrelation with each moral foundation as addi-tional discriminative features to improve the pre-diction.

Figure 2: Example of Westboro Baptist Church.

Overall, this paper makes the following contri-butions:

1. We introduce various NLP techniques, suchas entity linking and part-of-speech tagging, totackle the problem of moral value prediction,which provides a new insight into the inference oflatent semantic attributes in social media.

2. In the area of computational social science,most previous work involving applications of NLPto psychological measurement has relied exclu-sively on features derived directly from input text.

Due to the brevity and informality of tweets, how-ever, textual features alone may not be sufficientfor high-quality prediction. To address this issue,we acquire and incorporate background knowl-edge into our language models in order to betterrepresent tweets and we use moral value predic-tion as case study for this approach. To the best ofour knowledge, this is the first work to actively ac-quire background knowledge to enrich contextualinformation for more precise prediction in compu-tational social science applications.

2 Moral Foundation Theory

What a given person holds to be moral or immoralcan vary widely as a function of individual differ-ences, and contextual and cultural factors. MoralFoundations Theory (Graham et al., 2013) 2 aimsto explain this variability as a function of five coremoral factors or foundations that appear acrosscultures, as shown in Table 1. These foundationsaccount for various aspects of morality that servedifferent but related social functions. Degree ofsensitivity towards them vary across different cul-tures and can change over time.

Foundation DefinitionCareHarm

Prescriptive moral values such as caring forothers, generosity and compassion and moralvalues prohibiting actions that harm others.

FairnessCheating

Prescriptive moral values such as fairness, jus-tice, and reciprocity and moral values pro-hibiting cheating.

LoyaltyBetrayal

Prescriptive moral values associated withgroup affiliation and solidarity and moral val-ues prohibiting betrayal of one’s group.

AuthoritySubversion

Prescriptive moral values associated with ful-filling social roles and submitting to authorityand moral values prohibiting rebellion againstauthority.

PurityDegradation

Prescriptive moral values associated with thesacred and holy and moral values prohibitingviolating the sacred.

Table 1: Moral foundation definitions.

Given the importance of human morality for so-cial functioning (Haidt, 2012), it is perhaps un-surprising that our moral values leave residue incultural artifacts such as texts. Indeed, researchindicates that variation in moral rhetoric can reli-ably distinguish between cultural groups (Grahamet al., 2009), is responsive to environmental dis-turbances such as terrorism (Sagi and Dehghani,2014), and predicts psychologically relevant be-havior (Dehghani et al., 2016).

2http://moralfoundations.org/

Page 3: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

While classifying the ground-truth moral con-tent of a text is ultimately subjective and imper-fect, general sentiment associated with the foun-dations above has been shown to be a sufficientproxy for models making secondary predictions(Graham et al., 2009; Sagi and Dehghani, 2014;Garten et al., 2016; Dehghani et al., 2016).

In Table 2, we list real tweets on the topic ofHurricane Sandy extracted from our data set thatreflect each of the five foundations.

Foundation ExampleCareHarm

Loss of material things hurts but loss of peo-ple and pets is devastating Sending prayersto all who were affected by Sandy

FairnessCheating

Complicit lap dog biased corrupt media issaying Obama has done good job w SandyWHAT LIES Organization & Distributionget double F s

LoyaltyBetrayal

Love my fellow brothers and sisters in NewJeersey [sic] And fellow Americans stand-ing strong as a nation Sandy please donate tolocal shelters

AuthoritySubversion

I maintain a profound respect forgovchristie newjersey sandy AT USERhumanitarian

PurityDegradation

God bless these men Truly touched by theirdedication AT USER gentTN Tomb guardsIncredible Sandy

Table 2: Tweets reflecting each of the foundations.

3 Approach Overview

In this work, our goal is to predict the moral valuesexpressed in social media text based on the MoralFoundation Theory via a suite of Natural Lan-guage Processing techniques. For example, moralvalues on Care/Harm and Purity/Degradation areexpected to be detected from the following tweet– “The Lord our Shepherd will keep & protect ev-eryone on the East Coast Apply wisdom & be safe.Listen to the Spirits nudge. Love you. #Sandy”.Thus, we define the Moral Value Prediction prob-lem as follows:

DEFINITION: Given a set of documents X ={x1, ..., xn} regarding a certain topic and a set ofmoral foundations F = {f1, ..., fm}, for each x ∈X , return a binary vector y = {y1, ..., ym}, whereyj indicates whether x reflects concern on fj .

3.1 Framework

Figure 3 depicts the overall framework. In the Tex-tual Feature Extraction module, textual featuresare extracted from the tweet and encoded into sep-arate vectors. At the beginning of the Background

Figure 3: Overall framework.

Knowledge Extraction module, we apply entitylinking to each tweet and acquire abstract descrip-tions and properties of linked entities from the KB.Next, we select information that is relevant to thetarget moral foundation from the prior knowledgeand represent it as a vector. Lastly, all vectors areconcatenated as input to a binary classifier.

We train a separate classifier that returns yj foreach foundation fj and merge classification resultsfrom all classifiers as y. A tweet will be predictedas “Non-moral” if all classifiers return False.

3.2 Learning Model

In previous studies on predicting attributes suchas gender, personality, power, and political ori-entation (Gomez et al., 2003; Burger et al.,2011; Schwartz et al., 2013; Park et al., 2015;Katerenchuk and Rosenberg, 2016), a document isusually modeled as a bag of words and representedby counting the frequency of each feature or ag-gregating embeddings of words. A major draw-back to this approach is that bag-of-words mod-els disregard word order and relationships betweenwords that may serve as important information forclassification. Consider the following tweets thatmention “governor”:∗ [AUTHORITY] Love our governor’s honesty#njsandy∗ [FAIRNESS] Only 14 months till marriage#Equality comes to NJ, when @CoryBooker issworn in as next governor.

In the first tweet, two positive words “love”and “honesty” around “governor” obviously re-flect the user’s attitude towards him. In the sec-ond one, however, “governor” is not closely inter-twined with other words and only modified by a

Page 4: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

neutral word “next”. Because bag-of-words fea-tures ignore such context, the classifier may mis-takenly assign Authority/Subversion to tweet 2 if“governor” is selected as a feature.

To address this issue, we experimented with var-ious supervised learning models and found that theRecurrent Neural Network-based classifier withlong short-term memory (LSTM) (Hochreiter andSchmidhuber, 1997) performed the best. LSTM isa specific Recurrent NN variant designed to bet-ter model long-term dependencies. LSTM cellstake as input a sequence of embeddings of words{w1, w2, ..., wl} in a tweet and output hiddenstates {h1, h2, ..., hl} in succession. We use thelast output hl, which is expected to encode in-formation of the entire tweet, and handle it witha fully connected layer. Extra features includingbackground knowledge are represented as separatevectors and processed using fully connected layersas well. Finally, we concatenate processed vectorsand add a softmax layer on top for classification.To prevent overfitting, we apply dropout to outputsof the embedding, LSTM, and fully connected lay-ers, and L2 regularization to the weight of the soft-max layer. We train a separate classifier for eachfoundation and merge results from all classifiers.

3.3 Textual Features

In this paper, we use the following textual features.Word Embedding: Word embedding is a dense

distributed representation which embeds wordsto a low-dimensional space to encode their se-mantic and syntactic information. We use 300-dimensional Word2Vec embeddings trained onGoogle News3. We also carried out experimentsusing unigram and/or bigram as features. As theydidn’t outperform word embedding and the latteris a common input to neural networks, we choseembeddings as the basic feature.

Moral Foundation Dictionary: Linguistic In-quiry and Word Count (LIWC) (Pennebaker et al.,2001) is a program that counts the proportionof words in different psychologically meaning-ful categories. Researchers have reported suc-cess applying LIWC to a range of social psychol-ogy problems (Pennebaker, 2011; Schwartz et al.,2013). In this work, we use Moral FoundationDictionary (Graham et al., 2009), a LIWC dictio-nary that contains 324 foundation-supporting andfoundation-violating words and word stems under

3https://code.google.com/archive/p/word2vec

11 categories. It can be regarded as a kind ofmoral-oriented prior knowledge, while it is not asrich as the knowledge we propose to utilize.

4 Background Knowledge

Prior knowledge plays a critical role in how ahuman reader comprehends texts. As discussedabove, background knowledge is also importantin understanding expressions of moral concerns.For example, to perceive the Fairness/Cheatingand Purity/Degradation-related moral concerns inthe sentence “we would also like to ban KKK”,we need to know that “KKK” refers to Ku Klux

Klan, hate groups opposing the Civil RightsMovement and same-sex marriage.

4.1 Background Knowledge Acquisition

To incorporate background knowledge, we applyentity linking to associate mentions with their ref-erent entities. Next, we develop a set of criteriato automatically remove or correct erroneous link-ing results based on their types, linking confidencescores, or part-of-speech tags. From the KB, weextract structured and unstructured information ofremaining entities. We will elaborate each stepwith the example illustrated in Figure 4.

Entity linking. First, we identify and linkmentions to entities in the KB using TAGME (Fer-ragina and Scaiella, 2012), a system developed tolink mentions to pertinent Wikipedia pages. Wechoose this tool because most entity linkers aredesigned for formal texts such as news articles,while TAGME is intended for short texts andincludes a special mode to handle hashtags,usernames, and URLs in tweets. TAGME providesan open API4, which returns a response includingidentified mentions, offsets, confidence scores,and Wikipedia titles. For tweet 1 in Figure 4,the linker identifies “Booker”, “everything”,and “him” and associates them with “GeorgeWilliam Booker”, “Everything (Michael

Buble song)”, and “HIM (Finnish band)”.Result refinement. TAGME not only anno-

tates capitalized phrases, thereby covering morementions in poorly composed tweets at the costof aggressively identifying some non-name wordsas mentions, such as “everything” and “him” inthis example. Additionally, the lack of infor-mation that contextualizes the mention stands anobstacle to entity disambiguation. For example,

4https://sobigdata.d4science.org/web/tagme/

Page 5: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

Figure 4: Acquiring background knowledge.

with the only clue - “politician” - in tweet 1, itis difficult to determine whether “Booker” refersto George William Booker or Cory Booker asboth of them are politicians. To reduce thesetwo types of errors, namely spurious annota-tions (“everything” and “him”) and linking errors(“Booker”→George William Booker), we re-fine the results based on the following attributes:

1. Linking confidence score. For each annota-tion, TAGME returns a confidence score (ρ) thatestimates the linking quality. We remove enti-ties with a low score (< 0.1) such as George

William Booker (ρ = 0.021) in the example.2. Type of entity. Under most circumstances,

concepts incorrectly linked to non-name words areliterary or musical work entities, such as songs andbooks, which are more possibly titled using com-mon words. We collect all entity types in DBpe-dia5 and manually discard 113 types.

3. Part-of-speech. In general, a single verb, ad-jective, adverb, pronoun, determiner, or preposi-tion is unlikely to be a name. Thus, “him” actingas a pronoun in tweet 1 should not be marked as aname. We utilize a tweet-oriented part-of-speechtagger (Owoputi et al., 2013) to annotate the part-of-speech of each word. If no word in a mentionmatches any nominal tag, we will remove the as-sociated entity from the results.

Cross-document propagation. In the previousstep, we present rules to reduce spurious annota-tions and linking errors. For the latter case, how-ever, our goal is to leverage prior knowledge ratherthan merely eliminating incorrect entities. There-fore, if the linker returns an annotation with a lowscore, we reject it and try to retrieve the refer-ent entity from annotations of the same mentionin other documents. We make the following as-

5http://wiki.dbpedia.org/

sumption: within a topic, when people mentionthe same name, they usually refer to the same en-tity. For example, in tweets regarding architecture,it is very likely that all mentions of “Zaha” refer toZaha Hadid, an architect, instead of Wilfried

Zaha, a footballer. Analogously, as it is difficult todetermine the referent entity of “Booker” in tweet1, we check annotations of other “Booker”s in theentire corpus, find the most confident one (“CoryBooker” in tweet 2→Cory Booker, ρ = 0.536),and use it as the entity of “Booker” in tweet 1.

Knowledge extraction. Unlike human beings,machines still lack the ability to process and com-prehend complicated information (e.g., a man car-ries an American flag upside down in an imagein the Westboro Baptist Church Wikipediapage, which can be viewed as a political statementor an act of desecration and disrespect) or disre-gard information contributing little to moral valueprediction (e.g., population of New York State).For this reason, we only derive two types of con-structive knowledge that can be processed and uti-lized by existing techniques and are applicable tomost entities from the KB as follows.

1. Entity abstract: a summary of an entity,which usually contains useful facts such as defi-nition, office, party, and purpose.

2. Entity property: structured metadataand facts of an entity. We obtain entityproperties from DBpedia and keep the follow-ing: purpose, office, background, meaning,orderInOffice, seniority, title, and role.

4.2 Background Knowledge Incorporation

Many facts in the retrieved background knowl-edge, however, are irrelevant to the prediction ofmoral values (e.g., term of office and educationinformation of Cory Booker). In addition, unliketweets that are limited to 140 characters, an ab-

Page 6: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

stract can either be very concise or contain up tohundreds of words. In order to extract discrimina-tive information from the background knowledge,we design a pointwise mutual information (PMI)-based approach as follows.

We first merge the abstract and property valuesof an entity into a document. For example, themerged document of Cory Booker is “Cory An-thony Booker (Born April 27, 1969) is an Amer-ican politician ... Mayor of Newark, New Jer-sey. Democratic Party (United States)”. Afterthat, we calculate the document-based PMI withcorpus level significance (cPMId) (Damani, 2013)between each word in the document and the tar-get foundation. We rank all words with respectto their cPMId’s and choose the top k (k = 100 inour experiments). Thus, we extract words stronglyrelated to each moral foundation as features.

Lastly, we encode the background knowledge asa vector consisting of k elements, each of whichrepresents the term frequency of a selected word.However, consider this case where “hurt” is a fea-ture, while “injury” is the word used in a docu-ment. We will ignore “injury” although it is asynonym for “hurt”. To encode the backgroundknowledge in a “softer” and more generalizableway, we calculate the cosine similarity sim(u,w)between embeddings of feature u and each wordw. If sim(u,w) exceeds a chosen threshold (0.6in this work), we regard w as an occurrence of u.

5 Experiments

5.1 Data Set

For this work, we use a corpus of 4, 191 tweetsrandomly sampled from a larger corpus of 7 mil-lion Tweets containing hashtags relevant to Hur-ricane Sandy, a hurricane that caused major dam-age to the Eastern seaboard of the United States in2012. All tweets included in these analyses wereprocessed to strip user mentions, URLs, and punc-tuation.

To establish ground truth for our analyses,three trained annotators coded the 4, 191 sam-pled tweets6. Coder training consisted of mul-tiple rounds of annotation and discussion. Af-ter completing training, annotators coded for thepresence or absence of each moral foundation di-mension. Additionally, tweets that contained no

6The data set and annotation guideline designed based onthe Moral Foundation Theory will be published in a separatepaper.

moral rhetoric were coded as “Non-moral”. Gold-standard classes for each tweet were then gener-ated by taking the majority vote for each classacross all three coders. Each tweet can be an-notated with more than one moral concern at thesame time.

Foundation Positive Negative Pos:NegCare/Harm 1,802 2,389 1:1.33 (0.75)Fairness/Cheating 667 3,524 1:5.28 (0.19)Loyalty/Betrayal 574 3,617 1:6.30 (0.16)Authority/Subversion 935 3,246 1:3.47 (0.29)Purity/Degradation 159 4,032 1:25.4 (0.04)Non-moral 713 3,478 1:4.88 (0.21)

Table 3: Data set statistics. Note that “positive”and “negative” do not refer to virtue and vice ofa foundation. Rather, they indicate whether moralconcern on a foundation (e.g., Fairness/Cheating)is reflected in a tweet or not.

Class frequency analyses of the coded corpusrevealed considerable negative bias, such that theabsence of each class occurred with greater fre-quency than its presence (See Table 3). However,this is unsurprising, as there is no reason to expecthalf or even close to half of the texts in this corpusto evoke a given moral domain. Nonetheless, ex-treme imbalance like this can inhibit classifier per-formance by inducing classification bias and fail-ing to sufficiently represent the population of theinfrequent class. To account for this in our exper-iments, we up-sample positive classes to preventbias toward the majority class.

To evaluate the annotation quality of this cor-pus, we measure inter-annotator agreement (IAA)using prevalence-adjusted bias-adjusted kappa(PABAK) (Sim and Wright, 2005), which is suit-able for imbalanced data. Based on the widelyreferenced standards for Kappa proposed in (Lan-dis and Koch, 1977), IAA scores of this data setrange from moderate (0.41− 0.60) to almost per-fect (0.81− 1.00).

5.2 Overall Results

We evaluate our model with three feature sets:word embedding alone (E), the combinationof word embedding and background knowledge(E+BK), and the combination of all features(E+BK+MFD). Model performance is evaluatedusing F-scores generated from 10-fold cross-validation in Figure 4.

Our experiment results provide evidence that in-tegrating background knowledge into the repre-

Page 7: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

Foundation E E+BK E+BK+MFDCare/Harm 81.2 82.3 81.9Fairness/Cheating 66.1 70.7 70.8Loyalty/Betrayal 47.2 50.3 50.3Authority/Subversion 68.3 69.3 69.9Purity/Degradation 34.7 37.4 37.0Non-moral 61.7 64.2 63.5

Table 4: Overall results (%, F-score). E, BK, andMFD represent embedding, background knowl-edge, and Moral Foundation Dictionary, respec-tively.

sentation of tweets improves detection of moralvalues. The following example demonstratesthis process for a tweet which contains Author-ity/Subversion rhetoric:∗ [AUTHORITY] Holy shit Chris Christie is ask-ing for federal funds Sounds like a self hatingrepublican to me hurricanesandy

After linking “Chris Christie” and “republican”to Chris Christie and Republican Party

(United States), we know the former is the55th Governor of New Jersey and the latter a ma-jor political party in the United States. As our au-tomatic approach selects politics-related words in-cluding “governor” and “party” as features, suchbackground knowledge effectively confirms themoral sentiment on Authority/Subversion in thistweet.

In another example:∗ [PURITY] Hurricane Sandy is an opportunity forbelievers to embody the perfect peace Isaiah 26 3talks about as we trust in HIM hurricanesandy

Although we successfully link “Isaiah” and usethe prior knowledge to correct the prediction, thelinker fails to associate “HIM” with God, whichillustrates the limitations of existing techniques.Humans are able to make a quick inference aboutthe referent of “HIM” from its distinct uppercaseform because pronouns referring to God are oftencapitalized or uppercased. In contrast, it is difficultfor machines to distinguish different “HIM”s (e.g.,a common yet uppercased pronoun, a pronoun re-ferring to God, the Finnish rock band, etc.), espe-cially in poorly composed texts such as tweets.

It should also be noted that there seems to bea relationship between the prevalence of the posi-tive class for a given dimension and performancefor that dimension. For example, we observe thatall models perform well on Care/Harm, for whichthe data is relatively balanced (See Table 3 and 4),while they produce particularly low scores on the

most imbalanced foundation, Purity/Degradation.In addition, we observe that adding the Moral

Foundation Dictionary does not further improvethe performance if we have background knowl-edge. Our framework automates the extraction ofthe latter, hence saving much manual effort.

5.3 Comparison with the Human Annotator

While we have demonstrated the viability of ourapproach for classifying moral rhetoric, to trulyevaluate the performance of these models it is nec-essary to compare them to human coder perfor-mance. To do this, we had a minimally trainedfourth coder annotate a sample of 300 tweets andused both the coder’s annotations and the predic-tions from our model to predict moral concerns onthese tweets. This enables us to compare the per-formance of the model to the performance of anindependent human annotator.

Foundation 4th Coder Our ModelCare/Harm 76.0 76.3Fairness/Cheating 76.6 72.3Loyalty/Betrayal 62.2 69.5Authority/Subversion 68.5 67.8Purity/Degradation 61.8 54.8Non-moral 77.9 69.2

Table 5: A comparison of performance betweenhuman and our method (%, F-score).

On most categories, our model performs com-parably to the human annotator (see Table 5).Though, notably, the model again obtains a lowscore on Purity/Degradation. We also observe alarge gap in the prediction of Non-moral, whichmay indicate that humans have a stronger abilityto recognize tweets without moral content.

We also observe that although our modelachieves comparable performance to the humanannotator, the latter is superior in understandingdeeper information in text to make inference. Forexample, in the following tweet:∗ [LOYALTY] There needs to be a proper balancebetween individual responsibility and collectiveobligation Superstorm Sandy has shown us that

Although “individual responsibility” and “col-lective obligation” are not typical words for Loy-alty/Betrayal, a human reader is able to under-stand that the author’s concern on this foundationis reflected when discussing the balance between“individual responsibility” and “collective obliga-tion”. The model, however, is unable to capturetheir relationship to make the correct prediction.

Page 8: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

5.4 Remaining Challenges

Despite the effectiveness of our proposed model,we encounter some unsolved problems over thestudy period. We summarize the main remainingchallenges as follows.

Tweets are often too short to provide contextualcues sufficient for entity disambiguation. For ex-ample, for the tweet “Willard is a Frickin LyingHypocrite”, it is hard for the entity linking systemto determine which entity “Willard” refers to. Ad-ditionally, tweets are often poorly composed andneed to be normalized. People extensively useelements such as hashtags, abbreviations, slangs,and emoticons in tweets, which affects the perfor-mance of the entity linker and classifiers.

Further, knowledge from KBs is relatively staticand limited. Consider the following tweet, “Sandycould be God s answer to Obama letting his coun-trymen die in Benghazi and then lying about it”.The entity linker can easily link “Benghazi” tothe Benghazi city. However, the real concernedknowledge is the attack against United States gov-ernment facilities in Benghazi in 2012 instead ofother facts, such as the population of the city, inthe KB. To address this issue, we need to exploitmore comprehensive knowledge of other types orfrom other sources, such as news and tweets.

Additionally, in this work, entity types to re-move and property types to keep in the back-ground knowledge extraction step are manuallyselected due to the limitation of data size. Man-ual selection may introduce individual biases andweaken generalization ability of the model onother corpora and domains. With enough occur-rences of entity and property types, a number ofautomatic feature selection methods are applica-ble, such as mutual information, chi square, andinformation gain.

6 Related Work

Recently NLP techniques have been success-fully applied to computational social science.Combined with social network analysis, textualcontent analysis has shown promise in applica-tions such as prediction of moral value (Sagiand Dehghani, 2014; Dehghani et al., 2016),power (Gomez et al., 2003; Prabhakaran andRambow, 2013; Katerenchuk and Rosenberg,2016), expertise (Horne et al., 2016), leadershiprole (Tyshchuk et al., 2013), personality (Gol-beck et al., 2011; Schwartz et al., 2013), gen-

der (Burger et al., 2011; Rao et al., 2010), hatespeech (Waseem and Hovy, 2016; Nobata et al.,2016), and social interaction (Althoff et al., 2014;Tan et al., 2016). This work has extensively stud-ied textual (e.g., n-gram and LIWC) and structuralfeatures (e.g., Twitter relationships) on a variety ofonline platforms.

Nevertheless, to the best of our knowledge, ourwork is the first attempt to incorporate backgroundknowledge through entity linking to enhance im-plicit content analysis in the area of computa-tional social science. It should be noted that al-though there is a study on incorporating back-ground knowledge into movie reviews classifica-tion by (Boghrati et al., 2015), their “backgroundknowledge” refers to articles describing the targetmovies, which act like the Moral Foundation Dic-tionary whereas are completely different from thebackground knowledge we actively extract fromthe knowledge base.

7 Conclusions and Future Work

Moral value prediction is a critical task for pre-dictingpsychological variables and events. Usingit as a case study, we demonstrate the importanceof acquiring background knowledge for extractingimplicit information through our new framework.Our framework can also be adapted for other im-plicit sentiment prediction tasks that are convert-ible to a multi-label classification problem, suchas detecting personality types through text analy-sis (Goldberg, 1990).

In the future, we will exploit more up-to-datebackground knowledge from wider sources suchas news articles. We also will detect specific moralvalue holders and target issues associated witheach moral concern (e.g., women’s rights is the is-sue of the moral concern on Fairness/Cheating in“oppression of women must be tackled”). More-over, we are interested in uniting moral value pre-diction with a variety of applications such as im-plicit community membership and leadership rolesdetection in social networks and event prediction.

We believe computational social science re-search can establish a bridge between NLP tech-niques and social science theories. We apply com-putational methods to analyze social phenomenasupported by social theories, while more complexand accurate models can help theory adjudicationin social science as well.

Page 9: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

ReferencesTim Althoff, Cristian Danescu-Niculescu-Mizil, and

Dan Jurafsky. 2014. How to ask for a favor: Acase study on the success of altruistic requests. InICWSM.

Reihane Boghrati, Justin Garten, Aleksandra Litvi-nova, and Morteza Dehghani. 2015. Incorporatingbackground knowledge into text classification. InCogSci.

John D. Burger, John C. Henderson, George Kim, andGuido Zarrella. 2011. Discriminating gender onTwitter. In EMNLP.

Michael Conover, Jacob Ratkiewicz, Matthew R.Francisco, Bruno Gonalves, Filippo Menczer, andAlessandro Flammini. 2011. Political polarizationon Twitter. In ICWSM.

Om P Damani. 2013. Improving pointwise mutualinformation (pmi) by incorporating significant co-occurrence. In CoNLL.

Morteza Dehghani, Scott Atran, Rumen Iliev, SonyaSachdeva, Douglas Medin, and Jeremy Ginges.2010. Sacred values and conflict over iran’s nuclearprogram. Judgment and Decision Making 5(7):540.

Morteza Dehghani, Kate Johnson, Joe Hoover, EyalSagi, Justin Garten, Niki Jitendra Parmar, StephenVaisey, Rumen Iliev, and Jesse Graham. 2016. Pu-rity homophily in social networks. Journal of Ex-perimental Psychology: General 145(3).

Morteza Dehghani, Kenji Sagae, Sonya Sachdeva, andJonathan Gratch. 2014. Analyzing political rhetoricin conservative and liberal weblogs related to theconstruction of the “Ground Zero Mosque”. Journalof Information Technology & Politics 11(1):1–14.

Paolo Ferragina and Ugo Scaiella. 2012. Fast and accu-rate annotation of short texts with wikipedia pages.IEEE Software 29(1):70–75.

Justin Garten, Reihane Boghrati, Joe Hoover, Kate MJohnson, and Morteza Dehghani. 2016. Morality be-tween the lines: Detecting moral sentiment in text.In IJCAI.

Jeremy Ginges, Scott Atran, Douglas Medin, andKhalil Shikaki. 2007. Sacred bounds on rationalresolution of violent political conflict. Proceedingsof the National Academy of Sciences 104(18):7357–7360.

Jennifer Golbeck, Cristina Robles, Michon Edmond-son, and Karen Turner. 2011. Predicting personalityfrom Twitter. In SocialCom/PASSAT .

Lewis R Goldberg. 1990. An alternative “descriptionof personality”: the big-five factor structure. Jour-nal of personality and social psychology 59(6):1216.

Daniel Gomez, Enrique Gonzalez-Aranguena, Con-rado Manuel, Guillermo Owen, Monica del Pozo,and Juan Tejada. 2003. Centrality and power in so-cial networks: a game theoretic approach. Mathe-matical Social Sciences 46(1):27–54.

Jesse Graham, Jonathan Haidt, Sena Koleva, MattMotyl, Ravi Iyer, Sean P Wojcik, and Peter H Ditto.2013. Moral foundations theory: The pragmatic va-lidity of moral pluralism. Advances in ExperimentalSocial Psychology 47:55–130.

Jesse Graham, Jonathan Haidt, and Brian A Nosek.2009. Liberals and conservatives rely on differentsets of moral foundations. Journal of personalityand social psychology 96(5):1029.

Jonathan Haidt. 2012. The righteous mind: Why goodpeople are divided by politics and religion. Vintage.

Sepp Hochreiter and Jurgen Schmidhuber. 1997.Long short-term memory. Neural computation9(8):1735–1780.

Benjamin D. Horne, Dorit Nevo, Jesse Freitas, Heng Ji,and Sibel Adali. 2016. Expertise in social networks:How do experts differ from other users? In ICWSM.

Denys Katerenchuk and Andrew Rosenberg. 2016. Hi-erarchy prediction in online communities. In AAAI.

Spassena P Koleva, Jesse Graham, Ravi Iyer, Pe-ter H Ditto, and Jonathan Haidt. 2012. Tracing thethreads: How five moral concerns (especially purity)help explain culture war attitudes. Journal of Re-search in Personality 46(2):184–194.

J Richard Landis and Gary G Koch. 1977. The mea-surement of observer agreement for categorical data.Biometrics 33(1):159–174.

Chikashi Nobata, Joel Tetreault, Achint Thomas,Yashar Mehdad, and Yi Chang. 2016. Abusive lan-guage detection in online user content. In WWW.

Olutobi Owoputi, Brendan O’Connor, Chris Dyer,Kevin Gimpel, Nathan Schneider, and Noah ASmith. 2013. Improved part-of-speech tagging foronline conversational text with word clusters. InNAACL.

Gregory Park, H Andrew Schwartz, Johannes C Eich-staedt, Margaret L Kern, Michal Kosinski, David JStillwell, Lyle H Ungar, and Martin EP Seligman.2015. Automatic personality assessment throughsocial media language. Journal of personality andsocial psychology 108(6):934.

James W Pennebaker. 2011. The Secret Life of Pro-nouns: What Our Words Say About Us. BloomsburyPress.

James W Pennebaker, Martha E Francis, and Roger JBooth. 2001. Linguistic inquiry and word count:LIWC 2001 .

Page 10: arXiv:1709.05467v1 [cs.CL] 16 Sep 2017edge into our language models in order to better represent tweets and we use moral value predic-tion as case study for this approach. To the best

Vinodkumar Prabhakaran and Owen Rambow. 2013.Written dialog and social power: Manifestations ofdifferent types of power in dialog behavior. In IJC-NLP.

Delip Rao, David Yarowsky, Abhishek Shreevats, andManaswi Gupta. 2010. Classifying latent user at-tributes in Twitter. In SMUC.

Eyal Sagi and Morteza Dehghani. 2014. Measuringmoral rhetoric in text. Social science computer re-view 32(2):132–144.

H Andrew Schwartz, Johannes C Eichstaedt, Mar-garet L Kern, Lukasz Dziurzynski, Stephanie M Ra-mones, Megha Agrawal, Achal Shah, Michal Kosin-ski, David Stillwell, Martin EP Seligman, et al.2013. Personality, gender, and age in the languageof social media: The open-vocabulary approach.PLOS ONE 8(9):e73791.

Julius Sim and Chris C. Wright. 2005. The kappastatistic in reliability studies: Use, interpretation,and sample size requirements. Physical Therapy85(3):257–268.

Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. 2016. Win-ning arguments: Interaction dynamics and persua-sion strategies in good-faith online discussions. InWWW.

Andranik Tumasjan, Timm Oliver Sprenger, Philipp GSandner, and Isabell M Welpe. 2010. Predictingelections with Twitter: What 140 characters revealabout political sentiment. ICWSM 10(1):178–185.

Yulia Tyshchuk, Hao Li, Heng Ji, and William A Wal-lace. 2013. Evolution of communities on twitterand the role of their leaders during emergencies. InASONAM.

Zeerak Waseem and Dirk Hovy. 2016. Hateful sym-bols or hateful people? predictive features for hatespeech detection on twitter. In NAACL Student Re-search Workshop.