detecting fake news with weak social supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 ·...

9
1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE Intelligent Systems Detecting Fake News with Weak Social Supervision Kai Shu * , Ahmed Hassan Awadallah , Susan Dumais , and Huan Liu * * Arizona State University, {kai.shu, huan.liu}@asu.edu Microsoft Research, {hassanam, sdumais}@microsoft.com Abstract—Limited labeled data is becoming one of the largest bottlenecks for supervised learning systems. This is especially the case for many real-world tasks where large scale labeled examples are either too expensive to acquire or unavailable due to privacy or data access constraints. Weak supervision has shown to be effective in mitigating the scarcity of labeled data by leveraging weak labels or injecting constraints from heuristic rules and/or extrinsic knowledge sources. Social media has little labeled data but possesses unique characteristics that make it suitable for generating weak supervision, resulting in a new type of weak supervision, i.e., weak social supervision. In this article, we illustrate how various aspects of social media can be used as weak social supervision. Specifically, we use the recent research on fake news detection as the use case, where social engagements are abundant but annotated examples are scarce, to show that weak social supervision is effective when facing the labeled data scarcity problem. This article opens the door to learning with weak social supervision for similar emerging tasks when labeled data is limited. Index Terms: social media, weak supervision, social networking SOCIAL MEDIA has become an impor- tant means of large-scale information sharing and communication in all occupations, including marketing, journalism, public relations, and more. Due to the increased usage and convenience of social media, more people seek out and receive timely news information online. For example, the Pew Research Center announced that approxi- mately 68% of US adults get news from social media in 2018, while only 49% reported seeing news on social media in 2012. However, social media also proliferates a plethora of misinfor- mation and disinformation, including fake news, i.e., news stories with intentionally false infor- mation [1]. For example, during the 2016 U.S. election, the top most frequently-discussed false stories generated 8,711,000 shares, reactions, and comments on Facebook, larger than the total of 7,367,000 top most-discussed true stories. Detect- ing fake news on social media is critical to avoid people to consume false information and cultivate a healthy and trustworthy news ecosystem. However, detecting fake news on social media poses several unique challenges [1]. First, the data challenge has been a major roadblock for researchers in their attempts to develop effective defensive means against disinformation and fake news. This is because the content of fake news and disinformation is rather diverse in terms of topics, styles and media platforms; and fake news attempts to distort the truth with diverse linguistic styles while simultaneously mocking true news. Thus, obtaining labeled fake news data is non- scalable and data-specific embedding methods are not sufficient for fake news detection with little IT Professional Published by the IEEE Computer Society c 2019 IEEE 1 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

Detecting Fake News withWeak Social Supervision

Kai Shu∗, Ahmed Hassan Awadallah†, Susan Dumais†, and Huan Liu∗∗Arizona State University, {kai.shu, huan.liu}@asu.edu†Microsoft Research, {hassanam, sdumais}@microsoft.com

Abstract—Limited labeled data is becoming one of the largest bottlenecks for supervisedlearning systems. This is especially the case for many real-world tasks where large scale labeledexamples are either too expensive to acquire or unavailable due to privacy or data accessconstraints. Weak supervision has shown to be effective in mitigating the scarcity of labeleddata by leveraging weak labels or injecting constraints from heuristic rules and/or extrinsicknowledge sources. Social media has little labeled data but possesses unique characteristicsthat make it suitable for generating weak supervision, resulting in a new type of weaksupervision, i.e., weak social supervision. In this article, we illustrate how various aspects ofsocial media can be used as weak social supervision. Specifically, we use the recent research onfake news detection as the use case, where social engagements are abundant but annotatedexamples are scarce, to show that weak social supervision is effective when facing the labeleddata scarcity problem. This article opens the door to learning with weak social supervision forsimilar emerging tasks when labeled data is limited.

Index Terms: social media, weak supervision,social networking

SOCIAL MEDIA has become an impor-tant means of large-scale information sharingand communication in all occupations, includingmarketing, journalism, public relations, and more.Due to the increased usage and convenience ofsocial media, more people seek out and receivetimely news information online. For example, thePew Research Center announced that approxi-mately 68% of US adults get news from socialmedia in 2018, while only 49% reported seeingnews on social media in 2012. However, socialmedia also proliferates a plethora of misinfor-mation and disinformation, including fake news,i.e., news stories with intentionally false infor-mation [1]. For example, during the 2016 U.S.

election, the top most frequently-discussed falsestories generated 8,711,000 shares, reactions, andcomments on Facebook, larger than the total of7,367,000 top most-discussed true stories. Detect-ing fake news on social media is critical to avoidpeople to consume false information and cultivatea healthy and trustworthy news ecosystem.

However, detecting fake news on social mediaposes several unique challenges [1]. First, thedata challenge has been a major roadblock forresearchers in their attempts to develop effectivedefensive means against disinformation and fakenews. This is because the content of fake newsand disinformation is rather diverse in terms oftopics, styles and media platforms; and fake newsattempts to distort the truth with diverse linguisticstyles while simultaneously mocking true news.Thus, obtaining labeled fake news data is non-scalable and data-specific embedding methods arenot sufficient for fake news detection with little

IT Professional Published by the IEEE Computer Society c© 2019 IEEE 1

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Page 2: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

labeled data. Second, the evolving challenge ofdisinformation and fake news makes it non-trivialto exploit the rich auxiliary information signalsdirectly. Fake news is usually related to newlyemerging, time-critical events, which may nothave been properly verified by existing knowl-edge bases (KB) due to the lack of corroborat-ing evidence or claims. Moreover, detecting fakenews at an early stage requires the predictionmodels to utilize minimal information from userengagements because extensive user engagementsindicate more users are already affected by fakenews.

Recently, learning with weak supervision hasbeen of great interest the research communityto mitigate the data scarcity problem for varioustasks. Social media data has unique propertiesthat make it suitable for deriving weak supervi-sion. First, social media data is big. We havelimited data for each individual. However, thesocial property of social media data links indi-viduals’ data together, which provides a new typeof big data. Second, social media data is linked.The availability of social relations determines thatsocial media data is inherently linked, meaningit is not independent and identically distributed.Third, social media data is noisy. Users in socialmedia can be both passive content consumers andactive content producers, causing the quality ofuser-generated content to vary. Social networksare also noisy with the existence of malicioususers such as spammers and bots. Therefore,social media data provides a new type of weaksupervision, weak social supervision, which hasgreat potentials to advance a wide range of ap-plications including fake news detection.

In this article, we propose a new type ofweak supervision from multi-faceted social mediadata, i.e., weak social supervision, and illustratehow to effectively derive and exploit the weaksupervision for learning with little labeled data.We discuss three major perspectives of the socialmedia data to derive weak social supervision forfake news detection: users, posts, and networks.Further, we introduce recent work on exploitingweak social supervision for effective and explain-able fake news detection. First, we illustrate howwe can model the relationships among publishers,news pieces, and social media users with user-based and network-based weak social supervision

to detect fake news effectively. Second, we showhow to leverage post-based weak social supervi-sion for discovering explainable comments whiledetecting fake news. Finally, we discuss severalopen issues and provide future directions of learn-ing with weak social supervision.

1. Learning with Weak SupervisionLearning with weak supervision is an impor-

tant and newly emerging research area, and thereare different ways of defining and approachingthe problem. One definition of weak supervisionis leveraging higher-level and/or noisier inputfrom subject matter experts (SMEs). The super-vision from SMEs are represented in the formof weak label distributions, which mainly comefrom the following sources: 1) inexact super-vision: a higher-level and coarse-grained super-vision; 2) inaccurate supervision: a low-qualityand noisy supervision; and 3) existing resources:using existing resources to provide supervision.Another definition categorizes weak supervisioninto inexact supervision, inaccurate supervision,and incomplete supervision [2]. The incompletesupervision means that a subset of training dataare given with labels, which essentially includesactive learning and semi-supervised learning tech-niques. Weak supervision can be formed in de-terministic (e.g., in the form of weak labels) andnon-deterministic (e.g., in the form of constraints)ways.

Incorporating Weak Labels Learning withnoisy (inaccurate) labels has been of great in-terest to the research community for varioustasks. Some of the existing works attempt torectify the weak labels by incorporating a losscorrection mechanism. Patrini et al. [3] utilizethe loss correction mechanism to estimate a labelcorruption matrix without making use of cleanlabels. Other works consider the scenario wherea small set of clean labels are available. Forexample, Veit et al. use human-verified labelsand train a label cleaning network in a multi-label classification setting [4]. In some cases,weak supervision is obtained with inexact labelssuch as coarse-grained labels. For example, objectdetectors can be trained with images collectedfrom the web using their associated tags as weaksupervision instead of locally-annotated data sets.

2 IT Professional

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Page 3: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

Injecting Constraints Directly learning withweak labels may suffer from the noisy labelproblem. Instead, representing weak supervisionas constraints can avoid noisy labels and encodedomain knowledge into the learning process ofprediction function. The constraints can be in-jected over the output space and/or the input rep-resentation space. For example, Stewart et al. [5]model prior physics knowledge on the outputs topenalize “structures” that are not consistent withthe prior knowledge. For relation extraction tasks,label-free distant supervision can be achieved viaencoding entity representations under transitionlaw from knowledge bases (KB). This type ofweak supervision, i.e., injecting constraints, isoften based on prior knowledge from domainexperts, which are jointly optimized with theprimary learning objective of prediction tasks.

2. Learning with Weak SocialSupervision

In the previous section, we introduced thedefinitions and techniques for learning with weaksupervision. In this section, we further formallydefine the problem of learning with weak socialsupervision, introduce how to derive weak socialsupervision and exploit it for fake news detection.

2.1. From Weak Supervision to Weak SocialSupervision

With the rise of social media, the web hasbecome a vibrant and lively realm where bil-lions of individuals all around the globe interact,share, post and conduct numerous daily activ-ities. Social media enables us to be connectedand interact with anyone, anywhere and anytime,which allows us to observe human behaviors inan unprecedented scale with a new lens. However,significantly different from traditional data, socialmedia data is big, incomplete, noisy, unstructured,with abundant social relations. This new type ofdata contains rich social interactions that canprovide additional signals for obtaining weak su-pervision. Next, we formally define the problemof learning with weak social supervision.

A training example consists of two parts:a feature vector (or instance) describing theevent/object, and a label indicating the ground-truth. Let D = {xi, yi}ni=1 denote a set ofn examples, with X = {xi}ni=1 denoting the

instances and Y = {yi}ni=1 the correspondinglabels. In addition, there is a large set of unlabeledexamples. Usually the size of the labeled set nis much smaller than the unlabeled set due tolabeling costs or privacy concerns.

For the widely available unlabeled samples,we generate weak social supervision by generat-ing weak labels or incorporating constraints basedon social media data. For weak labels, we aim tolearn a labeling function g : X → Y , whereX = {xj}Nj=1 denotes the set of N unlabeledmessages to which the labeling function is appliedand Y = {yj}Nj=1 as the resulting set of weaklabels. This weakly labeled data is then denotedby D = {xj, yj}Nj=1 and often n << N . Forformulating constraints, we aim to model priorknowledge from social signals on the representa-tion learning of examples with a constraint func-tion h : X × Y → R, to penalize structures thatare not consistent with our prior knowledge. Notethat g can also be applied to X to regularize therepresentation learning. In spite of the differentforms we model weak social supervision, we areactually aiming to estimate a label distributionp(y|x) from weak social supervision. We give thefollowing problem formulation of learning withweak social supervision.

Learning with Weak Social Supervision:Given little data with ground truth labels Dand a label distribution p(y|x) derived fromweak social supervision, learn a predictionfunction f : X → Y which generalizes wellonto unseen samples.

2.2. Deriving Weak Social SupervisionNext, we illustrate how to derive weak social

supervision for fake news detection. Generally,there are three major aspects of the social mediacontext: users, generated posts, and networks.

User-based: Fake news pieces are likely tobe created and spread by non-human accounts,such as social bots or cyborgs [6]. Thus, capturingusers’ profiles and characteristics as weak socialsupervision can provide useful information forfake news detection. User behaviors can indicatetheir characteristics [7] who have interactionswith the news on social media. These signals canbe categorized in different levels: individual-level

May/June 2019 3

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Page 4: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

and group-level [1]. Individual-level signals areextracted to infer the credibility and reliabilityfor each user using various aspects of user demo-graphics, such as registration age, number of fol-lowers/followees, number of tweets the user hasauthored, etc. Group-level user signals captureoverall characteristics of groups of users relatedto the news. The injected constraints of weaksupervision is that the spreaders of fake newsand real news may form different communitieswith unique characteristics that can be depictedby group-level signals.

Post-based: Users who are involved in newsdissemination process express their opinions,emotions via posts/comments, which providehelpful signals related to the veracity of newsclaims [8]. Recent research looks into stance,emotion, and credibility to improve the perfor-mance of fake news detection [1]. First, stances(or viewpoints) indicate the users’ opinions to-wards the news, such as supporting, opposing, etc.Typically, fake news can provoke tremendouslycontroversial views among social media users,in which denying and questioning stances arefound to play a crucial role in signaling claimsas being fake [1]. Second, fake news publishersoften aim to spread disinformation extensivelyand draw wide public attention. Longstandingsocial science studies demonstrate that the newswhich evokes high-arousal, or activating (awe,anger or anxiety) emotions is more viral on socialmedia [9]. Third, post credibility aims to infer theveracity of news pieces from the credibility ofthe posts on social media. The injected constraintof weak supervision is that the credibility of thenews is highly related to the credibility degree ofits relevant social media posts.

Network-based: Users form different net-works on social media in terms of interests,topics, and relations. Fake news disseminationprocesses tend to form an echo chamber cy-cle, highlighting the value of extracting network-based weak social supervision to represent thesetypes of network patterns for fake news de-tection [1]. Different types of networks can beconstructed such as friendship networks, diffu-sion networks, interaction networks, etc. First,friendship network plays an important role in fake

news diffusion. The fact that users are likely toform echo chambers [10], strengthens our need tomodel user social representations and to exploreits added value for a fake news study. Second, thenews diffusion process involves abundant tempo-ral user engagements on social media. Fake newsmay have a sudden increase in the number ofposts and then remain constant beyond a shorttime whereas, in the case of real news, the in-crease of the number of posts are more steady [1].In addition, an important problem along temporaldiffusion is the early detection of fake newswith limited amount of user engagements. Third,interaction networks describe the relationshipsamong different entities such as publishers, newspieces, and users. For example, the user-newsinteractions are often modeled by considering therelationships between user representations and thenews veracity values. Intuitively, users with lowcredibilities are more likely to spread fake news,while users with high credibility scores are lesslikely to spread fake news [11].

2.3. Exploiting Weak Social SupervisionEarlier, we illustrate different aspects that we

can derive weak social supervision from. It isworth mentioning that the extracted weak socialsupervision can involve single or multiple aspectsof the information related to users, content, andnetworks. In this section, we discuss learning withweak social supervision for fake news detectionin different settings including effective fake newsdetection and explainable fake news detection.Specifically first, we illustrate how we can modelthe user-based and network-based weak socialsupervision to detect fake news effectively. Sec-ond, we show how to leverage post-based weaksocial supervision for discovering explainablecomments for detecting fake news.

Effective Fake News Detection We aim toleverage weak social supervision as an auxiliaryinformation to perform fake news detection effec-tively. As an example, we demonstrate how wecan utilize interaction networks by modeling theentities and their relationships to detect fake news(see Figure 1). Interaction networks describe therelationships among different entities such as pub-lishers, news pieces, and users. Given the interac-tion networks the goal is to embed the different

4 IT Professional

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Page 5: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

Figure 1. An illustration of the relationships amongpublishers, news pieces, and users, which can bemodeled as weak social supervision to detect fakenews [11].

types of entities into the same latent space, bymodeling the interactions among them. We canleverage the resultant feature representations ofnews to perform fake news detection, and we termthis framework Tri-relationship for Fake Newsdetection (TriFN) [11].

Social science research has demonstrated thefollowing observations which provide motivationsto derive rules of weak social supervision: peo-ple tend to form relationships with like-mindedfriends, rather than with users who have opposingpreferences and interests [10]. Thus, connectedusers are more likely to share similar latent inter-ests in news pieces. for publishing relationship,we exploit the following weak social supervision:publishers with a high degree of political biasare more likely to publish fake news. Moreover,for the spreading relation, we have: users withlow credibilities are more likely to spread fakenews, while users with high credibility scores areless likely to spread fake news. . We utilize non-negative matrix factorization (NMF) to learn thenews representations by encoding the weak socialsupervision. Specifically, the label distributionp(y|x) is estimated by injecting constraints intothe heterogeneous network embedding frameworkfor learning the news representations: (1) forpublishing relationship, we enforce that the news

representation should be good at predicting thepartisan bias of its publisher; (2) for the spreadingrelationship, we constrain that the news presen-tation and user representation are close to eachother if the news is fake and the user is less-credible, and vice versa.

Empirical Results To illustrate whether theweak social supervision in TriFN can help todetect fake news effectively, we show someempirical comparison results in the publicbenchmark Politifact dataset from FakeNewsNet(github.com/KaiDMML/FakeNewsNet) asin Figure 2, which consists of 120 true newsand 120 fake news pieces, with 91 publishers and23,865 users. We compare TriFN with baselinesthat 1) only extract features from news contents,such as RST [12], LIWC [13]; 2) only con-struct features from social supervision, such asCastillo [14]; and 3) consider both news contentand social supervision, such as RST+Castillo,LIWC+Castillo. In particular, (1) RST [12] standsfor Rhetorical Structure Theory, which buildsa tree structure to represent rhetorical relationsamong the words in the text; (2) LIWC [13]stands for Linguistic Inquiry and Word Count,which is widely used to extract the lexiconsfalling into psycholinguistic categories. It’s basedon a large sets of words that represent psycholin-guistic processes, summary categories, and part-of-speech categories; (3) Castillo [14] extract var-ious kinds of features from those users who haveshared a news item on social media. The featuresare extracted from user profiles and friendshipnetwork. We also include the credibility score ofusers as an additional social context feature.

We can see that the proposed TriFN canachieve around 0.75 accuracy even with a limitedamount of weak social supervision (within 12hours after the news is published), and has ashigh as 0.87 accuracy. In addition, with the helpof weak social supervision from publisher-biasand user-credibility, the detection performance isbetter than those without utilizing weak socialsupervision. Moreover, we can see within a cer-tain range, more weak social supervision leads tothe larger performance increase, which shows thebenefit of using weak social supervision.

Explainable Fake News Detection In re-cent years, computational detection of fake news

May/June 2019 5

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Page 6: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

hours12 24 36 48 60 72 84 96 All

Accuracy

0.5

0.6

0.7

0.8

0.9

1

RST

LIWC

Castillo

RST +Castillo

LIWC +Castillo

TriFN

Figure 2. The performance of fake news detectionwith different amount of weak social supervision. Noweak social supervision is incorporated in RST, LIWC,while Castillo only encodes weak social supervision.TriFN, which utilizes both labeled data and weaksocial supervision, can achieve the best performance.

User 2

User 1

User 3

User 5

… User 4

Figure 3. A piece of fake news with related user com-ments on social media. Some explainable commentsare directly corresponding to the sentences in newscontents.

has been producing some promising early results.However, there is a critical missing piece of thestudy, the explainability of such detection, i.e.,why a particular piece of news is detected asfake. Here, we introduce how we can deriveexplanation factors from weak social supervision.

We observe that not all sentences in newscontents are fake, and in fact, many sentencesare true but only for supporting the false claimsentences. Thus, news sentences may not beequally important in determining and explainingwhether a piece of news is fake or not. Similarly,user comments may contain relevant informationabout the important aspects that explain whya piece of news is fake, while they may alsobe less informative and noisy. For example, inFigure 3, we can see users discuss differentaspects of the news in comments such as “St.

dEFEND

dEFEND\Co

dEFEND\N

dEFEND\C

0.7

0.75

0.8

0.85

0.9

0.95

F1

Accuracy

Figure 4. Assessing the effects of news contents andweak social supervision based on user comments.

Nicholas was white? Really??Lol,”which directly responds to the claims in the newscontent “The Holy Book always saidSanta Claus was white.”

Therefore, we use the following weak socialsupervision: the user comments that are related tothe content of original news pieces are helpful todetect fake news and explain prediction results.The label distribution p(y|x) is also estimatedby injecting constraints such that: semanticallyrelated news sentences and user comments areattended to predict and explain fake news. Thus,we aim to select some news sentences and usercomments that can explain why a piece of newsis fake. As they provide a good explanation, theyshould also be helpful in detecting fake news.This suggests us to design attention mechanismsto give high weights of representations of newssentences and comments that are beneficial tofake news detection. Specifically, we first useBidirectional LSTM with attention to learn sen-tence and comment representations, and thenutilize a sentence-comment co-attention neuralnetwork framework called dEFEND to exploitboth news content and user comments to jointlycapture explainable factors.

Empirical Results We show the empiricalresults on Politifact platform from FakeNewsNetas in Figure 4, which consists of 145 truenews and 270 fake news pieces, with 89,999comments from 68,523 users. The labelsare manually assigned by journalist expertsfrom the fact-checking websites such as

6 IT Professional

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Page 7: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

PolitiFact.com, and the social interactionssuch as users and their comments are collectedfrom Twitter. We can see dEFEND achieves veryhigh performances in terms of accuracy (∼ 0.9)and F1 (∼ 0.92). We compare dEFEND withthree variants: 1) dEFEND\C not consideringinformation from user comments; 2) dEFEND\Nis not considering information from newscontents; and 3) dEFEND\Co eliminates thesentence-comment co-attention. We observe thatwhen we eliminate news content component,user comment component, or the co-attentionfor news contents and user comments, theperformances are reduced. It indicates capturingthe semantic relations between the weaksocial supervision from user comments andnews contents are important. The evaluationof explainability includes the perspectives ofnews sentences and user comments. First, theMean Average Precision (MAP) is adoptedas the metric to evaluate how explainableare news sentences. The results indicate thatdEFEND can achieve better MAP scores thanbaselines such as HAN [15]. Second, we useAmazon Mechanical Turk to perform humanevaluation on ranking the explainable comments,and Normalized Discounted Cumulative Gain(NDCG) as the metric. We observe dEFENDcan achieve better NDCG performance tocapture explainable comments than baselines.Moreover, we also illustrate the case study of

using weak social supervision as an explanationin Figure 5. We can see that: dEFEND can rankmore explainable comments higher than non-explainable comments. For example, comment“...president does not have thepower to give citizenship...” isranked at the top, which can explain exactly whythe sentence “granted U.S. citizenshipto 2500 Iranians including familymembers of government officials”in the news content is fake; In addition, we cangive higher weights to explainable commentsthan those interfering and unrelated comments,which can help select more related comments tohelp detect fake news. For example, unrelatedcomment “Walkaway from their...” hasan attention weight 0.0080, which is less thanan explainable comment “Isn’t graft and

payoffs normally a offense” with anattention weight 0.0086.

A senior Iranian cleric and member of parliament

has just dropped a bombshell.

He is claiming that the Obama administration, as

part of negotiating during the Iran Deal, granted U.S.

citizenship to 2500 Iranians including family

members of government officials.

...

ere have been so many things hidden from the

public about the Iran Deal if this was one more thing

given up in bribe, it wouldn’t be hard to believe.

If you had done your research, you

would know that the president does not

have the power to give citizenship. is

would have to done as an act of

congress... (0.0160)148 Comments

Isn’t gra! and payoffs normally a

offense even for a ex-president?

(0.0086)

Wow! What’s frightening is where will

it end? We could be seeing some

serious issues here. (0.0051)

Walkaway from their (0.0080)

CommentsFake News

Figure 5. The case study of leveraging weak socialsupervision for explanation.

3. Open Issues and Future ResearchIn this section, we present some open issues

in weak social supervision and future researchdirections.

3.1. Weak Social Supervision for Fake NewsDetection

Most of the current methods are trying toexploit weak social supervision as constraintsto help fake news detection. We can also ex-ploit generating weak labels from the aforemen-tioned social signals (user-based, post-based, andnetwork-based) as labeling functions for earlyfake news detection. Some representative labelingrules for extracting weak labels are described asfollows [16] : 1) Credibility-based: users with lowcredibilities are more likely to spread fake news,while users with high credibilities are less likelyto spread fake news; 2) Bias-based: publisherswith more partisan bias are more likely to pub-lish fake news than mainstream publishers withless bias; 3) Sentiment-based: news with moreconflicting viewpoints in the related media poststends to be fake than those with less conflictingviewpoints. Empirical results show that by ex-ploiting these multi-sources of weak social su-pervision can significantly improve the detectionperformance with limited labeled data of fakenews. The advantage of leveraging weak socialsupervision for early fake news detection is thatwe can jointly learn the feature representationsfrom little labeled data and weakly labeled data,and when predicting unseen news pieces, we canperform prediction with few/no social signals,

May/June 2019 7

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Page 8: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

which perfectly satisfy the requirement of earlydetection. In addition, in the extreme case whenno labeled data is available, we can utilize weaksocial supervision for unsupervised fake newsdetection. One idea is to extract users’ opinionson the news by exploiting the auxiliary infor-mation of the users’ engagements from posts onsocial media, and aggregate their opinions in awell-designed unsupervised way to generate ourestimation results.

3.2. Techniques for Learning Weak SocialSupervision

We expect along the direction of learningwith weak social supervision, more research willemerge in the near future. First, leveraging weaksocial supervision for computation social scienceresearch is promising. Since computational so-cial science research usually relies on relativelylimited offline survey data, weak social super-vision can serve as a powerful online resourcesto understand and study social computing prob-lems. Second, existing approaches utilize singleor combine multiple sources of weak social su-pervision, while to what extent and aspect theweak social supervision helps is fairly importantto explore. Third, the capacity of ground-truthlabels and weak social supervision and the rel-ative importance between the sources are essen-tials to develop learning methodology in practicalscenarios. Moreover, the weak supervision rulesmay have complementary information since theycapture social signals from different perspectives.An interesting future direction is to explore multisources of weak social supervision in a principledway to model the mutual benefits through dataprogramming.

4. ConclusionIn many machine learning applications, la-

beled data is scarce and obtaining more labelsis expensive. Motivated by the promising earlyresults of exploiting weak supervision learning,we propose a new type of weak supervision, i.e.,weak social supervision. We specifically focus onthe use case of detecting fake news on socialmedia. Specifically, We demonstrate that weaksocial supervision provides a new representationto describe social information uniquely availablewhere a better warning is sought, which has

promising results and great potentials toward de-tecting fake news, including challenging settingsof effective fake news detection and explainablefake news detection. We also further discusspromising future directions in fake news detectionresearch and expand the field of learning withweak social supervision to other applications.

REFERENCES1. K. Shu and H. Liu, “Detecting fake news on social me-

dia,” Synthesis Lectures on Data Mining and Knowledge

Discovery, vol. 11, no. 3, pp. 1–129, 2019.

2. Z.-H. Zhou, “A brief introduction to weakly supervised

learning,” National Science Review, vol. 5, no. 1, pp. 44–

53, 2017.

3. G. Patrini, A. Rozza, A. Krishna Menon, R. Nock, and

L. Qu, “Making deep neural networks robust to label

noise: A loss correction approach,” in Proceedings of

the IEEE Conference on Computer Vision and Pattern

Recognition, 2017, pp. 1944–1952.

4. A. Veit, N. Alldrin, G. Chechik, I. Krasin, A. Gupta, and

S. Belongie, “Learning from noisy large-scale datasets

with minimal supervision,” in Proceedings of the IEEE

Conference on Computer Vision and Pattern Recogni-

tion, 2017, pp. 839–847.

5. R. Stewart and S. Ermon, “Label-free supervision of

neural networks with physics and domain knowledge,”

in Thirty-First AAAI Conference on Artificial Intelligence,

2017.

6. S. Kumar, R. West, and J. Leskovec, “Disinformation

on the web: Impact, characteristics, and detection of

wikipedia hoaxes,” in Proceedings of the 25th interna-

tional conference on World Wide Web, 2016, pp. 591–

602.

7. V. Subrahmanian and S. Kumar, “Predicting human be-

havior: The next frontiers,” Science, vol. 355, no. 6324,

pp. 489–489, 2017.

8. K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, “defend:

Explainable fake news detection,” in Proceedings of

the 25th ACM SIGKDD International Conference on

Knowledge Discovery & Data Mining, 2019, pp. 395–

405.

9. J. Berger and K. L. Milkman, “What makes online con-

tent viral?” Journal of marketing research, vol. 49, no. 2,

pp. 192–205, 2012.

10. W. Quattrociocchi, A. Scala, and C. R. Sunstein, “Echo

chambers on facebook,” Available at SSRN 2795110,

2016.

11. K. Shu, S. Wang, and H. Liu, “Beyond news contents:

The role of social context for fake news detection,” in

8 IT Professional

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.

Page 9: Detecting Fake News with Weak Social Supervisionskai2/files/ieee_intelligent.pdf · 2020-06-04 · Social media has little labeled data but possesses unique characteristics ... such

1541-1672 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MIS.2020.2997781, IEEE IntelligentSystems

Proceedings of the Twelfth ACM International Confer-

ence on Web Search and Data Mining, 2019, pp. 312–

320.

12. V. L. Rubin, N. J. Conroy, and Y. Chen, “Towards news

verification: Deception detection methods for news dis-

course,” in Hawaii International Conference on System

Sciences, 2015, pp. 5–8.

13. J. W. Pennebaker, R. L. Boyd, K. Jordan, and K. Black-

burn, “The development and psychometric properties of

liwc2015,” Tech. Rep., 2015.

14. C. Castillo, M. Mendoza, and B. Poblete, “Information

credibility on twitter,” in Proceedings of the 20th interna-

tional conference on World wide web, 2011, pp. 675–

684.

15. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy,

“Hierarchical attention networks for document classifi-

cation,” in Proceedings of the 2016 conference of the

North American chapter of the association for computa-

tional linguistics: human language technologies, 2016,

pp. 1480–1489.

16. K. Shu, S. Wang, D. Lee, and H. Liu, “Mining disinfor-

mation and fake news: Concepts, methods, and recent

advancements,” arXiv preprint arXiv:2001.00623, 2020.

Kai Shu is a PhD student and research assistantat Data Mining and Machine Learning (DMML) Labat Arizona State University. His research interestsinclude artificial intelligence, social computing, datamining. Contact him at [email protected].

Ahmed Hassan Awadallah is a principle researchmanager at Microsoft Research (MSR). He leadsthe language and information technology team atMSR, focusing on creating language understandingand user modeling technologies to enable intelligentexperiences in multiple products. Contact him at [email protected].

Susan Dumais is a the technical fellow and Manag-ing Director, Microsoft Research New England, NewYork City and Montreal. Her research interests are inalgorithms and interfaces for improved information re-trieval, as well as general issues in human-computerinteraction. Contact her at [email protected].

Huan Liu is a professor of Computer Science andEngineering at Arizona State University. His researchinterests are in data mining, machine learning, so-cial computing, and artificial intelligence, investigatingproblems that arise in real-world applications withhigh-dimensional data of disparate forms. Contacthim at [email protected].

May/June 2019 9

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 23:04:39 UTC from IEEE Xplore. Restrictions apply.