suspicious news detection using micro blog texttagami/resources/paclic2018.pdfn proposed a new task...
TRANSCRIPT
Suspicious News Detection Using Micro Blog Text
Tsubasa Tagami"Hiroki Ouchi#,", Hiroki Asano",#, Kazuaki Hanawa", Kaori Uchiyama",Kaito Suzuki", Kentaro Inui",#, Atsushi Komiya%, Atsuo Fujimura%,
Hitofumi Yanai&, Ryo Yamashita', Akinori Machino(
"Graduate School of Information Sciences, Tohoku University, Japan#RIKEN, %SmartNews, Inc., &FactCheck Initiative Japan,
'Watchdog for Accuracy inNews-reporting, Japan, (Hi-Ether Japan
PACLIC2018
n Proposed a new task suspicious news detection using micro blog text
n This task aims to detect suspicious news articles that need to be fact-checking
n Developed human-machine hybrid fact-checking n Applied to a real-world situation of Okinawa governor election and detected 21 Fake News
Outline
SuspiciousNewsDetectionUsingMicroBlogText 2SuspiciousNewsDetectionUsingMicroBlogText 2
predict
Fact-checker
suspicious
http://www.news1~I suspect it is fake news. Read WSJ...
http://www.news2~This is completely misinformation ...
The Post-Truth Eran “Fake News” is considered to be a significant problem
p Researchers said fake news on social media influenced US election voters [Bovet+, 2018]
p Fake News led a young man to murder nine people at a historic African-American church in Charleston
p A drama featuring Fake News is produced in Japan by national broadcasting company
SuspiciousNewsDetectionUsingMicroBlogText 3
https://www.nhk.or.jp/dodra/fakenews/
https://www.eurweb.com/2018/01/trump-reveals-winners-controversial-fake-news-awards/
https://theundefeated.com/features/how-fake-news-led-to-dylann-roof-to-murder-nine-people/
What is Fake Newsn Definition
p News articles that are intentionally false and could mislead readers [Shu et al., 2017]
n Problematic Issuep The spreading of Fake News has a negative impacton our society and the news industry
SuspiciousNewsDetectionUsingMicroBlogText
negatively affect an election cause a conflictFAKE
4
Difficulty of Fact-Checkingn Fact-checking is a time-consuming task, sometimes It
takes a whole day to research and write a articlen Fact-checkers cannot keep up with the amount of misinformation generated every day
n Human fact-checking is an intellectually demanding and laborious process
SuspiciousNewsDetectionUsingMicroBlogText 5
Narrowing down the number of articles that require human fact-checking is necessary
Difficulty of Narrowing Down Articles
n Simply filtering with specific keywords such as ‘misinformation’ and ‘fake’ can not find C efficientlyp Just saying personal impression on the article
p The target of mention is not the content of news
SuspiciousNewsDetectionUsingMicroBlogText 6
http://www.news1~I really can not believe it. I wish it were a misinformation. I’m lost for words, but I’ll send my prayers!
http://www.news1~Does anybody feel she is trying to talk around false teeth because of all those implants in her cheeks and chin?
Our Goaln Automating suspicious news detection using posts on SNS that cast suspicion on news articles
SuspiciousNewsDetectionUsingMicroBlogText 7
1Collect posts on SNS
collect
database
2Predict suspicious or not using posts
predict
suspicious
Definitions of Termsn Suspicion casting posts (SCP)
p Posts on SNS that refer to and cast suspicion on certain news articles
n Suspicious articles (SA)p News articles to be verified by human fact-checker p We defined SA are news articles mentioned by at least one SCP
SuspiciousNewsDetectionUsingMicroBlogTextcitizen
http://www.news.~I suspect it is fake news. Read WSJ article ‒ says ...
fact-checker
Suspicion casting post (SCP)Fact-checking
Suspicious article (SA)
8
Proposed Taskn Propose and formalize two tasksn Suspicion Casting Post Detection
Post on SNS that refer to a news articleJudgement whether it is SCP or just mentioning personal impression on the article
n Suspicious Article Detectionp Given a set of posts that refer to same article, judge whether the set include SCP or not
SuspiciousNewsDetectionUsingMicroBlogText 9
1Input :Output :
2
http://www.news1.~This article denotes misinformation, doesn’t it?
Suspicion casting post (SCP)http://www.news2.~I really can not believe it. I wish it were a lie. I‘ll send my prayers!
Not suspicion casting post
Datasetn Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake
2. Removed the noises such as article title, URL, mentions and hashtags from posts
3. To each collected post, we annotated 1 if the post casts suspicion and -1 otherwise
SuspiciousNewsDetectionUsingMicroBlogText 10
1
http:www.news.~ #pleaserepostThis article is completely misinformation because …
Dataset
SuspiciousNewsDetectionUsingMicroBlogText 11
This article is completely misinformation because …
Preprocess
n Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake
2. Removed the noises such as article title, URL, mentions and hashtags from posts
3. To each collected post, we annotated 1 if the post casts suspicion and -1 otherwise
1
http:www.news.~ #pleaserepostThis article is completely misinformation because …
Datasetn Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake
2. Removed the noises such as article title, URL, mentions and hashtags from posts
3. To each collected post, we annotated 1 if the post casts suspicion and 0 otherwise
SuspiciousNewsDetectionUsingMicroBlogText 12
1
This article is completely misinformation because …
Preprocesshttp:www.news.~ #pleaserepostThis article is completely misinformation because …
Suspicion casting post (SCP)
Datasetn We created two datasets for our tasksn Suspicious Article Dataset
1. Collected a set of posts that refer to same news article and preprocessed these posts similarly
2. Annotated 1 if a set of posts refer to the same article include at least one SCP and 0 otherwise
SuspiciousNewsDetectionUsingMicroBlogText 13
2
This is completely false …
This fiscal policy is wrong …
Annotate
Suspicious articleSuspicion casting post (SCP)
Datasetn Statistics of datasetsn Suspicion Casting Post Dataset
p Number of sample is 7,775 posts (pos:1,036 / neg:6,739)p Average length of posts is 56.6 characters
n Suspicious Article Datasetp Number of sample is 1,836 articles (pos:564 / neg:1,272)p Average length of posts is 60.4 charactersp Average number of posts per article is 2.75
SuspiciousNewsDetectionUsingMicroBlogText 14
1
2
Experiments Setupn Models
p Logistic Regression (LR)p SVMp Decision Tree (DT)p Random Forest (RF)p LSTM
n Settingsp Word embeddings : 300dim (Learned from 4.5M tweets)p Vocab. size : 80K
n Evaluationp Precision, Recall, Micro-F1, Recall@K (Only SA detection)p Stratified 5-fold cross validation
SuspiciousNewsDetectionUsingMicroBlogText 15
Resultsn Results for SCP detection
n Results for SA detection
SuspiciousNewsDetectionUsingMicroBlogText 16
Overall, the LR, SVM and LSTM models yielded higher Micro-F1 scores than DT and RF models
Similarly, the LR, SVM and LSTM models achieved higher scores than the other two models
Error Analysisn Analyzed incorrectly judged posts by all models
p It is difficult for the basic models to properly capture sentence-level meanings, since the models mainly used word-level features, • Answer : SCP, Prediction : not SCP
• Answer : not SCP, Prediction : SCP
SuspiciousNewsDetectionUsingMicroBlogText 17
http://www.news1~At last, the news source has got clear... I wished it had been misinformation
http://www.news1~The description that a part ... is not wrong, but since the level of ~ , this title can mislead readers.
Resultsn Recall@K curve of SA detection task
SuspiciousNewsDetectionUsingMicroBlogText 18
Most of the models achieved 80% recall at the top 40% ranked articles
We can collect 80% suspicious articles only checking the top 40% ranked articles
Applicationn Created an application to support manual Fact-
checking named Fact-checking console
SuspiciousNewsDetectionUsingMicroBlogText 19
Suspicion casting post
Suspicious article
Suspicious article
Suspicion casting post
Application
SuspiciousNewsDetectionUsingMicroBlogText 20
Fact-check Projectn Our project used Fact-checking Console at the
Okinawa governor election held in Sep. 2018
SuspiciousNewsDetectionUsingMicroBlogText 21
(2018.9.1~10.3)http://fij.info/project/okinawa2018
6 media and 26 volunteers participated in this project as Fact-checker
Fact-check Project Outline
SuspiciousNewsDetectionUsingMicroBlogText 22
1Collect posts on SNS
collect
database
2Predict suspicious or not using posts
predict
Fact-checker
Fact-checking console
suspicious
3Check suspiciousnews articles
suspiciouscheck
Example of detected Fake Newsn Some media reported a famous female singer NamieAmuro supported a candidate Denny Tamaki
SuspiciousNewsDetectionUsingMicroBlogText 23
Suspicion casting post (SCP)
Denny Tamaki
NamieAmuro
FAKEA misinformation as if
Namie Amuro is supporting Denny Tamaki is spreading.
Conclusionn Summary
p Formalized and tackled a task, suspicious news detection using microblog text
p Applied our system to fact-checking activities in a real-world situation and succeeded to detect fake news
n Future workp To develop systems, we will create more sophisticated models for suspicious news detection
p Evaluate the difference between using our application for fact-checking and not using
p Consider information of news articles to predict
SuspiciousNewsDetectionUsingMicroBlogText 24