![Page 1: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/1.jpg)
1
Constraint-Aware Dynamic Truth Discovery in Big Data Social Sensing
IEEE Bigdata 17, Boston, MA, USA
Daniel Zhang, Dong Wang, Yang Zhang
Department of Computer Science and Engineering
University of Notre Dame
![Page 2: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/2.jpg)
What is Social Sensing?(1/2)
• A new sensing paradigm of collecting observations about the physical environment from humans (social sensors) or devices
on their behalf.
2
Social Media Sensing for Disaster Report
Water/Air Quality Sensing Traffic Monitoring Personalized Recommendation
![Page 3: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/3.jpg)
What is Social Sensing?(2/2)
Advantage Compared to Physical Sensors:• Infrastructure free
• Economic
• Versatile
• Mobility
3
![Page 4: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/4.jpg)
Sources Measurements (Claims)
Numeric data
Images
Text
People
Smart Devices
What to believe? Who to believe?
Truth Discovery Problem in Social Sensing
4
![Page 5: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/5.jpg)
Truth Discovery Problem in Social Media
Sensing- A Twitter Example
• Social sensors are subjective• Social sensors’ reliabilities are unknown a priori
5
![Page 6: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/6.jpg)
Related Work
6
Classic Batch Model Dynamic ModelExtended Batch Model
Distributed Model
IPSN 12 – Basic EM
TKDE 08 - Truth Finder
ACL 10 - Invest
.
BigData16- RTD
VLDB 14 - CATD
IPSN 14 - EM Source
KDD 15 - DynaTD
ICDCS 13 - EM Recursive
TPDS 16 - EM Hadoop Bigdata 17 - CADTD
Batch Extended Dynamic Distributed Our
Dynamic Truth × × √ √ √
Incompletes & Noisy Data × √ × × √
Physical Constraints × × × × √
We addressThree Challenges
ICDCS 17 - SSTD
Constraint-Aware Dynamic Truth Discovery
![Page 7: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/7.jpg)
7
Dynamic Truth Challenge:
Challenges (1/3)
When the truth dynamically changes how to effectively find such information?
Example - Suspect’s Escape Path Example- Impact Area of Hurricane
• When will truth change? How?• How to handle rumors and conflicting information?
![Page 8: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/8.jpg)
8
Noisy and incomplete data Challenge:
Challenges (2/3)
Social media data is incomplete and noisy in nature – how to get enough information for accurate estimation of truth?
Incomplete: 86% of the users only post one tweet and more than 91% post at most two tweets during a terrorist attack event.
Inadequate evidence to estimate the users’ reliabilities.
Noisy: rumors, misinformation, spams …
![Page 9: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/9.jpg)
Physical Constraints Challenge:
Challenges (3/3)
How to incorporate prior knowledge and physical constraints into the truth discovery framework?
Common sense and prior knowledge can help improve dynamic truth discovery performance
Example1 - the suspect cannot travel 80 miles within 10 minutes during a terrorist attack.
Example2 – the number of casualties can only be non-decreasing. Wrong!
# of casualties
time
![Page 10: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/10.jpg)
10
Proposed Solution - Summary
• Dynamic Truth Discovery• Designed a Hidden Markov Model based algorithm
• Incomplete and Noisy Data• Data Fusion of Traditional News and Online Social Media
• Physical Constraints• Extended Viterbi Algorithm to consider the difficulty of
state transitions.
![Page 11: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/11.jpg)
11
Proposed Solution - Modules
Twitter News Media
![Page 12: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/12.jpg)
HMM for Dynamic TruthIdea: use crowd intelligence to infer the hidden truth.
Contribution Score (CS) = Attitude Score * (1- Uncertainty Score) * Independence Score
Issue: how to measure individual contribution to the claim?
Disagree, Assertive, Independent CS = -1 * 1 * 1
Agree, Assertive, DependentCS = 1 * 1 * 0.1
Disagree, Uncertain, Independent CS = -1 * 0.5 * 1
12
![Page 13: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/13.jpg)
External Source Fusion for Incomplete & Noisy Data
13
Claim 1
Claim 2
Adopt Modified HITS Algorithm
1
1
0.8
False
True->1
->0.5
->0.4
![Page 14: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/14.jpg)
Extended Viterbi for Physical Constraints
• Global Order constraints: number of casualties ↑
• Spatial – temporal constraints: travel over cities within 10 mins
• Frequency constraints: five tornados in a row within 3 days – less likely
• Global Path constraints: barely possible to snow in Florida
Wrong!
# of casualties
time
Define 4 types of Constraints:
![Page 15: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/15.jpg)
Extended Viterbi for Physical Constraints
We propose an extended Viterbi
Algorithm that considers the “difficulty” of each truth transition.
If accumulated difficulty score exceed a
threshold , the transition is invalid.
Boston
Boston
NYC
Houston
Boston
NYC
Houston0.7
0.50.7
0
0
![Page 16: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/16.jpg)
1616
Data DescriptionPrimary (Twitter): two real-world data traces collected using
Twitter Search API.
Complementary (Traditional News Media): crawled 228 reports from six major news medias that are relevant to events using the
Google Search’s customized time frame feature.
![Page 17: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/17.jpg)
Evaluation Results (1/2)
72% measured variables in Boston Bombingdataset and 75% measured variables in Hurricane Matthewdataset evolve at least once
![Page 18: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/18.jpg)
Evaluation Results (2/2)
![Page 19: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/19.jpg)
Future Work
• Collusion Attack - A group of user can collude intentionally craft fake social media news.
• Knowledge Transfer- Current HMM based model is event-specific, in the future we will explore more generic/transferrable solutions.
• Cyclic Dependency – social media and news media can cite each other -> source dependency
![Page 20: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic](https://reader034.vdocuments.net/reader034/viewer/2022050118/5f4ea0b9bfa96a615f332655/html5/thumbnails/20.jpg)
20
Thank You!Social Sensing Lab at Univ. Notre Dame
http://www3.nd.edu/~sslab/