man vs. machine: adversarial detection of malicious ... · machine learning for security •...
TRANSCRIPT
![Page 1: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/1.jpg)
Man vs. Machine: Adversarial Detection of Malicious Crowdsourcing Workers
Gang Wang, Tianyi Wang, Haitao Zheng, Ben Y. Zhao
UC Santa Barbara [email protected]
![Page 2: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/2.jpg)
Machine Learning for Security
• Machine learning (ML) to solve security problems – Email spam detection
– Intrusion/malware detection
– Authentication
– Identifying fraudulent accounts (Sybils) and content
• Example: ML for Sybil detection in social networks
2
Training Classifier
Known samples
Unknown Accounts
![Page 3: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/3.jpg)
Adversarial Machine Learning
• Key vulnerabilities of machine learning systems – ML models derived from fixed datasets
– Assuming similar distribution of training and real-world data
• Strong adversaries in ML systems – Aware of usage, reverse engineering ML systems
– Adaptive evasion, temper with the trained model
• Practical adversarial attacks – What are the practical constrains for adversaries?
– With constrains, how effective are adversarial attacks?
3
![Page 4: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/4.jpg)
Context: Malicious Crowdsourcing
• New threat: malicious crowdsourcing = crowdturfing – Hiring a large army of real users for malicious attacks
– Fake customer reviews, rumors, targeted spam
– Most existing defenses fail against real users (CAPTCHA)
4
![Page 5: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/5.jpg)
Online Crowdturfing Systems
• Online crowdturfing systems (services) – Connect customers with online users willing to spam for money
– Sites located across the glob, e.g. China, US, India
• Crowdturfing in China – Largest crowdturfing sites: ZhuBaJie (ZBJ) and SanDaHa (SDH)
– Million-dollar industry, tens of millions of tasks finished 5
Customer
Crowd workers
… Crowdturfing site Target Network
![Page 6: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/6.jpg)
Machine Learning vs. Crowdturfing
6
• Machine learning to detect crowdturfing workers – Simple methods usually fail (e.g. CAPTCHA, rate limit)
– Machine learning: more sophisticated modeling on user behaviors o “You are how you click” [USENIX’13]
• Perfect context to study adversarial machine learning
1. Highly adaptive workers seeking evasion
2. Crowdturfing site admins tamper with training data by changing all worker behaviors
![Page 7: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/7.jpg)
Goals and Questions
• Our goals – Develop defense against crowdturfing on Weibo (Chinese Twitter)
– Understand the impact of adversarial countermeasures and the robustness of machine learning classifiers
• Key questions – What ML algorithms can accurately detect crowdturfing workers?
– What are possible ways for adversaries to evade classifiers?
– Can adversaries attack ML models by tampering with training data?
7
![Page 8: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/8.jpg)
Outline
• Motivation
• Detection of Crowdturfing
• Adversarial Machine Learning Attacks
• Conclusion
8
![Page 9: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/9.jpg)
• Detect crowdturf workers on Weibo
• Adversarial machine learning attacks – Evasion Attack: workers evade classifiers
– Poisoning Attack: crowdturfing admins tamper with training data
Methodology
9
Classifier
Training Data
Training (e.g. SVM)
Poison AMack
Evasion AMack
![Page 10: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/10.jpg)
Ground-truth Dataset
• Crowdturfing campaigns targeting Weibo – Two largest crowdturfing sites ZBJ and SDH
– Complete historical transaction records for 3 years (2009-2013)
– 20,416 Weibo campaigns: > 1M tasks, 28,947 Weibo accounts
• Collect Weibo profiles and their latest tweets – Workers: 28K Weibo accounts used by ZBJ and SDH workers
– Baseline users: snowball sampled 371K baseline users
10
![Page 11: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/11.jpg)
Features to Detect Crowd-workers
• Search for behavioral features to detect workers
• Observations – Aged, well established accounts – Balanced follower-followee ratio – Using cover traffic
• Final set of useful features: 35 – Baseline profile fields (9) – User interaction (comment, retweet) (8) – Tweeting device and client (5) – Burstiness of tweeting (12) – Periodical patterns (1)
11
Task-driven nature
Active at posting but have less bidirectional interactions
![Page 12: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/12.jpg)
Performance of Classifiers
• Building classifiers on ground-truth data – Random Forests (RF) – Decision Tree (J48) – SVM radius kernel (SVMr) – SVM polynomial (SVMp) – Naïve Bayes (NB) – Bayes Network (BN)
• Classifiers dedicated to detect “professional” workers
– Workers who performed > 100 tasks – Responsible for 90% of total spam – More accurate to detect the professionals è 99% accuracy
12
0% 10% 20% 30% 40% 50% 60%
RF J48 SVMr SVMp BN NB
False Posi\ve Rate False Nega\ve Rate
Random Forests: 95% accuracy
![Page 13: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/13.jpg)
Outline
• Motivation
• Detection of Crowdturfing
• Adversarial Machine Learning Attacks
– Evasion attack
– Poisoning attack
• Conclusion
13
![Page 14: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/14.jpg)
14
Classifier
Training Data
Training (e.g. SVM)
Evasion AMack
Detec\on Model Training
![Page 15: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/15.jpg)
Attack #1: Adversarial Evasion
15
• Individual workers as adversaries – Workers seek to evade a classifier by mimicking normal users
– Identify the key set of features to modify for evasion
• Attack strategy depends on worker’s knowledge on classifier
– Learning algorithm, feature space, training data
• What knowledge is practically available? How does different knowledge level impact workers’ evasion?
![Page 16: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/16.jpg)
A Set of Evasion Models
• Optimal evasion scenarios – Per-worker optimal: Each worker has perfect
knowledge about the classifier
– Global optimal: knows the direction of the boundary
– Feature-aware evasion: knows feature ranking
• Practical evasion scenario – Only knows normal users statistics
– Estimate which of their features are most “abnormal”
16
? ?
? ?
Prac\cal
Op\mal
Classifica\on boundary
![Page 17: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/17.jpg)
Evasion Attack Results
17
0
20
40
60
80
100
0 10 20 30 Worker E
vasion
Rate (%
)
Number of Features Altered
J48
SVMp
RF
SVMr
• Evasion is highly effective with perfect knowledge, but less effective in practice
• Most classifiers are vulnerable to evasion – Random Forests are slightly more robust (J48 Tree the worst)
99% workers succeed with 5 feature changes
Op@mal AAack
0
20
40
60
80
100
0 10 20 30 Worker E
vasion
Rate (%
) Number of Features Altered
J48
SVMp
RF
SVMr
Prac@cal AAack
No single classifier is robust against evasion. The key is to limit adversaries’ knowledge
Need to alter 20 features
![Page 18: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/18.jpg)
18
Classifier
Training Data
Training (e.g. SVM)
Poison AMack
Detec\on Model Training
![Page 19: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/19.jpg)
Attack #2: Poisoning Attack
• Crowdturfing site admins as adversaries – Highly motivated to protect their workers, centrally control workers
– Tamper with the training data to manipulate model training • Two practical poisoning methods
– Inject mislabeled samples to training data è wrong classifier
– Alter worker behaviors uniformly by enforcing system policies è harder to train accurate classifiers
19
Injec\on AMack
Inject normal accounts, but labeled as worker
Wrong model, false posi\ves!
Altering AMack Difficult to classify!
![Page 20: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/20.jpg)
Injecting Poison Samples
• Injecting benign accounts as “workers” into training data – Aim to trigger false positives during detection
20
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
False Po
si@v
e Ra
te (%
)
Ra@o of Poison-‐to-‐Turfing
Tree
SVMp
RF
SVMr J48-Tree is more vulnerable than others
Poisoning aMack is highly effec\ve More accurate classifier can be more vulnerable
10% of poison samples è boost false positives by 5%
![Page 21: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/21.jpg)
Discussion
• Key observations – Accurate machine learning classifiers can be highly vulnerable
– No single classifier excels in all attack scenarios, Random Forests and SVM are more robust than Decision Tree.
– Adversarial attack impact highly depends on adversaries’ knowledge
• Moving forward: improve robustness of ML classifiers
– Multiple classifier in one detector (ensemble learning) – Adversarial analysis in unsupervised learning
21
![Page 22: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware](https://reader034.vdocuments.net/reader034/viewer/2022042806/5f6dd64078e57d13e40c86ed/html5/thumbnails/22.jpg)
Thank You! Questions?
22