forecasting suspicious account activity at large-scale online …matei/papers/fc2019slides.pdf ·...

33
February 2019 Forecasting Suspicious Account Activity at Large-Scale Online Service Providers Hassan Halawa 1 , Konstantin Beznosov 1 , Baris Coskun 2 , Meizhu Liu 3 , Matei Ripeanu 1 1 University of British Columbia 2 Amazon Web Services 3 Yahoo! Research

Upload: others

Post on 29-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

February 2019

Forecasting Suspicious Account Activity at

Large-Scale Online Service ProvidersHassan Halawa1, Konstantin Beznosov1, Baris Coskun2, Meizhu Liu3, Matei Ripeanu1

1 University of British Columbia2 Amazon Web Services

3 Yahoo! Research

Page 2: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

Automated attacks

2

operating on alarge-scaleexploiting

unsafe decisionsmade by

individual users

Page 3: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Phishing

3

Page 4: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Phishing □ Online Services

4

Page 5: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

5

■ Phishing■ Overview □ Current vs. Proposed Current Defenses Proposed

Reactive

Signatures

Proactive

Anomalies

EvolvingAttacks

FalsePositives

Forecasting

EarlyWarning

identifyingattack/attacker patterns

miningbehavioral /usage patterns

Page 6: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

6

■ Phishing■ Overview □ Current vs. Proposed □ Highlights

■ Experiment at a Large-Scale Online Service Provider(4 months production data / 100+ million users / 100+ billion login events)

■ Promising Performance as an Early Warning System (AUROC ~ 0.92 / FPR ~ 0.5% / ACC ~ 99.5% / REC ~ 50.6% / PRE ~ 18.3% using only a 1 week historical trace and predicting 1 month in advance)

■ Supervised ML Pipeline for Forecasting(predict future suspicious account activity from historical traces)

■ Evaluation Across Varied Classification Exercises (1 week trace → [7, 90] day forecast / 3 weeks → [21, 34] days)

Page 7: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

7

Account Registration

Account Compromised

Account Flagged

Account Remediation

Legitimate Owner

Time

Overview of thelifecycle of a compromised account lifecycle

■ Phishing■ Overview■ Approach □ Account Lifecycle

✔ ✘

Legitimate Owner

Legitimate Owner

Attacker

Page 8: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

8

■ Phishing■ Overview■ Approach □ Account Lifecycle □ ML Pipeline

Goal: Forecast suspicious account activity using supervised machine learning

Data Source

Data Pre-Processing

Model Selection

Ground Truth User Activity

Susp. Acct. Classifier

Model Evaluation

Unstructured Data

Structured & Labeled Data

Susp. Acct. Population& Susp. Acct. Scores Defense Systems

MetricsAUROC, BTR, PRE, REC, FPR

Page 9: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

9

Time

Overview of aclassification exercise

Training Interval Testing IntervalBuffer Window

(BW)

■ Phishing■ Overview■ Approach □ Account Lifecycle □ ML Pipeline □ Classification Exercise Data Source

Label Window (LW)

Ground Truth

Data Window (DW)

User Activity

Label Window (LW)

Ground Truth

Data Window (DW)

User Activity

Susp. Acct. Classifier Susp. Acct. Population& Susp. acct. Scores Defense Systems

Page 10: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

10

■ Phishing■ Overview■ Approach■ Evaluation □ Classification Exercises

Notation DW - Data Window, BW - Buffer Window,LW - Label Window, H - Prediction Horizon

Time

HyperparameterOptimization

Overfit Check

PreprocessingImpact

Performance forWider Windows

Page 11: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

11

■ Phishing■ Overview■ Approach■ Evaluation □ Classification Exercises □ AUROC

Page 12: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

12

■ Phishing■ Overview■ Approach■ Evaluation □ Classification Exercises □ AUROC □ PRE/REC vs. Horizon

Page 13: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

13

■ Phishing■ Overview■ Approach■ Evaluation■ Recap

■ Experiment at a Large-Scale Online Service Provider(4 months production data / 100+ million users / 100+ billion login events)

■ Promising Performance as an Early Warning System (AUROC ~ 0.92 / FPR ~ 0.5% / ACC ~ 99.5% / REC ~ 50.6% / PRE ~ 18.3% using only a 1 week historical trace and predicting 1 month in advance)

■ Supervised ML Pipeline for Forecasting(predict future suspicious account activity from historical traces)

■ Evaluation Across Varied Classification Exercises (1 week trace → [7, 90] day forecast / 3 weeks → [21, 34] days)

Page 14: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

February 2019

Forecasting Suspicious Account Activity at

Large-Scale Online Service ProvidersHassan Halawa1, Konstantin Beznosov1, Baris Coskun2, Meizhu Liu3, Matei Ripeanu1

1 University of British Columbia2 Amazon Web Services

3 Yahoo! Research

Page 15: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

15

Backup/Discussion Slides

Page 16: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

16

■ Discussion □ Account Suspiciousness vs. Vulnerability

Time

Suspicious

in Future (Forecast)

Mining Historical Behavioral/Usage Patterns

Vulnerable

at Present

Page 17: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

17

■ Discussion □ Current vs. Proposed

(1)Throttled Outbox

Delayed Inbox

(2)Personalized Controls

Targeted Education

(3)Efficient Compromise-Detection Campaigns

(1)Email ClassificationAnomaly Detection

(2)HTTPS Browser Lock

Two Factor Auth.

(3)Incident Response

User Reports

Feedback based on identifying vulnerable users

Feedback based on identifying attack patterns

AttackLaunchedPhishing emails

(1) Operator

Filter

SystemInfiltratedEmail in

inbox

(2)UserFilter

UserVictimizedCredentials

stolen

(3) Remediation

FilterCompromise

Detected

V Robust

Users

Page 18: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

Honeypots

DifferentialDefenses

Prioritization

18

(1) Operator Filters (2) User Filters (3) Remediation Filters

Defense Resource Prioritization

Targeted User Education

Efficient InspectionEffective Exercises

Throttling DuringEmergencies Captive Portals

Mitigate Adversarial

Learning

Personalised Control & Advice

Infer Attack Origin

IdentifyNew Attacks

Design of new defense mechanisms■ Discussion □ Proposed Defense Mechanisms

Page 19: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

19

■ Discussion □ Evaluation of Mechanisms ML Pipeline Vulnerability Classifier Vuln. Population

& Vuln. Scores Proposed Defenses

Simulation Analytical Models Practical Experiments

Output MetricsCost, Effectiveness

Evaluation of proposed defense mechanisms

InputAttack Propagation

Population DistributionDefense Parameters

S I R

S I R

S I R

V Robust

A/B TestDefense ApplicationTargeted Education

Defense EvaluationSecurity Exercise

Page 20: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

20

Vulnerable Robust

Long-TermVulnerability

Scores

Context-SpecificVulnerability

Scores

Proposed Defenses

■ Discussion □ Context-Specific Defenses

Page 21: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Limited Access to User Data

■ Limited Computational Resources

■ Imperfect Groundtruth

■ Aggressive Pruning Heuristics

21

■ Discussion □ Results Presented as Lower Bounds

Page 22: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

22

■ Discussion □ Buffer Window Sizing

Page 23: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

23

“Social engineering, in the context of information security, refers to psychological manipulation of people into performing actions or divulging confidential information. A type of confidence trick for the purpose of information gathering, fraud, or system access, it differs from a traditional "con" in that it is often one of many steps in a more complex fraud scheme.”

■ Discussion □ Social Engineering

Page 24: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Cost of attack

■ Multi-Stage Attacks

■ Similar dynamics to epidemics

24

■ Discussion □ Focusing on the Vulnerable Population as a key defense Element

Page 25: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Targeted

■ Efficient

■ Proactive

■ Robust

25

■ Discussion □ Advantages of Proposed Paradigm

Page 26: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

26

■ Discussion □ Robustness

■ Current defenses are attack/attacker centric

■ Based on attacker-controlled behavior/features

■ Attackers can employ adversarial strategies

Page 27: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Discussion □ Reactive Defenses

Focus on identifying attacks/attackers

27

[SNS’11] Tao Stein, Erdong Chen, and Karan Mangla. 2011. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems (SNS'11). ACM, pp. 8, New York, NY, USA.

Begin Attack

Initial Detection

DefenderResponds

AttackerDetects

Attack

Mutate

Detect

Defense

Attacker Controls

Defender Controls

Page 28: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

28

■ Discussion □ User Education

■ First line of defense

■ Direct cost (attack) vs. Indirect cost (effort)

■ Distribute cost proportional to user vulnerability

Page 29: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Paternalism

■ Fairness (Service Discrimination)

29

■ Discussion □ Legal/Ethical Considerations

Page 30: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Feasibility to develop a vulnerability classifier

■ Inaccuracies in predicting the vulnerable population

■ Some defense mechanisms may violate user expectations

■ Targeted protection may be confusing / complex

30

■ Discussion □ Adoption Challenges

Page 31: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Offline Worlds

■ Online Worlds

■ Our Experience

31

■ Discussion □ Related Work

Page 32: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

■ Large-scale social-bot infiltration feasible

■ Defense system leveraging the proposed paradigm

■ Deployed at Telefonica’s OSN Tuenti (50+ million users)

32

■ Discussion □ Our Experience (Integro)

Page 33: Forecasting Suspicious Account Activity at Large-Scale Online …matei/papers/fc2019slides.pdf · 2019. 3. 4. · Detection Campaigns (1) Email Classification Anomaly Detection (2)

33

■ Discussion □ Integro

[ECS’16] Boshmaf, Y., Logothetis, D., Siganos, G., Lería, J., Lorenzo, J., Ripeanu, M., Beznosov, K., and Halawa, H. (2016). Íntegro: Leveraging Victim Prediction for Robust Fake Account Detection in Large Scale OSNs.Elsevier Computers & Security. 61: 142-168.