situational awareness, botnet and malware detection in the modern era - davide papini - codemotion...

40
SITUATIONAL AWARENESS, BOT- NET AND MALWARE DETECTION IN THE MODERN ERA Machine Learning Enabled Advanced Security CodeMotion Milan 2016 Davide Papini

Upload: codemotion

Post on 09-Jan-2017

98 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

SITUATIONAL AWARENESS, BOT-NET AND MALWARE DETECTIONIN THE MODERN ERAMachine Learning Enabled Advanced Security

CodeMotion Milan 2016

Davide Papini

Doc. Nr XXX— Rev. XXX

Page 2: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

ABOUT ME

x Research & Innovation @Ele ronica S.p.a.

x Postdoc @ISG Royal Holloway, UK on MLapplied to cyber situational awareness.

x M.Sc. Telecommunication Engineering@Politecnico di Milano:→ Erasmus @Danmarks Tekniske Universitet→ Master Thesis on ``Anomaly Based

Wireless Intrusion Detection Systems''

x Ph.D. @Danmarks Tekniske Universitet:→ ``Attacker Modeling in Ubiquitous

Computing Systems''→ External stay at COSIC, KU Leuven

2

Page 3: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

WHAT THIS TALK IS ABOUT

Topics:

x Applications of ML in Cybersecurity research.x Successful research: botnets, DGAs, early malwaredetection.x ML traps.x Evaluation metrics.

NOT about:

x New ML algorithms.x Showing one specific Security-ML based application.x Wear you out with math.

3

Page 4: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MOTIVATIONAL SLIDE

4

Page 5: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MOTIVATIONAL SLIDE

4

Page 6: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MOTIVATIONAL SLIDE

5

Page 7: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MOTIVATIONAL SLIDE

x Control of the botnet for 10 days: 180,000 infections,recording of over 70GB of data.x Torpig intercepts and records keystroke information at alow level, targeting a wide variety of applications andwebsites.x Stealing financial and personal informations, logincredentials for social networking etc.x Torpig periodically uploads any new data that it hascaptured to a central server.x The researchers were able to infiltrate the botnet byregistering one of the domains from a list of potential onesinfected machines use.

5

Page 8: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

SOME STATISTICS

h ps://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf

6

Page 9: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

SOME STATISTICS

h ps://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf

x 450,000 new malware per day.x 20,000 is mobile malware.x Includes: ransomware, botnets, rootkits, trojians …

6

Page 10: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

NEED A GAME CHANGER

Modern malware/intrusions are difficult to detect/block:

x Code obfuscation, polimorfism and packing.x Malware written ad-hoc for specific targets.x AVs are mainly signature-based.x URL Blacklists cannot be updated fast enough.x Local changes are often too small/subtle to be detected.x Logs contains lot of noise (≃ 90%)

Need for intelligent approaches:

x Adapt to unforseen "events"x Learn from data i.e. extract behaviours NOT signaturesx Leverage global knowledgex Can be quasi-real-time.

7

Page 11: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

NEED A GAME CHANGER

Modern malware/intrusions are difficult to detect/block:

x Code obfuscation, polimorfism and packing.x Malware written ad-hoc for specific targets.x AVs are mainly signature-based.x URL Blacklists cannot be updated fast enough.x Local changes are often too small/subtle to be detected.x Logs contains lot of noise (≃ 90%)

Need for intelligent approaches:

x Adapt to unforseen "events"x Learn from data i.e. extract behaviours NOT signaturesx Leverage global knowledgex Can be quasi-real-time.7

Page 12: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

ML FOR CYBER SECURITY

Page 13: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

Machine learning has been applied to many fields in security:

x Botnet detection and classificationx Mobile application analysisx Spam detection and campaigns analysisx Situational awareness through network traffic analysisx Download malware detectionx and many more...

Also in many flavours:

x Supervisedx Unsupervisedx combinations of those

9

Page 14: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

BOTNETS

x Situational awareness: knowledge of the health status of anetwork (e.g. malware infections, intrusions and dataexfiltration).x Botnet: a network of bots (drones), i.e. programs installedon the machines of unwitting Internet users and receivingcommands from a bot controller.

10

Page 15: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

BOTNETS

x Situational awareness: knowledge of the health status of anetwork (e.g. malware infections, intrusions and dataexfiltration).x Botnet: a network of bots (drones), i.e. programs installedon the machines of unwitting Internet users and receivingcommands from a bot controller.

10

Page 16: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

BOTNETS C&C CHANNEL

Bots connect to C&C Server in three ways:

x Hard coded IP:Bot → 1.2.3.4x Hard coded domain:Bot → badguy.ru → 1.2.3.4x Automatically Generated Domains:

→ Bot cycles through time-dependent domains.→ Domain names are generated using a Domain Generation

Algorithm.→ The botmaster needs to register only one of those domains.

jhhfghf7.tk faukiijjj25.tk pvgvy.tkcvq.com epu.org bwn.org

11

Page 17: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

BOTNETS C&C CHANNEL

Bots connect to C&C Server in three ways:

x Hard coded IP:Bot → 1.2.3.4x Hard coded domain:Bot → badguy.ru → 1.2.3.4x Automatically Generated Domains:

→ Bot cycles through time-dependent domains.→ Domain names are generated using a Domain Generation

Algorithm.→ The botmaster needs to register only one of those domains.

jhhfghf7.tk faukiijjj25.tk pvgvy.tkcvq.com epu.org bwn.org

1;'20$,1

DKM�LQIR

1;'20$,1

VMT�LQIR

�����������

7RUSLJ

KWWS���NUHEVRQVHFXULW\�FRP�ZS�FRQWHQW�XSORDGV���������URJXHBUHJLVWUDUVB����B'5$)7�SGI

courtesy of E.Colombo - Cerberus

11

Page 18: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

BOTNETS C&C CHANNEL

Bots connect to C&C Server in three ways:x Hard coded IP:Bot → 1.2.3.4x Hard coded domain:Bot → badguy.ru → 1.2.3.4x Automatically Generated Domains:

→ Bot cycles through time-dependent domains.→ Domain names are generated using a Domain Generation

Algorithm.→ The botmaster needs to register only one of those domains.

jhhfghf7.tk faukiijjj25.tk pvgvy.tkcvq.com epu.org bwn.org

1;'20$,1

DKM�LQIR

1;'20$,1

VMT�LQIR

�����������

7RUSLJ

KWWS���NUHEVRQVHFXULW\�FRP�ZS�FRQWHQW�XSORDGV���������URJXHBUHJLVWUDUVB����B'5$)7�SGI

courtesy of E.Colombo - Cerberus

Sinkholing: If domain is alreadyregistered

botmaster looses control of botnets!

11

Page 19: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

PHOENIX AND CERBERUS

Developed at Polimi and ISG@RHUL

System that relies on Machine Learning to identify DGA:x Leverage known malicious and benign domain names tobuild a classifier:→ Distinguish Human Generated Domains from AGD.→ Identifies the DGA used: botnets might share the same

DGA.x Use unsupervised learning to identify new DGAs.x Traffic comes from a na onal authoritative DNS server.

S. Schiavoni et al., Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection ofIntrusions and Malware, and Vulnerability Assessment (DIMVA) 2014.E. Colombo, Cerberus: Detec on and Characteriza on of Automa cally-GeneratedMalicious Domains. Master Thesis, Politecnico di Milano 2014.

12

Page 20: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

PHOENIX AND CERBERUS

Developed at Polimi and ISG@RHUL

System that relies on Machine Learning to identify DGA:x Leverage known malicious and benign domain names tobuild a classifier:→ Distinguish Human Generated Domains from AGD.→ Identifies the DGA used: botnets might share the same

DGA.x Use unsupervised learning to identify new DGAs.x Traffic comes from a na onal authoritative DNS server.

S. Schiavoni et al., Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection ofIntrusions and Malware, and Vulnerability Assessment (DIMVA) 2014.E. Colombo, Cerberus: Detec on and Characteriza on of Automa cally-GeneratedMalicious Domains. Master Thesis, Politecnico di Milano 2014.

Malicious Domains Phoenix Clusters

Time DetectiveSuspicious Domains

Filtering

DNS Stream

Classifier

Bootstrap

Filtering

Detection

courtesy of E.Colombo - Cerberus

12

Page 21: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

CERBERUS FINDINGS

x 187 malicious domains detected and labeledx 3,576 suspicious domains collectedx 47 clusters of DGA-generated domains discoveredx 319 new domains detected in the next 24 hours

13

Page 22: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MASTINO: REALTIME MALWARE DETECTION

Developed at TrendMicro and presented Defcon London 2016

System for advanced realtime malware detection:

x Leverages global knowledge on download eventsx Classifies malware from goodwarex Based on statistical evidence and graph analysis:x Tripartite graph: URLs, Files, Machinesx Intrinsic features e.g.→ file: size, obfuscated, signed;→ url: FQD, e2LD, query path→ machine: malware download history, processesx Behaviour-based features:→ Consider reputation of neighboring nodes→ Help to classify unknown nodes

14

Page 23: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MASTINO: REALTIME MALWARE DETECTION

Developed at TrendMicro and presented Defcon London 2016

System for advanced realtime malware detection:

x Leverages global knowledge on download eventsx Classifies malware from goodwarex Based on statistical evidence and graph analysis:x Tripartite graph: URLs, Files, Machinesx Intrinsic features e.g.→ file: size, obfuscated, signed;→ url: FQD, e2LD, query path→ machine: malware download history, processesx Behaviour-based features:→ Consider reputation of neighboring nodes→ Help to classify unknown nodes

Huge work on feature enginering!

14

Page 24: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MASTINO SYSTEM OVERVIEW

Copyright 2016 Trend Micro Inc.7

System Overview

courtesy of M.Balduzzi - TrendMicro

15

Page 25: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MASTINO TRAINING AND DETECTION

courtesy of M.Balduzzi - TrendMicro

16

Page 26: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MASTINO RESULTS

Mastino evaluation:

x On testing dataset: 95.8% TP, 0.5% FPx Early detection experiment, deployed in the wild for 6months:→ Detected 84% of future malware→ Verified later through VirusTotal

Detec on me≃ 0.16s!

17

Page 27: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

MASTINO RESULTS

Mastino evaluation:

x On testing dataset: 95.8% TP, 0.5% FPx Early detection experiment, deployed in the wild for 6months:→ Detected 84% of future malware→ Verified later through VirusTotal

Detec on me≃ 0.16s!

17

Page 28: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

ISSUES

Traditional ML developed for ``natural'' objects:

x Natural Language Processing.x Image analysis e.g. picture text search.x Classification of plants animals.x Economics laws.

Metrics like ROC, FP, FN, work very well in these cases,however cyberworld is not natural:

x Things change abruptly e.g. updates, new malware, newtechnologies.x There is no clear evolutionary law.x Change is deterministic and unpredictable.x Behaviours change/slide over time.

18

Page 29: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

ML TRAPS

Machine learning often seen as a black-box panacea:

x Little is understood.x Results with hi accuracy taken without questioning quality.

However:

x Overfitting: if training and testing is not done carefully.x Validity of results: a system that works on paper may notwork in the field.x Datasets: Variety vs Chronology

Need for novel metrics!

19

Page 30: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

ML TRAPS

Machine learning often seen as a black-box panacea:

x Little is understood.x Results with hi accuracy taken without questioning quality.

However:

x Overfitting: if training and testing is not done carefully.x Validity of results: a system that works on paper may notwork in the field.x Datasets: Variety vs Chronology

Need for novel metrics!

19

Page 31: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

CONFORMAL EVALUATOR

Library developed at Informa on Security Group at RoyalHolloway:x Evaluates algorithms in terms of confidence and credibility.x Core is Non-Conformity measure, elicited directly from the

algorithm, which in essence tells the difference between asample and a set of samples.x Builds decision and alpha assessments to evaluate thealgorithm.

R. Jordaney, Z. Wang, D. Papini, I. Nouretdinov and L. Cavallaro, Misleading Metrics:On Evalua ng Machine Learning for Malware with Confidence, Technical Report 2016-1Royal Holloway University of London.

20

Page 32: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

CONFORMAL EVALUATOR

Library developed at Informa on Security Group at RoyalHolloway:x Evaluates algorithms in terms of confidence and credibility.x Core is Non-Conformity measure, elicited directly from the

algorithm, which in essence tells the difference between asample and a set of samples.x Builds decision and alpha assessments to evaluate thealgorithm.

R. Jordaney, Z. Wang, D. Papini, I. Nouretdinov and L. Cavallaro, Misleading Metrics:On Evalua ng Machine Learning for Malware with Confidence, Technical Report 2016-1Royal Holloway University of London.

Training andTestingDataset

Similarity BasedClassification/Clustering

Algorithm

ConformalEvaluator

AlphaAssessment

DecisionAssessment

Non-ConformityMeasure

Conformal Evaluator Overview

20

Page 33: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

CE: EXAMPLE 1

System for Botnet detection and classification

bifrose sasfis blackenergy banbra pushdo0.0

0.2

0.4

0.6

0.8

1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15

Average algorithm correct choiceAverage algorithm credibility Average algorithm confidence

bifrose sasfis blackenergy banbra pushdo0.0

0.2

0.4

0.6

0.8

1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12

Average algorithm incorrect choiceAverage algorithm credibility Average algorithm confidence

Decision Assessment

21

Page 34: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

CE: EXAMPLE 1

System for Botnet detection and classification

bifrose sasfis blackenergy banbra pushdo0.0

0.2

0.4

0.6

0.8

1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15

Average algorithm correct choiceAverage algorithm credibility Average algorithm confidence

bifrose sasfis blackenergy banbra pushdo0.0

0.2

0.4

0.6

0.8

1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12

Average algorithm incorrect choiceAverage algorithm credibility Average algorithm confidence

Decision Assessment

bifrose'ssamples

sasfis'ssamples

blackenergy'ssamples

banbra'ssamples

pushdo'ssamples

0.0

0.2

0.4

0.6

0.8

1.0

P-v

alu

es

P-values: bifrose P-values: sasfis P-values: blackenergy P-values: banbra P-values: pushdo

Alpha Assessment

21

Page 35: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

CE: EXAMPLE 1

System for Botnet detection and classification

bifrose sasfis blackenergy banbra pushdo0.0

0.2

0.4

0.6

0.8

1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15

Average algorithm correct choiceAverage algorithm credibility Average algorithm confidence

bifrose sasfis blackenergy banbra pushdo0.0

0.2

0.4

0.6

0.8

1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12

Average algorithm incorrect choiceAverage algorithm credibility Average algorithm confidence

Decision Assessment

bifrose'ssamples

sasfis'ssamples

blackenergy'ssamples

banbra'ssamples

pushdo'ssamples

0.0

0.2

0.4

0.6

0.8

1.0

P-v

alu

es

P-values: bifrose P-values: sasfis P-values: blackenergy P-values: banbra P-values: pushdo

Alpha Assessment

Although the algorithm has reasonably good re-sults on paper, CE shows the quality of the re-sults is not good!

We run experiments on another dataset toconfirm, and the classifier get worse.

21

Page 36: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

CE: EXAMPLE 2

Mobile App classification: Malware vs Goodware

Correct choices Incorrect choices0.0

0.2

0.4

0.6

0.8

1.0

Average algorithm credibility for correct choiceAverage algorithm confidence for correct choiceAverage algorithm credibility for incorrect choiceAverage algorithm confidence for incorrect choice

MALICIOUS'ssamples

BENIGN'ssamples

0.0

0.2

0.4

0.6

0.8

1.0

P-va

lues

P-values: MALICIOUS P-values: BENIGN

22

Page 37: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

FINAL REMARKS

Page 38: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

FINAL REMARKS

Getting your hands in the game, what you need:

x You need to study a bit of MLx You need a problemx You need datax You need good metricsx In the wild analysis is a plusx You need tools:→ We did everything in python: Numpy, Scipy→ ML libraries: sk-learn, shogun-toolbox.org

24

Page 39: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

FINAL REMARKS

Machine Learning is great for Cyber Security!

Thanks for listening:Ques ons?

25

Page 40: Situational Awareness, Botnet and Malware Detection in the Modern Era  - Davide Papini - Codemotion Milan 2016

Introduction ML for Cyber Security Final Remarks

FINAL REMARKS

Machine Learning is great for Cyber Security!

Thanks for listening:Ques ons?

25