presented by: satyajeet dept of computer & information sciences university of delaware

18
CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic Analysis of Malware Behavior using Machine Learning Author’s: Konrad Rieck, Philipp Trinius, Carsten Willems, and Thosten Holz

Upload: avital

Post on 17-Mar-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware. Automatic Analysis of Malware Behavior using Machine Learning Author’s: Konrad Rieck, Philipp Trinius, Carsten Willems, and Thosten Holz. Abstract & Introduction. Malware - - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Presented by: SatyajeetDept of Computer & Information Sciences

University of Delaware

Automatic Analysis of Malware Behavior using Machine LearningAuthor’s: Konrad Rieck, Philipp Trinius, Carsten Willems, and

Thosten Holz

Page 2: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Abstract & Introduction

• Malware - • Poses major threat to security of computer systems. • Very diverse – viruses, internet worms, trojan horses,• Amount of malware – millions of hosts infected

• Obfuscation and polymorphism impede detection at file level

• Dynamic analysis helps characterizing and defending.

Page 3: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Abstract & Introduction Contd..

• Framework for automatic analysis of malware behavior using Machine learning• Framework allows automatic analysis of novel

classes of malware with similar behavior – Clustering.

• Assigning unknown classes of malware to these discovered classes – Classification.

• An incremental approach based on both for behavior based analysis.

Page 4: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Automatic analysis of Malware Behavior

• Framework steps and procedure• Executing and monitoring malware binaries in

sandbox environment. Report generated on system calls and their arguments.

• Sequential reports are embedded in a vector space where each dimension is associated with a behavioral pattern.

• ML techniques then applied to the embedded reports to identify and classify malware.

• Incremental analysis progress by alternating between clustering and classification.

Page 5: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Report representation• Can be textual or XML

• Human readable and suitable for computation of general statistics

• But not efficient for automatic analysis

• Hence MIST (Malware Instr. Set)• Inspired from instr. set used in process design.

Page 6: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

MIST

• Category of system calls• Operation - Reflects a particular system call• Arguments as argblocks.

Page 7: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Sandbox and MIST representation

Page 8: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Representation• These sequential reports identify typical behavior of

malware – Changing registry keys, modifying system files.

• But still not suitable for efficient analysis techniques. Hence the need to embed behavior reports in vector space – Using instruction q-grams.

• This embedding enables expressing the similarity of behavior geometrically – Calculating distance.

Page 9: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Clustering and Classification• Reports are embedded in vector space – Process

ready for applying ML techniques• Clustering of behavior – where classes of similar

behavior malware are identified.• Classification of behavior – which allows to assign

malware to known classes of behavior.• What allows us to do this? • Malware binaries are a family of similar variants

with similar behavior patterns !

Page 10: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Contd..

Page 11: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Algorithms

• Prototype extraction• Iterative algorithm• Extracts small set of prototypes from set of reports. First

one chosen at random.

• Clustering using Prototypes• Prototypes at beginning are individual clusters• Algorithm determines and merges nearest pairs of

clusters

• Classification using Prototypes• Allows to learn to discriminate between classes of

malware.

Page 12: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Algorithms Contd..• For each report algorithm determines the nearest

prototype of clusters in training data, if within radius then assigns to cluster

• Else rejects and holds back for later incremental analysis.

• Incremental analysis• Reports to be analyzed are received from source.• Initially classified using prototypes of known clusters• Thereby variants of known malware are identified for

further analysis.• Prototypes extracted from remaining reports and

clustered again.

Page 13: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Experiments and Results

Page 14: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Evaluating components• Prototype extraction

• Evaluated using Precision, Recall and Compression. • Precision – 0.99 when corpus compressed by 2.9 % & 7%

• Clustering• Evaluated using F-measure• F-measure for experiments – MIST 1 = 0.93 and MIST 2 =

0.95 better than previous related work 0.881

• Classification• F-measure for experiments – MIST 1= 0.96 and MIST 2 =

0.99

Page 15: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Experiments and Results Contd..

Page 16: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Experiments and Results Contd..

Page 17: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Conclusion

• A new framework introduced which overcomes several previous deficiencies.

• The framework is learning based• Framework can be implemented in practice

• Steps – Collection of malware, a study in sandbox environment, embed observed behavior in vector space, apply learning algorithms – clustering and classification.

• This process is efficient and learns automatically after initial setup and run.

Page 18: Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware

CISC 879 - Machine Learning for Solving Systems Problems

Thank you !