malware detection using machine learning

Post on 16-Apr-2017

1.631 Views

Category:

Technology

143 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MALWARE DETECTION USING MACHINE

LEARNING

ABHIJIT MOHANTA

ABOUT PRESENTER

• Worked as security researcher for Symantec,Mcafee,Cyphort

• Experience in reverse engineering ,malware analysis and detection

• Worked on antivirus engines,and sandbox engines

DISCALIMERI have used some contents from the following sites Reference:

• analyticsvidhya.com• datadrivensecurity.info• home.agh.edu.pl• neuralnetworksanddeeplearning.com• http://www.astroml.org• Youtube• Google images

Malware Detection in Antivirus:How Antiviruses detect malware?• Traditional AV's pattern matching on static files• Partially decrypt using techniques like emulation

How Malwares evade antivirus?• use polymorphic packers which evades static pattern

matching

Why Machine Learning?• Too many types of malware bots,virus • Based on target stealers,POS malwares,banking• Too much data for human to process

MACHINE LEARNING INTRO• Some prerequisites:

statistics,calculus,vectors,algebra

• Problems solved: classification /regression

• Types: supervised,semi-supervised,unsupervised

• What is our problem? Classification

Supervised Learning:• What is it?• Steps:

– Feature Selection– Training(provide Labelled Data)– Prediction

FEATURE SELECTION• How features are selected in Classification?• Some property with which you can distinguish two

classes is A Feature• Feature can be represented as Vector,Boolean etc• Apple Vs Orange Class:

– Feature: colour,weight,shape– Label: apple,guava

MODEL SELECTIONModels for supervised Learning:•K-Nearest Neighbours(KNN)-classification•K-Means clustering•SVM•Decision Tree•Random Forest•Naive Bayes Algorithm

K-Nearest Neighbours(KNN)• Supervised learning• Classification Algorithm• Similarity to neighbours-(Eucledian,Manhattan,Minkowski)• Euclidean distance• A circle around the point to be classified that contains k points

K-Means• Unsupervised learning• Clustering algorithm• Given some data we cluster the data to K

groups• In each iteration the mean value of the

cluster is updated• Centre calculated using Eucledian

distance• ref video:https://www.youtube.com/watch?

v=aiJ8II94qck

Support Vector Machines• Classifier• What are support vectors• Linearly separating Hyperplane• Margins with max separation

Support Vector Machines

• ref:http://www.saedsayad.com/support_vector_machine.htm• videos:• https://www.youtube.com/watch?v=1NxnPkZM9bc• https://www.youtube.com/watch?v=5zRmhOUjjGY

Decision Tree

Ref:https://databricks.com/blog/2014/09/29/scalable-decision-trees-in-mllib.html

Random Forest• Ensemble learning method• Uses output of multiple decision trees

Ref:https://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/

Features for Malware Detection• Static:

– Size– Signed/unsigned– Icon-exe file without icons– entropy

• Behaviour:– Process executed from %appdata% and %temp%– Dropped file has random name eg xszsde.exe– Process creating run entries– Code injection

Training Sets for malware

Some application for Malware Traffic Detection• DGA algorithm detection• DGA: what is DGA?

• Features:– N-Grams– Entropy– Dictionary– Reference:http://datadrivensecurity.info

ADVANCED TOPICS• NEURAL NETWORKS• DEEP NEURAL NETWORKS

PYTHON LIBRARIES• Scikit-Learn• Numpy• Pandas

top related