Download - Malware Detection using Machine Learning
MALWARE DETECTION USING MACHINE
LEARNING
ABHIJIT MOHANTA
ABOUT PRESENTER
• Worked as security researcher for Symantec,Mcafee,Cyphort
• Experience in reverse engineering ,malware analysis and detection
• Worked on antivirus engines,and sandbox engines
DISCALIMERI have used some contents from the following sites Reference:
• analyticsvidhya.com• datadrivensecurity.info• home.agh.edu.pl• neuralnetworksanddeeplearning.com• http://www.astroml.org• Youtube• Google images
Malware Detection in Antivirus:How Antiviruses detect malware?• Traditional AV's pattern matching on static files• Partially decrypt using techniques like emulation
How Malwares evade antivirus?• use polymorphic packers which evades static pattern
matching
Why Machine Learning?• Too many types of malware bots,virus • Based on target stealers,POS malwares,banking• Too much data for human to process
MACHINE LEARNING INTRO• Some prerequisites:
statistics,calculus,vectors,algebra
• Problems solved: classification /regression
• Types: supervised,semi-supervised,unsupervised
• What is our problem? Classification
Supervised Learning:• What is it?• Steps:
– Feature Selection– Training(provide Labelled Data)– Prediction
FEATURE SELECTION• How features are selected in Classification?• Some property with which you can distinguish two
classes is A Feature• Feature can be represented as Vector,Boolean etc• Apple Vs Orange Class:
– Feature: colour,weight,shape– Label: apple,guava
MODEL SELECTIONModels for supervised Learning:•K-Nearest Neighbours(KNN)-classification•K-Means clustering•SVM•Decision Tree•Random Forest•Naive Bayes Algorithm
K-Nearest Neighbours(KNN)• Supervised learning• Classification Algorithm• Similarity to neighbours-(Eucledian,Manhattan,Minkowski)• Euclidean distance• A circle around the point to be classified that contains k points
K-Means• Unsupervised learning• Clustering algorithm• Given some data we cluster the data to K
groups• In each iteration the mean value of the
cluster is updated• Centre calculated using Eucledian
distance• ref video:https://www.youtube.com/watch?
v=aiJ8II94qck
Support Vector Machines• Classifier• What are support vectors• Linearly separating Hyperplane• Margins with max separation
Support Vector Machines
• ref:http://www.saedsayad.com/support_vector_machine.htm• videos:• https://www.youtube.com/watch?v=1NxnPkZM9bc• https://www.youtube.com/watch?v=5zRmhOUjjGY
Decision Tree
Ref:https://databricks.com/blog/2014/09/29/scalable-decision-trees-in-mllib.html
Random Forest• Ensemble learning method• Uses output of multiple decision trees
Ref:https://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/
Features for Malware Detection• Static:
– Size– Signed/unsigned– Icon-exe file without icons– entropy
• Behaviour:– Process executed from %appdata% and %temp%– Dropped file has random name eg xszsde.exe– Process creating run entries– Code injection
Training Sets for malware
Some application for Malware Traffic Detection• DGA algorithm detection• DGA: what is DGA?
• Features:– N-Grams– Entropy– Dictionary– Reference:http://datadrivensecurity.info
ADVANCED TOPICS• NEURAL NETWORKS• DEEP NEURAL NETWORKS
PYTHON LIBRARIES• Scikit-Learn• Numpy• Pandas