entropy and malware detection itec808 – final project presentation

30
Entropy and Malware Detection ITEC808 – Final Project Presentation Vithushan Sivalingam Student No: 42413753 Supervisors: Prof. Vijay Varadharanjan & Dr Udaya Tupakula 11 th November 2011 1/30

Upload: mili

Post on 23-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Entropy and Malware Detection ITEC808 – Final Project Presentation. Vithushan Sivalingam Student No: 42413753 Supervisors: Prof. Vijay Varadharanjan & Dr Udaya Tupakula 11 th November 2011. Contents. Introduction Project Aims Shannon’s Entropy Review Malware - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Entropy and Malware Detection ITEC808 – Final Project Presentation

Entropy and Malware Detection

ITEC808 – Final Project Presentation

Vithushan SivalingamStudent No: 42413753

Supervisors: Prof. Vijay Varadharanjan & Dr Udaya Tupakula

11th November 2011

1/30

Page 2: Entropy and Malware Detection ITEC808 – Final Project Presentation

ContentsIntroductionProject AimsShannon’s Entropy ReviewMalware Entropy techniques with malware Analysis of the schemesDiscussionConclusion Future Works

2/30

Page 3: Entropy and Malware Detection ITEC808 – Final Project Presentation

Introduction

Entropy quantifies the uncertainty involved in predicting the value of a random variable.

The outcome of a fair coin flip (two equally likely outcomes)provides less information (lower entropy) than specifying the outcome from a roll of a dice (six equally likely outcomes).

In real world, most collections of data give the false information somewhere in between.

False Positive - Couldn’t be Identify the software to be malicious, but missed it and it is a malicious.

False Negative - Identify the software be malicious, but it doesn’t turn out to be.

3/30

Page 4: Entropy and Malware Detection ITEC808 – Final Project Presentation

Malware detection plays a significant role in protecting against attacks launched on a communication world.

Still malware detection tools cannot fully prevent against encrypted and packed malwares.

Explore improvement of malware detection through entropy techniques.

4/30

Page 5: Entropy and Malware Detection ITEC808 – Final Project Presentation

Project Aims

The main goal of this project was to investigate the development of suitable entropy techniques to detect malware.

ITEC808 Literature View Component are: Reviewing the Shannon’s entropy method. Identifying of malware attributes and functionality. Detailed understanding of entropy techniques and

malware detection. Study of entropy based malware detection schemes. Analysing and reasoning about the efficiency of the

proposed schemes.

5/30

Page 6: Entropy and Malware Detection ITEC808 – Final Project Presentation

Problems and SignificanceUnderstanding the entropy theorem.

Malware Growth & Identifying attributes and functionality.

Understanding on statistical variation in malware executables.

6/30

Page 7: Entropy and Malware Detection ITEC808 – Final Project Presentation

Investigate the development of suitable entropy techniques to detect malware.

Which could be helpful for security analysts to identify more efficiently malware samples (packed or encrypted).

7/30

Page 8: Entropy and Malware Detection ITEC808 – Final Project Presentation

Shannon’s Entropy Review

Point to Point Communication.

• Given two random variables, what can we say about one when we know the other? This is the central problem in information theory.

• Keywords : Choice, Uncertain and Entropy

8/30

Page 9: Entropy and Malware Detection ITEC808 – Final Project Presentation

The entropy of a random variable X is defined by

• X- information source

The entropy is non-negative. It is zero when the random variable is “certain” to be predicted.

9/30

𝐻 (𝑥)=−𝐾∑𝑖=1

𝑛

𝑃 𝑖 log 𝑃 𝑖 𝐻 (𝑥)=∑𝑖=1

𝑛

𝑃 𝑖 log1𝑃 𝑖

Page 10: Entropy and Malware Detection ITEC808 – Final Project Presentation

Flip Coin {0.5,0.5} ◦ H(x) = + ≈ 1 bit (Receive 1 bit of information)

Double headed {1}◦ H(x) = = 0 bit

Unfair Coin {0.75,0.25}◦ H(x) = + ≈ 0.811 bit

10/30

Fair distribution

Unfair distribution

Known distribution

Page 11: Entropy and Malware Detection ITEC808 – Final Project Presentation

11/30

H(X)

Bits

Probability

Fair distribution entropy reached the highest level (1 bit)

Known distribution, entropy getting 0 bits of information. ( P = 1 or 0)

Unfair distribution, the entropy lower than maximum. (not balanced)

Page 12: Entropy and Malware Detection ITEC808 – Final Project Presentation

Joint Entropy For two random variables X and Y , the joint entropy

is defined by H(X, Y) =

Conditional entropy Between two random variables X and Y are

dependent. The extra information X contains ones Y disclosed.

Continue with chain of entropy rules.

12/30

Page 13: Entropy and Malware Detection ITEC808 – Final Project Presentation

◦ H(X) - H(X|Y) = H(Y) - H (Y|X) ◦ H(X,Y) = H(X) +H(Y) (Independent)◦ H(X,Y) < H(X) +H(Y) (dependent)◦ H(X,Y) = H(X) + H(Y|X) = H(Y) + H (X|Y)

These entropy techniques helps to build the detection models.

13/30

Entropy

Joint Entropy

Conditional Entropy

Mutual Information(Information Gain)

Page 14: Entropy and Malware Detection ITEC808 – Final Project Presentation

MalwareMalware labelled by its attributes,

behaviours and attack patterns.

14/30

Page 15: Entropy and Malware Detection ITEC808 – Final Project Presentation

.

15/30

Reported that among 20, 000 malware samples more than 80% were packed by packers from 150 different families.

If the malware, modified in runtime encryption or compression, known as a packed malware.

This process compresses an executable file and modifies the file containing the code to decompress it at runtime

Page 16: Entropy and Malware Detection ITEC808 – Final Project Presentation

Packed executable is built with two main parts.

Initially, the original executable is compressed and kept in a packed executable as a file.

Secondly, a decompression section is added to the packed executable. (This section is used to reinstall the main executable. )

16/30

Page 17: Entropy and Malware Detection ITEC808 – Final Project Presentation

Entropy techniques with malware Entropy of packed information is higher than the

original information. Information is reduced by compression and a series of bits

becomes more unpredictable, which is equivalent to uncertainty.

◦ Packed Information Uncertainty Information Entropy

◦ Original Information. Uncertainty Information Entropy

False alarms play a big role. Possible that legitimately compressed and encrypted files

could trigger false positives.

17/30

Page 18: Entropy and Malware Detection ITEC808 – Final Project Presentation

But we can use entropy to determine whether it’s an anomaly or not.

Establish categories based on different entropies. If entropy over a threshold then we can categories to be

malicious and below that value all being not malicious.

That means, we can use the entropy as a measure to classify the software to be malware.

18/30

Not Malicious

Malicious

Page 19: Entropy and Malware Detection ITEC808 – Final Project Presentation

Analysis of the schemes In the Information-theoretic Measures for

Anomaly Detection.Objective

Provide theoretical foundation as well as useful tools that can facilitate the IDS development process and improve the effectiveness of ID technologies.

Experiments on University of New Mexico (UNM) sendmail system

call data MIT Lincoln Lab sendmail BSM data MIT Lincoln Lab tcpdump data

19/30

Page 20: Entropy and Malware Detection ITEC808 – Final Project Presentation

Approach: Entropy and conditional entropy: regularity

Determine how to build a model.

Joint (conditional) entropy: how the regularities between training and test datasets relate Determine the performance of a model on test

data.

A classification approach: Given the first k system calls, predict the k+1th

system call

20/30

Page 21: Entropy and Malware Detection ITEC808 – Final Project Presentation

Conditional Entropy of Training Data (UNM)

21/30

0

0.1

0.2

0.3

0.4

0.5

0.6

1 3 5 7 9 11 13 15 17

sliding window size

Con

ditio

nal E

ntro

py

bounce-1.int

bounce.int

queue.int

plus.int

sendmail.int

total

mean

• More information is included, the more regular the dataset.

Page 22: Entropy and Malware Detection ITEC808 – Final Project Presentation

Misclassification Rate: Training Data

22/30

• Misclassification means that the classification process classifies an item to be in class A while the actual class is B.

• The misclassification rate is used to measure anomaly detection performance.

05

1015

2025

3035

4045

50

1 3 5 7 9 11 13 15 17

sliding window size

Mis

clas

sific

atio

n Ra

tebounce-1.int

bounce.int

queue.int

plus.int

sendmail.int

total

mean

Page 23: Entropy and Malware Detection ITEC808 – Final Project Presentation

Conditional Entropy vs. Misclassification Rate

23/30

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

sliding window size

cond

Ent a

nd m

isC

lass

rat

e

total-CondEnttotal-MisClassmean-CondEntmean-MisClass

• The movement of misclassification rate coincides with the movement of conditional entropy.

• Estimated movement of misclassification rate, to select a sequence length for the detection model.

• E.g. Length 6 is better than 4, and 14 is better than 6.

Page 24: Entropy and Malware Detection ITEC808 – Final Project Presentation

Misclassification Rate of Testing Data and Intrusion Data

24/30

05

1015

2025

3035

4045

50

1 3 5 7 9 11 13 15 17

sliding window size

Mis

clas

sific

atio

n Ra

te

bounce-1.int

bounce.int

queue.int

plus.int

sendmail.int

total

sm-10763.int

syslog-local-1.int

fwd-loops-1.int

fwd-loops-2.int

fwd-loops-3.int

fwd-loops-4.int

fwd-loops-5.int

• Misclassification rate used as a indicator to determine whether it is an abnormal trace or normal trace .

Page 25: Entropy and Malware Detection ITEC808 – Final Project Presentation

Other Schemes Objectives “Unpacking using Entropy Analysis” analysis, how to use

entropy to quickly and efficiently identify packed or encrypted malware executable and offer results from testing methodology.◦ - bintropy technique

“Estimation for real-time encrypted traffic identification” analysis Entropy and describes a novel approach to classify network traffic into encrypted and unencrypted traffic.◦ real-time encrypted traffic detector (RTETD)◦ The classifier is able to operate in real-time as only the first packet of each

flow is processed◦ Used encrypted Skype traffic

25 /30

Page 26: Entropy and Malware Detection ITEC808 – Final Project Presentation

DiscussionThrough studying the schemes and information

theory I was able to find the follows.

Entropy can be used to measure the regularity of reviewing datasets of mixture of records.

Conditional entropy can be used to measure the regularity on sequential dependencies of reviewing datasets of structured records.

Relative entropy can be used to measure the relationship between the regularity (consistency) measures of two datasets.

Information gain used to categorise the classifying data items.

26/30

Page 27: Entropy and Malware Detection ITEC808 – Final Project Presentation

ConclusionReview and Analyse of Shannon’s entropy

study, with Examples.

Research and identification of malware (packed) functionalities with characteristics and attributes.

Analysis of entropy based schemes.

These significant findings will be following up in future work.

27/30

Page 28: Entropy and Malware Detection ITEC808 – Final Project Presentation

28/30

Involving on the Investigation of entropy analysis for selected software samples.o Use the entropy techniques to compute the

entropy scores from the selected malware executable samples.

Identify the experimental tools. o We planed to analysis the malware samples

using commercial experiments tools. E.g. PECompact Executable Compressor

Page 29: Entropy and Malware Detection ITEC808 – Final Project Presentation

Reference1. C. E. Shannon. The Mathematical Theory of Communication. Reprinted with corrections

from The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, July, October, 1948.

2. M. Morgenstern and A. Marx. Runtime packer testing experiences. In Proceedings of the 2nd International CARO Workshop, 2008.

3. *Lee, W., Xiang, D.: Information-theoretic Measures for Anomaly Detection. In: IEEE Symp. On Security and Privacy, Oakland, CA, pp. 130-143 (2001).

4. M. Morgenstern and Hendrik Pilz, AV-Test GmbH, Magdeburg, Germany, Useful and useless statistics about viruses and anti-virus programs, Presented at CARO 2010 Helsinki.

5. *Lyda, R., Hamrock, J.: Using Entropy Analysis to Find Encrypted and Packed Malware. In: Security & Privacy, IEEE Volume 5, Issue 2, pp. 40-45, Digital Object Identifier 10.1109/MSP.2007.48 (March-April 2007).

6. Guhyeon Jeong, Euijin Choo, Joosuk Lee, Munkhbayar Bat-Erdene, and Heejo Lee Generic, Unpacking using Entropy Analysis, Div. of Computer & Communication Engineering, Korea University, Seoul, Republic of Korea, 2010.

7. *Peter Dorfinger, Georg Panholzer, and Wolfgang John: Entropy estimation for real-time encrypted traffic identification: Salzburg Research, Salzburg, Austria, 2010.

29/30

Page 30: Entropy and Malware Detection ITEC808 – Final Project Presentation

Thank you.

30