hunting for metamorphic engines

39
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1

Upload: coby-anthony

Post on 03-Jan-2016

24 views

Category:

Documents


2 download

DESCRIPTION

Hunting for Metamorphic Engines. Wing Wong Mark Stamp. In This Paper…. Analyze metamorphic malware Hacker-produced metamorphic code Measure similarity of software Based on n -gram analysis Compute scores Based on n -grams and Based on HMMs This paper is baseline for future work. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hunting for Metamorphic Engines

1

Hunting for Metamorphic Engines

Wing WongMark Stamp

Hunting for Metamorphic Engines

Page 2: Hunting for Metamorphic Engines

2

In This Paper, We…

Analyze metamorphic malwareo Hacker-produced metamorphic code

Measure similarity of softwareo Based on n-gram analysis

Compute scoreso Based on n-grams ando Based on HMMs

This paper is baseline for future work

Hunting for Metamorphic Engines

Page 3: Hunting for Metamorphic Engines

3

Motivation

Many virus construction kits availableo Many can produce metamorphic code

So anybody can create “new” version of existing malwareo Virtually no technical expertise

required How “effective” is the resulting

metamorphic code? Can we detect metamorphic

malware?

Hunting for Metamorphic Engines

Page 4: Hunting for Metamorphic Engines

4

Background Encrypted, polymorphic,

metamorphico Metamorphic == body polymorphic

Metamorphic vs cloned softwareo Clone is the norm, but metamorphic

could offer advantages to the good guy too…

From the theory, we know malware detection is NP-completeo And metamorphic is at least as hardo But what about practical situation?

Hunting for Metamorphic Engines

Page 5: Hunting for Metamorphic Engines

5

Metamorphism

Metamorphic code changes it “shape”

Well-known exampleso W95/Regswapo W32/Ghosto W95/Zpermo MetaPHOR

Hunting for Metamorphic Engines

Page 6: Hunting for Metamorphic Engines

6

Metamorphism

General techniques availableo Insertiono Substitutiono Transpositiono Deletion

Some easier to implement than others

Some more effective against certain detection strategies

Hunting for Metamorphic Engines

Page 7: Hunting for Metamorphic Engines

7

Virus Construction Kits

In this paper, we considero PS-MPC (Phalcon/Skism Mass

Produced Code generator)o G2 (Second Generation virus

generator)o MPCGEN (Mass Produced Code

GENerator)o NGVCK (Next Generation Virus

Construction Kit)o VCL32 (Virus Creation Lab for Win32)

Hunting for Metamorphic Engines

Page 8: Hunting for Metamorphic Engines

8

Virus Construction Kits

Did not consider MetaPHORo Difficult to work with, finicky

All of these claim to be metamorphic

Are they really?o How can we measure

“metamorphism”? If they are highly metamorphic, can

we still detect them?Hunting for Metamorphic Engines

Page 9: Hunting for Metamorphic Engines

9

Brief Review of Malware Detection

First generationo Signature scanning, wildcards OK

Second generationo Approximate signature scanning; e.g.,

ignore NOP instructions Code emulation Heuristic analysis

o Static or dynamic, false positives…

Hunting for Metamorphic Engines

Page 10: Hunting for Metamorphic Engines

10

Machine Learning

Consider the followingo Data Mining, Neural Networks, HMMs

Data Miningo Malware-related previous worko Generic approach

Neural Networkso Previous work based on byte trigramso Developed and used at IBM

Hunting for Metamorphic Engines

Page 11: Hunting for Metamorphic Engines

11

Hidden Markov Models

Train HMM on metamorphic family Then we can score any file to see

how “close” it is to the family What to use to train such an HMM?

o Raw bytes in exe?o Disassembled code?o Opcode sequence?

More on this later…

Hunting for Metamorphic Engines

Page 12: Hunting for Metamorphic Engines

12

Software Similarity

How to quantify metamorphism? In general, how to measure

similarity of software? Given program 1 and program 2.. We develop a score

o Score of 0 means “no similarity”o Score of 1 means “virtually identical”

Hunting for Metamorphic Engines

Page 13: Hunting for Metamorphic Engines

13

N-gram Similarity

Given executable files X and Y Extract opcode sequences from

eacho Suppose X has n opcodeso Suppose Y has m opcodes

How to compare the sequences? Many possible ways --- here we use

n-gram analysiso That is, we compare subsequences

Hunting for Metamorphic Engines

Page 14: Hunting for Metamorphic Engines

14

N-gram Similarity

Extracted opcode sequenceso X=(x0,x1,…,xn-1) and Y=(y0,y1,…,ym-1)

Compare subsequences of length ko Then xi,xi+1,…,xi+k-1 matches yj,yj+1,…,yj+k-1 if

they are the same in any ordero For each such match, plot the point (i,j)o Remove any segments less than p points

Then score = (x axis covered + y axis covered)

/ 2

Hunting for Metamorphic Engines

Page 15: Hunting for Metamorphic Engines

15

N-gram Similarity Example

Hunting for Metamorphic Engines

Page 16: Hunting for Metamorphic Engines

16

N-gram Similarity

Score is between 0 and 1 If program X identical to program Y

o Main diagonal is a solid lineo And score = 1

Minimum score is 0 The smaller the score, the less

similar are the programs

Hunting for Metamorphic Engines

Page 17: Hunting for Metamorphic Engines

17

Typical N-gram Similarity

Hunting for Metamorphic Engines

Normal (cygwin utility) files

Page 18: Hunting for Metamorphic Engines

18

Typical N-gram Similarity

Hunting for Metamorphic Engines

NGVCK

Page 19: Hunting for Metamorphic Engines

19

Typical N-gram Similarity

Hunting for Metamorphic Engines

G2

Page 20: Hunting for Metamorphic Engines

20

N-gram Similarity

Compare members of a “family” with each other

Hunting for Metamorphic Engines

Page 21: Hunting for Metamorphic Engines

21

N-gram Similarity

In graphical form…

Hunting for Metamorphic Engines

Page 22: Hunting for Metamorphic Engines

22

N-gram Similarity Conclusion?

G2 more similar to each other than expectedo So, they are not very metamorphico Ditto for most of the other generators

But, NGVCK viruses more different from each other than expectedo So, they are highly metamorphic

Implication wrt signature detection?

Hunting for Metamorphic Engines

Page 23: Hunting for Metamorphic Engines

23

NGVCK Similarity

Compare NGVCK to other families…

Hunting for Metamorphic Engines

Page 24: Hunting for Metamorphic Engines

24

NGVCK Similarity Conclusion?

NGVCK viruses very different from each othero Implies highly metamorphic…o …so, signature detection will fail

But NGVCK viruses are even more different from normal fileso Then what about detection?

Hunting for Metamorphic Engines

Page 25: Hunting for Metamorphic Engines

25

Aside: Similar Similarity Measures to Consider?

Given opcode sequenceso Edit distanceo Other sequence comparison

techniqueso Statistical measures

Considering raw byteso Statistical measureso Entropy and other “structural”

measuresHunting for Metamorphic Engines

Page 26: Hunting for Metamorphic Engines

26

Hidden Markov Models

Generic view of HMM

Hunting for Metamorphic Engines

Page 27: Hunting for Metamorphic Engines

27

HMM Notation

Hunting for Metamorphic Engines

Page 28: Hunting for Metamorphic Engines

28

HMM for Metamorphic Detection

Train HMMo Extract opcodes from family

executableso Append opcode sequenceso Train a model, i.e., determine matrices

Use trained HMM to score fileso Given an file, extract opcode

sequenceo Score sequence against the modelo Compare to predetermined thresholdHunting for Metamorphic Engines

Page 29: Hunting for Metamorphic Engines

29

HMM Scoring: Fine Points

Score computed as log likelihood of the scored sequenceo Normalize as “log likelihood per

opcode”o Why LLPO?

How to quantify effectiveness?o ROC curves are very usefulo Specifically, area under ROC curve

(AUC)Hunting for Metamorphic Engines

Page 30: Hunting for Metamorphic Engines

30

Results

HMM scoring for NGVCK family

Hunting for Metamorphic Engines

Page 31: Hunting for Metamorphic Engines

31

HMM Scoring: Bottom Line

Signature detection for metamorphic families, except NGVCK

For NGVCK, we can use HMMo Classification is 100% when compared

to normal (benign) fileso Some misclassifications of other

malware (is that good or bad?) Should include ROC curves, AUC, …Hunting for Metamorphic Engines

Page 32: Hunting for Metamorphic Engines

32

HMM States: 3 State Model

Hunting for Metamorphic Engines

Page 33: Hunting for Metamorphic Engines

33

N-gram Score

Can also score files using N-grams Randomly select NGVCK file

o Extract its opcode sequence Given a file we want to score

o Extract its opcode sequenceo N-gram similarity to NGVCK sequenceo Higher similarity, classify as NGVCKo Lower similarity, classify as “not

NGVCK”Hunting for Metamorphic Engines

Page 34: Hunting for Metamorphic Engines

34

N-gram Score Results?

For NGVCK, obtain ideal separationo There exists a threshold for which…o …we can separate NGVCK from

normal Surprisingly strong results

o For such a simple similarity score Why does this work?

o We come back to this at the end…

Hunting for Metamorphic Engines

Page 35: Hunting for Metamorphic Engines

35

Compare to Commercial AV

Tested following on our virus setso eTrust, avast!, AVG

These scanners detected most of the viruses from weak familieso That is, G2, VCL32, etc.

But none of the NGVCK viruses detected by any of the 3 scanners

Hunting for Metamorphic Engines

Page 36: Hunting for Metamorphic Engines

36

Conclusion

HMM effective at detecting a highly metamorphic NGVCK malware family

N-gram similarity also effective NGVCK not detected by commercial

AV So, this detection improves the state

of the art Practical considerations?Hunting for Metamorphic Engines

Page 37: Hunting for Metamorphic Engines

37

Lessons Learned?

Why can we detect NGVCK family? In spite of high metamorphism,

code is statistically different from normal

“Improved” metamorphic malware? Metamorphism must be sufficient

to evade signature detection But, metamorphic family must be

statistically similar to normalHunting for Metamorphic Engines

Page 38: Hunting for Metamorphic Engines

38

Future Work

Build a better metamorphic generatoro Some progress here, but still

detectable using other detection methods

o Still need better generators… Develop and test other detection

strategieso Lots of work done here tooo But lots more to doHunting for Metamorphic Engines

Page 39: Hunting for Metamorphic Engines

39

References

W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology 2(3):211-229, 2006

M. Stamp, A revealing introduction to hidden Markov models

Hunting for Metamorphic Engines