classification of malware persistence mechanisms using low …... · 2019-02-13 · abstracting...

CLASSIFICATION OF MALWARE PERSISTENCE

MECHANISMS USING LOW-ARTIFACT DISK

INSTRUMENTATION

A Dissertation Presented

by

Jennifer Mankin

to

The Department of Electrical and Computer Engineering

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in

Electrical and Computer Engineering

in the field of

Computer Engineering

Northeastern University

Boston, Massachusetts

September 2013

Abstract

The proliferation of malware in recent years has motivated the need for tools to an-

alyze, classify, and understand intrusions. Current research in analyzing malware

focuses either on labeling malware by its maliciousness (e.g., malicious or benign)

or classifying it by the variant it belongs to. We argue that, in addition to provid-

ing coarse family labels, it is useful to label malware by the capabilities they em-

ploy. Capabilities can include keystroke logging, downloading a file from the internet,

modifying the Master Boot Record, and trojanizing a system binary. Unfortunately,

labeling malware by capability requires a descriptive, high-integrity trace of malware

behavior, which is challenging given the complex stealth techniques that malware

employ in order to evade analysis and detection. In this thesis, we present Dione, a

flexible rule-based disk I/O monitoring and analysis infrastructure. Dione interposes

between a system-under-analysis and its hard disk, intercepting disk accesses and re-

constructing high-level file system and registry changes as they occur. We evaluate

the accuracy and performance of Dione, and show that it can achieve 100% accuracy

in reconstructing file system operations, with a performance penalty less than 2% in

many cases.

ii

Given the trustworthy behavioral traces obtained by Dione, we convert file system-

level events to high-level capabilities. For this, we use model checking, a formal veri-

fication approach that compares a model extracted from a behavioral trace to a given

specification. Since we use Dione traces of file system and registry events, we aim to

label persistence capabilities—that is, we label a sample by the mechanism it uses not

only to persist on disk, but to restart after a system boot. We model the Windows

service, a commonly-employed capability used by malware to persist, load a binary

after reboot, and even load dangerous code into the kernel. We model the installation

of a Windows service, the system boot, and the file access of the service binary. We

test our models on over 1000 real-world malware samples, and show that it success-

fully identifies service-installing malware samples over 99% of the time, and malware

that loads that service over 98% of the time. Moreover, we demonstrate that we are

able to use traces of disk reads to differentiate between two types of file accesses. We

show that we can not only detect when a persistence mechanism is installed, but also

that the persistence mechanism is successful because we detect the automatic load

of the program binary after a system reboot. We correctly identify file access types

from disk access patterns with less than 4% of samples mislabeled, and demonstrate

that even an expert analyst would have difficulty correctly identifying the mislabeled

accesses.

iii

Acknowledgements

First and foremost, I would like to thank my husband Dana. Not only would it have

been nearly impossible to complete this work without his love and support, but it

most definitely would not have been this much fun! I would also like to thank my

family for everything they’ve done for me and for supporting me throughout the years.

I specifically owe my success to my parents for instilling in me a love of learning and

logic, and for emphasizing to me the most important thing is to try.

The insightful and inspiring help from both my academic and industry advisors

was critical throughout this entire process, culminating with this dissertation. I would

like to acknowledge the tremendous support of my advisor at Northeastern, Dr. David

Kaeli, and thank him for his many years of dedication to helping his students achieve

great things. I also want to thank my technical supervisors at MIT Lincoln Labo-

ratory, Charles Wright and Graham Baker, for developing this exciting research and

guiding me throughout the process. Finally, I would like to thank my colleagues at

Northeastern and MIT Lincoln Labs for their invaluable feedback and discussions.

iv

[This page intentionally left blank.]

v

Contents

Abstract ii

Acknowledgements iv

v

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . . . 12

2 Background 14

2.1 Malicious Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Malware Types . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.2 Anti-Forensics Techniques . . . . . . . . . . . . . . . . . . . . 16

2.1.3 Evasion Techniques . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Malware Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.1 Static Binary Analysis . . . . . . . . . . . . . . . . . . . . . . 27

2.2.2 Dynamic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3 Windows Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

vi

2.3.1 The Windows Registry . . . . . . . . . . . . . . . . . . . . . . 30

2.3.2 NTFS File System . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3.3 Performance Optimizations for Disk Accesses . . . . . . . . . 36

2.4 Formal Verification and Model Checking . . . . . . . . . . . . . . . . 37

2.4.1 Predicate Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.4.2 Temporal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.4.3 Linear Temporal Predicate Logic . . . . . . . . . . . . . . . . 43

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Related Work 45

3.1 Malware Analysis and Instrumentation . . . . . . . . . . . . . . . . . 45

3.2 Characterizing Malware Behavior . . . . . . . . . . . . . . . . . . . . 52

3.2.1 Characterizing Malware with Machine Learning . . . . . . . . 53

3.2.2 Characterizing Malware Using Modeling . . . . . . . . . . . . 55

4 Dione: A Disk Instrumentation Framework 60

4.1 Threat Model and Assumptions . . . . . . . . . . . . . . . . . . . . . 60

4.2 Dione Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.1 Dione Policy Commands . . . . . . . . . . . . . . . . . . . . 64

4.2.2 Dione State Commands . . . . . . . . . . . . . . . . . . . . . 65

4.3 Live Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.3.1 Live Updating Challenges . . . . . . . . . . . . . . . . . . . . 66

4.3.2 Live Updating Operation . . . . . . . . . . . . . . . . . . . . . 68

4.4 Disk Sensor Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 72

vii

4.5.2 Evaluation of Live Updating Accuracy . . . . . . . . . . . . . 72

4.5.3 Evaluation of Performance . . . . . . . . . . . . . . . . . . . . 74

4.6 Registry Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5 Labeling Malware Persistence Mechanisms with Dione 84

5.1 Modeling Persistence Mechanisms with LTPL . . . . . . . . . . . . . 84

5.1.1 System Boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.1.2 Service Install . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.1.3 File Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.1.4 Persistent Service Load . . . . . . . . . . . . . . . . . . . . . . 89

5.2 Dione Capability Labeler Implementation . . . . . . . . . . . . . . . 90

5.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.3.1 Testbeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.3.2 Malware Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3.3 Assignment of “Truth” Labels . . . . . . . . . . . . . . . . . . 94

5.3.4 Model Checker Results . . . . . . . . . . . . . . . . . . . . . . 98

5.4 Labeling File Access Type . . . . . . . . . . . . . . . . . . . . . . . . 103

5.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.4.2 Program Binary Load Classifier . . . . . . . . . . . . . . . . . 107

5.4.3 SVM Classifier Implementation . . . . . . . . . . . . . . . . . 108

5.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6 Directions for Future Work 117

7 Thesis Summary and Contributions 119

8 Appendix 122

viii

8.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Bibliography 137

ix

Chapter 1

Introduction

The past decade has been boldly marked by the ongoing arms race between mali-

cious software creators and security researchers. Not only are security companies and

researchers overwhelmed by the several million new unique samples discovered each

month, but the sophistication of malicious software continues to increase as well [46].

Malicious software, or malware, can take many forms. While the amount of harm

caused by a malware sample can vary, all malware share the property of having not

been installed with the full consent and knowledge of the user. Spyware or adware

can be installed on a user’s system, causing annoying pop-ups or violating privacy

expectations by tracking user habits [54]. Alternatively, malware may force the system

to become part of a network of hijacked machines used to send spam, hijack other

systems, or perpetuate Distributed Denial of Service (DDOS) attacks on banks or

targets of political protest [10]. Increasingly, malware is used for financial gain. For

example, banking threats seek to steal credentials from users or banking systems in

order to perpetuate financial crimes, while fake-alert and ransomware threats trick

the user into paying either for impostor security software or for the safe return of

1

their “ransomed” data [45].

Rootkits can be particularly dangerous, as they exist to provide additional stealth

measures to prevent the user or security products from detecting the presence of

the rootkit and any other malware it is packaged with [10]. Rootkits can execute

with administrator privilege by attacking and patching the code of the operating

system. Though the number of new rootkits discovered in the wild has been decreasing

since 2011, tens of thousands of new samples are still discovered every month [46].

Furthermore, there is a common adage in security that the winner between malware

and a security product is that which was loaded first. As a result, rootkits are

increasingly turning to infecting the Master Boot Record (MBR); since it performs key

startup operations, infection of the MBR is a devastating attack on the system [45].

Once a rootkit has breached kernel-level code, it is difficult to trust any security

product or malware analyzer running on the infected system.

In the past couple decades, research into labeling malware has focused on identi-

fying the malware by family or variant. While having labels available for new samples

is useful to provide a coarse-grained identification, we argue that labeling the behav-

ior of the malware could be more useful than identifying the family it belongs to.

Capability labeling is a promising solution to understanding how malware behaves.

Instead of identifying malware by its family or strain, identifying malware by the

capabilities it possesses allows security products to identify the high-level behaviors

that new malware is employing.

There are several benefits to labeling or identifying capabilities present in malware

or software. A system equipped with on-the-fly capability detection could provide

notifications to users when software or malware is installed with certain malicious ca-

pabilities. The information could also be used by security researchers and products to

2

outline the necessary steps to clean a system of the infection and prevent intrusions.

Furthermore, it allows security researchers to build up large corpuses of labeled sam-

ples for future research and experimentation, identifying what each sample actually

does.

Unfortunately, identifying high-level malware capabilities is a challenging problem.

First, it is difficult to obtain a descriptive, high-integrity trace of system events, since

malware writers employ a variety of techniques in order to prevent their malware from

being analyzed. Second, it is difficult to derive useful high-level behaviors from the

trace of events that has been obtained, as high-level behaviors can manifest themselves

in a variety of ways.

1.1 Motivation

Currently, state of the art research into malware labeling focuses predominantly on

one of two areas: labeling new samples as either malicious or benign, or labeling new

samples by family or variant. Early anti-virus technologies relied on signature match-

ing to identify and label software samples as malicious; these signatures contained

unique byte patterns, such as sequences of instructions, with each signature typically

only covering a single malware variant [17]. In order to counter attempts at obfus-

cation, researchers and AV vendors introduced the ability to use regular expressions

over the byte sequences, for example to skip over arbitrarily-inserted nop instructions,

though these too are easily evaded with polymorphic and metamorphic obfuscation

techniques [12].

3

Instead of using syntactic signatures—that is, raw byte patterns or regular expressions—

researchers have developed semantic models of malicious behavior based on instruc-

tion sequences [18]. Malicious behaviors are modeled, and the models abstract in-

struction sequences to use variables names and symbolic constants. Then, templates

of malicious behaviors are compared against potentially-malicious binaries to detect

instruction sequences that are semantically equivalent, rather than identical on a byte

level.

Abstracting semantic awareness to an even higher level, recent work has focused

on behavioral signatures. These behavioral signatures often looked at sequences of

system calls, or even higher-level behaviors represented by semantically-equivalent

system calls [4, 6, 52]. After building behavioral representations of malware samples,

both formal verification and machine learning techniques can be applied to label

samples by their maliciousness, or in an effort to divide them into classes based on

their family or variant.

Unfortunately, deriving family-based labels to identify malware samples presents

some significant challenges. Bailey et al. performed a detailed study of anti-virus

(AV) products and found that not only do different AV vendors use different labels

for different malware samples, but these AV vendors actually disagree on the number

and granularity of unique labels in general [4]. The goal of applying familial labels to

malware samples is to have a concise clustering of samples, with similar items grouped

into clusters that reflect appropriate differences while avoiding having so many labels

that the labels become meaningless. With too coarse a clustering, malware samples

may be labeled as being from the same family, when in reality they do not share

all functionality or capabilities. With too fine a granularity, similar variants within

the same family could be labeled as individual families, resulting in a clustering that

4

becomes less distinct as clusters blur together. The problem of labeling by family is

further exacerbated by the lack of “ground truth” labels. When researchers attempt

to assess the quality of their clustering algorithms, they often choose samples that

many AV vendors can easily label. This results in a malware corpus of “easy-to-label”

samples, and thus the effectiveness of labeling algorithms cannot be extrapolated onto

larger datasets for which ground truth is not known [49].

The blending and merging of malware samples arises from the relative ease with

which bad-actors can generate new malware samples. Malware writers can use ob-

fuscation techniques to produce samples with unique hashes and signatures. For

example, polymorphic techniques encrypt the body of the code, decrypting on the

fly during execution [20]. Meanwhile, metamorphic techniques change the structure

of the code—for example, using instruction reordering, insertion of junk instructions,

and registry renaming—while ensuring that the semantics of the code remains the

same [20]. Additionally, malware can be written in high-level programming lan-

guages, and source code and malware kits can be found on the Internet for little or

no cost, allowing even those with minimal programming skills to generate malware.

This means that malware writers can create new variants by adding new functionality

to old variants or by mixing existing components. The result of these techniques is

that the differentiation between malware families and variants begins to blur.

As the number of unique malware samples found in the wild continues to increase,

we posit that it is more useful to identify a malware sample by the behavioral char-

acteristics it possesses than by a variant label. A “capability” can be broadly defined

as being any intended feature of the software. Keystroke logging, downloading a

file from the internet, trojanizing a system binary, and overwriting the Master Boot

Record are all examples of malware capabilities. Instead of applying a single family

5

label to a malware sample, each sample would be labeled with all of the capabilities

it employs.

Labeling a sample by the capabilities it possesses, rather than by its family, pro-

vides several benefits. The first is that it provides an opportunity for alerting a user

or administrator when benign or malicious software installs or employs a potentially

dangerous or intrusive capability. Capability labeling can also identify how malware

infects a system, how it propagates to other systems, how it survives and restarts

after a reboot, or how it hides from the user or security products. Understanding

each of these characteristics is critical in developing products or advisories to clean

systems after infection, and to prevent malware from spreading to other systems.

A secondary benefit to capability labeling is in assisting security and malware

researchers, as it would allow researchers to build up large corpuses of malware for

which the high-level behaviors of each sample is known. If a researcher needs to test

a malware removal tool, for example, on real-world samples, they could simply query

the corpus for all samples that are labeled as having the specific capabilities that

allow them to persist on a system.

The first challenge in the labeling of malware based on behavior, whether by

capability, by family, or by maliciousness, is in obtaining a descriptive, high-integrity

behavioral trace, as even the best malware labeling algorithm cannot be accurate if

is processing an incomplete or inaccurate behavioral trace [20]. Behavioral traces can

be acquired using both static and dynamic techniques. Static analysis, in which the

binary is examined without actually executing it, is scalable but can be prevented by

malware looking to escape analysis [6, 43, 79]. For example, disassembling a malware

sample to obtain an instruction trace is useful for extracting the control and data

flow of a malware sample. However, in practice, malware often utilizes techniques to

6

prevent disassembly from occurring in the first place, maintaining the code’s original

functionality while transforming the binary.

Dynamic analysis attempts to understand the malware behavior by observing the

malware as it runs, collecting system call information, instruction traces, or other

events. While dynamic analysis avoids the obfuscation problems of static analysis, it

too has limitations. Manual analysis, performed by attaching a piece of malware to a

debugger, is too time-consuming to be scalable; furthermore, malware can detect when

it is running with a debugger and evade analysis [34, 57]. Similarly, analyzers running

in-host can be detected by malware and uninstalled or misled. Running malware in

a virtualization or emulation layer to collect event traces provides protection from

the malware and, as a result, provides the broadest coverage of malware analysis.

However, even these techniques can be detected by malware, and the malware can

then voluntarily exit to avoid analysis. Recent work suggested that nearly 25% of

malware utilized techniques to detect dynamic analyzers, and evaded analysis by

exiting [50]. Thus, the biggest problem of dynamic analysis is that it only reveals

what was actually executed, not all potential behaviors that could manifest on a given

system.

In order to address this challenge—the acquisition of a descriptive, high-integrity

trace—it is important for malware analyzers to work at either a higher privilege level

or a lower semantic level than the malware. In this dissertation, we present a disk

instrumentation and analysis infrastructure that does both. Dione, the Disk I/O

aNalysis Engine, is a flexible, portable, policy-based disk monitoring infrastructure

which facilities the collection and analysis of disk I/O. It uses information from a

sensor interposed between a System Under Analysis (SUA) and its hard disk. Since

it monitors I/O outside the reach of the operating system, it is resilient to stealth

7

measures employed by rootkits—including those with administrator-level privilege.

Instead of relying on constructs that can be manipulated by malware, Dione recon-

structs high-level file system and Windows registry operations using only low-level

intercepted metadata and disk sector addresses.

The second challenge to capability labeling is that, even after a high-integrity,

descriptive event trace has been obtained, it is necessary to convert the lower-level

events of the trace into higher-level behaviors or capabilities. The type of trace that is

available informs the types of capabilities that can be labeled. Since Dione provides

comprehensive, high-integrity events taking place at the disk level, we show that we

can infer high-level properties relating to the persistence capabilities of malware. That

is, we use the traces generated from Dione to demonstrate not only how malware

persists on disk, but how malware automatically restarts after a system is rebooted.

Persistence capabilities include trojanizing a system binary, overwriting the MBR to

force malicious code to load, utilizing the Windows service mechanism to automati-

cally load code or drivers at boot time, or pointing special auto-start registry keys to

the malicious code.

Given the descriptive, high-integrity traces produced by Dione, we set out to label

malware samples possessing certain capabilities that are used to persist and restart

upon system reboot. For our Dione Capability Labeler (DCL), we use model check-

ing, an algorithmic formal verification method used to verify properties of software.

Model checking is a property verification approach in which a property is specified

using a description language, resulting in a logic formula [28]. Likewise, a system (in

this case, a trace of malware execution) is also modeled using description language.

Then, the model of the system is compared to the logic formula to determine whether

the model satisfies the specification.

8

In the context of detecting a property—in this case, a capability—in malware, the

capability is specified in the description language, or logic, of the model checker. This

specification describes the behaviors (and the temporal ordering between behaviors)

that would be present in the behavioral trace if the malware possessed that capability.

Then, a model is extracted from each behavioral trace, and the model is compared

to the specification to determine whether the model fits the specification. If so, the

malware is labeled as having the specified capability.

In this work, we use the specification language Linear Temporal Predicate Logic [7,

42, 71], or LTPL, to model our capability specifications. We model three phases of

program behavior based on events gathered from Dione traces: (1) Installation of

persistence capability, (2) System boot, and (3) File access (program load) after

reboot. We chose these three stages because automatic loading of a program after

reboot demonstrates persistence and successful automatic loading. While the models

of system reboot and program load are shared across all persistence capabilities, each

type of persistence capability will require a separate model for the installation phase.

We also demonstrate that we can use the patterns detected from the file-read

events recorded in Dione traces to differentiate between two types of high-level file

accesses: file copy and program binary load. As a result, we can label malware as

not only having the capability to install a service, but also that it has successfully

utilized this mechanism to automatically load after a system boot.

We chose the Windows service as the persistence mechanism to model, since it

is a common mechanism used by malware to persist and restart after reboot [66].

The service mechanism can cause a lot of damage because it allows malware to load

malicious code into kernel space, it can be set to run automatically when the system

boots up, and it may not show up in Task Manager as a process [10, 66]. Using

9

domain knowledge about malware behavior, the NTFS file system, and the relation-

ship between the Windows XP operating system and corresponding disk behaviors,

we generate models for a service installation, a system reboot, and a program load;

we then combine the stages into a specification that detects the automatic loading of

the service after reboot. Because the pattern of disk accesses for a file load can vary

dramatically, we generalize the model for program load such that it specifies any type

of file access to avoid false negatives. Then, we bolster the model of a file access with

a supervised learning approach that differentiates between a program binary load and

another type of file read operation, a file copy. We generate features based on the file

content read pattern of the Dione trace and use a Support Vector Machine (SVM)

algorithm [22] to classify a series of disk accesses as either a file load or a file copy.

We demonstrate that we not only detect the persistence mechanism being in-

stalled, but we also verify that the persistence mechanism is successful because we

can detect the program binary load automatically after a system reboot.

1.2 Contributions

With this dissertation, we provide the following contributions to malware analysis

and disk forensics:

• We present Dione: The Disk I/O aNalysis Engine. Dione is the first portable

disk and file system analyzer to analyze disk traffic outside the system under

analysis to provide comprehensive, high-integrity traces for the NTFS file sys-

tem. We detail the challenges, the design, and the implementation of Dione,

explaining how we bridge both the semantic and temporal gaps in reconstructing

high-level operations from raw low-level metadata.

10

• We analyze the accuracy and performance of Dione demonstrating that it pro-

duces traces of file system operations with 100% accuracy, with a performance

overhead generally less than 10%—and often below 2%—in reconstructing file

system operations.

• We present DCL: the Dione Capability Labeler. We detail our models for three

properties: the Windows service installation, system reboot, and file access. We

model these properties using the logic language LTPL [71], and implement a

problem-specific model checker that checks the events of a Dione disk trace

against the specifications for each property. We demonstrate that DCL can

process a large number of samples in a short amount of time, labeling each

sample based on whether it exhibits a service persistence capability.

• We present a machine learning classifier that identifies a file binary load given

a disk access pattern, using this classifier to bolster our model for service per-

sistence. Our classifier mislabels fewer than 4% of traces, yet we show that

correctly labeling the mislabeled traces would be difficult for even an expert

analyst. By demonstrating that we can detect a file binary being automati-

cally loaded after a system boot, we can decisively label a sample as having a

successful persistence mechanism.

• We create an automated malware analysis testbed, which can automatically in-

strument malware samples using Dione and the Volatility memory introspec-

tion framework [76]. We run DCL with integrated file access pattern classifier on

over 1,000 real-world malware samples, detecting Windows service installation

over 99% of the time and service persistence over 97% of the time. Furthermore,

11

we show that, using Dione’s ability to generate on-the-fly traces of malware be-

havior, we can label more service installs and loads than a memory introspection

framework operating on a single snapshot in time.

1.3 Organization of Dissertation

The rest of the dissertation is organized as follows. In Chapter 2, we provide relevant

background material. This includes a discussion of malware types, as well as in-depth

explanation of the techniques that malware can utilize in order to hide from analysis.

We discuss the techniques of static and binary malware analysis, including the advan-

tages and disadvantages of each. We explain relevant Windows concepts; specifically,

we describe the structure of the NTFS file system as it pertains to disk instrumenta-

tion, as well as the optimizations used by the Windows operating system that make

instrumentation more challenging. Additionally, we discuss model checking, and the

logic language LTPL.

In Chapter 3, we discuss the related research in this area. This section includes

discussions of previous research on disk instrumentation, malware analysis, and the

use of machine learning and model checking to perform intrusion detection, malware

identification, and capability labeling.

In Chapter 4, we describe the Dione infrastructure. We detail the implemen-

tation of Dione, including design challenges and solutions of Dione. We evaluate

the accuracy of the Dione live updating engine, and the performance of full disk

instrumentation using the Xen hypervisor. Finally, we conclude with an explanation

of the limitations of Dione.

In Chapter 5, we describe our behavioral models for service install, system boot,

12

and service load; additionally, we model these properties in the logic language LTPL.

We describe the classification algorithm used to bolster the file access model, and we

detail the results of our integrated model checker and file access classifier on over a

thousand real-world samples.

Finally, we conclude the dissertation with objectives for future research in Chap-

ter 6 and a summary of contributions in Chapter 7.

13

Chapter 2

Background

In this chapter, we will outline relevant background information. We will begin with a

discussion of malware, including malware types, the common anti-forensics techniques

used by malware to avoid detection, and the evasion techniques they use to hide from

malware analyzers and security products. We will then discuss the ways in which

malware can be analyzed, including both static and dynamic analysis techniques.

Understanding how these analyzers can be misled by stealthy analyzers will motivate

the need for an analyzer that provides descriptive yet high-integrity traces. Before we

introduce Dione, our file system and disk I/O analysis infrastructure, in Chapter 4,

we will provide a thorough introduction to relevant Windows concepts, including the

NTFS file system, and the optimizations used by the Windows operating system

that make file system instrumentation more challenging. Since the Dione Capability

Labeler relies on model checking using formal specifications, we will discuss model

checking, and common description logic languages, including the Linear Temporal

Predicate Logic that we use to model persistence capabilities in Chapter 5.

14

2.1 Malicious Software

The term malware, or malicious software, can be used to describe a variety of un-

wanted or undesirable software or scripts. Generally, malware includes anything that

causes harm to a user, a computer system, or a network, though the amount of harm

can vary [54]. In this section, we will define and describe malware types, and de-

tail the anti-forensics and evasion techniques they may employ in order to hide from

malware analysis tools.

2.1.1 Malware Types

Viruses and worms can both be categorized as infectious agents; they are similar

in that they not only serve some nefarious purpose, but they are also capable of

replicating themselves [10]. A virus, however, requires an explicit user interaction—

double clicking on an executable, or opening a corrupted email attachment—whereas

a worm can propagate on its own, automatically transmitting itself over the network.

A trojan or trojan horse is malicious software that a user downloads or installs

believing that the software serves some benevolent, useful function [66]. The trojan

may indeed be bundled with useful software, or the software may be entirely malicious.

The verb trojanizing is also increasing in usage, and refers to malware hijacking and

patching an executable that already exists on the system so that the malicious code

will execute when the previously-benign program is loaded or run. Spyware and

adware may be used separately or together, and vary between merely annoying and

malicious [10]. Adware exists to disciple advertisements on the user’s computer, while

spyware tracks the users habits, usernames, passwords, or keystrokes.

Once a machine has been compromised by a worm, virus, or trojan, additional

15

types of malware may be installed. A backdoor is a method of bypassing standard

authentication to allow the attacker remote access to the compromised machine in

the future [54]. A botnet is a collection of machines that have been compromised

and are commanded and controlled by a bot herder [10]. Machines in a botnet may

wait for orders from the bot herder; these order could include sending spam, perpe-

trating Distributed Denial of Service (DDOS) attacks, and harvesting usernames and

passwords to commit financial crimes [10].

A rootkit is a particularly interesting component of malware. It exists to conceal

itself and other components, and to command and control a system remotely [10].

A rootkit’s most important quality is that it is stealthy: a good rootkit will go

undetected by the user to ensure that it stays present on the system as long as

possible. A rootkit may attain administrator-level privilege, either by exploiting a

program that is running with supervisor privilege, or by tricking an administrator

into installing malicious software. If the rootkit has unmitigated access to the kernel

code and data, then it can be difficult or impossible to detect.

Intuitively, it follows that malware will often be an amalgam of multiple malicious

components. For example, a virus may be packaged with a rootkit, so that the

rootkit can hide the presence of the virus. A rootkit may install a backdoor, so that

the attacker can command and control the compromised system. A botnet may be

composed of systems that have all been compromised by a rootkit.

2.1.2 Anti-Forensics Techniques

In order to thwart post-mortem forensic analysis of a compromised system, mal-

ware may utilize anti-forensics techniques [66]. On the simplest level, malware could

download and then subsequently delete any file-based payload, possible overwriting

16

the sectors that held the contents, to avoid any signature-based antivirus disk scan.

At a lower level, malware can manipulate the properties of a file. The hidden prop-

erty specifies whether a file or directory is hidden from the user, both in the graphical

explorer and through the command line. Another set of properties are the MAC

timestamps. Though “MAC” actually stands for Modified, Accessed, and Creation

times, NTFS utilizes a fourth timestamp as well, the Change time, which indicates

that metadata was changed. By setting any of the numbers to an unreasonably

low number, the Windows explorer will not display the time [10]. Alternatively, the

malware could set the MAC times of a newly created malicious file to the same times-

tamps as system files, so that the file appears to have been there since the operating

system was first installed.

Instead of hiding through the use of the hidden property, malware can also hide

through more sophisticated mechanisms. The first technique is called In-Band Hiding,

as it involves hiding in spaces that are specified by the file system. An example of

this is hiding in Alternate Data Streams (ADSs) [10]. As will be detailed further in

Section 2.3.2, the contents of a file in NTFS are stored in an attribute called $DATA.

However, a file can have multiple $DATA properties. These ADSs are a way to

persistently store information on disk, but they will not appear in Windows explorer

or in command line listings unless explicitly requested. Furthermore, the data stored

in an ADS is not included in the total size property of a file; this is because sizes are

associated with attributes, so the stated file size is actually just the size of the default

$DATA attribute. Conversely, Out-of-Band Hiding utilizes space not specified by the

file system. Malware may hide in the Master Boot Record (MBR), discussed in further

detail in Section 2.1.3. Alternatively, malware can hide in slack space. Because file

content is always allocated in clusters (commonly 4KB, or 8 sectors), there may be up

17

to 7 unused sectors for a given file. For certain versions of Windows, writing to this

slack space only requires repositioning the logical End-of-File (EOF) pointer, writing

to the space, and then non-destructively truncating the file by resetting the logical

EOF [10].

2.1.3 Evasion Techniques

Dione can be useful in instrumenting and analyzing the intrusion and presence of

each of these various types of malware, assuming that there is some symptom of

compromise which percolates to the disk. However, Dione’s particular strength is in

instrumenting and analyzing “hard” malware; that is, malware which uses rootkits or

rootkit-like technology to hide itself and any other malicious software with which it

is packaged. Once a rookit has attained kernel-level privilege, in-host analysis—and

even some virtualization-based analysis infrastructures—cannot be trusted, as the

rootkit could thwart or misdirect any attempts to analyze it. These techniques can

broadly divided into three categories: altering control flow, system call patching, and

modifying kernel objects.

The Windows System Call Mechanism

Since the system call provides the interface into kernel space, many of the methods

used by rootkits to hide themselves or associated malware occur within the steps

used when a system call is invoked [10]. In this section, we describe the system call

interface in Windows. The steps taken for a system call in Windows running on a

modern x86 processor are summarized in Figure 2.1 [10, 64].

First, a user application calls a native API function (the native API implements

the system call interface in Windows). The address of the function is obtained through

18

Kernel Mode

User Mode

User ApplicationPortable Executable (PE)

File Format

.text

(Code Section)

...

CreateFile();

...

.data

(Data Section)

...

.idata

(Import Data Section)

...

CreateFile 0x12345678

1

SYSENTER

0x------3D

EAX

0x00000008

IA32_SYSENTER_CS

0x81864880

IA32_SYSENTER_EIPntoskrnl.exe

KiFastCallEntry()

0x81864880

2

KiSystemService()

SSDT

0x001

NtCreateFile

0x03C

0x03D

0x03E

0x187

...

...

NtCreateFile()

I/O Manager

3

ntdll.dll

NtCreateFile()

KiFastSystemCall()

kernel32.dll

CreateFile()

0x12345678

File System Driverntfs.sys

File System Drivermalfilt.sys

Disk Driverdisk.sys

4

Disk

Figure 2.1: System call mechanism for Windows running on a modern x86 processor.Four mechanisms by which a rootkit can alter control flow are: (1) Import AddressTable, (2) SYSENTER Machine Specific Registers, (3) System Service Dispatch Ta-ble, and (4) Filter Driver.

19

the Import Address Table in the executable. This address points to a function in

the kernel32.dll dynamic linked library, which calls another function exported by

ntdll.dll. The dynamic linked library ntdll.dll routes system calls between the

user mode and kernel mode interfaces [64].

The ntdll.dll function KiFastSystemCall populates regular registers and three

Machine Specific Registers (MSRs). Of particular note, this code will store a system

call dispatch id in the lower 12 bits of the EAX register. For example, the CreateFile

system call will store a number containing 0x3D as the 12 least-significant bits into

EAX. Of the three MSRs, two of them (IA32 SYSENTER CS and IA32 SYSENTER EIP)

contain the Ring 0 code segment and offset into the code segment, respectively, at

which the processor will start executing the code [29]. For Windows, this address

will point to the KiFastCallEntry function. Finally, KiFastSystemCall will call

the SYSENTER instruction, which is used by modern processors to switch from user

mode (Ring 3) to kernel mode (Ring 0).

Once in Ring 0, execution begins in the ntoskrnl.exe executable [10]. The

execution proceeds to the function KiSystemService, which obtains the system call

dispatch ID from EAX and uses it to index into the System Service Dispatch Table

(SSDT). An SSDT is an array of addresses, in which each address is a pointer to the

entry point of a function in kernel space. There are two SSDT’s; one is for Windows

GUI functions, and the other is for the Windows Native API (e.g., system calls).

Once the kernel-mode function is obtained from the SSDT, control flow proceeds

to the appropriate kernel mode component. For disk I/O commands, control flow

proceeds to the I/O Manager, which then uses the appropriate driver stack to execute

the I/O command.

20

Altering Control Flow Through Hooking

A rootkit may have many motivations for modifying control flow in its attempt to

hide itself and its actions from the user and to collect information from the machine it

has compromised. It may block system calls to disrupt the work done by a program

(e.g., security software), replace kernel functions altogether, track all system calls

made and their input parameters (e.g., to instrument a system or application), or

filter out all output parameters (e.g., to hide a file or process).

One such way to alter control flow is to modify a call table; a call table is simply an

array of addresses, where each address points to a function or routine. By swapping

out the address with a new address, the system will call the attacker’s function instead

of the correct kernel function. The process of swapping out function pointers is

referred to as hooking, and there are several call tables that can be hooked [10].

The Import Address Table (IAT) is an application-level userspace call table. Each

entry in the IAT contains the addresses of all library routines that a program imports

from a Dynamic Linked Library (DLL). The IAT is populated when a DLL is linked

to at load time. While a rootkit could hook a function to any DLL, the user-space

functions that implement part of the system call interface are particularly dangerous.

For example, a rootkit could hook the IAT entries pointing to the user-space library

kernel32.dll in order to hide newly created malicious files; this scenario is labeled

(1) in the system call diagram of Figure 2.1. While any exported library routine

can be hooked in this manner, the disadvantage of this approach is that each hook

only applies to the given application, and since it hooks a user-space call table, any

program (such as security software) running in kernel space could easily detect this.

Unfortunately, several other call tables reside in kernel space; hooking any of these

tables results in a system-wide, rather than application-level, hook. The first option

21

is to hook hardware call tables at the system call interface. Old processors (e.g., pre-

Pentium II [10]) jumped to kernel space to handle system calls via an interrupt (specif-

ically, INT 0x2E). Contemporary processors use the dedicated SYSENTER instruction.

To hook the former, a rootkit would hook the interrupt handler corresponding to

interrupt 0x2E in the Interrupt Descriptor Table (IDT). To hook the SYSENTER in-

struction, an attacker would hook an MSR. Given the flat memory model, the Code

Segment MSR is unnecessary: it is enough to hook only the IA32 SYSENTER EIP regis-

ter. This is done by swapping the original pointer (which points to the kernel function

KiFastCallEntry) with a pointer to a new function. This hooking location is labeled

(2) in Figure 2.1. The unique disadvantage to the hardware-based approaches is that

these call gates are passthrough. Control passes through the hook to the system call

interface, but does not return through the hook. It is possible to instrument or block

any system calls, but not to filter output results, thus eliminating the opportunity

for a rootkit to hide processes or files.

Instead of hooking the hardware system call interface, a rootkit could instead

hook a Windows-specific table: the System Service Dispatch Table (SSDT). With

this approach, the attacker can both instrument and monitor input, and filter output,

since control can return to the hook after the system call execution completes. The

391 functions of the Windows API comprise the kernel-mode system call interface,

and thus provide a dangerous path for control flow modification. Hooking the SSDT

is performed by obtaining the index of the function in the SSDT, and swapping the

function pointer with one that points to its nefarious replacement. This hook is

labeled (3) in Figure 2.1.

Hooking, whether performed in the IAT, IDT, or SSDT, always suffers from the

same disadvantage: it is relatively easy to detect. In order to detect any of these

22

hooks, security software would need to iterate through each of the pointers in the vari-

ous tables to ensure that it points to a location in memory that falls within the library

or executable that implements them. In other words, the pointers in the IAT should

point to the region of memory containing the corresponding DLL, and the pointers in

the IDT, IA32 SYSENTER EIP register, and the SSDT all point to memory correspond-

ing to ntoskrnl.exe. However, determining the address ranges of these libraries and

modules itself requires using a system call, such as ZwQuerySystemInformation. As

a result, if a rootkit can hook this system call before being detected, it can deflect

hooking countermeasures.

System Call Patching

Given that hooking can be detected by confirming that critical function pointers point

within the bounds of the library executable it is expected to reside in, it makes sense

that a rootkit may attempt to modify the executable code itself. This technique

is more challenging for the malware creator, but also more difficult to detect [10].

Patching is a technique in which the raw bytes of an executable are overwritten in

order to, for example, mask or replace instructions. Patching can be performed in two

locations: in memory or on disk. Binary patching modifies the bytes of the executable

on disk; while the patch is permanent and persistent, it can be detected by looking

at binary file checksums. A run-time patch, on the other hand, modifies the binary

while it resides in memory, and thus would not survive a reboot.

Whether performed in memory or on disk, patching requires overwriting the ma-

chine code of system calls or other useful kernel routines. The simplest example of

patching would be to perform in-place modification of bytes. For example, the at-

tacker could replace instructions with NOPS to prevent the execution of the original

23

instructions. This approach is severely limited by the number of bytes which are

patched. A far more flexible approach is to overwrite the original instructions with

a jump instruction (JMP, CALL, or RET) that redirects the control flow into another

region of code, called trampoline or detour code [10]. The trampoline code has more

space in which to do things, including executing the original, overwritten instructions.

With this approach, a rootkit could instrument system calls and parameters by plac-

ing the trampoline at the start of the patched routine; the trampoline would execute

this prologue before optionally calling the original instructions to perform the stated

system call’s operation. By placing a trampoline after the system call executes, as an

epilogue, a rootkit could filter output parameters.

The steps of patching broadly consist of: (a) Saving the original code that will be

patched, (b) injecting trampoline code, and then (c) performing in-place patching of

the original code to force execution to jump to the specified address of the injected

trampoline code. The primary means for security software to detect patching would

be to look for suspicious jump instructions at the start of a function. Even this

heuristic is not foolproof, as a rootkit creator could simply move the jump patch

farther from the start of the the function.

To be particularly stealthy, a rootkit could patch code in the Master Boot Record

(MBR). The MBR is located at the first sector on disk. The code of the MBR is

loaded by the BIOS at system startup; the MBR then loads the boot sector of the

active disk partition, which in turn loads the operating system. This method uses

a combination of run-time and binary patching; it patches MBR boot code on disk

in order to have it alter system code in memory. The advantage of this approach is

that it is performed before any security software is loaded, and the winner of a battle

between security software and malware is often that which embeds itself in the kernel

24

first.

Modifying Kernel Objects

A third rootkit evasion technique addresses some of the limitations of the previous

methods, though it has some limitations of its own. This techniques is called Direct

Kernel Object Modification (DKOM), and it involves modifying kernel data structures

representing processes, drivers, and authentication tokens [10]. A similar method is

used whether hiding processes or drivers from the user or security software. Both

processes and drivers are maintained in doubly-linked lists. Therefore, to hide a

particular process or driver, a rootkit needs only to traverse the appropriate list to

find the process or driver to be hidden, and adjust the forward and backward links

of it and its immediate neighbors. A rootkit can also elevate the privileges of a

process by modifying the privilege substructures in the process object. There are a

few disadvantages to DKOM. First, not all objects have a kernel object to represent

them; for example, there is no kernel file object, so DKOM could not be used to

hide a file. Second, data structures are undocumented, so Microsoft can adjust the

fields of a structure between major and even minor releases, which could break the

bit-specific object patches.

Filter Drivers

The next rootkit evasion technique moves past the system call interface and into the

device driver stack. This kernel mode technique takes advantage of the layered device

driver architecture supported by Windows. A Windows device driver does have not

a monolithic structure; rather, it features a modular approach by which a series of

drivers perform some work and pass along the Interrupt Request Packet (IRP) to the

25

next driver in the chain. This is advantageous in that new drivers can be added to the

series and still leverage the work done by other drivers in the chain. A Filter Driver

is a driver that intercepts and modifies information as it makes its way through the

driver stack [10]. While this can be a good thing—for example, filter drivers could be

used to encrpyt and decrpyt data as it passes to persistent storage—it can also be used

for malicious purposes. A filter driver could be used for keylogging, to filter network

traffic, and to hide files and directories. This scenario is labeled (4) in Figure 2.1, as

a malicious filter driver is inserted before the disk driver. As a result, any analyzer

or security software examining files by hooking the system call interface will still

be deceived, as the filter driver will hide any files before they reach the system call

boundary.

2.2 Malware Analysis

Once a sample of malware has been obtained, there are several methods that can

be used to learn about the malware’s behavior. Given the sophisticated techniques

that can be employed by malware to prevent security software from detecting itself or

other malware, the job of analyzing malware behavior is a difficult one. Any malware

analysis solution will face several tradeoffs. The closer the analyzer is to the malware,

the more semantic information there will be to analyze. There will also be more

types of semantic information to analyze. However, if an analyzer operates at the

same or lower privilege level than the malware, it can be evaded, thwarted, or misled.

This section discusses various options for malware analysis. Malware analysis can be

roughly divided into two categories: static binary analysis and dynamic analysis.

26

2.2.1 Static Binary Analysis

In static binary analysis, the binary is analyzed before it is run; the binary itself is

disassembled to learn more about how the malware might behave. On a basic level,

static analysis can yield the architecture it was compiled for, the executable type,

and the operating system on which it would run. Static analysis can also yield string

names, such as passwords, paths, and file names, and imported library functions and

symbols can be extracted (though this task is easier if the binary was dynamically

linked).

Another step of static analysis is disassembly of the binary. In this step, the raw

binary bytes are converted into machine code. This step is difficult for x86 bina-

ries because text and data can exist together, and because instructions are variable

length. Additionally, both compiler optimizations and a crafty malware writer may

take steps to further obfuscate the code. There are two common techniques for disas-

sembling a binary. The first, linear sweep, will iteratively disassemble one instruction

at a time [66]. An attacker could complicate this process by inserting junk between

instructions that does not alter control flow, but may cause the disassembler to be-

come out of step with the instructions. The second technique is to use a flow-oriented

approach, disassembling instructions until a branch instruction is encountered, then

building a list of locations to disassemble (for example, the locations of both the

true and false branches of a conditional branch). Anti-disassembly techniques take

advantage of the assumptions that a disassembler makes, resulting in inaccurate dis-

assembly [66]. Additionally, disassembly can be difficult or impossible if the binary

is compressed or encrypted, and will not capture the program’s behavior if it is un-

packed as it runs. In short, static analysis can yield some good first observations

about a binary, but it will not always provide a detailed evaluation of the malware’s

27

behavior.

2.2.2 Dynamic Analysis

Unlike static analysis, dynamic analysis watches the malware as it runs in order

to detect its control flow and behavior. One dynamic analysis method is to use a

debugger. A debugger is flexible; the analyst can set breakpoints to pause execution

at any point in order to construct a control flow for the program, as well as examine

memory and CPU registers. Unfortunately, malware writers have developed ways

to check for the presence of a debugger, either through an API, by looking for tell-

tale signs of breakpoints (e.g., an INT 0x03 instruction for a software breakpoint or

by using hardware breakpoint registers to stymie hardware breakpoints), or even by

performing timing analysis to determine if the execution is taking too long. If any of

these detection methods comes up true, then the malware can simply quit or perform

other benign activities in order to prevent the analyzer from learning anything useful

about its behavior.

Dynamic analyzers can also utilize the same techniques that rootkits themselves

use to monitor or change control flow. For example, the analyzer could hook into

the system call interface using one of the methods described in Section 2.1.3. Then,

the analyzer can create a trace of system calls and their parameters in order to

understand the behavior of the malware. Host-based tools could also use the Windows

API to track registry processes, registry modifications, and file system operations. In

exchange for the rich semantics that such approaches provide, the analyzer sacrifices

fidelity, as malware operating at the same privilege level could use evasion techniques

to undermine the analysis.

In order to operate at a higher privilege level than the malware, analyzers can

28

run the malware in a sandbox, such as in a virtualization solution (e.g., VMware [75],

Xen [5]) or emulation solution (e.g., Qemu [8]). This solution is better logistically,

since the analyzer can run the malware in an uncompromised sandbox, and then

quickly and easily revert to a previously obtained clean snapshot to be ready for the

next analysis. Furthermore, the analyzer can utilize Virtual Machine Introspection

(VMI) techniques to understand what is occurring inside the VM, without hooking

directly into the kernel structures. This way, the analyzer is operating at both a

higher privilege level (at a so-called Ring -1 level), and also looking at lower-level

but higher-integrity data. For example, since a system call ID is stored in the EAX

register before the SYSENTER instruction is executed, the VMI could create a system

call trace by directing examining the EAX register when SYSENTER is executed. Since

the analyzer doesn’t rely on host-level interfaces (such as the Windows API) to obtain

information, it is not as easily misdirected by malware.

It is in malware’s best interest to remain unanalyzable as long as possible, so

that it can continue to survive in the wild, perhaps adding additional systems to its

botnet or continuing to acquire financial gains. Understanding that a virtualization

layer allows analysis and security software to essentially run at a higher privilege level

than the operating system, malware may test whether it is being run in an emulated

or virtualized environment. If the test is positive, the malware may gracefully exit, or

perform some benign operations, in order to hide its true malicious operation. These

tests are referred to as red pills, and they can take many forms. Brute-force, high-level

red pills include checking hardware adapters (for example, the VGA adapter in the

VMware environment) for a well-known device string [15], or even checking that the

disk serial number or user name corresponds to those used by a well-known dynamic

analysis emulator such as Anubis [50]. Red pills can also operate on a low-level,

29

such as checking for a well-known bug in common emulators. For example, malware

can test whether it is being run in the popular emulator Qemu [8] by executing the

following instruction:

or %bh, 0x04(%ebx) [57]

Due to a Qemu bug, the instruction will reference the wrong memory address, so

the malware can detect that it is being run in an emulator based on the result of

this operation. Even in the presence of perfect emulation, timing or secondary hard-

ware effects (such as TLB flushes on VM exit operations) will still serve as red pills

for malware [23]. It is generally agreed that it is impossible to guarantee perfect

transparency for virtualization or emulation solutions.

2.3 Windows Concepts

Because the majority of malware attacks are directed at the Windows operating sys-

tem [53], Dione performs disk instrumentation for the Windows NTFS file system. 1

In this section, we discuss relevant Windows-specific concepts, including the particu-

lars of the NTFS file system and the disk optimizations made by the Windows cache

manager.

2.3.1 The Windows Registry

The Windows registry is a centralized database for configuration data, storing infor-

mation about hardware, device configuration, drivers, user preferences, network and

firewall configuration, and program startup information [63]. The hierarchy of the

1Plans to expand Dione to instrument other file system, such as ext3, are left for future work.

30

Windows registry can be thought of in terms similar to a file system. At the top

level, there are root keys; below each root key are more keys, or subkeys. Thus, keys

can be thought of as directories, and each key will have a path, which can be fully

qualified from the root key. Just like a directory in a file system, a key will also

have a name. Figure 2.2 breaks down a creation of a new key. The key has path

HKLM\system\CurrentControlSet\Services and name Beep. The key has no value;

for clarity, we also list the type KEY.

Figure 2.2: Breakdown of a Windows Registry key.

Below each key are values. A value is analogous to a file. It has a path, which is

comprised of every key above it in the hierarchy. It has a name, and just as a file, the

combination of a path and name uniquely identifies the registry value. Finally, just as

a file often stores contents, a value stores contents as well. The common terminology

is to (confusingly) refer to the data the value is storing as its value, referring to the

value structure itself by its name. Keeping this terminology, we refer to the value

contents as the value, though we may also refer to the value-name and value-value

for clarity. The value may be one of several types, including an integer (DWORD), a

string (SZ), or even any arbitrary binary data (BINARY). In Figure 2.3, we show some

new values created under the Beep key created in Figure 2.2. The Beep key that was

previously created now becomes part of the path, and two values are created below it:

Start, which is an integer type and stores the value 0x02, and DisplayName, which

31

stores the string “BeepService”.

Figure 2.3: Breakdown of two Windows Registry values; each value is associate witha key (in this case, Beep, and has both a name and a value.

There are six root keys in the registry; each has a long name, but is more commonly

referred by an acronym:

• HKEY USERS (HKU): Stores configuration data for all users with accounts on the

machine

• HKEY CURRENT USER (HKCU): Stores configuration data for the user that is cur-

rently logged in (and is actually just a link to the subkey in HKU for the

logged-in user)

• HKEY CLASSES ROOT (HKCR): Stores file association and Component Object

Model (COM) object registration information

• HKEY LOCAL MACHINE (HKLM): Stores system-related information

• HKEY PERFORMANCE DATA (HKPD): Stores performance information

• HKEY CURRENT CONFIG (HKCC): Stores a current hardware profile (and is actu-

ally just a link to a subkey under the HKLM root key)

Some of the data stored in the registry is populated on system startup, and resides

only in memory. Other data is stored on disk, and is loaded into memory when

32

the system starts up. On disk, the registry is stored in five (extensionless) files

in the path WINDOWS\system32\config: default, SAM, SECURITY, software, and

system. These files do not correspond to the root keys; most of the information

stored in the hive files appears in the Windows registry under the HKLM root key.

For example, the registry key of Figure 2.2, showed some persistent data stored in a

hive file, but when shown in the registry hierarchy the subkeys fall below the HKLM root

key. The data stored in the registry hive files has its own file system-like structure;

open source tools including regfi [55] can parse the hive files, outputting registry

key paths, names, and values.

2.3.2 NTFS File System

Many of the challenges of interpreting NTFS arise from its design goals of being

scalable and reliable. Scalability is achieved through multiple levels of indirection.

Reliability is accomplished through redundancy and by ordering writes in a system-

atic way to ensure a consistent result. Unfortunately, from an instrumentation and

operation reconstruction view, this is often in the least-convenient ordering.

The primary metadata structure of NTFS is the Master File Table, or MFT [14].

The MFT is composed of entries, which are each 1 KB in size. Each file or directory

has at least one MFT entry to describe it. The MFT entry is flexible: The first 42

bytes are the MFT entry header and have a defined purpose and format, but the rest

of the bytes store only what is needed for the particular file it describes. Among other

things, the MFT header contains a sequence number (which is incremented whenever

that entry is reused for a new file), a flag indicating whether the entry is currently

used, and whether it describes a file or a directory.

In NTFS, everything is a file—even file system administrative metadata. This

33

means that the MFT itself is a file called $MFT; its contents are the entries of the MFT

(therefore, the MFT has an entry in itself for itself). Figure 2.4 shows a representation

of the MFT file, and expands $MFT’s entry (which always resides at index 0 in the

MFT). Like any other file, the $MFT file expands and contracts as needed, and if the

disk is fragmented, the $MFT can expand into fragmented, non-consecutive clusters

anywhere on disk. This is shown in Figure 2.4, whereby the contents of $MFT are

stored in two non-contiguous runs of clusters.

...Name: $STANDARD_

INFORMATION

Type ID: 16Resident: 1

Created: 2011 06 06

20:04:37

File Modified: 2011 09 06

15:31:32

MFT Modified: 2011 09 06

15:31:32

Accessed: 2011 09 06

15:31:32

MFT Entry Header

Signature: FILESeq Num: 1In-Use: 1Is-Directory: 0Base Ref: 0

Name: $FILE_NAME


Name: $MFTParent MFT: 5

...

...

Name: $DATA


Run 0: Start: 104 Count: 4Run 1: Start: 220 Count: 2...

Cluster

104

105

106

107

220

221

Unused SpaceAttribute Headers Attribute Content

Figure 2.4: Representation of the MFT, which is saved in a file called $MFT. Thefirst entry holds the information to describe $MFT itself; the contents of this entryare expanded to show the structure and relevant information of a typical MFT entry.

Everything associated with a file is stored in an attribute. The attribute types are

pre-defined by NTFS to serve specific purposes. For example, the $STANDARD INFORMATION

34

attribute contains access times and permissions, and the $FILE NAME attribute con-

tains the file name and the parent directory’s MFT index. Even the contents of

a file—after all, a file’s purpose for existing is to store contents—are stored in an

attribute, called the $DATA attribute. The contents of a directory are references

to its children; these too are stored in attributes (referred to as $INDEX ROOT and

$INDEX ATTRIBUTE.

Each attribute consists of the standard attribute header, a type-specific header,

and the contents of the attribute. If the contents of an attribute are small, then

the contents will follow the headers and will reside in the MFT entry itself. These

attributes will be called resident; there is a flag in the attribute header to indicate

whether the attribute contents are resident or not. In Figure 2.4, the contents of

$STANDARD INFORMATION and $FILE NAME attributes are resident. If the contents are

large, then an additional level of indirection is used. In this case, a runlist follows the

attribute header. A runlist describes all the disk clusters which actually contain the

contents of the attribute, where a run is defined as a starting cluster address and a

length of consecutive clusters. (in NTFS terminology, a cluster is the minimum unit

of disk access, and is generally eight sectors long). In the example MFT of Figure 2.4,

since the contents of the MFT file are very large, $DATA’s contents are not resident;

its runlist indicates that the contents of $MFT can be found in clusters 104-107 and

220-221.

It is easy to see that a small file will occupy only the two sectors of its MFT

entry. A large file will occupy the two sectors of its MFT entry, plus the content

clusters themselves. Consider, then, the problem of a very large file on a highly

fragmented disk: it might take more than the 1024 bytes just to store the content

runlist. In this case, NTFS scales with another level of indirection and another

35

attribute, and multiple MFT entries are allocated (in addition to the base entry) to

store all attributes. Each of the non-base MFT entries will contain the MFT index

of the base index; this reference will be 0 for the base index.

2.3.3 Performance Optimizations for Disk Accesses

Disk accesses are expensive in terms of performance. While accesses to disk may take

upwards of 5-10 ms, accesses to RAM in a modern computer may take 50-100 ns (with

cache speeds even faster) [25]. Therefore, the operating system uses optimizations to

minimize unnecessary disk accesses. One such optimization is the page cache. The

page cache is a buffer of disk-backed pages that are stored in main memory; as a

result, frequently-accessed disk clusters will be available more quickly. Disk contents

will be paged to the page cache on the granularity of clusters; this is convenient in

modern systems because a cluster is the same size as a page (4KB).

Windows has different policies for reads and writes as they relate to the page

cache; these policies are carried out by the cache manager. The multi-threaded cache

manager utilizes a thread for intelligent read-ahead. The goal of intelligent read-

ahead is that the data will already be in faster main memory before it needed. With

intelligent read-ahead, spatial locality is used to prefetch data from disk according

to some perceived pattern of read accesses. For example, if the reads are streaming

through the disk, the operating system will prefetch the next sequential clusters; if

the reads follow a strided pattern, the operating system will prefetch the next clusters

that follow the strided pattern. The size of data that is prefetched is double the size

of the last access.

For write accesses, Windows uses Lazy Writing, courtesy of the cache manager’s

delay thread [64]. Instead of immediately flushing writes to disk, writes are buffered

36

and flushed as a burst to disk. When a page is written to, it is marked dirty. Every

second, Windows flushes one-eighth of the dirty pages to disk; therefore, it could take

as long as 8 seconds for a write to be flushed to disk from RAM. The advantage of this

scheme is that it reduces contention on the disk; this happens because the number of

disk I/O operations is reduced when multiple writes occur to the same cluster within

a short time frame. Instead of flushing the write to disk every time a change is made,

it will only perform one write at the end of the interval. The performance advantage

comes at the cost of reliability; while the user is under the impression that a change

has been committed to persistent storage, it may actually still be in volatile memory

for several seconds more, and would be lost in the event of a hard shutdown.

2.4 Formal Verification and Model Checking

Formal verification is a technique that has been used to specify and validate sequential

circuit designs, communication protocols, and software correctness. Due to its ability

to model software behaviors in the face of obfuscated code, it has also been recently

used to model malware behaviors and capabilities.

Model checking is a property verification approach that compares a model ex-

tracted from a behavioral trace to a given specification [28]. The specification of the

property to be detected is represented by a formula φ, which is written using the

description language, or logic, of the model checker. Additionally, a model M is ex-

tracted from each behavioral trace, and represented in the same description language.

Then, the model from each behavioral trace is compared to the specification in order

to determine whether the model M satisfies φ. The model checker outputs true or

false, indicating whether the property is verified for the system.

37

The description languages used to describe models and property formulas are

based on propositional logic. Verifying a property requires constructing a declara-

tive statement, or proposition, about that system, and then determining whether

that proposition is true or false. Propositional logic provides a formal language for

describing these declarative statements, and includes the familiar operators of not

(¬), and (∧), or (∨), and implication ( =⇒ ). For example, the proposition:

“(¬p ∨ r) =⇒ (p ∧ q)” can be translated as ‘if not p or r, then p and q’.

To create a running example of a type of malware behavior that we want to provide

a formal specification for, let’s assume that we have a trace of all x86 instructions

executed, gathered using dynamic instrumentation. Our specification formalizes the

following behavior:

In the program execution path, at some point in time a register

is set to zero, and after that, this same register is eventually

pushed on the stack before any other modification occurs to

that register.

By looking at key words of that statement, we can determine the ideal way to

represent it. The word and indicates that we will need propositional operators, the

phrases at some point and after that imply a preferred temporal ordering, and the

phrase this same register implies that we care not just about the operations, but the

inputs to the operations.

Using pure propositional logic, we cannot be very specific in formulating this

specification. For simplification, we’ll focus on a single register, eax. We can only

specify that the trace should contain both an instruction that sets register eax to 0

(mov(eax,0)), and an instruction that pushes register eax onto the stack (push(eax)).

Each instruction opcode and parameter(s) combination form a single propositional

38

atom; for example, mov(eax,0) and mov(eax,1) are as different of propositional

atoms as p and q are, above.

Therefore, our propositional statement is specified as:

φ = mov(eax, 0) ∧ push(eax) (2.1)

and this statement only evaluates to true if both of the instruction opcode/parameter

events appear in the trace. Note that this statement does not specify an ordering

between the instructions, merely that they both must appear in the trace in order for

the statement to evaluate to true (a binary bag-of-words-like specification).

2.4.1 Predicate Logic

Predicate logic extends propositional logic, satisfying the need for a richer language.

It includes modifiers such as there exists (∃) and for all (∀). It also allows for

the use of variables to generalize a statement, working as place holders for concrete

values.

In order to use Predicate Logic as a formal language, we define two types of

“objects” that can appear in a predicate logic statement: Terms and Formulas.

A term is an object; it can refer to a variable, or a function. Consider a formula

that will describe file properties. We can refer generically to our file as variable f ,

and we can describe certain properties of our file with functions of f , such as p(f),

the path of f , and n(f), the name of f . A term can be recursive: If x is a variable (a

term), and f(x) is a function of that variable (also a term), then g(f(x)), a function

of the function, is also a term.

Conversely, a formula is a predicate—it is a statement that resolves to true or

false. For example, we can use predicates to describe whether a certain type of

39

operation occurred on a certain file. For example, C(f) is true if file f was created.

Formulas can be connected using propositional symbols, such as ¬, ∧, ∨, and =⇒ .

For example, if φ1 is a formula and φ2 is a formula, then φ1 ∨ φ2 is also a formula.

Formulas can also be combined with variables in such a way that utilize the predicate

symbols ∃ and ∀. For example, if φ is a formula and f is a variable, then ∃fφ is

also a formula, and would read as there exists some f for which the formula φ

evaluates to true.

Given these two types of objects, we can define a vocabulary for Predicate Logic

(as a formal language) as having three sets: A set of predicate (or formula) symbols

P , a set of function symbols F , and a set of constant symbols C (since a constant is

a function without arguments, C can also be treated as a part of the function set F).

For the instruction trace example, our vocabulary of predicates consists of: P =

{mov(x, y), push(x)}, where x and y are variables (terms) over the set of Functions

F .

Using predicates allows us to write specifications that differentiate between the

type of operation (for example, the high level behavior, or the instruction opcode),

and the parameters of that operation.

For example, in Equation 2.1, we simplified the original statement, which referred

to “some register”, to refer only to register eax. If we were to keep the original

statement, the specification would be:

φ =(mov(eax, 0) ∧ push(eax)

)∨(mov(ebx, 0) ∧ push(ebx)

)∨(mov(ecx, 0) ∧ push(ecx)

)∨ ...

(2.2)

In this case, it is more succinct to generalize the statement using predicate logic,

creating a variable to represent the register that has a finite number of values. We

40

combine the variable with the predicate operator there exists, which can be used in

combination with variables. Equation 2.2 can be rewritten in predicate logic as:

φ = ∃r(mov(r, 0) ∧ push(r)

)(2.3)

This translates to: In the program execution path, some register is set to

zero, and and this same register is pushed on the stack. However, this

statement still says nothing about the ordering between the instructions.

2.4.2 Temporal Logic

Model checking is based on temporal logic; that is, the system is represented as a

sequence of states. The formula representing the model is not always true for the

model, but rather, the formula is true only at some point in time, when the system

has moved through a correct series of states.

There are two ways to think of time in a temporal formula. The first way to think

of time is as branching. With branching time, time is represented as a tree, with an

initial state as a single node at the root and possible future paths branching out from

that state. Branching time is useful when there are many possible paths, but not all

will occur, such as in a trace consisting of statically disassembled instructions of a

program. There are multiple possible paths of execution, though not all are taken.

For example, at a conditional statement, there are two two different branches that

are possible depending on whether the if or else branch is followed. The second

way to think of time is as linear; that is, there time is a set of paths, where each path

is a sequence of states. For example, a trace of instructions intercepted during the

course of execution of a program would be represented in linear time, as each event

occurred sequentially in the path of execution.

41

Due to the temporal nature of model checkers, the language used to specify a

behavior or capability uses temporal operators, which define how the different states

connect to each other in time. As our dynamically-obtained traces of disk events occur

in linear time, we focus here on the logic specification language Linear Temporal Logic,

or LTL [28]. LTL contains the expected propositional operators, but also includes

temporal operators: X, F, G, U, and R. Respectively, these temporal operators

stand for neXt state, some Future state, Globally in all future states, Until, and

Release. More formally, for a formula φ along a linear path π:

• Xp is true on a path π if p holds in the next state, π1

• Fp is true if p holds at any point in the future on path π

• Gp is true if p holds globally throughout the future on path π

• pUq is true if p holds on the path π until q holds

• pRq is true if q always holds on path π, with this requirement released once p

holds. Furthermore, it is possible that p will never hold.

Using LTL, we can provide a more specific specification by forcing a temporal

ordering between the instructions. The following LTL statement formula translates

to “in the program execution path, at some point, eax is set to zero, and

immediately after that, eax is pushed on the stack.”

φ = F(mov(eax, 0) ∧Xpush(eax)) (2.4)

Or alternatively, the following LTL specification allows other instructions to appear

between the mov and push instructions, specifying: In the program execution

42

path, at some point, eax is set to zero, and after that, eax register is

eventually pushed on the stack.

φ = F(mov(eax, 0) ∧ Fpush(eax)) (2.5)

Again, we can see the limit of LTL when we want to generalize the statement over

all registers, and the statement becomes unnecessarily complex:

φ =F(mov(eax, 0) ∧ Fpush(eax))∨

F(mov(ebx, 0) ∧ Fpush(ebx))∨

F(mov(ecx, 0) ∧ Fpush(ecx)) ∨ ...

(2.6)

2.4.3 Linear Temporal Predicate Logic

Formulas using Linear Temporal Logic are composed only of propositions—of other

formulas, connected by propositional operators and temporal operators, while pred-

icate logic lacks a notion of time. Therefore, in order to write robust yet succinct

specifications, it makes sense to combine the two, to form Linear Temporal Predi-

cate Logic. As with predicate logic, it allows us to differentiate between operations

and parameters, while maintaining the temporal operators that specify the ordering

between operations. Using LTPL, we can write:

φ = ∃r(F(mov(r, 0) ∧ Fpush(r))

)(2.7)

Translating to: There exists some register r, which is assigned the value

0, and at some point in the future, that same register r is pushed onto the

stack.

43

Using the powerful combination of LTPL, we can even succinctly specify the orig-

inal statement, using multiple variables and multiple temporal operators.

φ = ∃r(F(mov(r, 0) ∧X(¬∃mov(r, t))Upush(r))

)(2.8)

Translating to: In the program execution path, at some point, some register

is set to zero, and after that, this same register is eventually pushed on

the stack before any other modification occurs to that register.

2.5 Summary

In this chapter, we outlined the background information that is relevant to this the-

sis. In discussing the types of malware and the techniques employed by malware to

hide from security products and analyzers, we motivated the need for an analyzer

that provides a high-integrity, behavioral trace. We discussed the techniques used

by malware analyzers to understand malware behavior, including static and dynamic

techniques. To prepare for our description of Dione, our file system analysis frame-

work, we introduced relevant Windows concepts, including the NTFS file system and

the optimizations used by Windows to increase disk access performance. Finally, in

advance of our technique to detect malware persistence capabilities, we introduced

model checking, and described the logic languages to generate specifications of mal-

ware behaviors.

44

Chapter 3

Related Work

3.1 Malware Analysis and Instrumentation

The ability to instrument disk accesses and file system operations is useful in many

security fields, including intrusion detection and prevention and malware analysis.

However, Dione is the first disk analysis infrastructure to provide live, up-to-date

instrumentation for Windows NTFS file systems.

Research on Intrusion Detection Systems (IDSs) has frequently included tech-

niques to monitor disk accesses or modifications to the file system [3, 31, 33, 35, 58,

59, 73, 77, 81]. Kim and Spafford demonstrated that malware intrusions could be de-

tected by monitoring Unix systems for unauthorized modifications to the file system

with Tripwire [35]. Tripwire performed file-level integrity checks and compared the

result to a reference database. While it worked quite well to discover modifications

to files, it did not discover changes made to files if they are reverted before the utility

is run again. Furthermore, it inherently produced many false positives. Stolfo et al.

also developed host-based anomaly detection system which monitored changes to the

45

file system [73]. Their File Wrapper Anomaly Detection System (FWRAP) consisted

of a host-based sensor which wrapped around a modified file system to extract infor-

mation about each file access. Their anomaly detection algorithm then determined

the probability that a file access was abnormal and generated an alert based on the

score. Both these host-based solutions require a trusted OS, whereas Dione does not

require that the host is uncompromised.

On the other end of the spectrum from host-based solutions, Pennington et al.

implemented a rule-based IDS that resided on an NFS server [59]. The authors

enumerated the specific ways in which malware modifies data on disk, such as modi-

fications to system administration files, log scrubbing, and timestamp reversal. Their

IDS was effective at catching rootkits that modified persistent data on disk. It was,

however, implemented for Linux (which has a far lower share of malware intrusions),

and it resided on a separate storage processor, and thus could not be easily utilized for

a desktop computer. Dione addresses both of these issues, as it monitors Windows

systems with NTFS file systems, and it can monitor either a virtual machine or any

desktop with an interposing hardware sensor.

While host-based IDSs are problematic because a privileged rootkit can override

or misdirect malware detectors, and network-based IDSs lack visibility into the host

events, virtualization-based IDSs offer both high visibility and isolation from com-

promised operating systems. Garfinkel and Rosenblum introduced the first IDS to

leverage virtualization technology, thus revolutionizing malware detection [24]. Their

IDS, Livewire, utilized Virtual Machine Introspection (VMI) techniques, such as the

monitoring of memory and register contents and events such as interrupts, memory

accesses, and device state changes. However, it did not incorporate disk accesses,

thus missing out on additional system information.

46

Payne et al. proposed requirements that should guide any virtual machine mon-

itoring infrastructure, and implemented XenAccess to incorporate VMI capabili-

ties [58]. We observed their requirements in our implementation of Dione, as they

provide an excellent guide for the proper design of an infrastructure for monitor-

ing VMs. The disk-monitoring capabilities of their proof-of-concept implementation,

however, can only be used for paravirtualized guest OSes, which is a simplification

of the problem of interpreting a complex file system like NTFS for fully-virtualized

Windows guest OSs.

Azmandian et al. used low-level architectural events and disk and network accesses

in their machine learning-based VMI-IDS [3]. While their instrumentation platform

captured more types of events in addition to disk accesses, providing a rich set of

features for their IDS, their disk instrumentation lacked the higher-level file system

semantics provided by Dione.

The work of Zhang et al. is very similar to ours; they presented a VMI-IDS that

monitored the disk accesses of the virtual machine under analysis [81]. Their IDS

creates a mapping between files and their sectors and monitor accesses to these sec-

tors. Their system allows for the creation of rules that that watch for the types of

accesses, discussed in [59], that might indicate an intrusion. However, their moni-

toring framework is dependent upon virtualization technology, and it only runs for

FAT32 file systems, a significantly more simple challenge.

Jiang et al. also implemented a VMI-IDS, called VMwatcher, which incorporates

disk, memory, and events [31]. However, they too cannot analyze the ubiquitous

NTFS file system, and instead their Windows VMs must use the Linux ext2/ext3

file systems. The VMI-IDS of Joshi et al. detect intrusions before the vulnerability

is disclosed [33]. Unfortunately, their solution to inspecting disk accesses requires

47

invoking code in the address space of an application process within the guest operating

system itself, so to undo the effects of this intrusive action, their heavyweight solution

must checkpoint and rollback.

With Ghostbuster, Wang et al. [77] present a cross-view diff-based approach to de-

tecting rootkits. By enumerating files, configuration settings, and processes at both a

high-level (Windows APIs) and low-level (examining the data structures themselves),

Ghostbuster can determine whether a stealthy rootkit is hiding evidence of infection.

Their host-based solution has the advantage of being able to check for more than

just file system operations; however, it will not provide a ground truth. While their

approach will catch many file-evasion techniques (including many described in Sec-

tion 2.1.3), it can still be evaded by particularly stealthy malware that interposes

between the calls to obtain the raw metadata used to construct the low-level view.

Furthermore, it only detects file hiding, as opposed to other file system operations,

and it performs detection with dedicated snapshots views of the file system. Dione

continually updates its views as metadata is written to disk, and thus maintains an

up-to-date view.

Other researchers have acknowledged the role of disk accesses in malware intru-

sions by providing rootkit prevention solutions [11, 21]. With Rootkit Resistant Disks,

Butler et al. provide a means to block accesses to directories containing sensitive op-

erating system configuration files and executables [11]. Their hardware-based solution

requires that all sensitive directories reside on a separate partition from the rest of

the file system, and they physically block access to that partition unless a secure

token is present. Chubachi et al. also provide a mechanism to block accesses to disk,

and they can operate on a file-level granularity [21]. Unfortunately, they collect their

mappings of files and sectors before the VM boots, and do not provide a live updating

48

capability as files are created, deleted, or changed in size. As a result, their sector

watch list becomes inaccurate as the VM executes. Sundararaman et al. protect disk

with a different approach: they developed a new disk format which provides data

versioning for roll-back in the event of an intrusion [74]. They selectively version all

metadata and user-specified content, allowing users to have block-based protection

of their disks but through high-level semantics. However, it requires new file system

modification, and thus is only applicable with open-source file systems.

Previous work has also addressed the need for malware analyzers, with different

solutions operating at different levels of semantics and isolation from malware. Sev-

eral solutions perform malware analysis in-host [23, 62, 78]. DiskMon, part of the

SysInternals tools for Windows, is an in-host solution which uses kernel event tracing

to track file system operations [62]. Another solution for dynamically analyze mal-

ware samples (including file system operations) is CWSandbox, which uses hooks in

the Windows API to obtain the information it needs and to hide from malware [78].

Since both solutions reside in-host, malware could detect their presences (for example,

by checking for hooks in the Windows API) and attempt to deceive the analyzers by

providing their own in-host hooks.

Many dynamic analyzers instrument the behavior of malware by tracing system

calls [23, 38, 43, 41, 68]. These analyzers can use an emulation or virtualization layer

to achieve isolation from the malware, and perform low-level semantic reconstruction

by introspecting on registers and the VM’s memory.

King and Chen’s BackTracker uses a virtualized environment to gather process

and file system-related events that led to a system compromise of a Linux guest [38].

Despite BackTracker’s residence in the virtualization layer, the authors concede that

49

malware can hide from it, preventing live analysis. Sitaraman and Venkatesan ex-

tended the functionality of BackTracker, providing improvements that will reduce the

size of the dependency graph generated by BackTracker [68]. However, the implemen-

tation of their event logger is compiled into the kernel or implemented as a loadable

kernel module, and as such is not isolated from the malware.

The work of Krishnan et al. creates a whole-system analysis by monitoring disk

accesses, physical memory, and system calls, and reconstructing their intertwined

relationships to provide a complete post-mortem forensic analysis [41]. Their disk

monitoring infrastructure logs accesses to disk blocks and periodically performs a

scan of the disk to connect blocks to files. The result is that their mappings are only

accurate at the time of the scan, and do not reflect the file system changes that may

occur between scans. Dione, on the other hand, uses live updating to maintain a

perpetually up-to-date of the file system for accurate file system analysis.

Kruegel et al. developed TTAnalyze (later renamed Anubis) to profile malware

behavior, including file system activities, of Windows systems emulated in Qemu [43].

This approach has many advantages. Their instrumentation provides a rich opportu-

nity to track all system calls and their parameters, and also all Windows API functions

and parameters. This allows them to have a full-system, on-the-fly reconstruction.

They can also identify the running process to limit the trace to only the functions

called by the malware sample.

Ether is another dynamic malware analyzer which isolates itself from malware

through the virtualization layer [23]. Ether monitors malware at different levels of

granularity: a fine-grained (instruction-level) granularity, or a coarse-grained (system

call level) granularity. The goal of Ether is complete transparency, so that the malware

cannot detect that it is being analyzed. Unfortunately, the performance cost of Ether

50

is steep (approximately 3000 times slowdown for single-step instrumentation [79]).

Chow et al. introduced the idea of a replay approach with Aftersight [16]. Though

it was built with bug-detection in mind, rather than malware-analysis, it provided

an interesting foundation for future work. Yan et al. expanded on the heteroge-

neous replay approach; their V2E records the malware’s behavior in a transparent

virtual machine, then replay its behavior in a software-based dynamic binary analysis

platform [79]. V2E provides both transparency and strong instrumentation support,

without the high overhead seen in Ether. The authors were able to demonstrate that

V2E could defeat common anti-emulation attacks.

While Dione only provides file system-level instrumentation traces, and many

of these analyzers provide multi-faceted analysis information, Dione provides some

advantages not found in other analyzers. First, they are inextricably tied to the

platforms they were developed for (for example, Qemu and Xen). Therefore, they

cannot be ported to other environments in order to provide side-by-side comparison

of environment sensitive malware. Dione, by contrast, can be ported to a variety of

virtualization, emulation, and bare hardware platforms, and will produce comparable

output reports.

Additionally, they cannot provide the same level of ground truth that Dione

can provide. The in-host solutions can be misled by malware utilizing lower-level

call table hooking or filter drivers; analysis could also be bypassed if the analyzer

provides its own hooks and the malware restores the call tables to eliminate the

hooks. Even analyzers which reside in the virtualization or emulation layer face the

theoretical chance of being intentionally misled by malware. For example, consider an

analyzer that identifies library calls by comparing the executing instruction pointer

with their exported library function addresses to determine which library function

51

was called. Malware could hook a call table (such as the SSDT, as described in

Section 2.1.3), diverting system calls to a different location in memory which does

not correspond to the exported library addresses. As a result, the system call would

not be recorded unless the hook eventually returns control to the original library

function. Analyzers that rely on other tells for the system call id (for example, by

recording the value in EAX at each SYSENTER invocation) might even be misled by

theoretical malware that encodes and decodes the system dispatch id before and

after the SYSENTER transition 1. Since Dione intercepts raw disk accesses, and relies

only on state changes and the actual intercepted disk sectors and contents, it cannot

be misled by kernel-level malware.

Finally, many other analyzers can only obtain information from intercepted system

calls and possibly their parameters (Anubis can, by contrast, obtain more information

both through the additional Windows API and by injecting new function calls into

the executing instruction stream). Dione intercepts all the raw metadata of a file,

and can therefore determine every property relating to that file, whether or not it can

be read or modified by a Windows API or system call.

3.2 Characterizing Malware Behavior

Though capability labeling for malware behaviors is a more recent discipline, it draws

upon work from several related areas. Specifically, previous work has utilized behav-

ioral traces and profiling in order to label malware by its family or variant (malware

classification and clustering) or by its maliciousness (intrusion detection). Since the

templates or specifications of malicious behavior used in these areas tend to identify

1It is worth noting that such deceptive system call obfuscation would only be performed with theunique purpose of thwarting VMI or emulator-based analyzers, and therefore it is far more likelythat malware would simply detect the analyzer and exit.

52

specific malware capabilities and techniques, research in this area is directly related

to labeling based on the capabilities themselves. Behavioral traces can be gener-

ated either statically or dynamically. With the traces, researchers can characterize

malware using one of several techniques: machine learning, informal modeling, and

formal verification.

3.2.1 Characterizing Malware with Machine Learning

Malware clustering and classification naturally follows from the generation of mal-

ware analysis traces. Lee et al. conducted early research using behavioral, rather

than signature-based, clustering and classification of malware samples [48]. However,

they used simple system call traces to describe malware behavior, and more recent re-

search has shown that better results can be achieved with a high-level behavior-based

approach, rather than with system call traces [4, 6].

Bailey et al. also used a malware’s behavior in order to create a fingerprint [4].

Rather than system calls, they focus on higher level descriptions of what the malware

is doing on the system. They showed that existing antivirus solutions for character-

izing malware are inconsistent across products, incomplete across malware strains,

and do not contain concise semantics. With a classification technique that focuses

on system state changes, instead of low-level system calls or binary signatures, they

could do a better job classifying malware (including malware that hadn’t been seen,

and therefore didn’t have a signature) than existing antivirus products.

Similarly, Bayer et al. use a behavioral profile, rather than just a system call trace,

to cluster malware [6]. They introduce taint analysis to Anubis to track dependencies

between both native API and Windows API functions, and also track control flow

dependencies and network traffic, in order to generate the behavioral profile. This

53

work achieved better clustering than Bailey et al., and with their Locality-Sensitive

Hashing based clustering algorithm they can scale to real-world data sets. They also

achieved significantly better results than a purely system-call based approach, which

they attribute to too much noise in the system call traces.

Similarly, Jang et al. present BitShred, a clustering technique for malware triage [30].

They use feature hashing to reduce the high-dimensional feature space drawn from

behavioral profiles, and use the Jaccard and BitVector Jaccard distance to measure

similarity.

Rather than focusing on clustering of known malware variants, Reick et al. de-

veloped a classification scheme that can determine whether a new malware instance

belongs to a known malware family or is a new malware strain [60]. Behavior traces

are obtained from the in-host CWSandbox [78] dynamic analysis platform. They use

the Support Vector Machine (SVM) model to classify new behavior; with two vari-

ants, they can alternately perform multi-class classification and predict and detect

novel malware behavior. Additionally, for each malware family they obtain a feature

ranking in order to gain additional insight into its typical behavior patterns.

Rieck et al. later argued that batch clustering of malware samples can be extended

to include the iterative classification of new samples [61]. However, their approach is

closer to system call analysis, as they capture system call traces, encode them into

their Malware Instruction Set (MIST), and create behavioral patterns through a slid-

ing window in the instruction stream. They bridge clustering and classification with

reports that demonstrate typical behavior for homogeneous groups. These prototypes

maintain intermediate results, such as cluster assignments in previous iterations of

the algorithm, for use in incrementally analyzing new samples.

Though this work had some success in identifying malicious code behavior, it

54

was dependent upon static analysis using disassembly, which is known to be easily

defeated with malicious hand-tuned assembly code [66].

Interestingly, there has been enough work in malware classification and clustering

that additional research seeks to verify, analyze, and even constructively criticize

the research and evaluation of previous work. Since classification and clustering

algorithms require some sort of distance metric to determine how similar two pieces

of malware are, Apel et al. devoted research to evaluating different types of distance

metrics [2], and found that the Manhattan distance satisfied their criterion the best.

Li et al. attempt to shed light on the inherent problems in using machine learning

for classification by constructively criticizing the evaluation previous work [49], in-

cluding the state-of-the-art by Bayer et al. [6]. They conclude that the problem arises

from the lack of “ground truth”—that using malware samples that can be identi-

fied by anti-virus scanners will bias the corpus in favor of easy-to-cluster instances.

While the problem is still not solved, it is useful to consider effects such as these in

evaluating clustering algorithms.

3.2.2 Characterizing Malware Using Modeling

Instead of using machine learning techniques on behavioral profiles, complementary

research aims to use formal [7, 9, 20, 19, 36, 37, 67, 70, 69, 71] and informal [39, 44, 72]

verification techniques to label or classify malicious malware samples.

Kruegel et al. aimed to determine whether a Loadable Kernel Module (LKM) in

Linux resembled that of a rootkit when loaded into kernel space [44]. They created

an abstract model of program behavior using static analysis, generating a control flow

graph of preprocessed kernel module code, and compared it to an informally-defined

specification of rootkit behavior. Kirda et al. also generated informal specifications

55

of spyware behavior using a combined static and dynamic analysis approach [39] and

a customized browser instrumentation infrastructure.

In their work on AccessMiner, Lanzi et al. analyze and model benign program

behavior to better understand malicious behavior [47]. After first attempting a ma-

chine learning-based approach and demonstrating that an n-gram sliding windows of

system calls does not produce sufficiently accurate results, they model benign behav-

ior with an access activity model, whereby benign activity is expressed in terms of

access tokens to system resources (eg, files and registry keys).

Researchers have observed that a common behavior employed by malicious pro-

grams relates to the way sensitive data is treated, and have developed informal policies

to define this behavior [72, 80]. Stinson et al. informally define a malicious bot be-

havior as one in which data is received from the network and subsequently used as an

input parameter to a system call—that is, an untrusted source is fed into a trusted

sink [72]. They use system call interposition and tainting to achieve their dynamic

analysis. With Panorama, a whole-system information flow tracking system, Yin et

al. also used taint propagation to detect malicious behavior. Panorama can detect

when a malware sample accesses sensitive data that it should not have access to, and

can track what it does with that sensitive data [80].

Many researchers have chosen static analysis to obtain the traces that will be used

in their behavioral analysis. Bergeron et al. were among the first researchers to utilize

formal verification techniques to detecting malicious code patterns in malware [9].

The authors used static analysis to generate a control flow graph of security-critical

API calls, and then used model checking to verify these graphs against a malicious

code specification. Likewise, Singh et al. identify fundamental functionality that

sufficiently capture malicious properties of a virus, which they call organs, including

56

survey, concealment, propagation, injection, and self-identification [67]. They use

Linear Temporal Logic (LTL) formulas to encode malicious behaviors. However,

these early works do not provide comprehensive evaluations of their methodologies

on sufficient amounts of real-world malware samples.

Kinder et al. also sought to describe and identify malware based on behavioral

signatures using model checking [36]. In order to succinctly and comprehensively de-

scribe these behaviors, they developed and demonstrated the use of a new temporal

logic, Computation Tree Predicate logic (CTPL), on statically-generated instruction

traces. They demonstrated that the same specification of malicious behaviors could

be used to identify several different real-world worms [37]. CTPL was extended to ex-

press stack operations by Song and Touili [70]. The resulting logic was called SCTPL

and allowed them to model a program using a Pushdown System with predicates

over the stack. They further expanded this work to produce SCTPL formulas that

consider values, rather than names, of registry and memory locations [69], and they

also improve the efficiency of the detection algorithm. Finally, they abandon the

branching logic variants of CTL for a Linear Temporal Logic in [71]. In doing so,

they describe LTPL, a linear temporal logic with predicates, and then extend it for

their Pushdown System with stack semantics for SLTPL. Similarly, Beaucamps et

al. utilized a variation of LTL with predicates [7]. Their two contributions were to

abstract static traces into high-level behaviors, and then use model checking to com-

pare them against a malware specification expressed in First-Order Linear Temporal

Logic (FOLTL). While these papers utilized model checking in a novel way, they were

more concerned with intrusion detection—labeling a sample as malicious or benign—

rather than on generating specifications and detecting capabilities on samples (both

malicious and benign), as ours does.

57

Christodorescu et al. provided a richer specification of malware behavior [20].

They developed formal templates of malicious behavior consisting of instruction se-

quences with variables and symbolic constants; a match for malicious behavior is

detected when a malware sample’s instruction sequence is a match for a template.

Since many of these formal and informal modeling methodologies require hand-

written malware specifications, Christodorescu et al. developed an automated system

to generate malware specifications, or malspecs [19]. A malspec is generated from the

system call-based dependence graphs of malicious programs, and is represented as a

dependence graph. Recognizing the much of the previous work in malware detection

demonstrated high detection rates, though without a common testing methodology

and lacking large datasets on which to test the algorithms (in some cases, models were

tested with only a dozen or two benign and malicious samples), Likewise, in their

preventative system, Kolbitsch et al. represent malicious behavior in a dependency

graph of relevant system calls [40]. Then, their on-line scanner monitors system call

invocations and parameters, and determine on-the-fly whether the program matches

one of the behavior graphs.

Canali et al. performed a detailed study of previously-researched malware de-

tectors [12]. They explored the design space of hundreds of models, and tested the

models on hundreds of thousands of samples. They demonstrated that analytical rea-

soning does not demonstrate utility, but that it must be supplemented by a rigorous

evaluation with a sufficiently-large dataset.

Finally, the research of Martignoni et al. into capability labeling most closely

aligns with the work described in this dissertation [52]. They create high-level be-

havior specifications from domain knowledge. From malware samples, they generate

behavior graphs, and use a behavior matching algorithm to determine whether the

58

sample exhibits each high-level behavior. However, they target different types of ca-

pabilities (generally, network-related capabilities and keylogging), and they represent

their specifications using behavior graphs (or and/or graphs), instead of the more

succinct LTPL. Additionally, they tested their approach on a mere 25 samples (11

benign); given the variety of ways in which malware can manifest a certain behavior,

it does not sufficiently demonstrate the effectiveness of their solution.

59

Chapter 4

Dione: A Disk Instrumentation

Framework

Dione is a flexible, policy-based disk I/O monitoring and analyzing infrastructure [51].

Dione maintains a view of the file system under analysis. A disk sensor intercepts all

accesses from the System-Under-Analysis (SUA) to its disk, and passes that low-level

information to Dione. The toolkit then reconstructs the operation, updates its view

of the file system (if necessary), and passes a high-level summary of the disk access

to an analysis engine as specified by the user-defined policies. The rest of this section

discusses Dione in more detail.

4.1 Threat Model and Assumptions

Our threat model does not require that the SUA is trusted or uncompromised. The

SUA can be compromised by malware with administrator-level privileges that can

hide its presence from host-level detection mechanisms.

The attacker may access, modify, create, or delete files anywhere in the file system.

60

However, we assume that there is some disk-level artifact of the malware infection.

This means that the malware needs to either download files to the hard disk, create

new files, or modify existing files. We can still observe these operations even if a

kernel-level rootkit has attempted to hide these operations and artifacts from a host

detection mechanism.

We assume that there is a sensor that interposes between the SUA and its hard

disk and provides disk access information. This sensor can be a software sensor

(e.g., a virtualization layer) or a hardware sensor. We assume that both the sensor

providing the disk access information and the Analysis Machine (that is, the machine

which runs Dione) are trusted. Therefore, in a virtualization-based solution, neither

the hypervisor nor the virtual domain which is serving as the Analysis Machine can

have been compromised. In a physical solution, the separate machine running Dione

cannot have been compromised.

4.2 Dione Operation

There are four discrete components to Dione: A sensor, a processing engine, an anal-

ysis engine, and the Dione Manager. The Dione architecture is shown in Figure 4.1.

The Sensor interposes between the SUA and its disk. It intercepts each disk

access, and summarizes the access in terms of a Logical Block Address (LBA, or

simply sector), a sector count, the operation (read/write), and the actual contents

of the disk access. The sensor type is flexible. It can be a physical sensor, which

interposes between a physical SUA and the analysis machine, or a virtual sensor,

such as a hypervisor, which intercepts disk I/O of a virtual SUA.

The Processing Engine is a daemon on the analysis machine. The multithreaded

61

Analysis MachineAnalysis Machine

Processing Engine (Dione Daemon)

Dione Manager

DiskAccess

Classification

Live Updating

Policy Engine

Disk

Analysis Engine

System Under Analysis (SUA)

System Under Analysis (SUA)

SensorSensor

Figure 4.1: High-level overview of Dione Architecture.

Dione daemon interacts with both the user and the sensor. It receives disk access

information from the sensor, and performs three steps. The first step is Disk Access

Classification; for each sector, it determines which file it belongs to (if known) and

whether the access was to file content or metadata. In the Live Updating phase, it

compares the intercepted metadata to its view of the file system to determine if any

high-level changes occurred. It passes the high-level access summary to the Policy

Engine, which determines if any policies apply to the file accessed. If so, it passes the

information along to the analysis engine.

The Analysis Engine performs some action on the information it has received

from the processing engine. Currently, the analysis engine logs the accesses to a

file, but future work will extend the analysis engine. An example of a portion of an

outputted analysis log is provided in Figure 4.2.

The Dione Manager is a command line program which the user invokes to send

commands to the Dione daemon. The commands can be roughly divided into two

categories: Policy Commands and State Commands. A summary of all commands is

presented in Table 4.1.

62

Command Description

declare-rule Declare a new rule for instrumentation. Types of rules include:

• access: Record an access to file content/metadata

• operation: Record high-level file system operation, (e.g., file cre-ation, deletion, move)

• anti-forensics: Record anti-forensics operation, (e.g., file hiding,timestamp reversal, Alternate Data Stream (ADS) creation/dele-tion)

• MBR Alert: Record read/write access to Master Boot Record(MBF)

delete-rule Delete a previously-declared rulelist List all rulesapply Bulk-apply declared rules to file record data structuresscan Perform a full scan of a disk image (or mounted disk partition),

creating all file records from the raw bytes and automaticallyapplying all declared rules

save Save the state of the Dione file record hierarchy to a file to beloaded from later

load Load the Dione file record hierarchy from a previously-saveconfiguration file

Table 4.1: Commands used for communication with the Dione daemon.

63

<Jul 16 11:27:13> <REGISTRY_CREATION> <HKLM/system/ControlSet00X/Services/sysmgr/> <KEY> <>

<Jul 16 11:27:13> <REGISTRY_CREATION> <HKLM/system/ControlSet00X/Services/sysmgr/Type> <DWORD>

<0x00000110>

<Jul 16 11:27:13> <REGISTRY_CREATION> <HKLM/system/ControlSet00X/Services/sysmgr/Start> <DWORD>

<0x00000002>

<Jul 16 11:27:13> <REGISTRY_CREATION> <HKLM/system/ControlSet00X/Services/sysmgr/ErrorControl>

<DWORD> <0x00000000>

<Jul 16 11:27:13> <REGISTRY_CREATION> <HKLM/system/ControlSet00X/Services/sysmgr/ImagePath>

<EXPAND_SZ> <%25SystemRoot%25\System32\svchost.exe -k sysmgr>

<Jul 16 11:27:13> <REGISTRY_CREATION> <HKLM/system/ControlSet00X/Services/sysmgr/Parameters/ServiceDll>

<EXPAND_SZ> <C:\WINDOWS\system32\wbem\sysmgr.dll>

<Jul 16 11:27:16> <FILE_DELETION> <Documents and Settings/jenny/Desktop/> <abcdefg.exe>

<Jul 16 11:27:16> <FILE_CREATION> <WINDOWS/system32/wbem/> <sysmgr.dll> <664>

<Jul 16 11:27:16> <META_WRITE> <WINDOWS/system32/wbem/> <sysmgr.dll>

<Jul 16 11:27:16> <FILE_CREATION> <WINDOWS/Prefetch/> <ABCDEFG.EXE-06005E9D.pf> <40>

<Jul 16 11:35:31> <META_READ> <WINDOWS/system32/wbem/> <sysmgr.dll>

<Jul 16 11:35:31> <CONTENT_READ> <WINDOWS/system32/wbem/> <sysmgr.dll> <0> <64> <664>






Figure 4.2: Sample Dione Disk Trace.

4.2.1 Dione Policy Commands

As Dione instruments the file system under analysis, the user can specify policies to

determine whether the instrumentation data should be passed along to the Analysis

Engine. A policy specifies an action to be taken on a file for a given operation. The

Policy Engine is a flexible framework for declaring new policies. Currently, we have

implemented four types of policies: Record, TimeStamp Alert, Hide-Alert, and MBR

Alert. Policies can be declared or deleted at any point when Dione is running,

including when it is actively monitoring a live system.

The Record policy specifies whether accesses should be recorded to a log file.

When an access is recorded, Dione will specify whether it was to file content or

metadata, whether it was a read or write, and whether it was a special operation

such as a file creation, deletion, or renaming. A special annotation is provided for

files which are created with their hidden property set to hide from the user. The

64

Timestamp Alert detects a specific symptom of intrusion: the reversal of any of

the time-stamp properties of a file (the so-called Modification, Access, and Creation

(MAC) times). The Hide-File Alert detects the hiding of a file. For each of these

three policy types, optional arguments specify whether the policy should apply for

reads, writes, or both. If the specified file is a directory, the policy can optionally

apply to all of its descendants. If a file does not exist when the policy is declared,

the policy will remain in the system and will be automatically applied when the file

is created in the SUA.

The MBR Alert looks for an access to a specific region of disk: the sectors on

the partition containing the Master Boot Record (MBR). This policy, when applied,

records reads and writes to the sectors on the MBR partition.

In the Policy Command category, the user can declare, delete, list, or bulk-apply

policies.

4.2.2 Dione State Commands

In the State Command category, Dione loads and saves a view of state of the file sys-

tem under analysis. The load step is necessary to pre-populate Dione data structures

with the state. This step is required before Dione will begin monitoring I/O. The

goal of this stage is that Dione will already know everything about the file system

before the SUA boots, so that it can immediately begin monitoring and analyzing

disk I/O. This step can be accomplished with a disk scan, which reconstructs the file

system from the raw bytes of the disk, or by loading a previously saved configuration

file. The advantage of the load/save functionality is that a disk scan only needs to

be performed once, which is useful in the case of very large disks with many files for

which a raw scan takes longer than a load.

65

4.3 Live Updating

As the SUA boots and runs, new files are created, deleted, moved, expanded, shrunk,

and renamed. As a result, the pre-populated view of the SUA’s file system, including

the mappings between sectors and files, quickly become out-of-date, reducing the

accuracy of the monitoring and logging of disk I/O. The solution to this problem

is Live Updating: an on-the-fly reconstruction of disk events based solely on the

intercepted disk access information.

In the next sections, we will detail the challenges and solutions to live updating.

As our implementation is initially geared toward Windows systems with the NTFS

file system, and NTFS is particularly susceptible to the challenges inherent to live

updating, we will begin with an introduction to those NTFS concepts which will aid

in the understanding of the live updating implementation.

4.3.1 Live Updating Challenges

There are two big challenges to live updating: overcoming the Semantic Gap and

the Temporal Gap. The Semantic Gap is a well-studied problem in which low-level

data must be mapped to high-level data. In our case, we need to map the raw byte

contents of a disk access to files and their properties. Fortunately, there are existing

techniques, such as the open-source The Sleuth Kit (TSK) [13], which do much of the

work to bridge the semantic gap.

The Temporal Gap occurs when low-level behaviors occurring at different points

in time must be pieced together to reconstruct high-level operations. The high-level

operations that Dione monitors include file creation, deletion, expansion, move/re-

name, and updates in MAC times and the hidden property.

66

The first challenge of live updating is identifying the fields in an intercepted MFT

entry for which a change indicates a high-level operation. Often is is not just a

single change in an intercepted MFT entry that indicates a high-level operation, but a

combination of changes across multiple intercepted MFT entries. Due to requirements

for reliability, these changes will be propagated to disk in an inconvenient ordering.

As a result, Dione must piece together the low-level changes across time in order to

reconstruct high-level events.

The biggest challenge resulting from the temporal gap is the detection of file

creation. An intercepted MFT entry lacks two critical pieces of information: the

MFT index of that entry, and the full path of the file it describes. For a static image,

it not a challenge to calculate both. However, in live analysis, the metadata creation

will occur before the $MFT file’s runlist is updated—and just like any other file,

$MFT can expand to a non-contiguous location on disk. Therefore, in certain cases

it can be impossible to determine (at the time of interception) the index of a newly

created file. In fact, it can be impossible to determine at interception time whether

a file creation actually occurred in the first place.

A similar challenge arises in determining the absolute path of a file. The MFT

entry contains only the MFT index of that file’s parent, not its entire path. If the

parent’s file creation has not yet been intercepted, or the intercepted parent did not

have an MFT index when its creation was intercepted (due to the previously described

problem), Dione has no way to identify the parent and thus reconstruct the path.

This situation occurs quite frequently whenever an application is being installed. In

this case, many (up to hundreds or thousands) of files are created in a very short

amount of time. Since the OS bunches writes to disk in one delayed burst, many

hierarchical directory levels are created in which files cannot determine their paths.

67

The temporal gap also proves a challenge when a file’s attributes are divided over

multiple MFT entries. As Dione will only intercept one MFT entry at a time, it will

never see the full picture at once. Therefore, it needs to account for the possibility of

only intercepting a partial view of metadata.

4.3.2 Live Updating Operation

Live updating in Dione occurs in three steps. First, file metadata is intercepted as

it is written to disk. Next, the pertinent properties of the file are parsed from the

metadata, resulting in a reconstructed description of the file whose metadata was

intercepted. Finally, Dione uses the intercepted sector, the existing view of the file

system, and the reconstructed file description from the second step to determine what

event occurred. It updates the data structures to represent the file system change.

After intercepting an access to disk, Dione looks at the intercepted disk contents

and approximates whether the disk contents “look like” metadata (i.e., whether the

contents appear to be an intercepted MFT entry). If it looks like metadata, Dione

parses the raw bytes and extracts the NTFS attributes. It also attempts to calculate

the MFT index by determining where the intercepted sector falls within Dione’s

copy of the MFT runlist. With this calculated index, it can attempt to retrieve a

File Record. There are two outcomes of this lookup: either a valid File Record is

retrieved, or no File Record matches the index.

If a valid File Record is found, Dione will compare the extracted attributes to

those attributes found in the existing File Record. If any changes are detected, it

will modify the File Record to reflect the changes. A summary of the semantic and

temporal artifacts of each type of file operation is presented in Table 4.2.

However, if a valid File Record is not found, one of three situations has occurred.

68

Operation Artifacts

File Creation

• No existing File Record for calculated index

• Sector falls within MFT runlist, otherwise buffer until MFTrunlist expands to include sector

File Deletion• File Record exists for calculated index

• In-Use flag off in intercepted MFT entry header

File Replacement∗

• File Record exists for calculated index

• Creation Time: Intercepted > FileRecord,ORMFT Entry Sequence Number: Intercepted > FileRecordORMFT Entry type (base vs. nonbase) changed

File Rename• File Record exists for calculated index

• File Name: Intercepted 6= FileRecord

File Move• File Record exists for calculated index

• Parent’s MFT Index: Intercepted 6= FileRecord

File Shrink/Expand• File Record exists for calculated index

• Runlist: Intercepted 6= FileRecord

Timestamp Reversal• File Record exists for calculated index

• MAC Times: Intercepted < FileRecord

File Hidden• File Record exists for calculated index

• Hidden flag: Intercepted = 1 && FileRecord = 0

ADS Creation• File Record exists for calculated index

• List of $Data attributes: Intercepted 6= FileRecord

ADS Deletion• File Record exists for calculated index

• List of $Data attributes: Intercepted 6= FileRecord

Table 4.2: Summary of the artifacts for each file system operation. An MFT indexis computed based on the intercepted sector and the known MFT runlist. If a filerecord is found with the calculated index, properties of the file record are comparedwith properties parsed from the intercepted metadata.∗ A replacement is characterized by a file deletion and creation within the same flushto disk, whereby the same MFT entry is reused.

69

In the first case, a new file has just been created, and it has been inserted into a

“hole” in the MFT. The file creation can be verified because the intercepted sector

falls within the known runlist of the MFT. In the second case, a new file has just been

created, but the MFT was full, and thus it could not be inserted into a hole. Dione

buffers a reference to this file in a list called the Wait Buffer 1. Eventually Dione will

intercept the $MFT file’s expansion, and the file creation can be validated and the

path constructed. In the final case, the intercepted data had the format of metadata

(e.g., the data looked like an MFT entry), but the data actually turned out to be the

contents of another file. This happens for redundant copies of metadata and for the

file system’s $Logfile; additionally, a malicious user could create file contents which

mimic the format of a MFT entry. In any of these cases, a reference to this suspected

file—and the sector at which it was discovered—will be saved in the Wait Buffer.

However, the Wait Buffer will be periodically purged of any File Records when their

corresponding sectors are verified as belonging to a file which is not $MFT.

4.4 Disk Sensor Integration

In order to be portable to any type of sensor, the Dione instrumentation library is

compiled as a library. The Dione daemon is an executable created from the library.

Communication between the Dione daemon requires two corresponding components:

A sensor-side API and a Dione receiver. The receiver is compiled into the Dione

library, whereas the sensor-side library is compiled separately. Therefore, an inter-

process (in the case of a virtualization or emulation-based sensor) or inter-system (in

the case of a physical sensor) communication protocol is required.

1A newly-created file will also be placed in the Wait Buffer if it has a valid MFT index, but itspath cannot be constructed because its parent has yet to be intercepted.

70

The virtualization and emulation based sensors (using Xen and Qemu, respec-

tively) utilize an interprocess communication protocol in order to communicate disk

access information between the hypervisor/emulator and the Dione daemon. We

have implemented a producer/consumer communication protocol using shared mem-

ory and semaphores. A sensor-side API (called the DiskMonitor) provides two exter-

nally available functions. The first is an initialize function; it is called from the

Xen or Qemu I/O initialization function, and it sets up the shared memory region

and semaphores. The second is a disk access function; it marshalls the disk access

information (LBA, count, operation, and access contents) into the shared memory

region, and is therefore called once per multi-sector disk access.

The Xen-based implementation calls these functions from within the block device

driver. The Qemu-based implementation calls these functions from within the dma-

helpers device driver. The Xen implementation works for raw disk images, whereas

the Qemu implementation works for both raw and the new Qemu Copy-on-Write

(QCOW2) disk image formats.

The physical sensor, created with a custom FPGA board, interposes between a

system and its hard disk; therefore, it allows Dione to instrument a physical SUA,

preventing the malware from detecting that it is being analyzed. The physical sensor

parses the disk access information (LBA, count, operation, and access contents) from

the SATA commands. It then passes them along to the Dione daemon, which is

running on another physical system, through ethernet. The Dione library is compiled

with a client-side receiver that opens a socket for the given network interface and

waits for packets on that socket. The disk access information for each packet is

unmarshalled and passed to the rest of the Dione daemon.

71

4.5 Experimental Results

Next, we evaluate the accuracy and performance of Dione and demonstrate its utility

using real-world malware. Though Dione is a flexible instrumentation framework

capable of collecting and analyzing data from both physical and virtual sensors, we use

a hypervisor-based solution which utilizes the virtualization layer as a data-collecting

sensor.

4.5.1 Experimental Setup

Our virtualization-based solution uses the Xen 4.0.1 hypervisor. Our host system

contains a dual-core Intel Xeon 3060 processor with 4 GB RAM and Intel VMX

hardware virtualization extensions to enable full-virtualization. The 160 GB, 7200

RPM SATA disk was partitioned with a 25 GB partition for the root directory and a

80 GB partition for the home directory. The virtual machine SUA ran Windows XP

Service Pack 3 with the NTFS file system.

4.5.2 Evaluation of Live Updating Accuracy

In order to gauge the accuracy of live updating, we ran a series of tests to determine

if Dione correctly reconstructed the file system operations for live updating. For

our tests, we chose installation and uninstallation programs, as they perform many

file system operations very quickly and stress the live updating system. We chose

three open source applications (OpenOffice, Gimp, and Firefox), and performed both

an installation and a uninstallation for each. We also ran an all-inclusive test that

installed all three, then uninstalled all three.

72

Program Creations (Delayed) Deletions Moves Errors

OpenOffice Install 3934 3930 1 0 0Gimp Install 1380 1380 0 0 0Firefox Install 152 135 71 0 0OpenOffice Uninstall 353 62 3788 3836 0Gimp Uninstall 5 0 1388 0 0Firefox Uninstall 6 0 80 0 0All 6500 6114 5986 3815 0

Table 4.3: Breakdown of file system operations for each benchmark. The subset offile creations which wait for the delayed expansion of the MFT are also indicated.

These benchmarks perform a varying number of changes to the file system hier-

archy. Table 4.3 lists each of the seven benchmarks and the number of file creations,

deletions, and moves 2. As discussed in Section 2.3.1, if many new files are created

at once and the MFT does not have enough free space to describe them, there is a

delay between when the file creation is intercepted and when the MFT expands to

fit the new file metadata (at which point the file creation can be verified). We also

include the number of delayed-verification file creations in Table 4.3, as these create

additional stress to Dione’s live updating accuracy.

For each test, we started from a clean Windows XP SP3 disk image. We executed

one of the seven programs in a VM, instrumenting the file system. We then shut

down the VM, and dumped Dione’s view of the dynamically-generated state of the

file system to a file. We then ran a disk scan on the raw static disk image, and

compared the results of the static raw disk scan to the results of the dynamic execution

instrumentation. An error is defined as any difference between the dynamically-

generated state and the static disk scan. This includes a missing file (one that was not

reported created), an extraneous file (one that was not reported deleted), a misnamed

2The “All” test is not a sum of the individual tests, because the operating system also creates,deletes, and moves files, and the number of these may change slightly through tests.

73

file, a file with the wrong parent ID or path, a file mislabeled as a file or directory, a file

mislabeled as hidden, a file with an incorrect timestamp (of any of the four timestamps

maintained by Windows), or a file with an incorrect runlist. Table 4.3 shows the result

of the accuracy tests. In each case, Dione maintained a 100% accurate view of the file

system, with no differences between the dynamically-generated view and the static

disk scan.

4.5.3 Evaluation of Performance

In order to gauge the performance degradation associated with disk I/O instrumen-

tation using Dione, we ran two classes of benchmarks: one high in file content reads

and writes, and one high in file metadata reads and writes.

Iozone Benchmark

Iozone generates and measures a variety of file operations. It varies both the file size

and the record size (e.g., the amount of data read/written in a given transaction).

Because it creates very large files, reading and writing to the same file for each test,

this is a content-heavy benchmark with very little metadata being processed.

We ran all Iozone tests on a Windows XP virtual machine with a 16 GB virtual

disk and 512 MB of virtual RAM. We used the Write and Read tests (which stream

accesses through the file), and Random Write and Random Read (which perform

random accesses). We varied the file size from 32 MB to 4 GB, and chose two record

sizes: 64 KB and 16 MB. We ran each test 50 times to average out some of the

variability that is inherent with running a user-space program in a virtual machine.

For each test, we ran three different instrumentation configurations. For the Base-

line configuration, we ran all the tests without instrumentation (that is, with Dione

74

turned off). In the second configuration, called Inst, Dione is on, and performing

full instrumentation of the system. There are, however, no rules in the system, so it

does not log any of these accesses. This configuration measures the minimum cost of

instrumentation, including live updating. The final configuration is called Inst+Log.

For these tests, Dione is on and providing instrumentation; additionally, a rule is set

to record every access to every file on the disk. Figure 4.3 shows the results of the

tests. Each of the lines represents the performance with instrumentation, relative to

the baseline configuration.

For the Read Iozone tests (Figures 4.3(a) and 4.3(b)), the slowdown attributed to

instrumentation is near 0 for files 512 MB and smaller. Since the virtual machine has

512 MB of RAM, Windows prefetches and keeps data in the page cache for nearly

the entire test. Practically, this means that the accesses rarely go to the virtual disk.

Since Dione only instruments actual I/O to the virtual disk—and not file I/O within

the guest OS’s page cache—Dione is infrequently invoked.

At larger file sizes, Windows needs to fetch data from the virtual disk, which

Xen intercepts and communicates to Dione. At this point, the performance of in-

strumentation drops relative to the baseline case. In the worst case for streaming

reads, Dione no-log instrumentation achieves 97% of the performance of the unin-

strumented execution.

For the random read tests with large file sizes, there is a larger penalty for in-

strumentation. Recall that Dione incurs a penalty relative to the amount of data

accessed on the virtual disk. Therefore, the penalty is higher when more accesses are

performed than are necessary. Windows XP utilizes intelligent read-ahead, in which

the cache manager prefetches data from a file according to some perceived pattern.

For random reads, the prefetched data may be evicted from the cache before it is

75

32 64 128 256 512 1024 2048 4096File Size (MB)

0.0

0.2

0.4

0.6

0.8

1.0

Perfo

rman

ce R

elat

ive

to B

asel

ine

Instrumentation OverheadRead Tests (64 KB Record)

Inst (Stream)Inst (Random)Inst+Log (Stream)Inst+Log (Random)

(a) Read Test, 64 KB Record Size

32 64 128 256 512 1024 2048 4096File Size (MB)

0.0

0.2

0.4

0.6

0.8

1.0

Perfo

rman

ce R

elat

ive

to B

asel

ine

Instrumentation OverheadRead Tests (16 MB Record)


(b) Read Test, 16 MB Record Size

32 64 128 256 512 1024 2048 4096File Size (MB)

0.0

0.2

0.4

0.6

0.8

1.0

Perfo

rman

ce R

elat

ive

to B

asel

ine

Instrumentation OverheadWrite Tests (64 KB Record)


(c) Write Test, 64 KB Record Size

32 64 128 256 512 1024 2048 4096File Size (MB)

0.0

0.2

0.4

0.6

0.8

1.0

Perfo

rman

ce R

elat

ive

to B

asel

ine

Instrumentation OverheadWrite Tests (16 MB Record)


(d) Write Test, 16 MB Record Size

Figure 4.3: Performance of instrumentation, normalized to the baseline (no instru-mentation) configuration for Iozone benchmarks for streaming and random read andwrite tests.

76

used, resulting in more accesses than necessary. This also explains why the penalty

is not as high for the tests using the larger record size (for a given file size). Win-

dows adjusts the amount of data to be prefetched based on the size of the access,

so the ratio of prefetched data to file size is higher with a larger record size. With

more prefetched data, there is a higher likelihood that the data will be used before

it is evicted from the cache. Fortunately, this overhead is unlikely to be incurred in

practice, as random-access of a 2 GB file is rarely performed.

Another observation is that the performance of Dione actually improves for

streaming and random reads as file sizes get larger than 1 and 2 GB (respectively).

This is explained by considering the multiple levels of memory hierarchy in a virtu-

alized system. As the file size grows larger than the VM’s RAM, I/O must go to

the virtual disk. However, the file may still be small enough to fit in the RAM of

the host, as the host will naturally map files (in this case, the VM’s disk image) to

its own page cache. Thus, disk reads are not performed from the physical disk until

the working size of the file becomes larger than available physical RAM. Since phys-

ical disk accesses are very slow, any cost associated with Dione instrumentation is

negligible compared to the cost of going to disk.

The Iozone Write tests (Figures 4.3(c) and 4.3(d)), show some performance degra-

dation at small files sizes. Windows must periodically flush writes to the virtual disk,

even if the working set fits in the page cache. However, the performance impact is

minimal for all file sizes, with a worst-case 10% performance degradation, though it is

generally closer to 3%. Additionally, the random write tests do not show the penalty

associated with random reads. Since Windows only writes dirty blocks to disk, there

are fewer unnecessary accesses to disk.

It is also noticeable that speedup values are sometimes greater than 1 for the

77

32 MB file size write tests. This would imply that the benchmark runs faster with

instrumentation than without. In reality, this effect is explained by an optimization

Windows uses when writing to disk. Instead of immediately flushing writes to disk,

writes are buffered and flushed as a burst to disk. With this Lazy-Writing, one eighth

of the dirty pages are flushed to disk every second, meaning that that a flush could

be delayed up to eight seconds. From the perspective of the user—and therefore, the

timer—the benchmark is reported to have completed. In reality, the writes are stored

in the page cache and have yet to be flushed to disk. The long-running benchmarks

will have flushed the majority of their writes to disk before the process returns.

However, a short-running benchmark—such as the Iozone benchmarks operating on

a 32 MB file—may still have outstanding writes to flush. The time it will take to

flush these will vary randomly through the tests. We reported a 21-24% standard

deviation (normalized to the mean) for the baseline, instrumentation, and logging

tests. This effect is examined in more detail in the next section.

For all tests, the cost of logging all accesses is relatively low, falling anywhere

from 0-8%. For these tests, the root directory (under which the logs were stored)

was on a separate partition than the disk image under instrumentation. Therefore,

logging introduced an overhead, as the disk alternated between writing to the log file

and accessing the VM’s disk image. This performance penalty can be reduced by

storing the log on the same partition as the disk image. Future work can also reduce

the overhead by buffering log messages in memory—performing a burst write to the

log—to reduce the physical movements of the disk.

78

OOInstall

GimpInstall

FirefoxInstall

OOUninstall

GimpUninstall

FirefoxUninstall

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Perfo

rman

ce R

elat

ive

to B

asel

ine

Normalized Instrumenation PerformanceInstInst + Log

(a) Performance of Dione instrumentation(error bar equals one standard deviation).

OOInstall

GimpInstall

FirefoxInstall

OOUninstall

GimpUninstall

FirefoxUninstall

0

20

40

60

80

100

120

140

160

180

Exec

utio

n Ti

me

(s)

Execution TimeNo InstInstInst+Log

(b) Average execution time with and withoutDione instrumentation.

Figure 4.4: Evaluation of Dione instrumentation for Open Office, Gimp, and FirefoxInstall/Uninstall benchmarks.

Installation Benchmarks

In the second set of performance experiments, we evaluated the overhead of bench-

marks that are high in metadata accesses. These tests will heavily stress the live

updating part of Dione’s execution. We ran the same six install/uninstall bench-

marks as in the accuracy tests; the number of creations (including delayed), deletions,

and moves were listed in Table 4.3. We ran each test ten times to average out the

variation inherent in running a user-space application on a virtual machine; for each

run, we started from a the same clean disk image snapshot. We used a Windows XP

SP3 virtual machine with an 8 GB virtual disk and 512 MB of virtual RAM.

We compared the baseline execution (with no instrumentation) to full instrumen-

tation with Dione with and without logging. Figure 4.4 graphs the execution times

of the three configurations, as well as the performance of Dione instrumentation

79

relative to the baseline execution.

As Figure 4.4 shows, even when the workload requires frequent metadata analy-

sis for live updating, the overhead of instrumentation is low. Without logging, the

full instrumentation of the benchmarks causes between a 1% and 5% performance

degradation.

The three benchmarks with the least penalty are OpenOffice installation and unin-

stallation and Gimp installation. These have between 1-2% performance degradation

for instrumentation without logging, compared to 5% for Firefox Install and Gimp

Uninstall (Firefox Uninstall is excluded for now, and explained in more detail be-

low). Figure 4.4(b), which graphs the average execution times of the six benchmarks,

provides more insight. These three benchmarks are the longest running of the six

benchmarks, which is important because of how Windows performs writes to disk.

As described in the previous section, Windows will perform a burst flush to disk, and

writes could be delayed as many as eight seconds before they are flushed from the

page cache. While the program is reported to have completed, there are still out-

standing writes that need to be flushed to disk. This effect is especially pronounced

in any program with a runtime on the same order of magnitude as the write delay.

We can see this effect in Figure 4.4, which includes error bars showing the normal-

ized standard deviation for the 10 runs of each benchmark. The 3 longest-running

benchmarks also have the lowest standard deviations. This means that the results

of these three tests are the most precise, and the average reflects the true cost of

instrumentation. While two of the three shortest-running benchmarks have the high-

est reported cost of instrumentation, the standard deviation between tests is greater

than the reported performance penalty. The execution time of the Firefox Uninstall is

dwarfed by the time Windows may delay its writes—as reflected in its high standard

80

deviation. In practice, this means that a user is unlikely to ever notice a slowdown

attributed to disk instrumentation for short bursts of disk activity.

These tests also show between a 0% and 9% performance decrease. In these tests,

the disk image resided on the same partition as the log file. Therefore, the cost of

logging to a file was lower than for the content tests.

4.6 Registry Monitoring

As discussed in Section 2.3.1, Windows stores configuration data for the operating

system, users, and applications in the Windows registry. While some of the registry

is created on system boot and remains only in memory, much of it is backed up on

the disk in the form of Windows registry hive files. There are five registry hive files:

system, security, software, default, and SAM.

In order to have a dynamic view of the Windows registry hive files that is always

up to date, we integrated registry monitoring into Dione. In addition to the file

system operations already tracked by Dione, it also tracks when registry keys are

created, deleted, or changed. We keep track of the registry hive files in the same way

that Windows does: by mapping the files to memory. Initialization of the registry

hive files can occur in one of two ways. If the file system state is obtained through

a scan of the raw disk, via the operation scan, then an optional argument will carve

the registry files out of the raw disk, saving them to memory. These files can also

be saved to disk, so that in future system starts, they can be loaded automatically

from the saved state, rather then carved from the raw disk. We detail the additional

commands that are needed for registry monitoring, as well as new arguments for

existing commands, in Table 4.4.

81

New Commands

save-registry Save given raw registry hive file from Dione to diskload-registry Load Dione with a previously-saved raw registry hive file

stored on disk

New Parameters to Existing Commands

declare-rule Declare a new rule for instrumentation. New rule type includes:

• registry: Save all registry key creations, deletions, changes to ex-isting keys and values.

scan Perform a full scan of a disk image (or mounted disk partition),creating all file records from the raw bytes and automaticallyapplying all declared rules. Optionally carve the registry filesfrom the raw disk, saving them in memory for Dione use.

Table 4.4: New commands, as well as new arguments to existing commands, for useto communicate with the Dione daemon to perform registry monitoring.

When a disk write comes across the wire, Dione determines whether that write

is to a content sector of one of the registry hive files. If so, it patches its view of the

hive file in memory using the sector number, sector count, and associated raw file

content. Though writes to other files may be intertwined with writes to the registry,

there are never more than three files written to simultaneously; this means that we

can judge a series of writes to a hive file to be complete once there have been writes

to three other consecutive files. Once a series of writes to the hive file is complete, we

parse the hive file using the regfi open source library [55], storing the information

for each key and subkey in list. We then compare the previous view of the hive file

to this newly-parsed view, and look for any differences. We use a naive algorithm,

originally described by Johnson et al. [32] in an internal document, but summarized

by [27].

The differencing algorithm we used consists of three steps: First, it goes through

the list item by item until two items disagree. Second, it compares the kth item

82

ahead in each list with the k lines following the mismatch, incrementing k in each

round, until a match is found. The advantage of this approach is that, if the item to be

matched occurs quickly, it will find it quickly. Once the match is found, the algorithm

continues to the next disagreement. This algorithm we used is known to be a naive

algorithm, though it works well in practice when there are relatively few differences

between the items to be compared, and relatively few duplications [27]. With registry

modifications, the first case is often true (there are few modifications relative to the

list of all keys), and the second is always true (there are no duplications).

Unfortunately, occasionally the algorithm produced an inaccurate, noisy trace. In

approximately 22 of 1,084 samples, an error in the diff algorithm resulted in an event

trace listing the deletion of every key in the registry, followed by the creation of every

key in the registry. This error is easily detectable; as a result, we discarded the traces

for which the error occurred. Future work will implement a more robust algorithm

to avoid this problem.

83

Chapter 5

Labeling Malware Persistence

Mechanisms with Dione

In this section, we discuss persistence capability labeling with DCL, the Dione Capa-

bility Labeler. We generate specifications for properties, including the service install,

service load, system boot, and file access, using Linear Temporal Predicate Logic

(LTPL). We support our file loading model with a machine learning classifier that

differentiates between two types of file access patterns. We implement an automated

testbed and generate Dione traces from over one thousand real-world malware sam-

ples, evaluating the accuracy of our models in their ability to detect persistence

mechanisms.

5.1 Modeling Persistence Mechanisms with LTPL

In order to demonstrate the successful use of a persistence mechanism to survive

and automatically restart after a reboot, we broke each persistence capability into

three phases, and DCL models each of the phases. The first phase is installation,

84

whereby the malware makes the necessary changes to the file system (creating new

files and modifying existing files) and to the registry (adding new keys and values,

and modifying the contents of existing subkeys). The second phase is system boot,

whereby we model the sequence of disk operations that are indicative of a system boot.

Without the reboot, we cannot test whether the persistence mechanism was successful.

Finally, we model the service load, whereby the binary associated with a service, if one

was installed, is automatically loaded after reboot. This stage incorporates another

model, the file access. This stage demonstrates that the file associated with the

persistence mechanism was accessed after the system booted. In order to eliminate

false negatives from occurring—with a file access going unlabeled—we keep the model

sufficiently generic. In Section 5.4, we bolster the file access model with a machine

learning algorithm to differentiate between different types of file accesses, to ensure

we correctly label the loading of the program binary associated with the persistence

mechanism.

In Section 2.4, we discussed model checking and the specification language Linear

Temporal Predicate Logic, LTPL, using examples from an x86 instruction trace. In

this section, we model persistence capabilities using LTPL, replacing the x86 instruc-

tion predicates with seven predicates representing operations obtained from a Dione

trace, plus a predicate to perform a regular expression match of two strings. The

predicate vocabulary used to model the persistence capabilities from Dione events

is provided in Table 5.1. 1

1Recall from Section 2.3.1 that, since keys and values are hierarchically organized, it is useful tothink of the hierarchy as analogous to a file system. Each key or value has a path (the concatenationof all keys higher in the hierarchy) and a name, and just like a file, it may optionally hold contents(which we also refer to as the value). Consequently, we can use similar terminology between filesand registry keys.

85

Set Name Description

P

RegCreate2(p,n) Event is creation of registry key or value with pathp and name n

RegCreate3(p,n,v) Event is creation of registry key or value with pathp, key name k, and value v

MBRRead(s) Event is read of sector offset s of Master BootRecord

ContentRead(f,s) Event is read of sector offset s of file fMetaRead(f) Event is read of metadata associated with file fFileMove(f) Event is move of file to destination file fFileCreate(f) Event is creation of file fRegExMatch(re,s) String s matches the regular expression provided in

string re

F path join Returns the concatenation of an absolute path witha key or file, resulting in a new path

C

ServicePath The path under which all service subkeys are kept:HKLM\system\ControlSet00X\Services

RegExSvcHostEvent The regular expression for an event inwhich a registry value is created fora service run by svchost.exe (a sys-tem process that hosts multiple services):REGISTRY CREATION.*ImagePath.*WINDOWS\system32\svchost.*-k

RegExSvcHostFile The regular expression used as the value of theImagePath registry value when a service is run bysvchost.exe: WINDOWS\system32\svchost.*-k

Table 5.1: Function (F), Predicate (P), and Constant (C) symbols for propertyspecifications.

86

We developed the property specifications from domain knowledge. That is, we ob-

served both synthetic and real-world software samples, including hand-coded benign

software, real-world benign software, and real-world malware samples. We evaluated

the models’ accuracy on an entirely different set of samples than the samples we

developed the models on.

5.1.1 System Boot

We first model the specification for a system boot, as a detected boot implies the

system was shutdown or restarted. A system boot is characterized by a read of

Master Boot Record sector 0, followed immediately by a read of the 0th sector of the

file content of the file $Boot, followed immediately by a read of the 0th sector of file

content of the file $MFT. Equation 5.1 lists the LTPL specification of a system boot.

φSB = F(MBRRead(0) ∧X(ContentRead(“$Boot”, 0) ∧XContentRead(“$MFT”, 0))) (5.1)

5.1.2 Service Install

Next, we model the installation of the service. Several events must occur within the trace

in order to satisfy the specification for service installation. At some point in the trace, there

must be a creation of a key with name k and path equal to the constant string ServicePath.

There must be a creation of three values; all three have a path that is a concatenation of

the constant string ServicePath and the key name k, and with names type, start, and

ImagePath, respectively. If there appears any event e in the trace that matches the regular

expression of constant RegExSvcHostEvent, there must also be somewhere in the trace a

creation of a registry value with name ServiceDll and a path that is the concatenation of

87

the ServicePath, the key name, and the string Parameters. Finally, we require that all the

previous events must occur before a system boot.

The LTPL specification to perform a service installation is given in Equation 5.2.

.

φsinst = ∃kF(RegCreate2(ServicePath, k)

∧ RegCreate2(path join(ServicePath, k), “Type”)

∧ RegCreate2(path join(ServicePath, k), “Start”)

∧ RegCreate2(path join(ServicePath, k), “ImagePath”)

∧(RegExMatch(RegExSvcHostEvent, e) =⇒

RegCreate2(path join(ServicePath, k, “Parameters”), “ServiceDll”))

∧ FφSB)

(5.2)

It should be noted that we preprocess the paths provided in ImagePath, ServiceDll,

and other path-related registry entries to normalize them. For example, we normalize for

Windows allowing certain environment variables in the path, for paths that take advantage

of the Window’s specification of a default directory, and we ensure that slashes match the

style of the Dione log (Unix-style forward-slashes).

5.1.3 File Access

A file access is characterized for a given file, f . The file access consists of a read at some

point in the trace of a file’s metadata, followed eventually by a read starting at the 0th

sector of the same file’s contents. The LTPL specification to perform a file access is given

in Equation 5.3.

φFA = ∃f(F(MetaRead(f) ∧ FContentRead(f, 0))) (5.3)

88

As previously noted, this generalization of a file load only checks whether a file was

accessed, and does not differentiate between different types of file accesses. In Section 5.4,

we will differentiate between a load and another type of access (a file copy) using a machine

learning classifier.

5.1.4 Persistent Service Load

In this section, we model the persistent service load. This incorporates several stages.

Since the service is installed for the goal of persistence, the specified operations must occur

before system reboot in order to ensure that the service binary can automatically load.

Additionally, a file access of that service binary must be detected after the reboot.

The specification for a service load relies on the idea of a relevant binary. During service

installation, there needs to be a registry creation event with name ImagePath and value

fip. If the service runs in its own process, fip will be the absolute path and filename of its

executable binary. A service load incorporates the registry events required by the service

install, plus a file creation or move of a file with the same value as fip. In the future, there

must appear a system boot (specified by φSB), followed in the future by a file access of the

same file fip.

Alternatively, if the service is going to be run by the SvcHost service, fip will contain a

string to which there must be a regular expression match with the constant string RegExSv-

cHostFile. Furthermore, there must be a creation of a registry value called ServiceDll,

and the contents of this value, fdll will specify the path of the executable binary that must

be created, then loaded after system reboot.

Notice that in Equation 5.2, we specified that a service installation specification required

registry creation events of ImagePath and/or ServiceDll, with no requirement of the con-

tents of those values. In this specification, we check the value of those registry creation

events, and ensure that a file is created to match that executable binary.

The LTPL specification to perform a service load given a persistent service installation

89

is given in Equation 5.4.

φsload = ∃kF(RegCreate2(ServicePath, k)

∧ RegCreate2(path join(ServicePath, k), “Type”)

∧ RegCreate2(path join(ServicePath, k), “Start”)

∧ ∃fip(RegCreate3(path join(ServicePath, k), “ImagePath”, fip)∧((

RegExMatch(RegExSvcHost, fip)

∧ ∃fdll(RegCreate3(path join(ServicePath, k , “Parameters”), “ServiceDll”, fdll)

∧ (FileCreate(fdll) ∨ FileMove(fdll))

∧ F(φSB ∧ F(MetaRead(fdll) ∧ FContentRead(fdll, 0)))))

∨(¬RegExMatch(RegExSvcHostF ile, fip) ∧ (FileCreate(fip) ∨ FileMove(fip))

∧ F(φSB ∧ F(MetaRead(fip) ∧ FContentRead(fip, 0)))))))

(5.4)

5.2 Dione Capability Labeler Implementation

DCL is implemented as a behavioral model checker using custom Python code. The behav-

ioral model checker implements the specifications of Section 5.1 by hard-coding the states

of the model to fit the specification. After the model checker pre-processes the events of the

trace, it moves through the states detailed by the specifications of Section 5.1, outputting

true if the events satisfy a specification and false otherwise. For each malware sample, it

outputs an XML file that lists which of the properties were present in the trace (service

installation, reboot, load of service binary). We chose a behavioral implementation, in-

stead of an exhaustive model checker, because it is specific to the problem being modeled.

Since the three phases occur sequentially, we could break down the traces into smaller sub-

traces. As a result, the resulting state machine contains programatically-simple transitions

90

between states. Because of this property, we were able to achieve higher performance with

a hand-coded model checker than would be possible if we had used a generic-but-exhaustive

model checker, and such scalability is necessary in a malware analysis environment in which

hundreds of thousands of samples are discovered each day.

5.3 Experimental Setup

We created a testbed infrastructure in order to automatically load and run a malware

sample, observe it with Dione and other tools, and automatically save instrumentation

data. The automation of our testbed was instrumental in analyzing more than a thousand

real-world samples, so that we could comprehensively evaluate our model checker.

5.3.1 Testbeds

Our testbed consisted of two servers, each running the Xen 4.1.2 hypervisor with an Ubuntu

12.04 Domain 0, and controlled by using virsh commands of the libvirt library. The

SUA—the system on which the malware was loaded and instrumented—was 32-bit Windows

XP Service Pack 3 with an NTFS file system. Each Windows XP SUA had 512 MB of

physical memory, a 16 GB disk, and a secondary 2 GB disk. Two different VMs were

loaded with differing amounts and types of software, and had seen varying amounts of use

so that they had different disk allocation patterns and fragmentation rates. Our two testbed

VMs can be described by the following information:

• VM0: This VM was used to generate the traces that provided us with enough domain

knowledge to develop our models. It was also used to train, test, and evaluate our

models on over 1,000 samples. This VM was generally clean; it had only enough soft-

ware installed to tempt the malware to run. Of the 16 GB disk, it had approximately

9.5 GB of free space, and had a total fragmentation of 7% and a file fragmentation

91

of 16%.

• VM1: The second testbed was used purely for additional testing; we tested approx-

imately 350 samples on this testbed, including many samples also run on VM0. We

used this testbed to demonstrate that our models work for other 32-bit Windows

XP SP3 systems beside the one trained on. As such, no traces generated from this

testbed were used in the acquisition of domain knowledge, nor in the creation of our

persistence mechanism specifications nor the training of our machine learning classi-

fier. This VM was not as clean as VM0; as a result, it had more files on disk and more

programs installed, and thus had less free space and higher fragmentation. VM1 had

1.5 GB of free space, 15% total fragmentation, and 30% file fragmentation.

Additionally, we provided a simulated internet to our VMs through the iNetSim frame-

work [26]. INetSim simulates common network services, including HTTP, SMTP, DNS,

and FTP. Simulating internet services is necessary for two reasons. First, it prevents the

malware from causing harm to other systems on the network, since it is not connected to

the actual internet. However, it is not enough to simply disable internet access to the VM,

as many malware samples will not run unless they can receive responses to simple queries

(e.g., by performing a DNS lookup). It is common for malware to attempt to connect to an

internet service in order to coarsely detect if it is running in a sandboxed environment, and

if so, it will exit to avoid being analyzed. As a result, using iNetSim causes more malware

samples to run, allowing us to obtain meaningful traces.

In addition to integrating Dione with the Xen Domain 0, as detailed in Section 4.4, we

also integrated the Volatility memory forensics framework [76] into our testbed. Volatility is

an open source framework that can extract digital artifacts, including process lists, drivers,

and services, from the physical memory of a virtual machine. Volatility can easily integrate

with Xen, reading the memory of the VM; our testbed used Volatility to report process lists,

lists of DLLs loaded by each file, loaded modules, loaded drivers, and details of installed

92

services.

Before running any tests, the VM was booted and warmed up, then the VM was paused

and a snapshot was taken of the memory. For each malware sample, the sample was loaded

on to the paused SUA, which was then restored from the checkpoint using a copy-on-write

(COW) disk (of the Xen VHD disk format) so that any modifications the malware made to

the system could be discarded after instrumentation was complete. After running for three

minutes, allowing the malware to install itself, the system was restarted. Approximately

three minutes after the system restarted, we used Volatility to extract information from

the VM’s memory, saved Dione and Volatility logs, and replaced the COW disk with a

clean image for the next sample. Due to an artifact of the Xen libvirt library, shutdown

of the VM takes on the order of five minutes. As a result, it takes approximately eleven

minutes to run and instrument each sample. While this is the dominating time-expensive

part of the analysis, it can be alleviated in future work by parallelizing Dione so that

multiple instances can run, each monitoring a single VM, allowing multiple VMs to run

simultaneously.

5.3.2 Malware Corpus

Most of the samples were acquired from Anubis [1], an online malware analysis platform.

Anubis temporarily stores the samples submitted by the general public for analysis. We

acquired these samples from their database over the course of October to December 2012 and

took a random sampling. In order to ensure that our corpus contained a sufficient number

of service-installing samples to fully evaluate our model checker, we specifically targeted

samples which were known to create a file under the C:\WINDOWS\system32 directory, as

many (but not all) services tend to install their executable into this directory. We also

manually downloaded numerous samples from Open Malware [56]. In the end, we had

obtained 1,084 real-world malware samples with unique MD5 checksums.

93

5.3.3 Assignment of “Truth” Labels

Having obtained samples found in-the-wild, we were tasked with determining ground truth

of each malware sample’s behavior in order to evaluate the correctness of our models.

Unfortunately, the ground truth problem is nontrivial to solve [4, 49]. As discussed in

Section 1.1 and in more detail by Bailey et al. [4], anti-virus companies agree on neither

names nor scope and granularity of labels, and clustering algorithms can only evaluate the

effectiveness of their own models using these AV labels. Even if trustworthy family/variant

labels could be obtained for each, there does not exist a repository of malware descriptions

describing each sample, so we could not, for example, verify whether or not a sample was

known to install a service. Finally, we face the challenge of environment-sensitive malware.

There are many reasons a malware sample may not run in our environment; it may detect

it is running in a virtualized platform, it may attempt to download a specific file from the

internet and be unable to, it may require a library on the system that is not present, or it

may be written for another version of Windows. Malware is also not known to be robust;

it is common for either the malware or the entire OS to crash (resulting in the dreaded

BSOD, or “Blue Screen of Death”). Thus, even if a malware sample were known to exhibit

the capabilities we are modeling in another environment, we could not be certain that it

exhibited that capability in our environment.

For these reasons, we used Cross View Detection, a technique in which two very different

views of the same system are compared in order to determine a truth for samples running

in our environment. We used this definition of truth to label our samples. First, we used

Volatility to extract installed services, processes, DLLs, modules, and drivers from the

memory space of our SUA. Then we compared the results of our models to the results

extracted using Volatility, and applied a label based on one of two definitions of truth: (1)

If Volatility and DCL agree on labels, those labels are deemed truthful, or (2) If DCL and

Volatility do not agree, we performed manual analysis to break the tie and apply a truthful

label. Manual analysis consisted of both static and dynamic analysis techniques, including

94

disassembly, analysis with IDA Pro, running the samples using in-host analysis tools like

ProcMon and RegShot, and using the WinDbg kernel debugger on a connected system to

step through and analyze kernel code and data structures.

We ran the 1,084 samples on VM0. We discarded 59 traces, leaving us with 1,025 viable

traces. We discarded these traces for one of three reasons. First, occasionally Volatility

wasn’t able to parse data structures from the SUA memory, thus preventing us from es-

tablishing truth on the samples. Second, some samples didn’t reboot, (either because the

malware blocked shutdown, or because the malware corrupted kernel space so badly that

boot caused a BSOD). This prevented us from evaluating persistence. Third, we discarded

traces due to the error in the Dione registry monitoring infrastructure discussed in Sec-

tion 4.6, since this reflects an issue with the instrumentation infrastructure, not the model

checker, and is easily detectable.

Of the 1,025 samples that had valid traces, DCL and Volatility output agreed on the

labels applied to 974 samples, satisfying our first definition for truth. Of the remaining

52 samples, 27 of them disagreed on whether a service installation occurred, while the

remaining 25 agreed that the service installation took place, but did not agree on whether a

load after reboot occurred. We manually analyzed these 52 samples to determine the cause

of the disagreement in order to assign truthful labels.

There were three causes for disagreement on service install between DCL and Volatility

output. Of these, the analysis revealed that Dione had applied the correct labels in the

first two cases; in the last case, the analysis revealed that Dione mislabeled the sample for

its service installing capability.

1. Uninstall Before Snapshot (UBS): A subset of samples were labeled as service

installing by DCL, but no service was detected in memory by Volatility. The reason

behind this discrepancy is that Volatility only detects the services installed at the

time of the single snapshot during which it parses the SUA’s memory. Dione, on the

other hand, continually monitors the disk, detecting every action. In these samples,

95

the service mechanism is used to load driver code into kernel space. Once the driver is

loaded, the service is uninstalled. Evidence of this can be found in the other Volatility

logs, which show loaded drivers and modules matching the service name. As such, the

resulting truth label applied to these 24 samples is service installing, and the DCL

labels are correct.

2. Delete Before Reboot (DBR): This sample was labeled as service installing by

DCL, but no service was detected in memory by Volatility. This sample used the

service mechanism only to load code into memory, not for persistence; it created

and deleted the service in rapid succession upon installation. This sample is labeled

service installing, and the DCL labels are correct.

3. Installation After Reboot (IAR): These samples were labeled as service installing

by Volatility, but not by DCL. These samples installed the services after reboot.

They used the service mechanism to load code, then deleted the service immediately

thereafter. As they do indeed install services, these samples are labeled as service

installing, and DCL labels are incorrect.

There were four causes for disagreement between DCL and Volatility, present in 25

sample traces, regarding whether the installed service was automatically loaded after reboot.

If the service successfully loads after reboot, it is considered a service-persisting sample.

4. Unload Before Snapshot (ULBS): These samples samples were labeled by DCL

as service loading, but not by Volatility. The reason is similar to “Uninstall Before

Snapshot”, above: Volatility operates on a single snapshot taken after the system

boots up, and these samples had loaded and unloaded their service before the snapshot

was taken. As a result, these samples are labeled persistent service loading, and DCL

labels are correct.

96

5. Fast File Deletion (FFD): These samples were labeled as service loading by Volatil-

ity, but not by DCL. Analysis showed that these samples created and deleted the files

so quickly that the evidence is not flushed to disk. However, system call analysis

shows the files are indeed created, the services loaded into memory, and then the

original files deleted. As a result, these samples are labeled persistent service loading,

and DCL labels are incorrect.

6. File Creation After Boot (FCAB): This sample was labeled service loading by

Volatility, but not by DCL. This sample installed its service before reboot (verified by

both DCL and Volatility), but didn’t create a service binary to correspond with the

service until after reboot. Without a valid binary creation before system shutdown,

the service cannot be used for persistence, and thus we label this not persistent service

loading, and DCL labels are correct. However, we acknowledge that the definition of

service persistence prevents us from labeling samples that load a service for any other

reason, and address this in Section 5.3.4.

7. Temporary Service Creation (TSC): This sample was labeled service loading

by Volatility, but not by DCL. This sample does not use the service mechanism to

persist, only to load driver code into the kernel. It installs and loads the service,

then deletes the service and accompanying file. After boot, it repeats the process:

Installing and starting the service, loading the binary (the service is started by a third

mechanism), then deleting all trace of its existence on disk. Since the service installed

is not reloaded automatically after reboot, we label this sample not persistent service

loading, and DCL labels are correct. Again, we acknowledge a missed opportunity to

label a service loading for non-persistence, and discuss this in Section 5.3.4.

97

5.3.4 Model Checker Results

Figure 5.1 presents the results of our experiments on VM0 in the form of a confusion matrix,

with the labels applied by DCL compared to the actual labels. Of the 1,025 samples that

produced valid traces on VM0, 197 of them were service installing. Of these 197 samples,

there were 63 unique malware variants; 14 variants appeared multiple times in the corpus,

with 5 malware variants appearing more than 10 times in the corpus. Table 8.1 in the

Appendix lists all 197 samples that install services.

ActualLabel

DCL Labelp n

P 196 2 198

N 0 827 827

196 829 1025

(a) Service Installing

ActualLabel

DCL Labelp n

p 152 7 159

n 0 866 866

152 873 1025

(b) Service Loading

Figure 5.1: Confusion matrix for service-installing and service-loading labels appliedto samples run on VM0 testbed.

DCL correctly labeled 98.9% (196/198) of the service installing samples; of the 827 non-

service installing samples, there were no false positives. This means that DCL correctly

labeled 99.8% (1,023/1,025) of all samples in our corpus for the service install capability.

This included 25 samples that were not correctly labeled by Volatility, and thus would not

be labeled in an approach that relies purely on a single memory snapshot in time.

Likewise, of the 1,025 samples that produced valid traces, 159 were service loading

and relied on the Windows service mechanism for persistence. Our model successfully

labeled 95.6% (152/159) of service loading samples, and did not falsely label any of the

remaining 866 non-service loading samples. All together, DCL correctly labeled service

98

MD5Service

Name(s)Install Load Cause

0038ee2524f8bc7cb329e01cad411f0f Forter 1 0 FFD0061d7b4c7db34437695853252a82474 wowsub 1 0 FFD08cdc80a346508e6d57efe4a782a9531 PSSdk21 1 0 FFD0ed50455c7ddece3b8989ff5f02dc442 abp470n5 1 0 FFD0072c5497f7eae033ee9934492f17180 abp470n5 1 0 FFD015c976f05bdf3942b9f998b9b1eb7e5 asc3360pr 0 0 IAR

025f8ecd28e85da68eb73b58b0d1b1c7NdisFileServices32

0 0 IAR

Table 5.2: Samples (run on VM0) whose service capabilities were mislabeled by DCL,plus labels assigned by DCL and the cause of the mislabel (according to Section 5.3.3).

loading samples in 99.3% of samples (1018/1025). This included 43 not caught by our

Volatility-based memory forensics (including the 25 for which Volatility did not catch the

service installation). Table 5.3 lists all samples that were mislabeled by DCL on VM0.

In order to demonstrate the effectiveness of our models on any Windows XP system—not

just the particular system in which we tested our samples—we also evaluated the samples

on an entirely different testbed, VM1. This Windows XP VM had a different usage history,

and as such, it had different software updates applied, different allocation on disk, different

programs, and different fragmentation patterns. No traces from VM1 were considered in

obtaining the domain knowledge to develop the service models. We reused most of the

service-installing samples, as well as many non-servicing installing samples, from VM0.

Out of 362 samples, we obtained 353 valid traces. Figure 5.2 shows the confusion matrix

of our results from those samples run on VM1.

Of the 353 samples, 153 installed a service. Some malware variants appeared more than

once in the corpus; altogether, there were 51 unique variants that installed a service, with

3 variants appearing more than 10 times. Of the samples that installed at least one service,

133 loaded that service after reboot. DCL correctly labeled 99.3% (152/153) of the samples

that were servicing installing, and 86.4% (115/133) of the services that were service loading.

99

ActualLabel

DCL Labelp n

P 152 1 153

N 0 200 200

152 201 353


ActualLabel

DCL Labelp n

p 115 18 133

n 0 220 220

115 238 353

(b) Service Loading

Figure 5.2: Confusion matrix for service-installing and service-loading labels appliedto samples run on VM1 testbed.

There were no false positives in either category, meaning that DCL correctly applied the

service installing label 99.7% of the time, and the service loading label 94.9% of the time.

Table 8.2 in the Appendix lists all 153 samples that install at least one service.

Table 5.3 lists all samples that were mislabeled by DCL on VM1. As Table 5.3 shows,

the reason for the low detection rate of service loading samples is due to one malware strain,

loading a service called amsint32. Because this sample appeared in the corpus 11 times,

and the detection of its load was prevented by a fast creation-deletion cycle, it resulted in

the majority of the false negatives.

Altogether, across all 1,378 traces generated on VM0 and VM1, DCL correctly applied

the service installing label to 99.8% (1376/1378) of traces, and correctly applied the service

loading label to 98.1% (1352/1378) of traces. The confusion matrix for the results of all

traces is shown in Figure 5.3.

Discussion of Results

Though DCL didn’t attain 100% detection rate of service installations and service loads,

the false negatives provide useful insight for malware researchers into stealthy malware

100

MD5Service

Name(s)Install Load Cause

0038ee2524f8bc7cb329e01cad411f0f Forter 1 0 FFD0061d7b4c7db34437695853252a82474 wowsub 1 0 FFD08cdc80a346508e6d57efe4a782a9531 PSSdk21 1 0 FFD0ed50455c7ddece3b8989ff5f02dc442 abp470n5 1 0 FFD0072c5497f7eae033ee9934492f17180 abp470n5 1 0 FFD01cf51be1a4bacb550c35e165d4453d4 amsint32 1 0 FFD026896a7449afcdac48323afcd71d3c0 amsint32 1 0 FFD02d275b6110444732f9bef39218d1997 amsint32 1 0 FFD0685b2bf04e60c65be9bd1f667c07c4a amsint32 1 0 FFD0bc7b598af22bd4c7a496b25811dc362 amsint32 1 0 FFD0d2970588384bed4bdb1221003b0a45a amsint32 1 0 FFD0eb98691c031f995c054375a1ebf89b7 amsint32 1 0 FFD0f068f1d2b014b217773ffaaf79abec2 amsint32 1 0 FFD0f36825bea6bf4967403dd9dd5f10a11 amsint32 1 0 FFD0f5adb96c2a975648667451a50c13f28 amsint32 1 0 FFD12ada88d49498a43cf4b9274f3fb586c amsint32 1 0 FFD02fe132fbd9657a60e9bca1b5a3fe747 aic32p 1 0 FFD144cd9ace18008807329e5c6e7c336e6 aic32p 0 0 IAR

Table 5.3: Samples (run on VM1) whose service capabilities were mislabeled by DCL,plus labels assigned by DCL and the cause of the mislabel (according to Section 5.3.3).

ActualLabel

DCL Labelp n

P 348 3 351

N 0 1027 1027

348 1030 1378


ActualLabel

DCL Labelp n

p 267 25 292

n 0 1086 1086

267 1111 1378

(b) Service Loading

Figure 5.3: Confusion matrix for service-installing and service-loading labels appliedto all traces generated on VM0 and VM1

.

101

behavior.

As Tables 5.2 and 5.3 show, the most common reason for DCL to mislabel a sample is

due to the FFD: Fast File Deletion. The loading of services was missed in 23 sample traces:

Forter, Wowsub, PSSdk21 (two traces each), abp470n5 (four traces), amsint32 (eleven

traces), and aic32p (two traces).

This labeling error occurs when the malware sample creates and deletes a file so quickly

that the OS doesn’t have a chance to flush the file to disk. Recall that in Section 2.3.3, it

may take up to 8 seconds for data to be flushed to disk, but the liveness of the malicious

driver files for these samples is less than one second, sometimes even less than 0.2 seconds.

The malware uses the service mechanism not only to persist, but also because it provides a

simple way to load malicious driver code into kernel space. Then, knowing that this leaves

forensic evidence in the Windows registry and on disk, it deletes the file associated with the

driver (the code is hidden somewhere else on disk, so that at the next reboot, the file can

again be created, mapped to memory, and deleted). Since the driver code has been mapped

to main memory, the driver remains loaded, but without the forensic evidence.

Unfortunately, this is a shortcoming of using a disk sensor as the only source of events,

since it will, by definition, only catch traffic that is flushed to disk. However, the models

are still valid in these cases, so combining a disk sensor with another type of sensor (such

as a system call interposer) would be able to catch these quick file creations and thus detect

the service load.

The mislabeling of the two remaining samples, asc3360 and NdisFileServices32, was

due to an over-precise model for our goal. While we aimed to model any service install, by

adding in the requirement that the installation happens before the system boot, we limited

the number of services installing samples we could find. In future work, we will eliminate

the reboot requirement from the service install model, and test for the service persistence

only in the service load model.

102

In the analysis of disagreements between Volatility and DCL, we encountered some in-

teresting observations, even when the samples were correctly labeled by DCL. We discovered

that several samples were not using the service mechanism for persistence, but rather they

were using it because it provides a convenient mechanism to load malicious code into the

kernel. Thus, we sometimes saw a service binary being loaded, even though this was not

a service being used to persist across reboot. Even though our model labeled the samples

correctly as not loading a service persistent binary, it did point out a missed opportunity

in capability labeling. Therefore, in future work, we will generalize our specification that

detects persistent service loads to detect a service load under any circumstance. Then, we

will divide these samples with service-loading capabilities into two specialized models that

label the samples based on their use of the service capability. The result will be two labels

for the service loading capability: One for persistence, and the other as a capability to load

code into memory.

Performance of Model Checker

We timed our model checker using the Linux time command, and reported the user output

(the time spent executing the process code in user space). We ran DCL on the system

described in Section 5.3.1. Obtaining the capability labels for each of the 1,062 samples of

VM0 (including those whose traces would be discarded) took 330 seconds, or 0.31 seconds

per sample. Running DCL on the 362 samples of VM1 took 128 seconds, or 0.35 seconds

per sample.

5.4 Labeling File Access Type

The successful use of a service as a persistence mechanism implies that the file associated

with the service is loaded after a system boot. Analysis of Dione traces for program binary

loads yielded an interesting observation: Because the Windows loader grabs certain parts

103

of the binary at different times, the ordering of disk sector accesses in a program binary

load looks very different from a standard read (for example, reading for a copy operation)

of that same file. Given this observation, we labeled a series of of disk accesses as a program

binary load, contrasting with another type of disk read, the file copy.

5.4.1 Motivation

The intuition behind the labeling of a series of reads is apparent when looking at a visual-

ization of disk reads. Figure 5.4 visualizes the disk reads encountered for the loading of four

executables for running. The x-axis represents (unitless) time, while the y-axis represents

the file layout, noted in the offset (in sectors) within the file. Then, each bar represents a

single disk read—there is one disk read per time unit, and each bar spans the sectors of

the file that were read. The bars are overlaid across a visual representation of the sections

of the binary—each background bar represents the offset range within the file in which the

section resides.

As Figure 5.4 shows, individual disk reads tend to fall on boundaries corresponding with

binary sections. The first access always encompasses the first two sectors of the binary, as

this contains the binary header and thus a mapping between the offsets in the file and the

other sections. As shown in Figure 5.4 (a) and (b), the loader tends to request the resource

(.rsrc) section after reading the header. Often this is succeeded by reads to a .rdata

section, before reading the .text section. These trends are present in many binary loading

patterns; unfortunately, as shown in Figure 5.4 (c) and (d), this is not always the case.

However, there are still some important observations. First, the disk reads don’t often start

where the last access stopped. Also, there tends to be some overlap in the sectors read;

some sectors are read more than once. Additionally, the accesses tend to be non-uniform

in size; that is, the standard deviation of the access size is fairly large. Finally, there may

be sectors (even sectors in the middle of the binary) which are not read at all.

By contrasting Figure 5.4 with Figure 5.5, which visualizes the common disk read pattern

104

0 1 2 3 4 5 6 7Time

0

20

40

60

80

100

120

140

160

180

Off

set

in F

ile (

Sect

or)

Hydraq: Load

.reloc

.rsrc

.data

.rdata

.textHeaders

(a) Hydraq malware service executable(Rasmon.dll).

0 2 4 6 8 10Time

0

20

40

60

80

100

120

140

Off

set

in F

ile (

Sect

or)

Darkshell: Load

saber.rsrc.data.rdata.textHeaders

(b) Darkshell malware service executable(regedit32.exe).

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Time

0

10

20

30

40

50

60

70

80

90

Off

set

in F

ile (

Sect

or)

Dishingy.F: Load

.rsrc

.data

.textHeaders

(c) Dishingy.F malware service executable(alg.exe).

0 1 2 3 4 5 6 7Time

0

20

40

60

80

100

120

Off

set

in F

ile (

Sect

or)

Solitaire: Load

.rsrc

.data

.textHeaders

(d) Solitaire executable.

Figure 5.4: Visualization of disk access patterns for loading program binaries.

105

for a program copy, it is apparent how different a program load looks from a copy of that

same program. In a copy, the accesses do not correspond to section boundaries. The reads

tend to occur as a linear sweep across the file, with one disk access picking up precisely

where the last one left off. There is no overlap between accesses. Aside from the initial

read of the header, the reads are fairly uniform in size, and the number of accesses required

to copy the whole file is fairly small. Finally, there are no redundant reads—each sector is

read precisely once.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Time

0

20

40

60

80

100

120

140

160

180

Off

set

in F

ile (

Sect

or)

Hydraq: Copy

.reloc

.rsrc

.data

.rdata

.textHeaders

(a) Visualization of the copy of the executableassociated with the service of the Hydraq

malware, Rasmon.dll.

0.0 0.5 1.0 1.5 2.0Time

0

20

40

60

80

100

120

Off

set

in F

ile (

Sect

or)

Soliaire: Copy

.rsrc

.data

.textHeaders

(b) Visualization of the copy of the executablegame Solitaire.

Figure 5.5: Visualization of disk access patterns for copying program binaries.

It is these observations which motivated our model of disk access patterns. Unfortu-

nately, as is clear from even just the few examples of Figures 5.4 and 5.5, the patterns are

different enough that a model checking approach will not work accurately. While there

are trends present in the differences between loads and copies, the patterns are still varied

enough that if we were to specify a load given a specific—even common—pattern, we would

miss the loading of other samples. However, if we generalize the model in order to avoid

106

false negatives, as we eventually did in Section 5.1.3, we end up with a specification that

is so generic that it actually models any file access. For that reason, we chose to bolster

our file access model with a machine learning classifier, which would classify a series of disk

reads as belonging to either a file load or a file copy.

5.4.2 Program Binary Load Classifier

In order to classify a series of disk reads as a program binary load, as opposed to a program

copy, we chose the two-class Support Vector Machine (SVM) classifier [22]. The SVM

classifier is a supervised learning algorithm; that is, it trains on a set of labeled data,

generating a model that will then be used to classify new, unlabeled samples. The problem

can be modeled easily as a two-class classifier: our first class is a program load, the second

class is a program copy.

The Linear SVM algorithm is a simplified version of the SVM problem. With this

algorithm, each data point xi is represented by an d-dimensional vector (where each value

in the vector is a feature), and the goal of the training phase is to find the way to separate

the points into two classes with a hyperplane of dimension d − 1. Any hyperplane can be

written as the set of points x satisfying w · x − b = 0, where · is the dot product and w

is the normal vector to the hyperplane. The best hyperplane is the one that results in the

largest margin between the two classes (any points lying on the margin are referred to as

support vectors). A representation of an optimal hyperplane splitting two classes of labeled

data with an maximum margin is shown in Figure 5.6.

Since the hyperplane in the Linear SVM algorithm must be linear, nonlinear SVM pro-

vides an algorithm to find a nonlinear hyperplane in the original feature space. With this

method, the dot product is replaced with a nonlinear kernel function, and the algorithm de-

termines the maximum margin in a transformed, nonlinear, high-dimensional feature space.

One such kernel is the Gaussian Radial Basis Function (RBF), defined by Equation 5.5.

107

Figure 5.6: The optimal hyperplane of a SVM classifier on labeled data.

k(xi,xj) = exp(−γ ‖ xi − xj ‖2), γ > 0 (5.5)

5.4.3 SVM Classifier Implementation

We used the scikit-learn open source library for its SVM classifier in Python [65]. We

wrote Python code to process Dione traces of disk reads, and for each series of reads to

the same file, we generated a feature vector. The scikit-learn library provides several

kernels for the SVM classifier; we found that the Radial Basis Function (RBF), described

in Equation 5.5, worked the best on our data.

We generated labeled training data from several sources on VM0. For the training data

generated from load disk accesses, we created a list of binaries that were loaded at boot time

(including drivers and services), and extracted these disk reads from boot logs generated by

Dione. We also created a corpus of executables, including Windows executables, malware,

and third party software, and generated a script to run each of these executables from the

108

command line. It is important to use the command line because, if using the Windows

explorer, Windows may prefetch binary data from disk in anticipation of read request. For

training data that was accessed with a copy, we generated a corpus of Windows system

executables, Windows drivers, and third party applications. Again, we generated a script

that would copy each of these files via the command line. Each trace was processed with

our Python script, and labeled appropriately as either load or copy.

Each file access is composed of a series of one or more disk reads, such as was visualized in

Figures 5.4 and 5.5. After processing the traces into objects of disk accesses, we constructed

a feature vector for each file access. The feature vector vector was comprised of the following

features:

1. Consecutive Accesses: We count the number of accesses that occur to the next

consecutive sector, that is, to the sector that immediately succeeds the last sector

accessed. We normalize this number by dividing it by the total number of accesses,

resulting in the percent of accesses that are to the next consecutive sector, and then

bucketize the value into one of 10 buckets, each representing 10 percentage points.

2. Skip Amount: We calculate the average number of sectors that are skipped between

accesses: that is, how many sectors exist before or behind the next consecutive sector.

We normalize this value by dividing by the total number of sectors in the file, resulting

in the average percentage of the file that is skipped over between accesses. We then

bucketize the value into one of 20 buckets, each representing 5 percentage points.

3. Average Access Size: We calculate the average size of all accesses, as a percent of

the whole file. We bucketize the value into one of 10 buckets, each representing 10

percentage points.

4. Access Size Standard Deviation: We calculate the standard deviation of the

access size over all accesses, with each access represented as a percentage of the whole

109

file. We bucketize the value into one of 20 buckets, each representing 5 percentage

points.

5. Percent of File Accessed: We calculate the percentage of the file that was accessed

at least once. We bucketize the value into one of 10 buckets, each representing 10

percentage points.

6. Overlapping Accesses: We count the number of accesses that access at least one

sector that has already been accessed. We normalize this number by dividing it by

the total number of accesses, resulting in the percent of accesses that a previously

accessed sector. We then bucketize the value into one of 20 buckets, each representing

5 percentage points.

5.4.4 Results

Training and Testing

We created 918 labeled samples gathered from traces on VM0 for training and evaluation.

The training data is comprised of 737 samples labeled copy and 179 samples labeled load.

In order to evaluate our algorithm, we performed 10-fold cross validation. That is, we

randomly partitioned the sample set into 10 subsets, and used 9 of the subsets for training

and 1 subset for testing. We performed this test 10 times, with a different random partition

each time, in order to evaluate our SVM algorithm. Each resulting label is categorized as

one of the following: True Load (TL), False Load (FL), True Copy (TC), or False Copy

(FC). The results of the 10 rounds of Cross Validation (10CV) testing are shown in the first

10 rows of Table 5.4.

Additionally, we generated labeled traces on VM1, as specified in Section 5.3.1, to be

used purely for testing. That is, we trained our classifier on all 918 samples gathered from

VM0, then tested the classifier on traces from VM1. This dataset, labeled VM1, consists

110

TestID TL FL TC FC % Mislabeled10CV-0 12 0 78 1 1%10CV-1 16 0 72 3 3%10CV-2 21 1 67 2 3%10CV-3 14 0 73 4 4%10CV-4 18 0 72 1 1%10CV-5 13 1 74 3 4%10CV-6 17 0 70 4 4%10CV-7 19 0 67 5 5%10CV-8 12 0 73 6 7%10CV-9 15 0 72 4 4%

10CV-Avg 15.7 0.2 71.8 3.3 3.5%

VM1 122 8 388 12 3.8%

Table 5.4: Results of SVM classifier for 10-fold cross validation (10CV) on Testbed 0dataset and for VM1 dataset.

of 530 samples, of which 396 are labeled copy and 134 are labeled load. The goal of this

experiment is to show that the classifier is independent of the testbed the training traces

are generated on, that it is universally capable of classifying new samples instrumented on

any Windows XP SP3 system. The results of the evaluation of VM1 traces is also shown

in Table 5.4, with “TestID” VM1.

As Table 5.4, there were an average of 3.5% samples mislabeled in the rounds of the

10-fold cross validation, and 3.8% of samples mislabeled in the VM1 dataset, with mislabels

occurring predominantly as loads mislabeled as copied.

Across both tests, there were 55 samples that resulted in a mislabel. If we look at the

samples that caused the mislabeling of loads as copies, the reason behind the mislabel fits

what we intuitively expect. The three causes of sample mislabeling are:

1. Single or Double Read: By far the most common source of mislabeled samples,

80% (44/55) of the samples had an access pattern consisting of only one (33) or two

(11) disk reads. As we can intuit, there is not enough information to reliably classify

an access pattern consisting of one or two reads.

111

2. Large Files: The second category of mislabeled samples was samples that were very

large; this accounted for 7.3% (4/55) of the mislabeled samples. Large files require so

many accesses to disk that, even with some non-sequential and overlapping reads in

the beginning of the file access, eventually it will resort to many uniform sequential

reads as it reads in the bulk of the program binary for loading. The result is that

the access overall looks more like a copy as, among other things, the percentage of

non-sequential and overlapping accesses approaches zero, and the standard deviation

of access size is small. This effect is visualized in Figure 5.7.

3. Windows Drivers: The last category of mislabeled samples is Windows drivers;

this category accounted for 9.1% (5/55) of the mislabeled samples. Each of these

samples had between three and five disk reads, and yet each read was sequential

through the file, roughly uniform in size, did not overlap, and did not pay attention

to section boundaries. The drivers that are presenting this problem have a start value

of 0x1, or SERVICE SYSTEM START. These driver services , which are set to load at

system start, are loaded by the I/O Manager, from the file ntoskrnl.exe. Alterna-

tively, other services and drivers that are set with start value SERVICE AUTO START

or SERVICE DEMAND START are loaded by a different loader later in the boot process,

from the file services.exe. Unfortunately, the loader in ntoskrnl.exe appears to load

files by grabbing sequential blocks of data from disk, and the effect is that the load

looks exactly like a common copy operation. Thus it would be very difficult for any

classifier to correctly label this as a load. An example of this case, the loading of

driver MRxSmb, is visualized in Figure 5.8.

Given the difficulty of identifying a file access type given a pattern of only one or two

accesses, or when the loader itself uses an algorithm that resembles a standard file copy, we

are confident in asserting that even manual labeling by an expert analyst would result in a

similar number of mislabeled accesses.

112

0 2 4 6 8 10 12 14Time

0

200

400

600

800

1000

1200

1400

1600

Off

set

in F

ile (

Sect

or)

Malware 0x8097: Load

(a) File load pattern for malware sample MD50x809705b2f15f33193fb29d204efd7736.

0 10 20 30 40 50 60 70Time

0

500

1000

1500

2000

Off

set

in F

ile (

Sect

or)

Malware 0x8153: Load

(a) File load pattern for malware sample MD50x815350b4f362b7fa0fd192b5a173ce5f.

Figure 5.7: Visualization of disk access patterns for loading program binaries.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Time

0

100

200

300

400

500

600

700

800

900

Off

set

in F

ile (

Sect

or)

MRxSmb: Load

.reloc

.rsrcINITPAGEPAGE5NETPAGE4BROPAGE.data.rdataSECUR.textHeaders

Figure 5.8: Visualization of disk access patterns for Windows driver mrxsmb.sys.

113

Evaluating Malware Service Loads

Finally, having demonstrated that it is possible to classify a file access based on its disk read

pattern, we set out to demonstrate that DCL can label malware that utilizes the service

persistence mechanism to automatically start on a system boot. We accomplished this by

bolstering our model for service load with our SVM classifier, with the goal of demonstrably

showing that the service does indeed load at the start of system boot.

We trained the classifier using the same training data discussed in Section 5.4.4, and then

integrated the SVM classifier with DCL. After the model checker has determined whether

a service has been installed, which binary is associated with the service, and whether that

binary has been accessed during boot, the classifier then attempts to label that file access

as a load. This procedure was used on all samples from VM0 and VM1.

Ideally, it would detect every access as a load, since none of these malware samples

utilize the copy mechanism (though one could conceive of a malware sample which would

perform a file copy/delete operation in order to move around its malicious payload). In

actuality, on both VM0 and VM1, there were 28 and 27 samples, respectively, for which the

service binary was labeled as accessed, but was not labeled a binary load. However, these

two testbed had many samples in common, and it turned out that the same 27 samples were

mislabeled in both datasets (with one additional mislabel present in VM0 but not VM1).

In summary, the load classifier mislabeled 28 samples.

For VM0, 124/152 were correctly labeled a load, and for VM1, 88/115 were correctly

labeled a load. Of these 28 samples, 19 were the same variant (Koutodoor), and 3 others

were another variant (Sality). So, more precisely, only 8 malware variants were incorrectly

labeled.

As with the previously-mislabeled samples, there were two causes of mislabel. Of the

28 mislabeled samples, 8 were mislabeled because of the Single Read problem described

in Section 5.4.4. The other 20 samples were mislabeled because of a combination of the

Double Read and Windows Driver error. Of these 20 samples, 19 were the malware

114

variant Koutodoor. Each of these 19 Koutodoor samples had a file access pattern consisting

of only two or three reads, these two access patterns are visualized in Figure 5.9. Having only

two reads of information would already present a challenge to the classifier, but the problem

is made even more challenging because the file to be loaded is a driver, and Windows driver

loads follow the pattern of a file copy rather than a file load. Again, we posit that an expert

would be unlikely to perform better, given the disk access patterns that were present in the

mislabeled samples.

0.0 0.5 1.0 1.5 2.0Time

0

10

20

30

40

50

60

70

80

90

Off

set

in F

ile (

Sect

or)

Koutodoor: Load

(a) Visualization of the 2-read disk load patternfor a Koutodoor sample.

0.0 0.5 1.0 1.5 2.0 2.5 3.0Time

0

10

20

30

40

50

60

70

80

Off

set

in F

ile (

Sect

or)

Koutodoor: Load

(b) Visualization of the 3-read disk load patternfor a Koutodoor sample.

Figure 5.9: Visualization of the two observed disk access patterns of the load of thedriver associated with malware Koutodoor.

The remaining sample, called Hupigon, is also installed as a driver. Its file access consists

of only two reads, which are both sequential and non-overlapping, as shown in Figure 5.10.

While each of these samples are mislabeled because of the pure SVM classification, the

algorithm could be modified to automatically label file accesses that consist of a single read,

as well as drivers with Start values of 0x1 (SERVICE SYSTEM START), as loads, since this

information is already available to DCL when performing the classification. This would

115

eliminate the false negatives at the risk of future false positives.

0.0 0.5 1.0 1.5 2.0Time

0

10

20

30

40

50

60

70

Off

set

in F

ile (

Sect

or)

Hupigon: Load

Figure 5.10: Visualization of the disk access pattern for the driver file associated withmalware Hupigon.

116

Chapter 6

Directions for Future Work

This dissertation introduces the concept of capability labeling for persistence mechanisms.

There are many directions in which the work could continue from here. First of all, there

are many persistence mechanisms that are capable of being modeled from Dione logs.

Modeling of additional persistence capabilities could include, but is not limited too, auto-

start locations, DLL search order hijacking, and slack-space detection.

Auto-start locations are used by Windows to automatically load binaries on system

start; the Windows service mechanism can be considered one type of auto run location.

In addition to the service mechanism, there are dozens of keys in the registry which store

the path of binaries that should be loaded on system boot. By modeling the creation

of these specific registry keys, a matching file creation, the system reboot, and the post-

boot loading of the relevant binary, this persistence capability could be labeled in malware

samples. Another auto-start technique involves simply storing the binary, or a soft link to

the binary, in certain directories. On boot, Windows loads any binaries in these directories.

A model to detect this capability would consist of a file creation or move in the specific

directories, following by reboot and successful load.

DLL search-order hijacking is another technique used by malware writers to ensure

117

their code gets loaded—though it may occur at any time, not necessarily at system boot. A

problem arises because many programs do not list an absolute path of the dynamic libraries

that they are dependent on. When a program starts, Windows searches for the library in

a pre-defined order. For example, it will first look in the working directory of the binary,

then start looking in pre-defined directories. If malware installs a file with the name of

the expected DLL in a directory that is higher in the search list than the expected DLL,

it will load the malicious DLL instead. A model for the DLL search-order hijacking would

be developed based on the names and absolute paths of files created or moved to a new

location.

Another area ripe for modeling is in detecting slack space writes. Malware often will

write its malicious code into unused sectors of the disk; they may even develop entire file

systems to store their malicious code. This code is loaded by another mechanism (for

example, the malware may modify the Master Boot Record, forcing it to load the code from

the slack space locations on disk). Before this persistence capability could be modeled,

it would require extending Dione to detect reads and writes to slack space. Then, the

specification would model the write to slack space, and likely a mechanism to load the file

from slack space (such as a write to the MBR), followed by reboot and the loading of code

from slack space sectors.

Another avenue of future research is to develop models for these same persistence ca-

pabilities on other Windows systems. Our models were verified, and worked very well for,

32-bit Windows XP SP3. Given that Windows XP will be phased out in favor of newer

versions (e.g., Windows 7, Windows 8), it makes sense to evaluate and modify (if neces-

sary) the models to detect persistence capabilities on malware running on those operating

systems as well. Additionally, the models could be evaluated for 64-bit systems, to see if

they behave differently.

118

Chapter 7

Thesis Summary and

Contributions

In this work, we described persistence capabilities and how malware uses them not only to

persist on disk, but to automatically start once the system boots. We argued that labeling

malware samples by the capabilities they possess, rather than by their family or variant,

is useful in providing valuable information for developing a strategy for removing malware

and preventing its spread to other systems, and also provides a tangible benefit to security

researchers in need of a labeled corpus of malware samples.

The primary goal of this dissertation was to demonstrate the effectiveness of persistence

capability labeling using high-integrity traces of file system and registry events. To that

end, the contributions of this dissertation are as follows:

• We developed and implemented Dione, a disk I/O monitoring and analysis infras-

tructure that provides descriptive, high-integrity traces of file system and registry

activity for a system-under-analysis (SUA). We discussed the challenges of develop-

ing an interposer that would convert raw, low-level metadata to high-level file system

and registry operations. We discussed the particular challenges of instrumenting

119

NTFS, the notoriously complex closed-source file system used by modern Microsoft

Windows computing systems, and we explained how we bridged the semantic and

temporal gaps to result in descriptive file system traces.

• We evaluated both the performance and the accuracy of Dione, demonstrating that

Dione provides 100% accuracy in reconstructing file system operations. Despite this

powerful instrumentation capability, Dione has a minimal effect on the performance

of the system. For most tests, Dione results in a performance penalty of less than

10%—in many cases less than 2%—even when processing complex sequences of file

system operations.

• We implemented DCL: The Dione Capability Labeler. We modeled the three phases

needed to demonstrate persistence: Installation, system boot, and binary loading. We

chose the Windows service mechanism as our persistence capability to model, due to

its common use, its dangerous side effects, and its complexity to model. Using domain

knowledge, we generated specifications for the install of a Windows service, a reboot,

and a file access, and expressed these specifications using Linear Temporal Predicate

Logic (LTPL). We implemented a behavioral model checker, which extracted events

from a Dione trace and compared it to the states of the modeled specification.

• We evaluated our model checker on over one thousand real-world malware samples,

and found that it correctly applied the service installing label to over 99% of the

traces, and the service loading label to over 98% of the traces. And yet, even when

DCL mislabeled a sample, it yielded new and interesting insight into the stealth

behaviors employed by malware.

• Understanding that our model for a program was too generalized, we supported it

with a Support Vector Machine (SVM) classifier. We visualized disk access patterns,

and showed that the disk reads presented different patterns depending on whether the

file access was a binary load or a program copy. We used this information to generate

120

feature vectors for our SVM classifier. We evaluated the classifier using 10-fold cross

validation from labeled samples generated from one testbed, then tested it on data

generated from an entirely different testbed. In both cases, we reported less than 4%

of samples were mislabeled. We applied the same classifier to our malware corpus,

and showed that it correctly labeled 79% of the file accesses as loads. We showed that

the high rate of mislabels was due to one particular variant of malware that appeared

multiple times. Finally, we discussed the causes behind all mislabeled samples in all

three tests, and explained how even an expert analyst performing manual analysis of

the traces would be unlikely to have a lower error rate.

121

Chapter 8

Appendix

8.1 Tables

Table 8.1: Traces gathered from VM0 labeled by Dione as service installing; that is,

they install at least one service.

MD5 Service Name(s) Loaded

08ced09a00dd0940fde58c06aebc7ce1 6to4 1

0cc18acc6d1d65d638b1fa3842761cd5 servernabs4 1

10b155861d8db8bb4f5974b1221207c2 wmicucltsvc 1

0b886590d1a62ffd93583993971d22c8 nwcworkstation 0

a43b0d7a6cf8bd85beebffd85cc56740 winhelp32 1

0038ee2524f8bc7cb329e01cad411f0f forter 0

0e0555bafe4fd3c04dab4ac94c65c602 npf 1

a2769b11fb509d5d136f5b0d8f1765d4kabsctch28278241470203 1

kapfa 1

(Continued on next page)

122

Table8.1 – continued from previous page


a220f4d07d56e2ef6b9dfccd1cd20543 npf 1

089c2785dc08ae217bd0b6f796c10551 mswindows 1

a5e1533b7c58a1b66cb5579c95a3d3c8 mswindows 1

a2d1fb9c7ae9442635ec1c09a8ce72e2 asc3360pr 0

0a852ff18a07539b18f3bf0e50577d66 npf 1

a20d082368334bda4e9724bf13d22002 261d7905 1

069c3ee1c2251f36633b24312fbab119lxrqlvb 0

jmtst 1

11a5b416c137601753cef2af6e0e81f3 npf 1

0a417e05aacdff2d6f18670e2cf465a1 wmicucltsvc 1

006cd94a0d2b6506f924d7062c7f7b19lxrqlvb 0

jmtst 1

09040735f1fc5acc8805583d869ddf49 npf 1

153623fcff9d5e57a098d0ce09637d1c npf 1

012d45a4d6fae317ea11bab576bf8633

eywqojgbztrljdbw 0

gbytqljd 0

nthook 0

0a5c8fab8537fe4804c6d485307e1064 sshnas 1

0b6b39890e1b60a4a4b134277431384d nwcworkstation 0

11c8b2d430cf08612be40015e5986775lxrqlvb 0

jmtst 1

a11dd7c18905e40551123d7ae2bfece6 serahost 1

1500fe465ce684b153b34d771a1d48e4 dnservice 1


123



0c8ac3a3c592409b34108ece2c37cd3f npf 1

006f8e5ccbee29a2cd5dab8a43f8a496lxrqlvb 0

jmtst 1

136e57e2213cc8d9a614f3daabf64c34 rpclookup 1

0813d5fa325caa7cd932b4bd1ddec3b8 npf 1

0138a435b6d6eae4429afe3cc84a0cb5

wuomgezwrojhbztr 0

nthook 0

trljdbvt 0

3a3d624f78c306b200ff4e05247cf66e

eywqojgbztrljdbw 0

nthook 0

trljdbvt 0

12502c798a1f2c86bce55f5662befe30 6to4 1

1167ad095a5167613db6f1772d78e0db ndisfileservices32 1

0925c10addf72ab0aa79ddf7b0e3da16 npf 1

11080175b453b16a79ffad80f1463d44svkp 1

adsl 1

07095643ce2e77d0cf95e3da4a3b0fadlxrqlvb 0

jmtst 1

0bc17bd9bea675932a93cff5c81b92fd cdralw 0

0e8403a5ba76118e898a4ff77087f862e3046863 1

a12c759a 1

3a28f6dd414827004b3f91d9c1e17ced

eywqojgbztrljdbw 0

nthook 0


124



olgeywqo 0

a30a8c8a9d73ae58d2188d32293445ac npf 1

07f98c9f11744ef73c991cf320eecc35 npf 0

09762d12f1cc882436990f4188308f7c npf 1

0b3e46bb919dac00845a8b6941de11ce 6to4 1

156160432e696b0008db0f954544deb8 system information n321 1

a19389ca75ca11d72442662e43c18541 npf 1

0061d7b4c7db34437695853252a82474 wowsub 0

000fcab138a015cc63b6ad84fb6c0f67 npf 1

0bfea159ebff886a381158dc5ec2b841 17872078 1

a19f08b61c0330da67e8ca1bfb6859a0 nvmini 1

0b65ee8ded5c9fd4306efa11324a7105 network adapter events 1

072b7a011d60593e647eb885ee76316a npf 1

a1d1479ca0a4ae87d7c381538210563e mswindows 1

02fd70e0c08aac516b545d790a770846 nwcworkstation 1

0c623f30e5d60b938afc05c7572eee83 npf 1

13b4fd1595decd18fd0029749f6b0635 npf 1

12515cf32218016a3010805241440dd8 svrwsc 1

06af40767d2e2b63dd40333fed474aa0 mswindows 1

0aa70c1dea42cf425e298d5a71553c17 npf 1

3b452fd0cefabc61d8af7de26f432f6b

bztrljebwuomgeyw 0

nthook 0

olgeywqo 0


125



0c7a64bc4b7371bd5e428fc797e46ab1 ndisfileservices32 0

016e17c8670eed81d03119d968905057lxrqlvb 0

jmtst 1

12608baf20f111a91c473cbc57fae9ad bord 007 1

0fecf8dae7b58c012e8d0ed816f186c1 mswindows 0

13757d8fd8ee42a20b21f5bd8e56ee82 mswindows 1

01ab74292433b59b8edbfdba8ba51f17 npf 1

09285219e6f361f89ecd25abec02a4d0 osevent 1

0f89ef3397417ba569336a6bbe9a3ba9 npf 1

14fe7fa2c3cc308087b4b09b2ea05751 allowstop 1

a62d37e832fb9b291d8385d043bae510 03f4745e 1

a52a4ee6f29cf99cccb19b7517553ea9 6e4779c5 1

0072c5497f7eae033ee9934492f17180 abp470n5 0

06f9added57c0987bdea3b9196393cff uu0tjp9k 1

0f068f1d2b014b217773ffaaf79abec2 amsint32 0

a01135e7260cf8397bc2f4bb9b8210e4 npf 1

0f4a96ba2ec5ee4cf6ea456dece4eee7 lfdl 0

102306eda101676228af65e6c1a0c8f5lxrqlvb 0

jmtst 1

106c89ac834064c6957f2c8b97777355

ljebwtomgeywrojg 0

dbvtolge 0

nthook 0

a46ed9d45e9fff42cc4a817d42c9719d npf 1


126



a63a40a09137e06fc2ae40d40ca22f00 npf 1

0ed50455c7ddece3b8989ff5f02dc442 abp470n5 0

0093c98d8f04d3bc60041783dc63e221cdralw 0

ndisfileservices32 1

a0fc9b78a843e90c777123146dcd921b npf 1

09eb5b25f156bb0df7873d7615249212lxrqlvb 0

jmtst 1

107bab0f05ce3ceba6d1a05f062e20d6 ndisfileservices32 1

06e5be21c9f649e24d37969bf15c367c driver 0

077065b0da8266eac9aae4715fe70245 6to4 1

008fed18ab661bfcd26f422c26f1789e rpcremote 1

Table 8.2: Traces gathered from VM1 labeled by Dione as service installing; that is,

they install at least one service.


0ada410bc95d8d3dbe2e143f01edd617 npf 1

0061d7b4c7db34437695853252a82474 wowsub 0

11a5b416c137601753cef2af6e0e81f3 npf 1

111428cf955378b63f7b593f9fc80833 npf 1

025d4c909c5e6b83c011fc331f62f0c8 svschost.dll 1

0925c10addf72ab0aa79ddf7b0e3da16 npf 1


127



0903221e156b71bb50e80990d3fa5abc npf 1

0283667e42491442a6cba01187b3db3axkqef 0

tig 1

13a9d4896dd289a72750d6ee847d9356 npf 1

3ba8960f956cd77a6566d943a175bc8e

ezwrpjhbztrmjecw 0

nthook 0

wrojhbzt 0

0a053b10ea328aae5a9a12e464f1b4ab npf 1

136e57e2213cc8d9a614f3daabf64c34 rpclookup 1

0bc17bd9bea675932a93cff5c81b92fd cdralw 0

1148bb3fc00f3f62523e798fd7e3e055 npf 1

09bdb377e700d0d50bae68ee528f561a 1c10007c 1

069c3ee1c2251f36633b24312fbab119xkqef 0

tig 1

3b79532a38ebb1c0de2de639a9e1398f microsoft device manager 1

0c3d0d2e90dc3531cfcffcaca6347e05 6to4 1

0f29b8855a64c171c4ffdc2612c91f9e class file redirector discovery 1

012d45a4d6fae317ea11bab576bf8633

urmkecwuomhezxrp 0

ljebwuom 0

nthook 0

3a28f6dd414827004b3f91d9c1e17ced

ecwuomhezwrpjhbz 0

mgeywroj 0

nthook 0


128



0b6b39890e1b60a4a4b134277431384d nwcworkstation 0

0b886590d1a62ffd93583993971d22c8 nwcworkstation 0

09040735f1fc5acc8805583d869ddf49 npf 1

0bfea159ebff886a381158dc5ec2b841 17872078 1

02d275b6110444732f9bef39218d1997 amsint32 0

0cf7a24db3a4d3abc10299d76038cab6 oyglqecx 1

07b8c7c08ca21865ad6d1cfbd1fc37a6 ndisfileservices32 1

001cd9f69812b1f164c2a463055a7aca mswindows 1

06f9added57c0987bdea3b9196393cff i6rfdkoe 1

07095643ce2e77d0cf95e3da4a3b0faduwnoyab 0

gngf 1

02fd70e0c08aac516b545d790a770846 nwcworkstation 1

0ecd7673fbecf130c7063de126b344b6 mswindows 1

0cc18acc6d1d65d638b1fa3842761cd5 servernabs4 1

0aa70c1dea42cf425e298d5a71553c17 npf 1

0093c98d8f04d3bc60041783dc63e221cdralw 0

ndisfileservices32 1

09285219e6f361f89ecd25abec02a4d0 osevent 1

0d2970588384bed4bdb1221003b0a45a amsint32 0

0f068f1d2b014b217773ffaaf79abec2 amsint32 0

3a3d624f78c306b200ff4e05247cf66e

ezxrpjhbzurmkecw 0

nthook 0

wrojhbzt 0


129



113579812e50bb160088d9aa30721febuwnoyab 0

gngf 1

153623fcff9d5e57a098d0ce09637d1c npf 1

027bc198706f0f0f3fde0f20b29e3a72uwnoyab 0

gngf 1

06af40767d2e2b63dd40333fed474aa0 mswindows 0

09eb5b25f156bb0df7873d7615249212uwnoyab 0

gngf 1

13bfc2e87e9aac39d2b91ef0f719a50bxkqef 0

tig 1

10b0f432a915e9ccda134d3be14ab9d5 npf 1

02fe132fbd9657a60e9bca1b5a3fe747 aic32p 0

09f6f4c52870deebb4ead267b7a90329 42e44980 1

0f5adb96c2a975648667451a50c13f28 amsint32 0

0eb98691c031f995c054375a1ebf89b7 amsint32 0

0cf4ded67ff076c840fff0b30ed4a423 npf 1

0e29e9f12006fc7534b0e10c66d49005abp470n5 1

mcidrv 2600 6 0 1

00abea875f3260d4430b717062d31258 6to4 0

023b1621a8945f43eaf0c320120cde3c zbsvc 1

06e5be21c9f649e24d37969bf15c367c driver 0

13b83d9fb880bbd3f63ec18613e41e3a npf 1

156160432e696b0008db0f954544deb8 system information n321 1


130



12608baf20f111a91c473cbc57fae9ad bord 007 1

1164c0b138b04ad5cb28f8eca4c12098 npf 1

000fcab138a015cc63b6ad84fb6c0f67 npf 1

017013adb36bd021ddcd651aa54de1af 6to4 1

0714d23d108c582444d812525b53d131uwnoyab 0

gngf 1

0f89ef3397417ba569336a6bbe9a3ba9 npf 1

06822e5d8d318fa244e354e668d9d394 6to4 1

016e17c8670eed81d03119d968905057uwnoyab 0

gngf 1

08ced09a00dd0940fde58c06aebc7ce1 6to4 1

13757d8fd8ee42a20b21f5bd8e56ee82 mswindows 1

102306eda101676228af65e6c1a0c8f5xkqef 0

tig 1

1167ad095a5167613db6f1772d78e0db ndisfileservices32 1

002c1e1520db09cfa07a1adf43bf3dc2uwnoyab 0

gngf 1

0b3e46bb919dac00845a8b6941de11ce 6to4 1

0c623f30e5d60b938afc05c7572eee83 npf 1

0cf08dd774107ee5bd49ab71dc7b5a1f npf 1

0bb4c544c6cf1d758fc26816c2856c95 mswindows 1

14fe7fa2c3cc308087b4b09b2ea05751 allowstop 1

07f547673f8be16535625ba1e076b765 nwcworkstation 1


131



3b452fd0cefabc61d8af7de26f432f6b

hbztrmjecwuomgez 0

jgbztrlj 0

nthook 0

13b4fd1595decd18fd0029749f6b0635 npf 1

107bab0f05ce3ceba6d1a05f062e20d6 ndisfileservices32 1

09762d12f1cc882436990f4188308f7c npf 1

01fb28e02f21e99ec1102b73a5f89874

bzurmkecwuomhezx 0

ljebwuom 0

nthook 0

080eee3fea8212fd8db2709c574171fe npf 1

112eb493b0e7699dad5e13cad88138b5 vryhsoftebosew 0

0b65ee8ded5c9fd4306efa11324a7105 network adapter events 1

109a5a7f19a9531cba90bfaed04de250 npf 1

071069c80b36d8ac51ebaea70a568a40oreans32 1

windowsinfo 0

0e0555bafe4fd3c04dab4ac94c65c602 npf 1

01e1e1b626693032d5c8fed4df5e4c09 npf 1

006cd94a0d2b6506f924d7062c7f7b19xkqef 0

tig 1

0ed50455c7ddece3b8989ff5f02dc442 abp470n5 0

072b7a011d60593e647eb885ee76316a npf 1

12502c798a1f2c86bce55f5662befe30 6to4 1

0197632dfa58d9f60fe97f120536a751 npf 1


132



06e4d5f5bc1edf5b791c8ea28deeffe3uwnoyab 0

gngf 1

0bc7b598af22bd4c7a496b25811dc362 amsint32 0

0e8403a5ba76118e898a4ff77087f862e3046863 1

a12c759a 1

016b4b33d080858e4bf043b997bcfbc4xkqef 0

tig 1

0c7a64bc4b7371bd5e428fc797e46ab1 ndisfileservices32 0

073cb375c3ed2bb46af89f587627f0e3 micorsoft windows service 0

0f36825bea6bf4967403dd9dd5f10a11 amsint32 0

0cf4108444b6c3eaa475fcd3c10c0db5 npf 1

0169402bf2554abf52528a703a4461bc npf 1

01cf51be1a4bacb550c35e165d4453d4 amsint32 0

0d4a8a75f6d260f75aaa1fefbe65eb3d npf 1

14719bcf0d5ea15702283da32c57b34e ntptdb 1

14413b24225013394dca3564c7974e6d npf 1

09519d7988057272c196cf34c1d0cba7 npf 1

0fecf8dae7b58c012e8d0ed816f186c1 mswindows 1

1160efbf0de0792c08f99d46883a19ca npf 1

07f98c9f11744ef73c991cf320eecc35 npf 0

0a852ff18a07539b18f3bf0e50577d66 npf 1

11c8b2d430cf08612be40015e5986775uwnoyab 0

gngf 1


133



090eb88b5a44bb1be4f68a28c04accc9 nwcworkstation 1

0b7fd47a3fe835d6d464a2b429929be5 npf 1

11100da9b9fa6b383671ff8af31d6603 npf 1

dc3fdfde66fffb6cfbec946a237787d8 sysmgr 1

1035cb9f322146a15346716fb68c42a4kaseyaagent 1

kapfa 1

0138a435b6d6eae4429afe3cc84a0cb5

xrpjhbzurmkecwuo 0

nthook 0

rojhbztr 0

01a08b10703e21f150f17e01828f7119 6to4 1

0685b2bf04e60c65be9bd1f667c07c4a amsint32 0

0de2ef8dc54647b0bc3c2d767a2909ba npf 1

078899307e12aa35227c9d9a465bbf91 npf 0

089c2785dc08ae217bd0b6f796c10551 mswindows 1

10c7b70b011f4151b8725c720b178875 mswindows 1

0813d5fa325caa7cd932b4bd1ddec3b8 npf 1

0270b04226c583414450aea969c5a937uwnoyab 0

gngf 1

096a915fd3803433b07734b6c13dcf6akaseyaagent 1

kapfa 1

01ab74292433b59b8edbfdba8ba51f17 npf 1

006f8e5ccbee29a2cd5dab8a43f8a496uwnoyab 0

gngf 1


134



0038ee2524f8bc7cb329e01cad411f0f forter 0

0af1b4ebedea4bcdeebca05053e64882 network confg system 1

0c8ac3a3c592409b34108ece2c37cd3f npf 1

1500fe465ce684b153b34d771a1d48e4 dnservice 1

077065b0da8266eac9aae4715fe70245 6to4 1

08cdc80a346508e6d57efe4a782a9531 pssdk21 0

09889986f3605b9bbf5acca56637c238uwnoyab 0

gngf 1

11080175b453b16a79ffad80f1463d44svkp 1

adsl 1

0e128dd0e16b50431afb51c1d55d84c1 registry system service 1

0e9f0635c60f7225d10397cdd7d22e4d svrwsc 1

0072c5497f7eae033ee9934492f17180 abp470n5 0

12515cf32218016a3010805241440dd8 svrwsc 1

0f4a96ba2ec5ee4cf6ea456dece4eee7 lfdl 0

02936b913d1688fee664ea02c09bcc03 nwcworkstation 1

106c89ac834064c6957f2c8b97777355

zwrpjhbztrmjecwu 0

nthook 0

ojhbztrm 0

0a5c8fab8537fe4804c6d485307e1064 sshnas 1

026896a7449afcdac48323afcd71d3c0 amsint32 0

025778f4315812baadef07fa35b1a443uwnoyab 0

gngf 1


135



11d7b18173a7f131b5f59b92c2d985ea npf 1

12ada88d49498a43cf4b9274f3fb586c amsint32 0

008fed18ab661bfcd26f422c26f1789e rpcremote 1

09a20d0147e3b2fd6ac712e6f88496c5 winxzrssf 1

136

Bibliography

[1] Anubis: Analyzing Unknown Binaries. http://anubis.iseclab.org. Accessed on

September 1, 2013.

[2] Apel, M., Bockermann, C., and Meier, M. Measuring similarity of malware

behavior. In Local Computer Networks (LCN) (2009).

[3] Azmandian, F., Moffie, M., Alshawabkeh, M., Dy, J., Aslam, J., and Kaeli,

D. Virtual machine monitor-based lightweight intrusion detection. SIGOPS Operating

Systems Review 45 (July 2011).

[4] Bailey, M., Oberheide, J., Andersen, J., Mao, Z. M., Jahanian, F., and

Nazario, J. Automated classification and analysis of internet malware. In Recent

Advances in Intrusion Detection (RAID) (2007), Springer-Verlag.

[5] Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neuge-

bauer, R., Pratt, I., and Warfield, A. Xen and the art of virtualization. In Pro-

ceedings of the Nineteenth ACM Symposium on Operating Systems Principles (2003),

SOSP ’03, ACM, pp. 164–177.

[6] Bayer, U., Kruegel, C., Kirda, E., Comparetti, P. M., and Hlauschek,

C. Scalable, behavior-based malware clustering. In Network and Distributed System

Security Symposium (NDSS) (2009).

137

[7] Beaucamps, P., Gnaedig, I., and Marion, J.-Y. Abstraction-based malware

analysis using rewriting and model checking. In Computer Security – ESORICS 2012,

S. Foresti, M. Yung, and F. Martinelli, Eds., vol. 7459 of Lecture Notes in Computer

Science. Springer Berlin Heidelberg, 2012, pp. 806–823.

[8] Bellard, F. Qemu, a fast and portable dynamic translator. In USENIX Annual

Technical Conference (2005), USENIX Association.

[9] Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M., Lavoie, Y., and

Tawbi, N. Static detection of malicious code in executable programs. In Symposium

on Requirements Engineering for Information Security (2001).

[10] Blunden, B. The Rootkit Arsenal: Escape and Evasion in the Dark Corners of the

System. Wordware Publishing, Inc, 2009.

[11] Butler, K. R., McLaughlin, S., and McDaniel, P. D. Rootkit-resistant disks.

In Computer and Communications Security (CCS) (2008), ACM, pp. 403–416.

[12] Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M.,

and Kirda, E. A quantitative study of accuracy in system call-based malware de-

tection. In Proceedings of the 2012 International Symposium on Software Testing and

Analysis (New York, NY, USA, 2012), ISSTA 2012, ACM, pp. 122–132.

[13] Carrier, B. The Sleuth Kit (TSK). http://www.sleuthkit.org. Accessed on

October 1, 2011.

[14] Carrier, B. File System Forensic Analysis. Addison-Wesley, 2005.

[15] Chen, X., Andersen, J., Mao, Z., Bailey, M., and Nazario, J. Towards an

understanding of anti-virtualization and anti-debugging behavior in modern malware.

In Dependable Systems and Networks (DSN) (2008), pp. 177 –186.

138

[16] Chow, J., Garfinkel, T., and Chen, P. M. Decoupling dynamc program analysis

from execution in virtual environments. In USENIX Annual Technical Conference

(2008), USENIX Assoc.

[17] Christodorescu, M., and Jha, S. Testing malware detectors. In Proceedings of

the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis

(ISSTA 2004) (Boston, MA, USA, July 2004), ACM Press, pp. 34–44.

[18] Christodorescu, M., and Jha, S. Static analysis of executables to detect malicious

patterns. Tech. rep., DTIC Document, 2006.

[19] Christodorescu, M., Jha, S., and Kruegel, C. Mining specifications of malicious

behavior. In Proceedings of the the 6th joint meeting of the European software engi-

neering conference and the ACM SIGSOFT symposium on The foundations of software

engineering (New York, NY, USA, 2007), ESEC-FSE ’07, ACM, pp. 5–14.

[20] Christodorescu, M., Jha, S., Seshia, S., Song, D., and Bryant, R. Semantics-

aware malware detection. In Security and Privacy, 2005 IEEE Symposium on (may

2005), pp. 32 – 46.

[21] Chubachi, Y., Shinagawa, T., and Kato, K. Hypervisor-based prevention of

persistent rootkits. In Symposium on Applied Computing (SAC) (2010), ACM.

[22] Cortes, C., and Vapnik, V. Support-vector networks. Machine Learning 20, 3

(1995), 273–297.

[23] Dinaburg, A., Royal, P., Sharif, M., and Lee, W. Ether: malware analysis

via hardware virtualization extensions. In Computer and Communications Security

(2008), ACM.

139

[24] Garfinkel, T., and Rosenblum, M. A virtual machine introspection based archi-

tecture for intrusion detection. In Network and Distributed System Symposium (NDSS)

(2003).

[25] Hennessy, J. L., and Patterson, D. A. Computer Architecture: A Quantitative

Approach, 5 ed. Morgan Kaufmann, 2012.

[26] Hungenberg, T., and Eckert, M. INetSim: Internet Services Simulation Suite.

http://www.inetsim.org. Accessed on September 25, 2012.

[27] Hunt, J., and McIlroy, M. An algorithm for differential file comparison. Tech.

Rep. 41, Bell Laboratories, July 1976.

[28] Huth, M., and Ryan, M. Logic in Computer Science: Modelling and reasoning about

systems, vol. 2. Cambridge University Press, 2004.

[29] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual,

September 2013.

[30] Jang, J., Brumley, D., and Venkataraman, S. Bitshred: feature hashing mal-

ware for scalable triage and semantic analysis. In Computer and communications

security (CCS) (2011), ACM.

[31] Jiang, X., Wang, X., and Xu, D. Stealthy malware detection through VMM-based

“out-of-the-box” semantic view reconstruction. In Computer and communications se-

curity (CCS) (2007), ACM, pp. 128–138.

[32] Johnson, S. ALTER: A Comdeck comparing program. Tech. rep., Bell Laboratories

Internal Memorandum, 1971.

[33] Joshi, A., King, S. T., Dunlap, G. W., and Chen, P. M. Detecting past and

present intrusions through vulnerability-specific predicates. In ACM Symposium on

Operating Systems Principles (SOSP ’05) (2005), pp. 91–104.

140

[34] Kang, M. G., Yin, H., Hanna, S., McCamant, S., and Song, D. Emulating

emulation-resistant malware. In Workshop on Virtual Machine Security (2009), ACM.

[35] Kim, G. H., and Spafford, E. H. The design and implementation of tripwire: a file

system integrity checker. In Computer and Communications Security (CCS) (1994),

ACM, pp. 18–29.

[36] Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H. Detecting ma-

licious code by model checking. In Detection of Intrusions and Malware, and Vul-

nerability Assessment, K. Julisch and C. Kruegel, Eds., vol. 3548 of Lecture Notes in

Computer Science. Springer Berlin Heidelberg, 2005.

[37] Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H. Proactive de-

tection of computer worms using model checking. Dependable and Secure Computing,

IEEE Transactions on 7, 4 (2010).

[38] King, S. T., and Chen, P. M. Backtracking intrusions. In Symposium on Operating

Systems Principles (SOSP) (2003), ACM.

[39] Kirda, E., Kruegel, C., Banks, G., Vigna, G., and Kemmerer, R. A.

Behavior-based spyware detection. In Proceedings of the 15th conference on USENIX

Security Symposium - Volume 15 (Berkeley, CA, USA, 2006), USENIX-SS’06, USENIX

Association.

[40] Kolbitsch, C., Comparetti, P. M., Kruegel, C., Kirda, E., Zhou, X., and

Wang, X. Effective and efficient malware detection at the end host. In Proceedings

of the 18th conference on USENIX security symposium (Berkeley, CA, USA, 2009),

SSYM’09, USENIX Association, pp. 351–366.

[41] Krishnan, S., Snow, K. Z., and Monrose, F. Trail of bytes: efficient support for

forensic analysis. In Computer and Communications Security (2010), ACM.

141

[42] Kroger, F., and Merz, S. Temporal Logic and State Systems. Springer, 2008.

[43] Kruegel, C., Kirda, E., and Bayer, U. TTAnalyze: A tool for analyzing malware.

In European Institute for Computer Antivirus Research (EICAR) (2006).

[44] Kruegel, C., Robertson, W., and Vigna, G. Detecting kernel-level rootkits

through binary analysis. In Computer Security Applications Conference, 2004. 20th

Annual (dec. 2004), pp. 91–100.

[45] Labs, M. Mcafee threats report: First quarter 2013. Report, McAfee Inc., May 2013.

[46] Labs, M. Mcafee threats report: Second quarter 2013. Report, McAfee Inc., May

2013.

[47] Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., and Kirda,

E. Accessminer: using system-centric models for malware protection. In Proceedings

of the 17th ACM conference on Computer and communications security (New York,

NY, USA, 2010), CCS ’10, ACM, pp. 399–412.

[48] Lee, T., and Mody, J. J. Behavioral classification. In European Institute for Com-

puter Antivirus Research (2006).

[49] Li, P., Liu, L., Gao, D., and Reiter, M. K. On challenges in evaluating malware

clustering. In Proceedings of the 13th international conference on Recent advances in

intrusion detection (2010), Springer-Verlag.

[50] Lindorfer, M., Kolbitsch, C., and Comparetti, P. M. Detecting environment-

sensitive malware. In Recent Advances in Intrusion Detection (2011).

[51] Mankin, J., and Kaeli, D. Dione: A flexible disk monitoring and analysis frame-

work. In Proceedings of the 15th International Conference on Research in Attacks,

Intrusions, and Defenses (Berlin, Heidelberg, 2012), RAID’12, Springer-Verlag.

142

[52] Martignoni, L., Stinson, E., Fredrikson, M., Jha, S., and Mitchell, J. C.

A layered architecture for detecting malicious behaviors. In Proceedings of the 11th in-

ternational symposium on Recent Advances in Intrusion Detection (Berlin, Heidelberg,

2008), RAID ’08, Springer-Verlag, pp. 78–97.

[53] Rootkits, part 1 of 3: The growing threat. White-Paper, McAfee Avert Labs, 2006.

[54] Microsoft security intelligence report, June 2012.

[55] Morgan, T. D., and Carter, G. regfi: Windows NT read-only registry li-

brary. http://www.http://projects.sentinelchicken.org/data/doc/reglookup/

regfi/. Accessed on September 1, 2013.

[56] Open Malware. http://www.offensivecomputing.net. Accessed on September 1,

2013.

[57] Paleari, R., Martignoni, L., Roglia, G. F., and Bruschi, D. A fistful of red-

pills: How to automatically generate procedures to detect cpu emulators. In USENIX

Conference on Offensive Technologies (WOOT) (2009), USENIX Assoc.

[58] Payne, B. D., de A. Carbone, M. D. P., and Lee, W. Secure and flexible

monitoring of virtual machines. In Annual Computer Security Applications Conference

(ACSAC) (2007).

[59] Pennington, A. G., Strunk, J. D., Griffin, J. L., Soules, C. A. N., Goodson,

G. R., and Ganger, G. R. Storage-based intrusion detection: Watching storage

activity for suspicious behavior. In USENIX Security Symposium (2003).

[60] Rieck, K., Holz, T., Willems, C., Dussel, P., and Laskov, P. Learning

and classification of malware behavior. In Detection of Intrusions and Malware, and

Vulnerability Assessment (DIMVA) (2008), Springer-Verlag.

143

[61] Rieck, K., Trinius, P., Willems, C., and Holz, T. Automatic analysis of malware

behavior using machine learning. Journal of Computer Security 19 (December 2011).

[62] Russinovich, M. DiskMon for Windows v2.01. http://technet.microsoft.com/

en-us/sysinternals/bb896646. Accessed on November 24, 2011.

[63] Russinovich, M. Inside the registry. http://technet.microsoft.com/library/

cc750583.aspx. Accessed on September 25, 2013.

[64] Russinovich, M. E., and Solomon, D. A. Microsoft Windows Internals, 4 ed.

Microsoft Press, 2005.

[65] scikit-learn: Machine Learning in Python. http://www.scikit-learn.org. Accessed

on September 2, 2013.

[66] Sikorski, M., and Honig, A. Practical Malware Analysis: The Hands-On Guide to

Dissecting Malicious Software. No Starch Press, 2012.

[67] Singh, P., and Lakhotia, A. Static verification of worm and virus behavior in

binary executables using model checking. In Information Assurance Workshop, 2003.

IEEE Systems, Man and Cybernetics Society (june 2003), pp. 298 – 300.

[68] Sitaraman, S., and Venkatesan, S. Forensic analysis of file system intrusions using

improved backtracking. In Proceedings of the Third IEEE International Workshop

on Information Assurance (Washington, DC, USA, 2005), IEEE Computer Society,

pp. 154–163.

[69] Song, F., and Touili, T. Efficient malware detection using model-checking. In FM

2012: Formal Methods, D. Giannakopoulou and D. Mery, Eds., vol. 7436 of Lecture

Notes in Computer Science. Springer Berlin Heidelberg, 2012, pp. 418–433.

[70] Song, F., and Touili, T. Pushdown model checking for malware detection. In

Tools and Algorithms for the Construction and Analysis of Systems, C. Flanagan and

144

B. Konig, Eds., vol. 7214 of Lecture Notes in Computer Science. Springer Berlin Hei-

delberg, 2012, pp. 110–125.

[71] Song, F., and Touili, T. Ltl model-checking for malware detection. In Proceedings

of the 19th international conference on Tools and Algorithms for the Construction and

Analysis of Systems (Berlin, Heidelberg, 2013), TACAS’13, Springer-Verlag, pp. 416–

431.

[72] Stinson, E., and Mitchell, J. C. Characterizing bots’ remote control behavior. In

Proceedings of the 4th international conference on Detection of Intrusions and Malware,

and Vulnerability Assessment (Berlin, Heidelberg, 2007), DIMVA ’07, Springer-Verlag,

pp. 89–108.

[73] Stolfo, S. J., Hershkop, S., Bui, L. H., Ferster, R., and Wang, K. Anomaly

detection in computer security and an application to file system accesses. In Foun-

dataions of Intelligent Systems (ISMIS) (2005).

[74] Sundararaman, S., Sivathanu, G., and Zadok, E. Selective versioning in a secure

disk system. In Proceedings of the 17th conference on Security symposium (2008),

USENIX Association.

[75] http://www.vmware.com. Accessed on May 10, 2012.

[76] The Volatility Framework: Volatile memory artifact extraction utility framework.

http://www.volatilesystems.com/default/volatility. Accessed on May 17,

2012.

[77] Wang, Y.-M., Beck, D., Roussev, R., and Verbowski, C. Detecting stealth

software with strider ghostbuster. In 2005 International Conference on Dependable

Systems and Networks DSN05 (2005), Ieee, pp. 368–377.

145

[78] Willems, C., Holz, T., and Freiling, F. Toward automated dynamic malware

analysis using CWSandbox. IEEE Security Privacy 5, 2 (March-April 2007).

[79] Yan, L.-K., Jayachandra, M., Zhang, M., and Yin, H. V2E: Combining hard-

ware virtualization and software emulation for transparent and extensible malware

analysis. In Virtual Execution Environments (VEE) (2012).

[80] Yin, H., Song, D., Egele, M., Kruegel, C., and Kirda, E. Panorama: capturing

system-wide information flow for malware detection and analysis. In Proceedings of

the 14th ACM conference on Computer and communications security (New York, NY,

USA, 2007), CCS ’07, ACM, pp. 116–127.

[81] Zhang, Y., Gu, Y., Wang, H., and Wang, D. Virtual-machine-based intrusion

detection on file-aware block level storage. In Symposium on Computer Architecture

and High Performance Computing (2006), IEEE Computer Society.

146