gq: practical containment for measuring modern malware systems · third, we present insights from...

† ICSI, 1947 Center St., Ste. 600, Berkeley, CA 94704 * UC San Diego, San Diego, CA § Microsoft Research, Redmond, Washington ‡ UC Berkeley, Berkeley, CA 94704 This work was partially supported by funding provided to ICSI through National Science Foundation grant CNS-0433702 (“CCIED: Collaborative Center for Internet Epidemiology and Defenses”). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the National Science Foundation.

GQ: Practical Containment for Measuring

Modern Malware Systems

Christian Kreibich,† Nicholas Weaver,† Chris Kanich,* Weidong Cui,§ and Vern Paxson†‡

TR-11-002

May 2011

Abstract

Measurement and analysis of modern malware systems such as botnets relies crucially on execution of specimens in a setting that enables them to communicate with other systems across the Internet. Ethical, legal, and technical constraints, however, demand containment of resulting network activity in order to prevent the malware from harming others while still ensuring that it exhibits its inherent behavior. Current best practices in this space are sorely lacking: measurement researchers often treat containment superficially, sometimes ignoring it altogether. In this paper we present GQ, a malware execution “farm” that uses explicit containment primitives to enable analysts to develop containment policies naturally, iteratively, and safely. We discuss GQ’s architecture and implementation, our methodology for developing containment policies, and our experiences gathered from six years of development and operation of the system.

1 Introduction

Despite strenuous efforts of security vendors and re-searchers, the Internet is not becoming a safer place. Re-cent industry reports state that 60,000 new samples ofmalware—software created with the intent to conductmalicious activity—appear every day [19]. In this en-vironment, malware analysts face an increasingly chal-lenging task of separating chaff—uninteresting rehashesof known malware strains—from truly novel behavior,in order to enable close study of the novelty to under-stand its goals and determine how to combat it. Atthe same time, analysts must also follow the activityin known, major outbreaks—akin to infiltrating criminalorganizations—to stay abreast of recent developments.These tasks rely crucially on execution: “letting loose”malware samples in an execution environment to studytheir behavior, sometimes only for seconds at a time(e.g., to understand the bootstrapping behavior of a bi-nary, perhaps in tandem with static analysis), but poten-tially also for weeks on end (e.g., to conduct long-termbotnet monitoring via “infiltration” [13]).

This need to execute malware samples in a laboratorysetting exposes a dilemma. On the one hand, uncon-strained execution of the malware under study will likelyenable it to operate fully as intended, including embark-ing on a large array of possible malicious activities, suchas pumping out spam via email or the web, contribut-ing to denial-of-service floods, conducting click fraud,or obscuring other attacks by proxying malicious traffic.On the other hand, if executed in complete isolation, themalware will almost surely fail to operate as intended,since it cannot contact external servers via its command-and-control (C&C) channel in order to obtain input dataor execution instructions.

Thus, industrial-strength malware analysis requirescontainment: execution environments that on the onehand allow malware to engage with the external worldto the degree required to manifest their core behavior,but doing so in a highly controlled fashion to prevent themalware from inflicting harm on others. Despite the crit-ical importance of proper containment, researchers of-ten treat the matter at best superficially, sometimes ig-noring it altogether. We perceive the root cause for thisshortcoming as arising largely from a lack of technicaltools to realize sufficiently rich containment, along witha perception of containment as a chore rather than anopportunity. To improve the state of affairs, we presentGQ, a malware execution “farm” we have developed andoperated regularly for six years. GQ’s design explic-itly focuses on enabling the development of precise-yet-flexible containment policies for a wide range of mal-ware. Drawing upon GQ’s development and operationover this period, we make three contributions:

First, we develop an architecture for a malware exe-cution platform that provides a natural way to configureand enforce containment policies by means of contain-ment servers, establishing first-order primitives for ef-fecting containment on a per-flow granularity. We beganour work on GQ during the peak of the “worm era” [9],and initially focused on capturing self-propagating net-work worms. Accordingly, we highlight both the sys-tem’s original architectural features that have “stood thetest of time,” as well as the shortcomings the emerged asour usage of GQ began to shift towards broader forms ofmalware.

Second, we describe a principled—albeit largelymanual—approach to the development of containmentpolicies to ensure prevention of harm to others. Our goalhere is to draw attention to the issue, highlight poten-tial avenues for future research, and point out that farfrom being a chore, containment development can ac-tively support the process of studying malware behavior.

Third, we present insights from our extensive oper-ational experience [4, 9, 13, 17, 18, 24] in developingcontainment policies that serve to illustrate the effective-ness, importance, and utility of a principled approach tocontainment.

We begin by reviewing the approaches to contain-ment framed in previous work (§ 2), which provides fur-ther context for our discussion of the issue of contain-ment (§ 3). We then present GQ’s design goals (§ 4),architecture (§ 5), and implementation (§ 6), illustratingboth initial as well as eventual features as a consequenceof our continued desire to separate policy from mech-anism and increase general applicability of the system.Finally, we report on our operational experiences overGQ’s lifetime (§ 7), discuss our current methodology fordeveloping containment policies and containment’s gen-eral feasibility (§ 8), and offer concluding thoughts (§ 9).

2 Related Work

A large body of prior work focuses on malware execu-tion for the purpose of understanding the modi operandiof binary specimens. Historically, “honeyfarms” werethe first platforms supporting large-scale malware execu-tion. These systems focused in particular on worm ex-ecution, with the “honey” facet of the name referring toa honeypot-style external appearance of presenting a setof infectible systems in order to trap specimens of globalworm outbreaks early in the worm’s propagation. Vrableet. al designed the Potemkin honeyfarm as a (highly)purpose-specific prototype of a worm honeyfarm that ex-plored the scalability constraints present when simulat-ing hundreds of victim machines on a single physical ma-chine [28]. Their work also framed the core issue of con-tainment, but leveraged a major simplification that (sim-

1

ple) worms can potentially provide, namely that one canobserve worm propagation even when employing a veryconservative containment policy of redirecting outboundconnections to additional analysis machines in the hon-eyfarm. Jian and Xu’s Collapsar, a virtualization-basedhoneyfarm, targeted the transparent integration of hon-eypots into a distributed set of production networks [11].The authors focused on the functional aspects of the sys-tem, and while Collapsar’s design supports the notion ofassurance modules to implement containment policies,in practice these simply relied on throttling and use of“Inline Snort” to block known attacks.

With the end of the “worm era,” researchers shifted fo-cus from worm-like characteristics to understanding mal-ware activity more broadly. Bayer et al. recognized thesignificance of the containment problem in a paper intro-ducing the Anubis platform [2], which has served as thebasis for numerous follow-up studies: “Of course, run-ning malware directly on the analyst’s computer, whichis probably connected to the Internet, could be disas-trous as the malicious code could easily escape and in-fect other machines.” The remainder of the paper, how-ever, focuses on technical aspects of monitoring the spec-imen’s interaction with the OS; they do not address thesafety aspect further.

The CWSandbox environment shares the dynamicanalysis goals of Anubis, using virtualization or po-tentially “raw iron” (non-virtualized) execution insteadof full-system emulation [30]. The authors state thatCWSandbox’s reports include “the network connectionsit opened and the information it sent”, and simply ac-knowledge that “using the CWSandbox might cause someharm to other machines connected to the network.”

The Norman SandBox [22] similarly relies on mal-ware execution in an instrumented environment to un-derstand behavior. As a proprietary system, we cannotreadily infer its containment model, but according to itsdescription it emulates the “network and even the Inter-net.”

Chen et al. [7] presented a “universe” management ab-straction for honeyfarms based on Potemkin, allowingmultiple infections to spread independently in a farm.Their universes represent sets of causally related honey-pots created due to mutual communication. GQ providessuch functionality in a more general fashion via its sup-port for independently operating subfarms. In contrastto Potemkin’s universes, GQ’s subfarms do not implythat malware instances executing within the same sub-farm necessarily can communicate.

More recently, John et al. presented Botlab [12], ananalysis platform for studying spamming botnets. Bot-lab shares commonalities with GQ, including long-termexecution of bot binaries, in their case particularly forthe purpose of collecting spam campaign intelligence.

They also recognize the importance of executing mal-ware safely. In contrast to GQ, the authors designed Bot-lab specifically to study email spam, for which they em-ployed a static containment policy: “[T]raffic destined toprivileged ports, or ports associated with known vulnera-bilities, is automatically dropped, and limits are enforcedon connections rates, data transmission, and the totalwindow of time in which we allow a binary to execute.”In addition, the authors ultimately concluded that con-tainment poses an intractable problem, and ceased pur-suing their study: “Moreover, even transmitting a “be-nign” C&C message could cause other, non-Botlab botnodes to transmit harmful traffic. Given these concerns,we have disabled the crawling and network fingerprint-ing aspects of Botlab, and therefore are no longer ana-lyzing or incorporating new binaries.”

Researchers have also explored malware execution inthe presence of complete containment (no external con-nectivity). These efforts rely on mechanisms that aimto mimic the fidelity of communication between botsand external machines. The SLINGbot system emulatesbot behavior rather than allowing the traffic of live mal-ware on the commodity Internet [10]; clearly, such anapproach can impose significant limits on the obtainabledepth of analysis. The DETER testbed [21] relies uponexperimenters and testbed operators to come to a con-sensus on the specific containment policy for a given ex-periment. Barford and Blodgett’s Botnet Mesocosms runreal bot binaries, but emulate the C&C infrastructure inorder to evaluate bot behavior without Internet connec-tivity [1]. Most recently, Clavet et al. described theirin-the-lab botnet experimentation testbed with a similarcontainment policy to Botnet Mesocosms, but with thecapability of running thousands of bot binaries at onetime [5]. While all of these systems guarantee that mali-cious traffic does not interact with the commodity Inter-net, they required substantial effort for enabling a usefullevel of operational fidelity and, in the process, necessar-ily sacrifice the fidelity that precisely controlled contain-ment can offer.

3 The Problem of Containment

The frequently light (at best) treatment of the centralproblem of executing malware safely is to some extentunderstandable. Established tool-chains for developingcontainment do not exist, so the resulting containmentmechanisms are ad-hoc or suboptimal, leaking undesir-able activity or preventing desired activity. This defi-ciency renders containment implementation and verifica-tion a difficult, time-consuming problem—one that fre-quently gets in the way of conducting the intended exper-iment. GQ aims to fill this gap by making containmentpolicy development a natural component of the malware

2

execution process.Development of precise containment requires a bal-

ance of (i) allowing the outside interactions necessary toprovide the required realism, (ii) containing any mali-cious activity, all while (iii) convincingly pretending tothe sample under study that it enjoys unhampered in-teraction with external systems. Lacking rigor in thisprocess raises ethical issues, since knowingly permittingsuch activity can cost third parties significant resourcesin terms of cleaning up infections, filtering spam, fend-ing off attacks, or even financial loss. It also may exposethe analyst to potential legal consequences, or at the veryleast to abuse complaints. Finally, technical considera-tions matter as well: prolonged uncontained maliciousactivity can lead to blacklisting of the analysis machinesby the existing malware-fighting infrastructure. Suchcountermeasures can prevent the analyst from acquiringunbiased measurements as they may filter key activityelsewhere in the network, or because the malware itselfchanges its behavior once it detects suspiciously dimin-ishing connectivity. We might consider full containmentas a viable option for experiments such as “dry-runs” ofbotnet takedowns [5], but we find that frequently the dy-namically changing nature of real-world interactions andthe actual content of delivered instructions holds centralinterest for analysis.

Thus, we argue for the importance of avoiding theperception of containment as a resource-draining chore.Indeed, we find that developing containment from adefault-deny stance, iteratively permitting understoodactivity, in fact provides an excellent way to understandthe behavioral envelope of a sample under study. We dis-cuss this further in § 8.

Finally, we emphasize that we do not mean for GQto replace the use of static or dynamic malware analy-sis tools. Rather, we argue that researchers should de-ploy cutting-edge malware analysis tools inside a systemsuch as GQ, to provide a mature containment environ-ment in which they can employ sophisticated malwareinstrumentation safely and robustly.

4 Initial & Eventual Design Goals

As we mentioned above, our initial design for GQ tar-geted a classic, worm-focused honeyfarm architecture.Over the past years, we focused on evolving the sys-tem into a generic malware farm—a platform suitablefor hosting all manner of malware-driven research safely,routinely, and at scale. In the following we outline origi-nal design goals as well as those we came to embrace atlater stages of building the system.

Versatility. In contrast to previous work, GQ mustserve as a platform for a broad range of malware exper-imentation, with no built-in assumptions about inmates

belonging to certain classes of malware, or acting ex-clusively as servers (realizing traditional honeyfarms) orclients (realizing honeycrawlers [29]). Our initial focuson worms imposed design constraints that became im-practical once we wanted to experiment with non-self-propagating malware. Similarly, focus on a particularclass of botnets (say those using IRC as C&C, or do-main generation algorithms for locating C&C servers viaDNS), restricts versatility.

Separation of policy and mechanism. This goalclosely relates to the former, but we needed to investconsiderable thought in order to achieve it in a satis-fying fashion. In our original honeyfarm architecture ,we tightly interwove mechanism (a highly customizedpacket forwarder) and policy (worm containment) in for-warding code that we could adapt only with difficulty andby imposing system downtime. GQ must clearly separatea comparatively stable containment enforcement mecha-nism from an adaptable containment policy definition.

Verifiable containment. GQ must provide completecontrol over all inbound and outbound communication.This remains as true now as it was at the outset of GQ’sdevelopment . Depending on a flow’s source, destination,and content the system must allow dropping, reflecting,redirecting, rewriting, and throttling the flow flexibly de-pending on current execution context. Moreover, mech-anisms should exist to verify that developed policies op-erate as intended.

Multiple execution platforms. Malware frequentlychecks the nature of its execution platform in order todetermine whether it likely finds itself running in an anal-ysis environment or as a successful infection in the wild.For example, malware often considers evidence of virtu-alization a tell-tale sign for operation in a malware anal-ysis environment [8] and ceases normal operation. GQmust support virtualized, emulated, and raw iron execu-tion as desired, on a per-inmate granularity, in a fashiontransparent to the containment process.

Multiple experiments. While extending GQ weincreasingly noticed the need for running multiplemalware-driven setups at the same time. The originalsingle-experiment design made it difficult to accommo-date this. For GQ, we aim to provide a platform for inde-pendently operating experiments involving different setsof machines, containment policies, and operational im-portance. For example, we have found it exceedinglyuseful to run a “deployment” and “development” setupfor running spambots; one for continuously harvestingspam from specimen for which we have developed ma-ture containment policies, the other one for developingsuch policies on freshly obtained samples.

Threat model. We need to consider the way in whichmachines on our inmate network become infected, aswell as the behavioral envelope once a machine has be-

3

Gateway

Figure 1: GQ’s overall architecture.

come infected. Regarding the former, if an experiment sodesires, GQ fully supports the classic honeyfarm modelin which external traffic directly infects honeypot ma-chines. Today, we find intentional infection with mal-ware samples collected from external sources more rele-vant and useful. We describe our mechanism for facili-tating such infections in § 6.6. Regarding post-infectionbehavior, we adopt the position of other systems execut-ing malware in virtualized environments and assume thatthe program cannot break out of the virtual environment.The malware may corrupt user and kernel space on theinfected host arbitrarily and conduct arbitrary networkI/O.

5 GQ’s Architecture

5.1 Overall layout

We show GQ’s conceptually simple architecture in Fig-ure 1. A central gateway sits between the outside net-work and the internal machinery, separating it into aninmate network on which we host infected machines—inmates—and a management network that we use to ac-cess virtual machine monitors (VMMs) and other farmcontrol infrastructure.

Within the gateway, a custom packet forwarding logicenables and controls traffic flow between the farm andthe outside network, as well as among machines withinthe farm. This takes several forms. First, a learn-ing VLAN bridge incrementally understands and enablescrosstalk among machines on the inmate network as re-quired subject to the containment policy in effect. Sec-ond, we devote the majority of the forwarding logic to theredirection of outgoing and incoming flows to a contain-ment server, which decides what containment policies toenforce on a given flow, and to enforce these contain-ment policies. Finally, a safety filter ensures that the rateof connections across destinations and to a given desti-nation never exceeds a configurable threshold.

5.2 Inmate hosting & isolationWe employ three different inmate-hosting technolo-gies: full-system virtualization (using VMware ESXservers), unvirtualized “raw iron” execution (more onthis in § 6.4), and QEMU for customized emulation [4].The hosting technology we employ for a given inmateremains transparent to the gateway.

GQ enforces inmate isolation at the link layer: eachinmate sends and receives traffic on a VLAN ID uniqueon the inmate network. VLAN IDs thus serve as handyidentifiers for individual inmates. Physical and, wherefeasible, virtual switches (for simplicity not shown inFigure 1) behind the gateway enforce the per-inmateVLAN assignment, which our inmate creation/deletionprocedure automatically picks and releases from theavailable VLAN ID pool.

The gateway’s packet forwarding logic enables con-nectivity to the outside Internet as permissible, while se-lectively enabling inmate crosstalk as required.

5.3 Network layoutWe use network address translation for all inmates. Theinmate network uses non-routable RFC 1918 addresses,which the gateway maps to and from a configurableglobal address space. This provides one level of indi-rection and allows us to keep the inmate configurationboth simple and dynamic at inmate boot time. To facili-tate this, the inmate network provides a small number ofinfrastructure services for the inmates, including DHCPand recursive DNS resolvers. The gateway places theseservices into a restricted broadcast domain, comprisingall machines the inmates can reach by default. In addi-tion, infrastructure services include experiment-specificrequirements such as SMTP sinks, though these do notbelong to the broadcast domain.

5.4 Containment serversGQ’s design hinges crucially on the notion of an explicitcontainment server that determines each flow’s contain-

4

(a) Forward (b) Rate-limit (c) Drop

(d) Redirect (e) Reflect (f) Rewrite

Figure 2: GQ’s flow manipulation modes, illustrated on flows initiated by an inmate.

ment policy and one of the major improvements over ex-isting systems. Our original design included no equiv-alent; instead, the packet router in the gateway aloneimplemented the containment policy. Its design resultsfrom our desire to reduce the gateway’s forwarding logicto an infrequently changing mechanism to the greatestextent possible, while keeping containment policies con-figurable and adaptable. This contrasts starkly with ourinitial design, which intermixed a fixed nine-step con-tainment policy with the mechanisms for enforcing thatpolicy, all inside the gateway.

The containment server is both a machine—typically aVM, for convenience—and a standard application serverrunning on the machine. Conceptually, the combinationof the gateway’s packet router and the containment serverrealizes a transparent application-layer proxy for all traf-fic entering and leaving the inmate network. The gate-way’s packet router informs the containment server ofall new flows to or from inmates, enabling it to issue con-tainment verdicts on the flows. The router subsequentlyenforces these verdicts, which enable both endpoint con-trol and content control. Endpoint control happens at thebeginning of a flow’s lifespan and allows us to (i) for-ward flows unchanged, (ii) drop them entirely, (iii) re-flect them to a sink server or alternative custom proxies,(iv) redirect them to a different target, or (v) rate-limitthem. Once the gateway has established connectivitybetween the intended endpoints, it alone enforces end-point control, conserving resources on the containmentserver. Content control, on the other hand, remains feasi-ble throughout a flow’s lifespan and works by fully prox-ying the flow content through the containment server. Wethus can (i) rewrite a flow, (ii) terminate it when it wouldnormally continue still, or (iii) prolong it by injecting ad-ditional content into a flow that would otherwise already

terminate. Note that the connection’s destination neednot necessarily exist: the containment server can simplyimpersonate one by creating response traffic as needed.Endpoint and content control need not mutually excludeeach other; for example, it can make sense to redirect aflow to a different destination while also rewriting someof its contents. Figure 2 illustrates the flow manipulationmodes. Note how the containment server remains pas-sive during containment enforcement except in case ofcontent rewriting (Figure 2(f)).

Besides flow-related containment management, thecontainment server also controls the inmates’ life-cycle.As the containment server witnesses all network-levelactivity of an inmate, it can react to the presence—andabsence—of such network events using activity triggers.These triggers can terminate the inmate, reboot it, or re-vert it to a clean state for subsequent reinfection. Forexample, a typical life-cycle policy for a spambot mightautomatically restart the bot once it has ceased spammingactivity for more than an hour. Another standard policy isto terminate an inmate that has begun to flood a particularrecipient with more than a certain number of connectionrequests per minute.

5.5 Inmate controlTo realize inmate life-cycle control, we require a compo-nent in the system that can create inmates, expire them,or adjust their life-cycle state by starting, stopping, or re-verting them to a clean state. In GQ’s architecture, thecontainment servers issue life-cycle actions to an inmatecontroller located centrally on the gateway. Since wekeep containment servers logically and physically sepa-rate from the gateway, we require a mechanism allowingthem to communicate with the inmate controller. We re-

5

Gateway

Figure 3: A possible subfarm scenario in GQ: three independent routers handle disjunct sets of VLAN IDs, thus enabling parallelexperiments. For sake of clarity, we do not show per-subfarm infrastructure services individually.

alize this by maintaining an additional network interfaceonly on the containment servers that allows them to in-teract directly, out-of-band of the inmate-related networkactivity, with the gateway. This controller understandsthe inmate hosting infrastructure and abstracts physicaldetails of the inmates such as their hosting server andwhether they run virtualized or on raw iron. The con-troller requires only the inmate’s VLAN ID in order toidentify the target of a life-cycle action.

5.6 SubfarmsThe combination of central gateway and inmate networkparallelizes naturally: instead of a single packet forward-ing logic handling the entire range of VLAN IDs on theinmate network, multiple instances of the packet routerhandle disjunct sets of VLAN IDs. This creates inde-pendent, self-contained, non-interfering habitats for dif-ferent experiments. We call these habitats subfarms.Subfarms generally have their own containment server(thereby providing a subfarm-specific containment pol-icy) and infrastructure services, but can vary widely inexperiment-specific additional services. Shared networkresources, as mentioned by Chen et al. [7], are possi-ble, but we have not found them necessary or useful inpractice—the very reason for creating independent sub-farms is typically the need to configure systems differ-ently or record separate data sets. Figure 3 illustratessubfarms.

5.7 Monitoring and packet trace archivalWe use a two-pronged packet trace recording strategy.First, the packet routers record each subfarm’s activ-ity from the inmate network’s perspective, i.e., with un-routable addresses for the inmates. This has the benefitof providing some extent of immediate anonymity in thepacket traces, simplifying data sharing without the risk ofaccidentally leaking our global address ranges. Second,system-wide trace recording happens at the upstream in-terface to the outside network, thus capturing all activ-

ity globally and prior to any gateway-induced filtering orrewriting.

6 Implementation

6.1 Packet routingWe implement the subfarm’s packet routers on the gate-way using the Click modular router [15]. We separateeach subfarm’s Click configuration into a module con-taining invariant, reusable forwarding elements sharedacross all subfarms (400 lines) and a small (40 lines) con-figuration module that specifies each subfarm’s uniqueaspects, including the external IP address ranges, setof VLAN ID ranges from the inmate network to route,and logfile naming. The custom Click elements for theVLAN bridge, NAT, containment server flow handling,and trace recording comprise around 5000 lines of C++.

6.2 Containment serverWe realize the containment server in Python, using a pre-forked, multi-threaded service model. The server logiccomprises roughly 2200 lines, with 600 lines for theevent trigger logic. The containment policies, includingcontent rewriters, add up to 1000 lines.

Shimming protocol. To couple the gateway’s packetrouter to the containment server, we need a way tomap/unmap arbitrary flows to/from the single addressand port of the containment server. We realize this us-ing a shimming protocol that shares conceptual similar-ity with the SOCKS protocol [14]: upon redirection tothe containment server, the gateway injects a contain-ment request shim message with flow meta-informationinto the flow. The containment server expects this meta-information and uses it to assign a containment policyto the flow. The containment server similarly conveysthe containment verdict back to gateway using a con-tainment response shim, which the packet router stripsfrom the flow before relaying subsequent content back tothe endpoint. For TCP, the shim injection and removal

6

VLAN ID

Magic number Length Type Vers.

Resp. IP

Orig. Port Resp. Port

Orig. IP

Nonce Port

0 2 4 6 80

24

8

16

(a) Request Shim

Containment Verdict

Magic number Length Type Vers.

Resp. IP

Orig. Port Resp. Port

Orig. IP

0 2 4 6 80

24

8

16

56

Policy Name

Textual Annotation

(b) Response Shim

Figure 4: Shim protocol message structure.

requires bumping and unbumping of sequence and ac-knowledgement numbers; for UDP it requires paddingthe datagrams with the respective shims.1 The requestshim requires 24 bytes and begins with a preamble of 8bytes containing a magic number (4 bytes), the messagelength (2 bytes), a message type indicator (1 byte), and ashim protocol version number (1 byte), followed by theoriginal flow’s endpoint four-tuple (2 · 4 bytes plus 2 · 2bytes), the VLAN ID of the sending/receiving inmate (2bytes) and a nonce port (2 bytes) on which the gatewaywill expect a possible subsequent outbound connectionfrom the containment server, in case it needs to rewritethe flow continuously. The response shim can vary inlength and requires at least 56 bytes, consisting of a sim-ilar preamble (8 bytes), the resulting endpoint four-tuple(12 bytes), the containment verdict (FORWARD, LIMIT,DROP, REDIRECT, REFLECT, or REWRITE as per Fig-ure 2, possibly in feasible combination) as a numericopcode (4 bytes), a name tag for the resulting contain-ment policy (32 bytes) and a possible additional annota-tion string to clarify the context in which the containmentserver decided the containment verdict. Figure 4 summa-rizes the message structure.

An example of this containment procedure is shownin Figure 5. An inmate initiates the TCP handshake foran upcoming HTTP request (Ê), which the gateway redi-rects to the containment server’s fixed address and port,synthesizing a full TCP handshake. Upon completion,the gateway injects into the TCP stream the containmentrequest shim (Ë). The containment server in turn sendsa containment response shim (Ì) including the contain-ment verdict for the flow, in this case a REWRITE. To

1For large UDP datagrams this can introduce the need for fragmen-tation.

serve as a transparent proxy rewriting the flow content,it also establishes a second TCP connection to the tar-get via the gateway and the nonce port received as partof the containment request shim. The gateway forwardsthis TCP SYN to the target and relays the handshake be-tween target and inmate (Í). The inmate completes itsconnection establishment and sends the HTTP request,which the gateway relays on to the containment serveras part of the same connection that it used to exchangecontainment information, bumping the sequence numberaccordingly (Î). The containment server rewrites the re-quest as needed (here changing the requested resource toanother one) and forwards it on to the target, via the gate-way (Ï). The target’s response travels in the opposite di-rection and arrives at the containment server, which againrewrites it (here to create the illusion of a non-existing re-source) and relays it back to the inmate (Ð). (For brevity,we do not show the subsequent connection tear-downs inthe figure.)

Policy structure. We codify containment policies inPython classes, which the containment server instantiatesby keying on VLAN ID ranges and applies on a per-flowbasis. We base endpoint control upon the flow’s four-tuple, and content control depends on the actual data sentin a given flow. Object-oriented implementation reuseand specialization lends itself well to the establishmentof a hierarchy of containment policies. From a base classimplementing a default-deny policy we derive classes foreach endpoint control verdict, and from these specializefurther, for example to a base class for spambots that re-flects all outbound SMTP traffic. The containment serversimply takes the name of the class implementing the ap-plicable containment policy into the response shim (re-call Figure 4(b)) in order to convey it to the gateway.

Configuration. The codified containment policiesare customizable through a configuration file. This fileserves four purposes: (i) it determines the initial assign-ment of a policy to a given inmate’s traffic, (ii) it typicallyspecifies the individual or set of malware binaries withwhich we would like to infect a given inmate over thecourse of its life-cycles,2 (iii) it specifies activity triggers(e.g., revert and reinfect the inmate once the containmentserver has observed no outbound activity for 30 minutes)and (iv) it specifies IP addresses and port numbers of in-frastructure services in a subfarm (e.g., where to find aparticular SMTP sink or HTTP proxy). Figure 6 shows asample configuration snippet.

2As indicated earlier, we typically specify precisely which sampleto infect an inmate with. However, GQ equally supports classic hon-eypot constellations in which dynamic circumstances (such as a webdrive-by) determine the nature of the infection.

7

Tim

eSYN-ACK

ACK

REQ SHIM10.0.0.23:1234 192.150.187.12:80VLAN 12, Nonce port 42

RSP SHIM

10.0.0.23:1234 192.150.187.12:80

REWRITE, Policy ID, Annotations

ACK

SYN10.0.0.23:1234 192.150.187.12:80 SYN10.0.0.23:1234 10.3.0.1:6666

GET bot.exe HTTP/1.1

SYN-ACK

10.0.0.23:1234 192.150.187.12:80

GET bot.exe HTTP/1.1

SEQ += |REQ SHIM|10.0.0.23:1234 10.3.0.1:6666

SYN

10.4.0.1:42 10.3.0.1:2345

SYN10.0.0.23:1234 192.150.187.12:80

SYN-ACK

ACK

10.4.0.1:42 10.3.0.1:2345

GET cleanup.exe HTTP/1.1

SEQ -= |RSP SHIM|10.0.0.23:1234

HTTP/1.1 200 OK

SEQ += |REQ SHIM|10.4.0.1:42 10.3.0.1:2345

10.0.0.23:1234 10.3.0.1:6666

HTTP/1.1 404 NOT FOUND10.0.0.23:1234 192.150.187.12:80

HTTP/1.1 404 NOT FOUND

SEQ -= |RSP SHIM|

Figure 5: TCP packet flow through gateway and containment server in a REWRITE containment. See § 5.4 for details.

6.3 Inmate controller & sinks

We realize the inmate controller as a simple message re-ceiver that interprets the life-cycle control instructionscoming in from the containment servers. We use a sim-ple text-based message format to convey the instructions.At startup, the controller scans the VMMs deployed onthe management network to assemble an inventory of in-mates and their VLAN IDs. We implement the controllerin 470 lines of Python. The support scripts for manag-ing inmate VMs and raw iron machines abstract concretevirtualization platforms (such as VMware’s ESX controlscripts), in 2000 lines of shell code. We likewise imple-ment GQ’s sink servers in Python. Our simplest catch-all server accepts arbitrary input and requires a mere 100lines of code, while our most complex sink constitutesa fidelity-adjustable SMTP server that can grab greetingbanners from the actual target and drop a configurablenumber of connections. It needs 800 lines, using the

same pre-forked multithreaded back-end as the contain-ment server.

6.4 Raw iron managementResearchers have long known of the use of VM-detectinganti-forensics capabilities in malware toolkits [8]. Usinga variety of techniques, the malware attempts to iden-tify virtualized execution, which it perceives as a tell-talesign of attempted forensic malware analysis. Rather thanattempt to develop specific countermeasures (such as re-naming interfaces or hiding other VM artifacts), GQ by-passes this problem by providing a group of identicallyconfigured small form-factor x86 systems running on anetwork-controlled power sequencer to enable remote,OS-independent reboots. Like our virtualized inmates,each raw iron system uses its own exclusive VLAN.From the gateway’s viewpoint, the only difference in theinmate hosting platform is the presence of an additional

8

[VLAN 16-17]Decider = RustockInfection = rustock.100921.*.exe

[VLAN 18-19]Decider = GrumInfection = grum.100818.*.exe

[VLAN 16-19]Trigger = *:25/tcp / 30min < 1 -> revert

[Autoinfect]Address = 10.9.8.7Port = 6543

[BannerSmtpSink]Address = 10.3.1.4Port = 2526

Figure 6: Example of a containment server configuration file:GQ will infect the inmates on VLAN IDs 16 and 17 iterativelywith binaries from the rustock.100921.*.exe batch, us-ing the “Rustock” containment policy. Similarly, the contain-ment server will deploy Grum binaries on inmates with VLANIDs 18 and 19 which contain the resulting infections accord-ingly. On all four VLAN IDs a lifecycle trigger reverts inmatesto a clean state whenever the number of flows to TCP port 25in 30 minutes hits zero. The last two sections specify the loca-tion in the subfarm of an auto-infection server and of an SMTPsink, respectively.

DHCP service and a separate reimaging process, exe-cuted by a dedicated raw iron controller.

The raw iron systems’ boot configuration alternatesbetween booting over the network (leading to a subse-quent OS image transfer and installation) and bootingfrom local disk when network booting fails (leading tonormal inmate execution). The raw iron controller hasa network interface on a VLAN trunk which includesall raw iron VLAN IDs. The controller runs a separateClick configuration which creates a virtual network in-terface that multiplexes and demultiplexes traffic acrossthe VLAN trunk, enabling unmodified Linux servers, in-cluding DHCP, TFTP, and NFS, to access the raw ironVLANs transparently.

To reimage a system, we configure the controller’sDHCP server to send PXE boot information in reply toDHCP requests from the inmate. The power sequencerthen reboots the inmate and the subsequent network bootinstalls a small Linux boot image. This image, oncebooted, downloads a compressed Windows image andwrites it to disk using NTFS-aware imaging tools. Next,controller disables network-booting in the DHCP serverand again power-cycles the raw iron inmate. The sub-sequent reboot yields a local booting of the locally in-stalled OS image. This process takes around 6 minutesper reimaging cycle, sufficiently fast for hosting bots orother malcode. We use a similar mechanism to capturedisk images from a suitably configured OS image.

Inmate Activity===============

Active subfarms: Botfarm, Clickbots

Subfarm ’Botfarm’ [Containment server VLAN 11]------------------------------------------------------

Grum [xxx.yyy.0.170/10.3.9.241, VLAN 18]----------------------------------------------------FORWARD- C&C port target port #flows

50.8.207.91.SteepHost.Net http 682

REFLECT- full SMTP containment target port #flows

*.*.*.* smtp 144997

REWRITE- autoinfection 6f007d640b3d5786a84dedf026c1507c

target port #flows10.9.8.7 6543 6

SMTP sessions 93340SMTP DATA transfers 93112

Rustock [xxx.yyy.0.164/10.3.9.247, VLAN 7]----------------------------------------------------FORWARD- C&C port target port #flows

*.*.*.* https 287

REFLECT- simple SMTP containment target port #flows

*.*.*.* smtp 280620

REWRITE- C&C filtering target port #flows

*.*.*.* http 788

- autoinfection 0740eb0e408ea0b6cfa95981273f89bbtarget port #flows

10.9.8.7 6543 21

SMTP sessions 182653SMTP DATA transfers 284765

Figure 7: Section of actual report generated by the monitoringsetup, for the “Botfarm” subfarm. GQ contains one inmate us-ing the “Grum” policy, the other using the “Rustock” one. TheREWRITEs show that the containment server employs auto-infection (§ 6.6) for both, along with the MD5 hashes of the ac-tual executables. The number of SMTP flows REFLECTed dif-fers from the total number of SMTP sessions because we con-figured the SMTP sink to drop connections probabilistically.(Global inmate IP addresses anonymized.)

6.5 Reporting

We realize the reporting component using Bro [23]. Wedeveloped an analyzer for the shimming protocol tokeep track of all containment activity on the inmate net-work, and track specific additional classes of traffic asneeded (for example, we leverage Bro’s SMTP analyzerto track attempted and succeeding message delivery forour spambots). Bro’s log-rotation functionality then ini-tiates activity reports on an hourly and daily basis. Thereports break down activity by subfarm, inmate, and con-tainment decision, allowing us to verify that the gateway

9

enforces these decisions as expected (for example, an un-usual number of FORWARD verdicts might indicate a bugin the policy, and absence of any C&C REWRITEs wouldindicate lack of botnet activity). We also pull in externalinformation to help us verify containment (for example,we check all global IP addresses currently used by in-mates against relevant IP blacklists). Figure 7 shows partof a report. The reporting setup extracts all of the per-inmate textual information from network activity.

6.6 Auto-infection & batch processingAutomating the infection (and re-infection, after a re-vert to clean state) of inmates is a key ingredient forsaving us time operating GQ. We have implemented anauto-infection routine as follows. We configured auto-infection inmate master images to run a custom infec-tion script at first boot (subsequent reboots should nottrigger reinfection, as some malware intentionally trig-gers reboots itself). This script then contacts an HTTPserver at a preconfigured address and port, requests themalware sample, and executes it, thus completing the in-fection. Note that we can realize the HTTP server asa REWRITE containment, simplifying the implementa-tion substantially: the containment server observes theattempted HTTP connection anyway, and can thus pro-ceed to impersonate the simple HTTP server needed toserve the infection. We realize this in a separate contain-ment class that serves as a base class for all policies thatoperate using auto-infection. Again, VLAN IDs drive se-lection of a malware sample, as the configuration in Fig-ure 6 illustrates. Processing batches of malware samplesfollows as a simple generalization: instead of serving thesame sample repeatedly, we maintain the batch as a listof files and serve them sequentially.

6.7 External network address spaceIn practice we operate GQ using a set of five /24 net-works. We use one network only to make our exper-iments’ control infrastructure (at least each subfarm’scontainment servers, potentially also sink servers andother machinery) available externally, leaving four net-works for our inmate population. The size of these allo-cations forms an upper bound on the number of inmateswe can accommodate at any given time, but has so farproved sufficient: we experience no address tainting, i.e.,the need to move around in our address range to preserveinmate functionality.

7 Operational Experiences

While containment policy development can easily feellike a chore, we have found that developing tight con-

tainment often forms a key ingredient for understandingthe behavior of a sample under study. We now show ex-amples of such insights learned during containment de-velopment.

Absence of activity. During our operation of GQ asa worm-capturing honeyfarm at the beginning of 2006 ,we captured 66 distinct worms belonging to 14 differentmalware families (according to classification by Syman-tec’s anti-virus products at the time). Table 1 summarizesthe captures. Note the high variation in incubation times:nine infection classes required more than three minuteson average for a propagation. This illustrates the impor-tance of long-duration execution to observe the desiredbehavior, even in infections that supposedly act quickly.

Unexpected visitors. During our infiltration of theStorm botnet in 2008 [13, 17, 18] we developed suf-ficiently complete understanding of this particular mal-ware family that a containment policy seemed to follownaturally. For the C&C-relaying proxy bots in the mid-dle of the Storm hierarchy, we preserved outside reach-ability of the bots (the requirement for their becomingrelay agents as opposed to spam-sourcing drones) andredirected all outgoing activity other than the HTTP-borne C&C protocol to our standard sink server. In June,we noticed apparent FTP connection attempts arrivingat our sink server. Closer investigation revealed the at-tempted use of our Storm bots for iframe injection. Anupstream botmaster used the bots’ ability to receive andprocess SOCKS [14] message headers to initiate FTPconnections to specific hosts, authenticate and log in withknown credentials, download a specific HTML file andre-upload it with a malicious iframe tag inserted. At thetime, articles on Storm frequently stated that its proxybots did not themselves engage in malicious activity, anda correspondingly loose containment policy would haveallowed these attacks to proceed unhindered.

Mysterious blacklisting. While deploying spambotsof the Waledac family [26] in June 2009, we noticedto our great surprise that the Composite Blocking List(CBL) [6] listed our inmates—a strong indication of apossible containment failure. However, the only out-side interaction we had permitted was a single test SMTPmessage, sent to a GMail server, because redirection toour default SMTP sink seemed to cause the bots to cedefurther activity. Subsequent independent testing revealedthat the bots used a recognizable HELO greeting string(wergvan), that Google detected its presence, and thatthey informed blacklist providers of the IP addresses ofSMTP speakers employing the string. This caused usto stop the policy of allowing even seemingly innocuousnon-spam test SMTP exchanges.

Satisfying fidelity. It soon turned out that Waledacwas not the only spambot family whose members paidclose attention to the greeting banners served by the

10

# EVENTS/ INCUBATIONEXECUTABLE WORM NAME # CONNS PERIOD (S)a####.exe W32.Zotob.E 4 / 3 29.0a####.exe W32.Zotob.H 9 / 3 25.2a####.exe — 1 / 3 223.2cpufanctrl.exe Backdoor.Sdbot 1 / 4 111.2chkdisk32.exe — 1 / 4 134.7dllhost.exe W32.Welchia.Worm 297 / 4 or 6 24.5enbiei.exe W32.Blaster.F.Worm 1 / 3 28.9msblast.exe W32.Balster.Worm 1 / 3 43.8lsd W32.Poxdar 11 / 8 32.4NeroFil.EXE W32.Spybot.Worm 1 / 5 237.5sysmsn.exe W32.Spybot.Worm 3 / 3 79.6MsUpdaters.exe W32.Spybot.Worm 1 / 5 57.0ReaIPlayer.exe W32.Spybot.Worm 2 / 5 95.4WinTemp.exe W32.Spybot.Worm 1 / 5 178.4wins.exe W32.Spybot.Worm 1 / 5 118.2msnet.exe W32.Spybot.Worm 1 / 7 189.4msgupdates.exe W32.Spybot.Worm 2 / 5 125.3ntsf.exe — 1 / 5 459.4scardsvr32.exe W32.Femot.Worm 4 / 3 46.2scardsvr32.exe W32.Femot.Worm 1 / 3 66.5scardsvr32.exe W32.Femot.Worm 55 / 3 96.6scardsvr32.exe W32.Femot.Worm 2 / 3 179.6scardsvr32.exe W32.Femot.Worm 1 / 5 49.3scardsvr32.exe W32.Femot.Worm 4 / 3 41.4scardsvr32.exe W32.Femot.Worm 1 / 3 41.1scardsvr32.exe W32.Valla.2048 1 / 5 32.2scardsvr32.exe — 1 / 7 54.8scardsvr32.exe W32.Pinfi 1 / 3 180.8x.exe W32.Korgo.Q 17 / 2 6.6x.exe W32.Korgo.T 7 / 2 9.5x.exe W32.Korgo.V 102 / 2 6.0x.exe W32.Korgo.W 31 / 2 5.9x.exe W32.Korgo.Z 20 / 2 6.6x.exe W32.Korgo.S 169 / 2 6.6x.exe W32.Korgo.S 15 / 2 8.6x.exe W32.Korgo.V 2 / 2 24.4x.exe W32.Licum 2 / 2 7.9x.exe W32.Korgo.S 3 / 2 10.4x.exe W32.Pinfi 1 / 2 329.7x.exe W32.Korgo.V 6 / 2 11.3x.exe W32.Pinfi 5 / 2 20.1x.exe W32.Pinfi 5 / 2 24.9x.exe W32.Pinfi 2 / 2 27.5x.exe W32.Korgo.V 1 / 2 27.5x.exe W32.Pinfi 1 / 2 63.1x.exe W32.Korgo.W 1 / 2 76.1x.exe W32.Korgo.S 1 / 2 18.0x.exe W32.Pinfi 1 / 2 58.2x.exe W32.Korgo.S 1 / 2 210.9xxxx...x Backdoor.Berbew.N 844 / 2 9.4xxxx...x W32.Info.A 34 / 2 7.2xxxx...x Trojan.Dropper 685 / 3 10.0xxxx...x W32.Pinfi 1 / 3 32.5xxxx...x W32.Pinfi 3 / 2 34.2n/a W32.Korgo.C 3 / 2 4.8n/a W32.Korgo.L 1 / 2 7.0n/a W32.Korgo.G 8 / 2 4.1n/a W32.Korgo.N 2 / 2 5.3n/a W32.Korgo.G 3 / 2 5.4n/a W32.Korgo.E 1 / 2 5.6n/a W32.Korgo.gen 1 / 2 5.0n/a W32.Korgo.I 15 / 2 4.3n/a W32.Korgo.I 1 / 2 5.4multiple W32.Muma.A 2 / 7 186.7multiple W32.Muma.B 2 / 7 208.9multiple BAT.Boohoo.Worm 1 / 72 384.9

Table 1: Self-propagating worms caught by GQ in early 2006.Events correspond to infections with the same executable. Nextcome the numbers of connections needed per infection to com-plete the propagation, and incubation times (the delay from ini-tial infection in our farm to subsequent infection of the nextinmate). We show delays of over 3 minutes in bold.

SMTP servers. As a consequence, we upgraded ourSMTP sink to support banner grabbing for select con-nections: SMTP requests to a hitherto unseen host nowcaused the sink to actually connect out to the SMTPserver and obtain the greeting message, relaying it backto the spambot. The more closely malware trackswhether the responding entity behaves as expected, themore likely we are to get drawn into an arms race of em-ulating and testing fidelity.

Protocol violations. On several occasions during ourongoing extraction of spam from spambots [17], ourspam harvest accounting looked healthy at the connec-tion level (since lots of connections ensued), but uponcloser inspection meager at the content level (since forsome bot families no actual message body transmis-sion occurred). Closer investigation revealed that ourSMTP sink protocol engine followed the SMTP spec-ification [25] too closely, preventing the protocol statemachine from ever reaching the DATA stage. The proto-col discrepancies included such seemingly mundane de-tails as repeated HELO/EHLO greetings or the format ofemail addresses in MAIL FROM and RCPT TO stanzas(with or without colons, with or without angle brackets).

Exploratory containment. Containment policy de-velopment need not always try to facilitate continuousmalware operation. To better understand a sample, wefrequently find it equally important to create specific en-vironmental conditions under which the sample will ex-hibit new behavior. Before our infiltration of Storm, wetried to understand the meaning of the error codes re-turned in Storm’s delivery reports [17] using a dual ap-proach of live experimentation, in which we exposed thesamples to specific error conditions during SMTP trans-actions, and binary analysis. Interestingly, neither ap-proach could individually explain all behavior, but by it-erating alternation of the two—live experimentation con-firming conjectures developed during binary analysis,and vice versa—we could solve the case. We used thisapproach again during our infiltration of the MegaD bot-net [4]. Here, live experimentation allowed us to confirmthe correct functionality of the extracted C&C protocolengine. During a recent investigation of clickbots [20]we used the approach to understand the precise HTTPcontext of some of the bots’ C&C requests.

Unclear phylogenies. When we take a set of spec-imen supposedly belonging to malware family X andsubject it to a corresponding containment policy, we im-plicitly assume that the notion of a malware family X isvalid and that samples belonging to family X will exhibitbehavioral similarity. We find this assumption increas-ingly violated in practice. We have repeatedly noticedthat third parties label samples inconsistently, and haveeven observed cases of split personalities in malware: inFebruary 2010 we encountered a specimen that at times

11

showed MegaD’s C&C behavior, and at other times be-haved like a member of the Grum family. While tightcontainment ensures that cause no harm when contain-ment policy and observed behavior mismatch, it puts intoquestion the idea of developing a library of containmentpolicies from which we can easily pick the appropriateone. A batch-processing setup that enables some extentof automated family classification is thus an importanttool to have. To this end, we reflect all outgoing networkactivity to our catch-all sink and apply network-level fin-gerprinting on the samples’ initial activity trace. We havesuccessfully used this approach to classify over one mil-lion malware samples that we harvested from pay-per-install distribution servers [3].

7.1 Scalability

Traditional honeyfarms such as Potemkin [28] placestrong emphasis on scalability of the inmate population:these systems quickly “flash-clone”new inmate to pro-vide additional infectees as needed. For the worm modelthis made sense—it requires infectee chains to reliablyidentify infections, and the assumption that any incom-ing flow may constitute a potential worm infection im-plies the need to set aside resources potentially for eachsender of traffic received in the honeyfarm’s IP addressrange. We feel today’s requirements for malware ex-ecution have shifted. Self-propagating malware infec-tions are not extinct, but much less relevant and typicallyuse entirely different vectors (such as the user’s activityon specific websites, as in the case of Koobface [27]).Correspondingly, careful filtering and resource manage-ment related to unsolicited, inbound traffic often remainsa non-issue, particularly when considering the typicalscenario of home-user machines deployed behind net-work address translators. We find it more importantto achieve scalability in terms of providing independentexperiments at varying stages of “production-readiness”(including development of the farm architecture itself)with moderate requirements of inmate population size,and convenient mechanisms for intentional, controlledinfection of the inmate population. GQ’s architecture re-flects these considerations.

The architecture we outlined in § 5 constrains scalabil-ity as follows. First, VLAN IDs are a limited resource.The IEEE 802.1Q standard limits the VLAN ID to twelvebits, allowing up to 4096 inmates. However, physicalswitches frequently support less. We can avoid this lim-itation naturally by moving to multiple inmate networksand prepending a gateway-internal network identifier toVLAN IDs for purposes of identifying inmates. Second,with a large number of inmates in a single subfarm, a sin-gle containment server becomes a bottleneck, as it has tointerpose on all flows in the subfarm. We can defuse

the situation straightforwardly by moving to a cluster ofcontainment servers, managed by the subfarm’s packetrouter. We would merely need to extend the router’s flowstate tables entries to include an identifier of the contain-ment server responsible for the flow. Several contain-ment server selection policies come to mind, such as ran-dom selection under the constraint that the same contain-ment server always handles the same inmate. Third, sim-ilarly to the containment server the central gateway itselfbecomes a bottleneck as the number of inmates and sub-farms grows. To date, we have not found this problem-atic: the same 3Ghz quad-core Xeon machine with 5GBof memory that we deployed six years ago still serves uswell today, running 5-6 subfarms in parallel with pop-ulations ranging from a handful to a dozen of inmates.Fourth, our globally routable IP address space is of lim-ited size, creating an incentive to avoid blacklisting andleaking of our actively used addresses at all cost to re-duce the rate at which we “burn through” this addressspace. Currently we find our address allocations suffi-cient. Should this change, we may opt to use GRE tun-nels in order to connect additional routable address spaceavailable in other networks (provided by colleagues orinterested third parties) to the system.

8 A Methodology for Developing Contain-ment Policies

GQ currently does not by itself help to automate the de-velopment of containment policies. However, it providesthe basis for a principled approach to containment pol-icy development. When deploying new malware sam-ples, we employ the following strategy. Beginning froma complete default-deny of interaction with the outsideworld, we execute the specimen in a subfarm providinga sink server that accepts arbitrary traffic without mean-ingfully responding to it, and to which the containmentserver reflects all outbound traffic. We thus understandthe extent to which the specimen comes alive, and can in-spect the nature of the attempted communication and inthe best case infer the precise intent, or at least attemptto distinguish C&C traffic from malicious activity. Wecan then whitelist suspected-safe traffic for outside inter-action, in the most narrow fashion possible. For exam-ple, we only allow an HTTP GET request, seemingly forC&C instructions, with similar URL path structure. Gen-erally opening up HTTP would be overzealous, as mal-ware might use HTTP both for C&C as well as a burst ofSQL injections. We then iterate the process over repeatedexecutions of the specimen until we arrive at a contain-ment policy that allows just the C&C lifeline onto theInternet, while containing malicious activity inside GQ.

We fully acknowledge the manual and—depending on

12

the specimen at hand—time-consuming nature of thisprocess, but crucially, GQ makes this process both ex-plicit and feasible. We see substantial potential for fa-cilitating automation. For example, leveraging recent re-sults of automated C&C protocol extraction from bina-ries [4, 16] could aid in understanding the significanceof individual flows. The language in which we expresscontainment policies in the containment server forms an-other area of future work. The primary reason for ourcurrent use of Python is experience and convenience, butthe general-purpose nature of the language complicatesthe creation of a tool-chain for processing policies. Forexample, a traffic generation tool that can automaticallyproduce test cases for a given concrete containment pol-icy would strengthen confidence in the policy’s correct-ness significantly. A more domain-specific, abstract lan-guage (like in Bro [23]) could simplify this.

It behooves us to contemplate the consequences of acontainment arms race, in which malware authors explic-itly try to defeat our approach to containment. It is easyto see—as mentioned by John et al. [12]—that botmas-ters can construct circumstances in which malware willcease to function as desired, despite our best efforts. Forexample, spam campaigners could scatter control groupsof email addresses among the spam target lists and re-quire successful delivery of messages to these addressesto keep a bot spamming. However, note the conceptualdifference between achieving the desired goal of execut-ing malware (say, longitudinal harvesting of spam mes-sages) and tight containment preventing harm to others.While the former may fail, in our experience we canguarantee the latter, given tight containment. We are thusoptimistic that we can operate malware safely to the pointof attack and control traffic becoming so blended that wecan no longer meaningfully distinguish them.

9 Conclusion

The GQ malware farm introduces containment as a first-order component in support of malware analysis. Inbuilding and operating GQ for six years, we have cometo treat containment itself as a tool that can improveour understanding of malware under study. While mod-ern malware increasingly resists longitudinal analysis incompletely contained environments, GQ not only allowsbut encourages flexible and precise containment policieswhile maintaining acceptable safety. With ongoing de-velopment, we anticipate it continuing to provide corefunctionality for malware and botnet analysis well intothe future.

References

[1] P. Barford and M. Blodgett. Toward botnet mesocosms.In Proceedings of the First Workshop on Hot Topics inUnderstanding Botnets, Berkeley, CA, USA, 2007.USENIX Association.

[2] U. Bayer, C. Kruegel, and E. Kirda. TTAnalyze: A toolfor analyzing malware. In 15th Annual Conference of theEuropean Institute for Computer Antivirus Research(EICAR), 2006.

[3] J. Caballero, C. Grier, C. Kreibich, and V. Paxson.Measuring Pay-per-Install: The Commoditization ofMalware Distribution. In Proceedings of the 20thUSENIX Security Symposium, San Francisco, CA, USA,August 2011.

[4] J. Caballero, P. Poosankam, C. Kreibich, and D. Song.Dispatcher: Enabling active botnet infiltration usingautomatic protocol reverse-engineering. In Proceedingsof the 16th ACM CCS, pages 621–634, Chicago, IL,USA, November 2009.

[5] J. Calvet, C. R. Davis, J. M. Fernandez, J.-Y. Marion,P.-L. St-Onge, W. Guizani, P.-M. Bureau, andA. Somayaji. The case for in-the-lab botnetexperimentation: creating and taking down a 3000-nodebotnet. In Proceedings of the 26th ACSAC Conference,pages 141–150, New York, NY, USA, 2010. ACM.

[6] CBL. Composite Blocking List.http://cbl.abuseat.org, 2003.

[7] J. Chen, J. McCullough, and A. C. Snoeren. UniversalHoneyfarm Containment. Technical ReportCS2007-0902, UCSD, September 2007.

[8] X. Chen, J. Andersen, Z. Mao, M. Bailey, and J. Nazario.Towards an understanding of anti-virtualization andanti-debugging behavior in modern malware. InProceedings of the 38th Conference on DependableSystems and Networks (DSN), pages 177–186. IEEE,2008.

[9] W. Cui, V. Paxson, and N. Weaver. GQ: Realizing aSystem to Catch Worms in a Quarter Million Places.Technical Report TR-06-004, International ComputerScience Institute, September 2006.

[10] A. W. Jackson, D. Lapsley, C. Jones, M. Zatko,C. Golubitsky, and W. T. Strayer. SLINGbot: A Systemfor Live Investigation of Next Generation Botnets. InProceedings of the 2009 Cybersecurity Applications &Technology Conference for Homeland Security, pages313–318, Washington, DC, USA, 2009. IEEE ComputerSociety.

[11] X. Jiang and D. Xu. Collapsar: A VM-based architecturefor network attack detention center. In Proceedings ofthe 13th USENIX Security Symposium, page 2. USENIXAssociation, 2004.

[12] J. John, A. Moshchuk, S. Gribble, andA. Krishnamurthy. Studying spamming botnets usingBotlab. In Proceedings of the 6th USENIX Symposium

13

http://cbl.abuseat.org

on Networked Systems Design and Implementation,pages 291–306. USENIX Association, 2009.

[13] C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. M.Voelker, V. Paxson, and S. Savage. Spamalytics: Anempirical analysis of spam marketing conversion. InProceedings of the 15th ACM Conference on Computerand Communications Security, pages 3–14, Alexandria,Virginia, USA, October 2008.

[14] D. Koblas. SOCKS. In Proceedings of the 3rd USENIXSecurity Symposium. USENIX Association, September1992.

[15] E. Kohler, R. Morris, B. Chen, J. Jannotti, andM. Kaashoek. The Click modular router. ACMTransactions on Computer Systems (TOCS),18(3):263–297, 2000.

[16] C. Kolbitsch, T. Holz, C. Kruegel, and E. Kirda.Inspector Gadget: Automated extraction of proprietarygadgets from malware binaries. In 2010 IEEESymposium on Security and Privacy, pages 29–44. IEEE,2010.

[17] C. Kreibich, C. Kanich, K. Levchenko, B. Enright, G. M.Voelker, V. Paxson, and S. Savage. On the SpamCampaign Trail. In Proceedings of the First USENIXWorkshop on Large-scale Exploits and Emergent Threats(LEET), San Francisco, USA, April 2008.

[18] C. Kreibich, C. Kanich, K. Levchenko, B. Enright, G. M.Voelker, V. Paxson, and S. Savage. Spamcraft: An insidelook at spam campaign orchestration. In Proceedings ofthe Second USENIX Workshop on Large-scale Exploitsand Emergent Threats (LEET), Boston, USA, April2009.

[19] McAfee Labs. Threats Report. http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2010.pdf, October2010.

[20] B. Miller, P. Pearce, C. Grier, C. Kreibich, andV. Paxson. What’s Clicking What? Techniques andInovations of Today’s Clickbots. In Conference onDetection of Intrusions and Malware & VulnerabilityAssessment (DIMVA), volume Lecture Notes inComputer Science 6739. Springer, July 2011.

[21] J. Mirkovic, T. V. Benzel, T. Faber, R. Braden, J. T.Wroclawski, and S. Schwab. The DETER project:Advancing the science of cyber security experimentationand test. In IEEE Intl. Conference on Technologies forHomeland Security (HST), page 7, November 2010.

[22] Norman. Norman SandBox. http://www.norman.com/security_center/security_tools/.

[23] V. Paxson. Bro: A System for Detecting NetworkIntruders in Real-Time. Proceedings of the 7th USENIXSecurity Symposium, pages 31–51, 1998.

[24] A. Pitsillidis, K. Levchenko, C. Kreibich, C. Kanich,G. Voelker, V. Paxson, N. Weaver, and S. Savage. BotnetJudo: Fighting Spam with Itself . In Proceedings of the17th Annual Network and Distributed System SecuritySymposium (NDSS), San Diego, CA, USA, March 2010.

[25] J. Postel. Simple Mail Transfer Protocol. RFC 821,August 1982.

[26] G. Tenebro. W32.Waledac Threat Analysis.http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/W32_Waledac.pdf, 2009.

[27] N. Villeneuve. Koobface: Inside a Crimeware Network.http://www.infowar-monitor.net/reports/iwm-koobface.pdf, November 2010.

[28] M. Vrable, J. Ma, J. Chen, D. Moore, E. Vandekieft,A. Snoeren, G. Voelker, and S. Savage. Scalability,fidelity, and containment in the potemkin virtualhoneyfarm. ACM SIGOPS Operating Systems Review,39(5):148–162, 2005.

[29] Y. Wang, D. Beck, X. Jiang, and R. Roussev. AutomatedWeb Patrol with Strider Honeymonkeys: Finding WebSites that Exploit Browser Vulnerabilities. InProceedings of the 13th Annual Network and DistributedSystem Security Symposium (NDSS), San Diego, CA,USA, March 2006.

[30] C. Willems, T. Holz, and F. Freiling. Toward automateddynamic malware analysis using CWSandbox. IEEESecurity & Privacy, pages 32–39, 2007.

14

http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2010.pdf



http://www.norman.com/security_center/security_tools/

http://www.norman.com/security_center/security_tools/

http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/W32_Waledac.pdf



http://www.infowar-monitor.net/reports/iwm-koobface.pdf

http://www.infowar-monitor.net/reports/iwm-koobface.pdf

gq: practical containment for measuring modern malware systems · third, we present insights from...

Documents