defending network-centricsystems using backdoorsiftode/citadel.pdf · around the existing computing...

Defending Network-Centric Systems using Backdoors

Liviu Iftode, Arati Baliga, Aniruddha Bohra, Stephen Smaldone, and Florin SultanDepartment of Computer Science

Rutgers University,Piscataway, NJ 08854-8019, U.S.A.

iftode, aratib, bohra, smaldone, sultan cs.rutgers.edu

As computing systems are increasingly depending on networking, they are also becoming more vulnerable to networkingmalfunctioning or misuse. Human intervention is not a solution when computer system monitoring and repairing must bedone fast and reliably regardless of scale, networking availability, or system impairing. Future network-centric systems mustbe built around a defensive architecture that allows computers to take care of themselves.

In this paper, we argue that the solution to building self-defending computer architectures is a Backdoor, which cansupport automated observation and intervention on a computer system’s memory without involving its operating system.Backdoors can therefore execute even when the functionality of the operating system of a critical system has been severelycompromised and the system is no longer accessible through the primary network. Backdoors can be realized in hardwareover a programmable network interface or in software over a virtual machine monitor.

I. Motivation and Goals

Despite decades of work on verification techniques, fault tolerance and security, computers and networks continue to remainvulnerable to failures and attacks. Increasing system complexity certainly does not help, even worse, it makes human-assisted monitoring, maintenance and intervention not only prohibitively costly but, more dramatically, unacceptably slowand sometimes ineffective. This problem is only going to become worse in the future, as the number of computing devices perperson will increase while the vision of pervasive computing approaches its realization. In response to these trends, researchfocus has recently started to shift from eliminating failures towards designing systems that can withstand them and recoverfast.1

What makes computers ultimately and persistently vulnerable? We believe that the ultimate cause of computer vulnerabilityis the computer architecture itself. Placing trust in the operating system is not a guarantee for dependability as long as theoperating system executes from the main memory, which can be accidentally or intentionally violated. Moreover, althoughoperating systems have been extensively crafted to be safe, holes continue to exist and will always exist. A common sourceof failures, operating system extensibility, has been recently addressed by projects like Nooks.2 More recent research hassuggested to move the trust from software to hardware.3 The problem with this approach is that it requires a modifiedprocessor architecture, not trivial to deploy across the huge computing infrastructure operational today. The question iswhether we can improve computer and network dependability in the presence of failures and attacks by strengthening ratherthan replacing the existing architectural framework.

A recent example of technological prowess, the remote repair of one of the Mars rover computers, holds, in our opinion,the answer. In the Mars rover case, a computer hung and then entered a pattern of periodic reboots because it ran out offlash memory due to an excessive, unanticipated, accumulation of logged data. What made repair possible was a provisionthat allowed remapping of the memory used during reboot from flash memory to RAM, creating an alternate executionpath that bypassed the faulty computer resource.4 This example suggests building systems capable of automated monitoringand intervention when their main functionality is compromised by faults or attacks. However, unlike the rover computer, thesolution must be cost-effective and easy to deploy on the existing computing platforms, by using only conventional processor,system, and OS architectures.

In this paper, we argue for augmenting existing computer systems and networks with trusted, multi-layered “last lines ofdefense.” A last line of defense provides a last resort for access to and action on resources of a computer even when

This work is supported in part by NSF CAREER Award CCR 0133366.

1 of 7

American Institute of Aeronautics and Astronautics

processors or I/O devices are not available due to failures or attacks. We describe our vision of defensive architectures builtaround the existing computing infrastructure.

The basic building block of Defensive Architectures (DAs) is the Backdoor (BD), a trusted intelligent device that sits on theI/O bus of the system and can connect to other similar devices through a local fabric or a secure private network. Backdoorscan be programmed to execute various defensive activities such as monitoring, logging, diagnosis and healing, using localand remote access operations to memory and I/O devices. To be secure, a backdoor should not be observable or controllableby the local processor or by any single remote backdoor. In particular, it must be resistant to tampering by malicious or faultysoftware running on its host computer.

In DA, the backdoors will be programmed to execute defensive activities to protect computers and networks of computers inthree defensive architectures: (i) Defensive Computer Architecture (DCA) - a conventional computer architecture augmentedwith a backdoor interface; (ii) Defensive Network Architecture (DNA) - a local network of backdoors, each of them attachedto a computer; (iii) Defensive Inter-Networking Architecture (DINA) - a private network of DNAs, potentially distributedover the wide area.

Achieving these goals requires solutions to two significant research challenges. First, we must be able to build backdoorcapabilities into an existing intelligent network interface (I-NIC). In addition to local and remote memory read/writeoperations, the I-NIC must be capable of (i) initiating I/O operations with other local or remote I/O devices withoutinvolving the host processors, and (ii) preventing local processors or other I/O devices from turning off or compromisingBD functionality. Second, we must design a software architecture that can be programmed to execute a wide range ofdefensive actions within the existing hardware and software platforms.

We believe that DA will radically change the way we design survivable systems, leading to a new area of research with a richproblem space. At the same time, a Defensive Architectures implementation, possible using off-the-shelf technology, makesit immediately applicable to real-world systems resilient to failures and attacks. This will address the dependability problemand reduce the human cost that is now confronting owners and system administrators who have to monitor, configure, andmaintain large-scale systems.

II. The Backdoor

A. Backdoor Architecture

The Backdoor architecture is a combination of hardware and low-level software (firmware). The BD hardware is essentiallya programmable network interface (I-NIC) (a NIC equipped with its own processor and memory) that (i) sits on the I/O busof the computer, and (ii) can connect to backdoors of other computers through a trusted, secure network fabric/backplane.

Placing processing power on a NIC has long been used in commercial implementations to increase communicationperformance by implementing low-latency, high-bandwidth communication models,5, 6 accelerating and/or offloadingexecution of IP stack protocols,7, 8 offloading remote storage protocols,6, 7, 9 etc. While in these examples some functionalityis provided by a NIC, they differ in their flexibility of programming. In BD, we specifically rely on the programmability ofthe NIC to implement access operations on the resources of its host computer.

Figure 1 depicts the software BD architecture. The “door” in the top left corner represents a (logical) access port controlledby the BD firmware. The door status (open/closed) indicates whether the BD accepts accesses to its resources initiated bythe local processors or other I/O devices. When a BD is powered on, the door is open. This allows the local OS to programthe BD to perform the desired operations. During the normal operation (after initialization) the BD can leave the door openor it can close it. Once the door is closed (at the end of the initialization phase), it cannot be opened again by any actionperformed by the local processors or I/O devices. Accesses to local memory or I/O devices initiated by the BD are alwayspossible, regardless of the status of the door.

The BD software architecture consists of the following components: (i) an operation queue where BD operations sent by thelocal processors and/or by the remote BDs are stored, (ii) an operation scheduler that dispatches queued operations repeatedly,according to their specified frequency, (iii) a basic engine that executes local or remote operations, (iv) an inter-BD protocolmodule, and (v) specialized functional units implementing complex actions (e.g., operation agreement for validating remoteoperations).

Remote operations are forwarded to the BD of a corresponding peer. The inter-BD protocol must include the door status witheach remote request. On receiving an operation from a remote peer, a BD can decide to service it right away (if the doorstatus is closed) or validate it before servicing it (if the door status is open). This is necessary because an operation received

2 of 7


Figure 1. The Backdoor software architecture.

from an open backdoor could be initiated or corrupted by its host OS, which the BD may not trust. To handle this case, theBD must be programmed to validate “insecure” remote operations through an agreement scheme that requires a select set ofbackdoors to send the same operation.

To implement the backdoor functionality, the I-NIC must support the following operations:

1. Access local and remote memory in cooperation with a remote BD without involving the remote CPU. This capability ispresent in implementations of remote memory communication in commercial I-NICs.5

2. Access local and remote I/O devices without involving local/remote CPU. This capability is supported by device-to-device communication over the I/O bus, present in switched-based I/O interconnects.10

3. Control accesses to local resources such as the BD control registers and memory. This capability is to be soonincorporated in commercial NICs, e.g., Myrinet.11

B. Backdoor Programmability

To program a BD for defensive activities, a sequence of operations generated by the local host is loaded into the BD operationqueue. For convenience, we call this sequence a BD program, not to be confused with the immutable and protected firmwarecontrolling the BD.

Operations in a BD program are tuples (written in a simple scripting language) that definerules of the form “if then ”, to be executed periodically by the execution engine of the BD. These rules are“programmed” into the BD by the local host during its safe initialization, or by remote BDs through the inter-BD protocol(without involving local processors). Note that programming a rule does not mean to download code into the BD. Rather,rules are expressed as parameterized “macrooperations” already implemented in BD functional units. Rules may containsymbolic references to kernel variables which are translated into physical references when the program is loaded.

Programming the BD can be done by the OS through a special BD driver that has only an initialization routine. Its solepurpose is to load the BD operation queue once, at system initialization time, through a secure and protected sequence thatdoes not access any other BD resource. For example: (i) the driver issues commands to the BD to load a program into itsoperation queue; (ii) when done, the driver signals the end of the sequence; (iii) in response, the BD closes the door to preventany further accesses and start execution of the program.

A challenging task is how to guarantee the integrity of the BD program, depending on the degree of trust the BD can placein the OS during initialization. In a trusted environment, running the driver by the OS in single-user mode would sufficeto prevent corruption of the loaded BD program. However, in the extreme case when the integrity of the OS, of the driverand/or of the program are not to be trusted, cryptographic techniques may be required to validate the BD program. Possiblesolutions rely on the ability to store immutable cryptographic hashes of the valid programs in the NIC only once, in a secureenvironment. This is acceptable, given the BD program does not change unless changes are made to the hardware or softwareconfiguration and the defensive activities must be reprogrammed. The BD will then check the integrity of the program onevery (untrusted) load by the driver, by recomputing and validating the hash over the loaded program.

3 of 7


III. Defensive Architectures

Figure 2. The Defensive Computing Architecture augments an existing computer system with a trusted backdoor.

The backdoor is a basic building block in three Defensive Architectures.

Defensive Computer Architecture (DCA). The DCA (Figure 2) is essentially a computer system, augmented with atrusted backdoor placed on the system I/O bus. DCA logically partitions a computer system into “frontdoor” components(processors, memory, I/O controllers, etc.) and its “backdoor” component. Frontdoor components, which provide the basefunctionality of the system, are usually exposed to interactions with the outside world (e.g., through networking) and areunder the control of the OS and application software. This makes them and the functions they provide vulnerable to failures,attacks, software crashes, human errors, etc. In contrast, the backdoor runs a trusted and protected firmware that controlsboth the backdoor interactions with the host and with other backdoors.

Defensive Network Architecture (DNA). The DNA (Figure 3) is a localized cluster of DCA nodes whose BDs are connectedover a high bandwidth, low latency interconnection fabric. For a BD to be part of the DNA, it must (i) allow protected remoteaccess to resources of its host (memory, I/O devices, etc.), and (ii) adhere to the inter-BD protocols to authenticate andvalidate requests.

Although DNA is intended to be under the complete control of secured BDs, it is possible to design a DNA where certain orall hosts are involved in the defense. A host can provide, for instance, additional resources (persistent storage, large memory,and fast processor) for monitoring and integrity constraint validation. When hosts are involved in the defense, the doorbetween them and the local BD might be left open after initialization. In this case, the BDs can be programmed to performagreement on operations and additional integrity checking.

Figure 3. The Defensive Network Architecture connects BDs of computers in a LAN-based cluster through a specialized interconnect.

Defensive Inter-Network Architecture (DINA). The DINA (Figure 4) is an internetwork of DNA clusters and trusted hostswhich are potentially geographically distributed. In DINA, the backdoors of each DNA cluster connect through a DNAGateway to a secure private inter-cluster network.

The DNA Gateway is a trusted system, i.e., which is not connected to the WAN or other I/O devices through which attacksor faults can be generated. The DNA Gateway plays two roles: (i) tunnels BD operations from one DNA cluster to anotherover the private network, and (ii) executes meta-operations such as BD operation multicast.

4 of 7


Figure 4. The Defensive Inter-Network Architecture uses DNA gateways interconnected on a private network to access BDs in nodes/clusters dispersedover wide-area.

IV. Defensive Activities

We describe several examples of Defensive Activities that can be implemented as BD programs.

A. Applications of the DCA

Smart Watchdog. A local BD can be programmed to monitor OS invariants in the host memory or the I/O system andtake immediate actions when they are violated. Invariants can be as simple as statistics on a certain OS event, such as thefrequency of an interrupt, or as complicated as a file system consistency test. Actions may include turning off the interruptbit for a malfunctioning device, overwriting a memory region to refresh software state and/or clean-up corrupted state of anOS subsystem, halting processors or I/O devices, and, most drastically, rebooting the system.

The idea of using dedicated hardware which is not under the control of the host OS for monitoring or to augment thefunctionality of a system has been used in several previous projects. Custom hardware for sensing, diagnosing and controllingcomputer hardware components in a large-scale cluster has been proposed in.12 Self-securing intelligent devices are proposedin13 to provide better system security through protected access control. A programmable secure coprocessor that provideshighly specialized support for complex secure applications has been developed by IBM.14 Finally, 15 describes a file integrityverification system based on an independent auditor implemented as an intelligent PCI card that can securely execute integritycheckers like AIDE or Tripwire.

Flight Data Recorder. Event logging by system loggers is usually used for post-mortem analysis of system crashes.However, the tail of the log, which holds the richest information about the succession of events that led to the crash, isoften lost because it is kept in system memory. To prevent loss of the event log tail, BD can act as a “flight data recorder”that keeps a short log over a window of most recent events. Moreover, BD can be programmed to dump logs on a local diskby executing direct BD-to-disk DMA transfers on the I/O bus.

Buffer Cache Synchronization. Buffer cache synchronization in the event of a crash can be implemented as a BD programthat detects the crash (e.g., by monitoring system activity) and writes-back the buffer cache to disk to save unsync’ed data.To prevent the machine from being rebooted while syncing to disk is still in progress, the BD can either block or interceptthe host reboot sequence and stall it by synchronizing with the processor commanding the reboot.

B. Applications of the DNA

OS Integrity Verification. An OS subject to failure or attack cannot be fully trusted with accurate self-monitoring andself-checking. DNA solves this problem by performing integrity checks through the secure path established between peerBDs. The BD can verify the integrity of the OS running on its host by periodic comparisons with reference copies runningor stored on other machines. On discovering a violation, the BD can take actions to (i) protect other hosts from a corruptedor compromised host by blocking its access to the DNA network, (ii) refresh/patch the OS state, halt or reboot.

Remote Logging. Logging the system activity in an OS relies on resources of the system for initiating and recording thelogs. This limits the usefulness of logging when scarcity of resources (e.g., due to CPU, memory or disk contention) delaysor prevents logged information from reaching its consumers. Worse, an attacker/intruder that takes control of a system can

5 of 7


simply wipe out the logs to hide illicit activities. DNA solves this problem by remote logging, i.e., storing the logs in theremote memory or on remote disks. The BD can retrieve the OS-produced logs from system memory (where they are storedbefore being flushed to disk) and send them to another logger BD that stores them.

Fast Reboot and State Refreshing. Rebooting is often performed to bring a system to a clean state. Its high downtimecost (reading the OS image from disk, wasted useful state, consistency checks, application restart, etc.) is often paid justto re-initialize a single component subsystem.16 DNA can support fast reboot by saving in-memory state of the critical OSdata structures in the memory of other nodes, and restoring it during reboot. The buffer cache can be made persistent acrossreboots, without using non-volatile memory and involving the processors or the OS in saving/restoring the state. DNA alsoenables fast refresh of OS subsystem state, without a full system restart, using consistent state snapshots periodically takenby the local BD.

C. Applications of the DINA

A News Agency. DINA can support a global secure information network, subscribed to by critical system controllers(routers, GRID control nodes, PlanetLab peers, etc.) interested in information about the Internet, individual networks, orhosts. DNA clusters produce information that is summarized and disseminated by their DNA Gateways, to be consumed byother DNA clusters or controller systems. Interested controllers can then request additional, fine-grained information fromDNA Gateways about their clusters and use it to diagnose problems and initiate repair.

Early Warning System. The spreading rate of Internet worms is too high for any attempt by human operators to protectnetworks against them: information and awareness come too late to help effective action. DINA offers a solution bypropagating information generated by the DCA systems (integrity reports, logs, etc.), through the DNA Gateway, tospecialized hosts that analyze it, identify threats, warn and protect other clusters against similar attack. For proprietarysystems, e.g., multiple instances of an Internet service, the system can be also used to monitor cluster performance, e.g.,to detect a load spike. The controller services, e.g., a CDN, can globally identify hot-spots and balance the load among itsclusters.

Recover Critical State. Long-running, performance-critical Internet systems (e.g., content distribution servers, cachingproxies, routers, etc.), build up their state over months of operation. This state, albeit soft, is critical for the service theyprovide, therefore sensitive to failures. Saving it to disk for persistence hurts system performance. DINA can solve thisproblem by nonintrusive state storage and retrieval on remote nodes.

V. Preliminary Results

We have implemented a BD prototype using Myrinet NICs5 for remote memory access, and used it for remote healing withina cluster of Internet servers running RUBiS (a complex, multi-tier, transactional auction service similar to e-Bay). Preliminaryresults17–19 show that nonintrusive remote access to the memory of a computer system can achieve accurate monitoring andefficient repair/recovery actions without using processors or OS resources of a failed system. Under a moderate workloadwith 200 clients, the system resumed service to the clients in under two seconds (quiescent time mainly due to clusterreconfiguration), without losing any requests, while maintaining exactly-once execution semantics across crash.

VI. Virtual Backdoor

In previous sections, we describe a Backdoor instantiated over a programmable Intelligent Network Interface Card (I-NIC) to monitor, recover, and repair systems from software faults and attacks. While it is easily deployable, built out ofcommodity components, and does not require any modification to the OS or applications, it requires additional hardware.As an alternative, the Backdoor can be realized over Virtual Machine Monitors (VMMs).20, 21 VMMs are implemented insoftware and execute at the highest privilege level in the system. The OS and applications are executed on top of these VMMsat a lower privilege and cannot directly access hardware. Instead, all hardware access is controlled through the VMM. Apartfrom providing performance isolation, VMMs have been previously used for intrusion detection, software debugging, andfault isolation.22, 23

A virtual backdoor can be implemented as a software component in a priviledged domain executing over the VMM, withhooks to execute in the privileged VMM domain. A specialized application running inside this priviledged domain can usethe virtual backdoor to access the Guest OS and application state. The design principles of the virtual backdoor are similar to

6 of 7


those described previously. While using a virtual BD enables easier development and deployment of the defensive activities,it relies on the correct execution and integrity of the VMM in addition to the BD programs that have been deployed on it.However, several commercial and academic research projects have demonstrated that highly reliable VMMs can be builtwithout compromising performance. We believe, with the growth of the VMM technology and the hardware capabilities,virtual BD would be an important component for monitoring, recovery, and repair of the system. We are currently developingan implementation of a virtual Backdoor, as an enhancement to an existing VMM.

References

Patterson, D. et al., “Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies,” Tech. Rep. UCB//CSD-02-1175, UCBerkeley Computer Science, March 2002.

Swift, M. M., Bershad, B. N., and Levy, H. M., “Improving the Reliability of Commodity Operating Systems,” Proc. 19th Symp. on Operating SystemsPrinciples (SOSP), Oct. 2003.

Lie, D. et al., “Architectural Support for Copy and Tamper Resistant Software,” Proc. 9th ASPLOS, 2000.“NASA JPL, Jan. 24-26, 2004 Press Releases,” http://marsrovers.jpl.nasa.gov/newsroom/pressreleases.“Myricom: Creators of Myrinet,” http://www.myri.com.“Emulex, Inc.” http://www.emulex.com.“Alacritech Storage and Network Acceleration,” http://www.alacritech.com.“Intel(R) IXP Network Processor Family,” http://www. intel. com/design/network/products/npfamily.“Cyclone Intelligent I/O,” http://www.cyclone.com.“PCI-SIG - PCI Express,” http://www.pcisig.com.Seitz, C. L., “Personal communication,” Feb. 2004.Oppenheimer, D., Oppenheimer, D., Brown, A., Beck, J., ant J. Kuroda, D. H., Treuhaft, N., Patterson, D., and Yelick, K., “ROC-1: Hardware Support

for Recovery-Oriented Computing,” IEEE Trans. Comput., Vol. 51, No. 2, 2002, pp. 100–107.Ganger, G. R. and Nagle, D. F., “Better Security via Smarter Devices,” Proc. HotOS VIII Workshop on Hot Topics in Operating Systems, 2001.Dyer, J. G., Lindemann, M., Perez, R., Sailer, R., van Doorn, L., Smith, S. W., and Weingart, S., “Building the IBM 4758 Secure Coprocessor,” Computer,

Vol. 34, No. 10, Oct. 2001, pp. 57–66.Molina, J. and Arbaugh, W. A., “Using Independent Auditors as Intrusion Detection Systems,” 4th Information and Communications Security

International Conference, Dec. 2002.Candea, G. and Fox, A., “Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel,” Proc. HotOS VIII Workshop on Hot Topics in

Operating Systems, 2001.Sultan, F., Bohra, A., Neamtiu, I., and Iftode, L., “Nonintrusive Remote Healing Using Backdoors,” In Proc. First Workshop on Algorithms and

Architectures for Self-Managing Systems, Self-Manage 2003, ACM, San Diego, June 2003.Sultan, F., Bohra, A., Gallard, P., Neamtiu, I., Smaldone, S., Pan, Y., Neamtiu, I., and Iftode., L., “Recovering Internet Service Sessions from Operating

System Failures,” IEEE Internet Computing, Special Issue-Recovery Oriented Approaches to Dependability, Vol. 9, No. 2, March-April 2005, pp. 17–25.Bohra, A., Neamtiu, I., Gallard, P., Sultan, F., and Iftode, L., “Remote Repair of OS State Using Backdoors,” Proc. Int’l. Conference on Autonomic

Computing, May 2004.Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A., “Xen and the Art of Virtualization,” SOSP

’03: Proceedings of the nineteenth ACM symposium on Operating systems principles, ACM Press, New York, NY, USA, 2003, pp. 164–177.Waldspurger, C. A., “Memory Resource Management in VMware ESX Server.” In Proc. Symposium on Operating System Design and

Implementation, OSDI’02, 2002.King, S. T. and Chen, P. M., “Backtracking intrusions,” ACM Trans. Comput. Syst., Vol. 23, No. 1, 2005, pp. 51–76.Garfinkel, T., Pfaff, B., Chow, J., Rosenblum, M., and Boneh, D., “Terra: a virtual machine-based platform for trusted computing,” SOSP ’03: Proceedings

of the nineteenth ACM symposium on Operating systems principles, ACM Press, New York, NY, USA, 2003, pp. 193–206.

defending network-centricsystems using backdoorsiftode/citadel.pdf · around the existing computing...

Documents