making byzantine fault tolerant systems tolerate byzantine ...· making byzantine fault tolerant...
Post on 22-Jun-2018
Embed Size (px)
USENIX Association NSDI 09: 6th USENIX Symposium on Networked Systems Design and Implementation 153
Making Byzantine Fault Tolerant SystemsTolerate Byzantine Faults
Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike DahlinThe University of Texas at Austin
Mirco MarchettiThe University of Modena and Reggio Emilia
AbstractThis paper argues for a new approach to building Byzan-tine fault tolerant replication systems. We observe thatalthough recently developed BFT state machine replica-tion protocols are quite fast, they dont tolerate Byzantinefaults very well: a single faulty client or server is capa-ble of rendering PBFT, Q/U, HQ, and Zyzzyva virtuallyunusable. In this paper, we (1) demonstrate that exist-ing protocols are dangerously fragile, (2) define a set ofprinciples for constructing BFT services that remain use-ful even when Byzantine faults occur, and (3) apply theseprinciples to construct a new protocol, Aardvark. Aard-vark can achieve peak performance within 40% of that ofthe best existing protocol in our tests and provide a sig-nificant fraction of that performance when up to f serversand any number of clients are faulty. We observe usefulthroughputs between 11706 and 38667 requests per sec-ond for a broad range of injected faults.
1 IntroductionThis paper is motivated by a simple observation: al-though recently developed BFT state machine replica-tion protocols have driven the costs of BFT replicationto remarkably low levels [1, 8, 12, 18], the reality is thatthey dont tolerate Byzantine faults very well. In fact, asingle faulty client or server can render these systems ef-fectively unusable by inflicting multiple orders of mag-nitude reductions in throughput and even long periodsof complete unavailability. Performance degradations ofsuch degree are at odds with what one would expect froma system that calls itself Byzantine fault tolerantafterall, if a single fault can render a system unavailable, canthat system truly be said to tolerate failures?
To illustrate the the problem, Table 1 shows the mea-sured performance of a variety of systems both in theabsence of failures and when a single faulty client sub-mits a carefully crafted series of requests. As we showlater, a wide range of other behaviorsfaulty primaries,recovering replicas, etc.can have a similar impact. We
believe that these collapses are byproducts of a single-minded focus on designing BFT protocols with evermore impressive best-case performance. While this fo-cus is understandableafter years in which BFT repli-cation was dismissed as too expensive to be practical,it was important to demonstrate that high-performanceBFT is not an oxymoronit has led to protocols whosecomplexity undermines robustness in two ways: (1) theprotocols design includes fragile optimizations that al-low a faulty client or server to knock the system off ofthe optimized execution path to an expensive alternativepath and (2) the protocols implementation often fails tohandle properly all of the intricate corner cases, so thatthe implementations are even more vulnerable than theprotocols appear on paper.
The primary contribution of this paper is to advocate anew approach, robust BFT (RBFT), to building BFT sys-tems. Our goal is to change the way BFT systems are de-signed and implemented by shifting the focus from con-structing high-strung systems that maximize best caseperformance to constructing systems that offer accept-able and predictable performance under the broadest pos-sible set of circumstancesincluding when faults occur.
System Peak Throughput Faulty ClientPBFT  61710 0Q/U  23850 0
HQ  7629 N/A
Zyzzyva  65999 0Aardvark 38667 38667
Table 1: Observed peak throughput of BFT systems in afault-free case and when a single faulty client submits acarefully crafted series of requests. We detail our mea-surements in Section 7.2. The result reported for Q/U isfor correct clients issuing conflicting requests. The HQprototype demonstrates fault-free performance and doesnot implement many of the error-handling steps requiredto handle inconsistent MACs.
154 NSDI 09: 6th USENIX Symposium on Networked Systems Design and Implementation USENIX Association
RBFT explicitly considers performance during bothgracious intervalswhen the network is synchronous,replicas are timely and fault-free, and clients correctand uncivil execution intervals in which network linksand correct servers are timely, but up to f = n13 servers and any number of clients are faulty. The lastrow of Table 1 shows the performance of Aardvark, anRBFT state machine replication protocol whose designand implementation are guided by this new philosophy.
In some ways, Aardvark is very similar to traditionalBFT protocols: clients send requests to a primary whorelays requests to the replicas who agree (explicitly orimplicitly) on the sequence of requests and the corre-sponding resultsnot unlike PBFT , High through-put BFT , Q/U , HQ , Zyzzyva , ZZ ,Scrooge , etc.
In other ways, Aardvark is very different and chal-lenges conventional wisdom. Aardvark utilizes signa-tures for authentication, even though, as Castro correctlyobserves, eliminating signatures and using MACs in-stead eliminates the main performance bottleneck in pre-vious systems . Aardvark performs regular viewchanges, even though view changes temporarily preventthe system from doing useful work. Aardvark utilizespoint to point communication, even though renouncingIP-multicast gives up throughput deliberately.
We reach these counter-intuitive choices by followinga simple and systematic approach: without ever compro-mising safety, we deliberately refocus both the designof the system and the engineering choices involved inits implementation on the stress that failures can imposeon performance. In applying this strategy for RBFT toconstruct Aardvark, we choose an extreme position in-spired by maxi-min strategies in game theory : wereject any optimization for gracious executions that candecrease performance during uncivil executions.
Surprisingly, these counter-intuitive choices imposeonly a modest cost on its peak performance. As Table 1illustrates, Aardvark sustains peak throughput of 38667requests/second, which is within 40% of the best perfor-mance we measure on the same hardware for four stat-of-the-art protocols. At the same time, Aardvarks faulttolerance is dramatically improved. For a broad rangeof client, primary, and server misbehaviors we prove thatAardvarks performance remains within a constant fac-tor of its best case performance. Testing of the prototypeshows that these changes significantly improve robust-ness under a range of injected faults.
Once again, however, the main contribution of this pa-per is neither the Aardvark protocol nor implementation.It is instead a new approach that canand we believeshouldbe applied to the design of other BFT protocols.In particular, we (1) demonstrate that existing protocolsand their implementations are fragile, (2) argue that BFT
protocols should be designed and implemented with a fo-cus on robustness, and (3) use Aardvark to demonstratethat the RBFT approach is viable: we gain qualitativelybetter performance during uncivil intervals at only mod-est cost to performance during gracious intervals.
In Section 2 we describe our system model and theguarantees appropriate for high assurance systems. InSection 3 we elaborate on the need to rethink Byzan-tine fault tolerance and identify a set of design principlesfor RBFT systems. In Section 4 we present a system-atic methodology for designing RBFT systems and anoverview of Aardvark. In Section 5 we describe in detailthe important components of the Aardvark protocol. InSection 6 we present an analysis of Aardvarks expectedperformance. In Section 7 we present our experimentalevaluation. In Section 8 we discuss related work.
2 System modelWe assume the Byzantine failure model where faultynodes (servers or clients) can behave arbitrarily  anda strong adversary can coordinate faulty nodes to com-promise the replicated service. We do, however, assumethe adversary cannot break cryptographic techniques likecollision-resistant hashing, message authentication codes(MACs), encryption, and signatures. We denote a mes-sage X signed by principal ps public key as Xp . Wedenote a message X with a MAC appropriate for princi-pals p and r as Xr,p . We denote a message containinga MAC authenticatoran array of MACs appropriate forverification by every replicaas Xr
Our model puts no restriction on clients, except thattheir number be finite: in particular, any number ofclients can be arbitrarily faulty. However, the systemssafety and liveness properties are guaranteed only if atmost f = n13 servers are faulty.
Finally, we assume an asynchronous network wheresynchronous intervals, during which messages are deliv-ered with a bounded delay, occur infinitely often.
Definition 1 (Synchronous interval). During a syn-chronous interval any message sent between correct pro-cesses is delivered within a bounded delay T if the senderretransmits according to some schedule until it is deliv-ered.
3 Recasting the problemThe foundation of modern BFT state machine replicationrests on an impossibility result and on two principles thatassist us in dealing with it. The impossibility result, ofcourse, is FLP , which states that no solution to con-sensus can be both safe and live in an asynchronous sys-tems if nodes can fail. The two principles, first appliedby Lamport to his Paxos protocol , are at the coreof Castro and Liskovs seminal work on PBFT . Thefirst states that synchrony must not be needed for safety:
USENIX Association NSDI 09: 6th USENIX Symposium on Networked Systems Design and Implementation 155
as long as a threshold of faulty servers is