xft: practical fault tolerance beyond crashes · © 2016 ibm corporation xft: practical fault...

18
© 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT, China) Paolo Viotti (EURECOM, France) Christian Cachin (IBM Research Zurich) Vivien Quéma (INP Grenoble, France) Marko Vukolić (IBM Research Zurich)

Upload: others

Post on 18-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2016 IBM Corporation

XFT: Practical Fault Tolerance beyond Crashes

OSDI 2016, November 3 2016, Savannah, GA, USA

Shengyun Liu (NUDT, China)

Paolo Viotti (EURECOM, France)

Christian Cachin (IBM Research – Zurich)

Vivien Quéma (INP Grenoble, France)

Marko Vukolić (IBM Research – Zurich)

Page 2: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

Fault tolerance

Building systems that tolerate machine and network faults

What machine faults?

Crash faults (CFT)

2

Non-crash (a.k.a. Byzantine) faults (BFT)Data losses, omissions, data corruptions, bugs, misconfigurations, hardware faults, cosmic rays,

incorrect firmware, operator errors, ……and malicious behavior

Page 3: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

Network faults (a.k.a network partitions, asynchrony)

Reflect the inability of correct machines

to communicate in a timely manner (i.e., synchronously)

3

a partitionedmachine

Page 4: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

This paper in one slide: XFT (cross fault tolerance)

in absence of non-crash faults

the same fault-tolerance guarantees as asynchronous CFT

(i.e., with the same thresholds)

in presence of non-crash faults

fault-tolerance guarantees as long as

the number of faulty or partitioned machines is within a threshold

XFT showcase: XPaxos

The first state-machine replication (SMR) protocol in the XFT model

(almost) as efficient as optimized CFT Paxos4

+ + ≤ t

> 0

Page 5: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

CFT vs. BFT deterministic SMR

5

CFT SMR state-of-the-art

(e.g., Paxos, RAFT, Zab)

BFT SMR state-of-the-art

(e.g., PBFT, Zyzzyva, Chain/Aliph)

2t+1 replicas 3t+1 replicas

Efficient message patterns Inefficient message patterns

client

leader

replica

replica

Paxos

replica

PBFT[TOCS’02]

Zyzzyva[TOCS’09]

Chain/Aliph[TOCS’15]

t=1

NB: These are only common-case message patterns

Page 6: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

FT guarantees: CFT SMR

6

Crash faults

Non-crashfaults

ma

ch

ine

fa

ults

Consistencyw. any number of machine faults

Availabilityw. up to n/2 machine faults

Performance/costvery good (production use)

Consistencyw. any number of

faulty or partitioned machines

none none

Availabilityw. up to n/2

faulty or partitioned machines

Network faults?yesno

Page 7: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

FT guarantees: BFT SMR

7

Crash faults

Non-crashfaults

ma

ch

ine

fa

ults

Consistencyw. any number of machine faults

Availabilityw. up to n/3 machine faults

Performance/costpoor (compared to CFT)

Consistency & Availabilityw. up to n/3 machine faults

Consistencyw. up to n/3 machine faults andany no. of partitioned machines

Consistencyw. any number of

faulty or partitioned machines

Availabilityw. up to n/3

faulty or partitioned machines

Availabilityw. up to n/3

faulty or partitioned machines

Network faults?yesno

Page 8: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

x = null

The Cost of Asynchronous BFT (Infamous 3t+1)

8

write(“I like you”)

x = “I like you”

x = “I like you”x = null

read()null

t = 1

and

Such a particular adversary isin many use cases

irrelevant

> 0 + + ≥ n/2n/3 >

The cost of BFT comes from providing consistency when

Page 9: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

XFT (cross fault tolerance)

in absence of non-crash faults

the same fault-tolerance guarantees as asynchronous CFT

(i.e., with the same thresholds)

in presence of non-crash faults

fault-tolerance guarantees as long as

the number of faulty or partitioned machines is within a threshold

XFT showcase: XPaxos

The first state-machine replication (SMR) protocol in the XFT model

(almost) as efficient as optimized CFT Paxos9

+ + ≤ t

> 0

Page 10: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

XPaxos: XFT SMR

10

Crash faults

Non-crashfaults

ma

ch

ine

fa

ults

Consistencyw. any number of machine faults

Availabilityw. up to n/2 machine faults

Performance/costvery good (compared to CFT)

Consistencyw. any number of

faulty or partitioned machines

Availabilityw. up to n/2

faulty or partitioned machines

Consistency & Availabilityw. up to n/2 machine faults Availability

w. up to n/2 faulty or partitioned machines

Consistencyw. up to n/2

faulty or partitioned machines

Network faults?yesno

Page 11: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

XPaxos message pattern (common case)

11

client

leader

replica

replica

Paxos (t=1)

client

leader

replica

replica

XPaxos (t=1)

Digitally signed messages

client

leader

replica

replica

XPaxos (t>1, here t=2)

replica

replica

client

leader

replica

replica

Paxos (t>1, here t=2)

replica

replica

Page 12: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

View-change sketch: a problem

12

view 1 view 2

What happened in view1?

Nothing!

t+1 replicas

t+1 replicas

Page 13: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

View-change sketch: XPaxos solution

13

view 1 view 2

Wait for at least 1 response +

Connection timeout to all t+1 replicas (including at least one correct)

Nothing!

t+1 replicas

t+1 replicas

t+1 replicas

what happened in view1?

Page 14: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

Deployment: geo-replication playground

14

Page 15: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

Choosing the timeout (for view-change)

Machines TCP ping-ing each other every 100ms for 3 months

Amazon AWS EC2 micro VMs in 6 regions─ US West (CA), US East (VA), Ireland (EU), Brazil (BR), Tokyo (JP), Sydney (AU)

IBM Softlayer─ Mexico (MX), San Jose (CA), Washington (DC), London (UK), Tokyo (JP), Sydney (AU)

15

Round-trip

Latency [ms]avg 99% 99.9% 99.99% max

min 85 [CA-VA] 130 [CA-JP] 1082 [CA-VA] 1097 [CA-VA] 5208 [JP-AU]

max 401 [AU-BR] 516 [AU-BR] 1474 [AU-BR] 2495 [JP-BR] 169749 [VA-EU]

< 2.5s

Round-trip

Latency [ms]avg 99.99% Max

min 65 [CA-MX] 1077 [CA-DC] 3476 [UK-DC]

max 305 [UK-AU] 1440 [UK-AU] 127869 [JP-DC]

< 1.5s

Page 16: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

Performance (ZooKeeper on Amazon EC2 testbed)

16

XPaxos

Zab

(CFT)

(optimized) Paxos

(CFT)

State-of-the-art

BFT

t=1

write-onlyworkload

1kB requests

closed-loop

Page 17: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

Where/when to use XFT?

Tolerating “accidental” non-crash (Byzantine) faults

Wide-area networks and geo-replicated systems

When adversary cannot control the network at will

“Permissioned” blockchain

17

www.hyperledger.org

Page 18: XFT: Practical Fault Tolerance beyond Crashes · © 2016 IBM Corporation XFT: Practical Fault Tolerance beyond Crashes OSDI 2016, November 3 2016, Savannah, GA, USA Shengyun Liu (NUDT,

© 2015 IBM Corporation© 2016 IBM Corporation

Thank you!

18

IBM Research - Zurich is hiringKeywords: distributed systems

fault-toleranceconsistencyblockchain