relative power of models in distributed computing petr kuznetsov tu berlin/dt-labs

Relative Power of Models in Distributed Computing

Petr KuznetsovTU Berlin/DT-Labs

What makes distributed computing special?

Failures and lack of synchrony

Computing units are unreliable and unsynchronized

(otherwise ≅ centralized system) Blind Men and the Elephant

What makes distributed system hard?

Multitude of abstractions and models Lack of categorization, complexity classes

Unnatural?

Blind Men and the Elephant

4

Distributed modeling jumble

CAS, LL/SC?

RW shared

memory?

Message passing?

Snapshot memory?

Sub-consensus objects?

Clouds, data centers…?

t-resilience ?

5

Today t-resilience wait-freedom [BG simulation]≅

(t+1)-process wait-free system is equivalent to a n-process t-resilient system (t<n)

a colorless task T is solvable t-resiliently iff T is solvable wait-free by t+1 processes

Non-uniform fault models wait-freedom≅Set consensus power of an adversaryColorless tasks Colored conjectures

6

Model n processes p1,…,pn

Read-write shared memory Atomic snapshots [AADGMS90] Crash failures

p2

p1 W(R1,1)

W(R2,1)p3

Snapshot(R)

W(R3,1) Snapshot(R)

Snapshot(R)

(1,0,0)

(1,1,1)

(1,0,1)

7

Distributed tasksFunctions in distributed computing

A task (I,O,Δ): I – set of input vectors O – set of output vectors Task specification Δ: I→2O

8

k-set consensus

Processes start with inputs in V (|V|>k)

Safety:• The set of outputs is a subset of inputs of size at most kLiveness:• Every correct process eventually outputs (wait-

freedom)

k=1: consensus wait-free (k+1)-process k-set consensus is impossible Colorless: a process is free to adopt inputs or outputs

9

The wait-free model: 2 processes

P Q

P reads before Q writes

P reads after Q writes

Q reads after P writes

Q reads before P writes

Full-information protocol:

while not done

write(view)

view := snapshot(memory)

Wait-free consensus is impossible!

10


P Q

R

11


P Q

R

12


P Q

R Sperner’s Lemma: wait-free (k+1)-process

k-set consensus is impossible

[BG93,SZ93,HS93]

13

What about k-resilience? Assume 1 out of a million process may fail? Can we solve consensus?

Can we solve k-set consensus in a k-resilient system for some n>k?

No!(Otherwise we could do it wait-free)

14

BG agreement

Part I

15

BG agreement[Borowsky, Gafni, 1993]

Safety as in consensus: Every output is an input No two outputs are differentLiveness: Every correct process outputs, if no participating

process fails

16

BG-agreement: protocolCode for pi:

write(Ai,inputi)

S:=snapshot(A)write(Bi,S)

wait until for all pj in S, Bj≠

decide on the smallest input in the smallest Bj

17

BG agreement: correctnessLiveness:

Suppose each participant takes 3 steps: every wait terminates

(If a participant “dies” between the writes – block)

Safety:Consider pt that wrote the smallest snapshot S to Bt

for all Bj≠, pt is in Bj

every pi waits until pt writes

every pi decides on the smallest input in S

18

BG simulation k+1 simulators q1,…,qk+1

n simulated processes p1,..,pn (n>k)

p1 pn

q1 qk+1

….

….

19

Simulation Every simulator qi takes a snapshot to get the “most

recent” view of pj Run 3 steps of BG-agreement to agree on the view of pj

If the view is decided, register the view If not proceed to the next process in round-robin

Safe: the simulated views Live?

No! What if a BG-agreement blocks?

20

Simulation order Run the BG agreements in round-robin When done with the “write-phase” of a BG-

agreement for a step of pj– proceed to pj+1 mod n

p2 p4

q1 q3

p1 p3

q2

21

Progress

A faulty simulator may block at most one process At most k simulators can fail At most k simulated process fail – a k-resilient

run!

22

Application: colorless tasksSuppose T=(I,O,Δ) is solvable k-resiliently, let A be

the corresponding (full-information) protocol Each qi starts with an input in VI The first view of each pj is its input (each qi

proposes its own input) In the resulting k-resilient run of A, some pj output qi adopts the first output value it sees.

k-resilient k-set consensus is impossible!

23

Works both ways n simulators p1,…,pn can k-resiliently simulate a

wait-free run on q1,..,qk+1 what matters is the number of failures, not the

number of simulators

All k-resilient systems are equivalent! (with respect to colorless tasks)

24

On Non-Uniform Fault Models

Part II

25

Uniform fault models Processes failures are IID

P(pi fails)=εP(no faults)=(1-ε)n

P(t or less faults)=I1-ε(n-t,t+1)

t-resilience: at most t faults wait-freedom: at most n-1 faults

26

But…

Processes may failIn a correlated wayIn non-identical way

27

Non-identical faults Processes p,q,r

p and r fail independentlyq is unlikely to fail

Possible runs:pqr (no faults)pq (r fails)qr (p fails)q (p and r fail)

p

q

r

28

Correlated faults Processes p,q,r

p and q share unreliable hardwareq and r share unreliable softwareIt is unlikely that both hardware and software fail

Possible runs:pqr (no faults)p (software fault)r (hardware fault)

p

q

r

29

Generic adversaries[Delporte et al., 2009]

pp

r s

• A - set of process subsets• The model = all runs with in correct sets in A

A= {p,qr,rs}

q r s

30

Hitting sets

A can solve 2-set consensusCan it solve consensus?

p q r s

Hitting set of A

A={p,qr,rs}

p q r s

Yes: for all S in A, h(AS)=1

31

Commit-adopt[Gafni, 1998]

Liveness: Every correct process returns

Safety: Return (adopt,v) or (commi,v) where v was proposed If one value proposed, only (commit,*) is returned If one returns (commit,v), then only (*,v) is rerurned

32

Commit-adopt: wait-free protocolCode for pi:

write(Ai,inputi)

S:=collect(A)

if |vals(S)|=1 then

write(Bi,inputi)

else

write(Bi,fail)

S:=collect(B)if there are no fails in S then

return (commit,inputi)

if for some j, inputj is in S

return (adopt, inputj)

return (commit,inputi)

33

Commit-adopt: correctness

Liveness: immediate

Safety: If all proposals are the same – every process commits At most one non-fail value in B If a process commits: the value is seen by every

terminated process in B

34

Leader-based consensusvi := inputi

while truer++(u,vi) := CommitAdoptr(vi)

if u = commit then return vi

vi := get the estimate from the Leaderi

Safety provided by Commit-AdoptLiveness if, eventually, the same correct leader is elected

35

Electing a leader using Afor all S in A, h(AS)=1

shared

C[1],…,C[n] \\ shared counters

R[1],…,R[n] \\ the most recent rounds

while true

r++

R[i]:= r

wait until for some S in A: for all j in S, R[j]≥r

for all j not in S: C[j]++ \\ increment the counter

Leaderi := argmin(C[1],…,C[n])

36

Set consensus number of Asetcon(A)= 0, if A is empty maxS in Amina in S setcon(AS,a) +1, otherwise

AS – all S’ in A, subsets of S

AS,a – all S’ in AS, not containing a

A = {pqr,pq,pr,p,q,r}, setcon(A)=2: S = pqr and a = p, we have AS,a= {q, r} and setcon(AS,a) = 1

37

A = {pqr, pq, pr, p, q, r}

for S=pqr, a=p, AS,a={q,r} and setcon (AS,a) = 1

setcon(A)= setcon (AS,a)+1=2

A1={pqr,pq,pr,p}

A2={q,r}

38

Partitioned adversary

A=A1,…,Ak

setcon(Ai)=1

there exists S in A, such that for all a in S:AS,a is in Ai+1,…,Ak

setcon(AS,a)=k-i

39

Characterizing setcon

setcon(A)=k if and only if A solves k-set consensus but not (k-1)-set consensus

A with setcon(A)=k solves a colorless task T if and only if T is solvable (k-1)-resiliently

n-process system with A ≅ k-process wait-free

40

SufficiencySolving k-set consensus with A, setcon(A)=k

Split A into A1,..,Ak, each of setcon 1Run k parallel leader-based consensuses – at least one

terminatesAdopt the first returned value

41

NecessitySuppose A can solve (k-1)-set

consensusk processes solve (k-1)-set consensus

as follows:

“Leveled” BG-simulation, starting from level k

At current level L, simulate steps of S such that setcon(AS)=L

If blocked simulating a step of p on level L go to simulating S’ in AS,p such that setcon(AS’)=L-1

If a higher level “unblocks”, return to it

k

k-1

2

1

…

42

Asymmetric progress conditions[Imbs et al., DISC 2010]

Make progress if p1 participates (wait-free for p1),

orat most one process is eventually

up (obstruction-free for the rest) A={p1*,p2,…pn}

setcon(A)=2A1={p1*}

A2={p2,…pn}

43

What if we use stronger objects? Suppose we can use k-process consensus objects What is the min k such that consensus is solvable?

Suppose k≤n-1 2 process solve wait-free consensus as follows:

BG-simulation (a slow simulator blocks up to n-1 simulated processes)

Start with simulating all in round-robinIf blocked – simulate the first unblocked processIf blocked – go back to simulating all

Eventually, either all advance, or exactly one runs solo

44

Summary t-resilience wait-freedom≅ [BG93] Adversaries wait-freedom≅ [GK10a]

What about complexity?Task solvability undecidable for >2 processes [HR97,GK99]

What about colored tasks?Extended to generic tasks [GK10b]

45

It’s wait-free!It’s wait-free!

46

References

E. Borowsky and E. Gafni. Generalized FLP impossibility result for t-resilient asynchronous computations,'' STOC 1993

E. Gafni and P. KuznetsovTurning Adversaries into Friends: Simplified, Made Constructive, and ExtendedOPODIS 2010

E. Gafni and P. KuznetsovRelating L-Resilience and Wait-Freedom via Hitting Sets ICDCN 2011

D. Imbs, M. Raynal, G. TaubenfeldOn Asymmetric Progress ConditionsDISC 2010

47

QUESTIONS?

48

Colorless distributed tasksVI –set of input values

VO-set of output values

Val(U) denotes the set of values in vector U In is in Δ (Out) Val(In) is subset of Val(In’) Val(Out’) is subset of Val(Out)

Out’ is subset of Δ(In’)

relative power of models in distributed computing petr kuznetsov tu berlin/dt-labs

Documents

free slide

s slide

processes p q p reads

elephant slide

free consensus

step of p j

hs93 slide

q writes p reads