failure detectors

18
Failure Detectors

Upload: gregory-jennings

Post on 31-Dec-2015

17 views

Category:

Documents


0 download

DESCRIPTION

Failure Detectors. Can we do anything in asynchronous systems?. Reliable broadcast Process j sends a message m to all processes in the system Requirement: If m is delivered by any correct process then it should be delivered by all correct processes - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Failure Detectors

Failure Detectors

Page 2: Failure Detectors

Can we do anything in asynchronous systems?

• Reliable broadcast– Process j sends a message m to all processes in the system– Requirement:

• If m is delivered by any correct process then it should be delivered by all correct processes

• Intuition: message m may be received by the process but it may deliver it at a later point

• Assumption– A single message send is atomic– If a message is sent, it would be received as long as the receiving

processes does not fail

Page 3: Failure Detectors

Is this algorithm correct?

• Let 1..n be processes in the system

– For x = 1 to n• Send m to x

– Upon receiving m• Deliver m

• What is wrong?

Page 4: Failure Detectors

How to fix it?

• Let 1..n be processes in the system

• We will use this algorithm in our work with failure detectors

Page 5: Failure Detectors

Main Requirements

• Accuracy– When a process is suspected to have failed, it

actually has

• Completeness– When a process fails it is suspected

• Assumption in this work: no repairs possible

Page 6: Failure Detectors

Different Completeness Requirements

• Strong completeness– Eventually every process that crashes is

permanently suspected by all processes

• Weak completeness– Eventually every process that crashes is

permanently suspected by some correct process

Page 7: Failure Detectors

Different Accuracy Requirements

• Strong accuracy– No process is suspected before it crashes

• Weak accuracy– Some correct process is never suspected

• Eventual strong accuracy– There is a time (unknown to processes themselves)

after which no process is suspected before it crashes

• Eventual weak accuracy– There is a time (unknown to processes themselves)

after which some correct process is never suspected

Page 8: Failure Detectors

Classification of Failure Detectors

Accuracy

Strong Weak Eventually Strong

Eventually weak

Com

pleteness

Strong Perfect

P

Strong

S

Eventually Perfect

P

Eventually Strong

S

Weak

Q

Weak

W Q

Eventually Weak

W

Page 9: Failure Detectors

Reducibility of Detectors

• Given a failure detector P can we implement Q?

• Given a failure detector Q, can we implement P?

Page 10: Failure Detectors

Reducibility of Detectors

D

TD->D’

D’

Page 11: Failure Detectors

Reducibility of Detectors

Repeat forever

{ p queries local failure detector Dp}

suspectp = Dp

send (p, suspectp) to all

[]

When receive(q, suspectq)

outputp = (outputp suspectq ) – { q }

Page 12: Failure Detectors

Reducibility of Detectors

Accuracy

Strong Weak Eventually Strong

Eventually weak

Com

pleteness

Strong Perfect

P

Strong

S

Eventually Perfect

P

Eventually Strong

S

Weak

Q

Weak

W Q

Eventually Weak

W

Page 13: Failure Detectors

Solving Consensus with Weak Failure Detector S

Phase 1for x = 1 to n – 1

report the new votes you learnt in the previous roundwait until you receive votes from everyone you do not

suspect to have failedend for

Phase 2report all the votes you have learntwait until you receive votes from everyone you do not suspect to have failed

Phase 3Consider only those votes that are known to everyoneChoose the vote of the smallest ID process as the decision

Page 14: Failure Detectors

Solving Consensus with Weak Failure Detector S

• Assume that the number of processes failed is strictly less than n/2

• Round based computation– Coordinator in round x is (x mod n) + 1

• Coordinator is just a process that follows a protocol that slightly differs from others

• Otherwise, there are no other assumptions about it

Page 15: Failure Detectors

Solving Consensus with Weak Failure Detector S

• In each round– Phase 1

• Send your estimates to coordinator

– Phase 2: at coordinator• Wait until at least (n+1)/2 messages are received• Use them to decide on a tentative decision• Send tentative decision to all

– Phase 3• Wait until tentative decision received from coordinator or coordinator is

suspected– In the former case, send an ack, and revise your estimate to be the tentative decision– In the latter case, send a nack

– Phase 4: at coordinator• If (n+1)/2 acks are received then make a final decision and send it using

reliable broadcast

Page 16: Failure Detectors

Solving Consensus with Weak Failure Detector S

• Upon receiving reliable broadcast message– Decide on the value proposed in it

Page 17: Failure Detectors

Other Results

S (or, W) is the weakest failure detector that can be used for solving consensus

• P is the weakest failure detector that can be used to solve leader election– The goal of the proposed survey in this area is

to study this issue further.

Page 18: Failure Detectors