distributed algorithms for failure detection in crash environments
DESCRIPTION
Distributed Algorithms for Failure Detection in Crash Environments. R. Cortiñas, A. Lafuente, M. Larrea Distributed Systems Group University of the Basque Country UPV/EHU. Guest Stars: P , S and Omega. P : s trong completeness, eventual strong accuracy - PowerPoint PPT PresentationTRANSCRIPT
UPV / EHU
Distributed Algorithms forFailure Detection inCrash Environments
R. Cortiñas, A. Lafuente, M. Larrea
Distributed Systems GroupUniversity of the Basque Country UPV/EHU
2
UPV / EHU
Master SIA – Sistemas Distribuidos
Guest Stars: P, S and Omega
P: strong completeness, eventual strong accuracy– Eventually every process that crashes is permanently
suspected by every correct process– There is a time after which correct processes are not
suspected by any correct process
S: strong completeness, eventual weak accuracy– There is a time after which some correct process is
never suspected by any correct process
• Omega: eventual leader election– There is a time after which all the correct processes
always trust the same correct process
3
UPV / EHU
Master SIA – Sistemas Distribuidos
The First P Algorithm [CT96]
4
UPV / EHU
Master SIA – Sistemas Distribuidos
p1
p3
p4
p6
p5
p2
Communication Optimality
A ring arrangement of processes
5
UPV / EHU
Master SIA – Sistemas Distribuidos
p1
p3
p4
p6
p5
p2
Communication Optimality
Communication-efficient algorithms:
n links are used forever
6
UPV / EHU
Master SIA – Sistemas Distribuidos
p1
p3
p4
p6
p5
p2
Communication Optimality
Communication-optimal algorithms:
C links are used forever
7
UPV / EHU
Master SIA – Sistemas Distribuidos
Communication-optimal P
8
UPV / EHU
Master SIA – Sistemas Distribuidos
• We also propose an optimal implementation of S, the weakest failure detector for solving Consensus:
– processes ordered: p1, ..., pn– heartbeat strategy– communication pattern: one-to-successors– based on a trusted process (instead of a list of suspected
processes)
Communication-optimal Omega
9
UPV / EHU
Master SIA – Sistemas Distribuidos
i) Initially, p1 starts sending messages periodically to the rest of processes, and all processes trust p1
p2p1 p5p4p3
trusted1 = p1 trusted2 = p1 trusted3 = p1 trusted4 = p1 trusted5 = p1
Communication-optimal Omega
10
UPV / EHU
Master SIA – Sistemas Distribuidos
ii) If a process does not receive a message within some timeout period from its trusted process pi, then it suspects pi and takes the next process pi+1 as its new trusted process
p2p1 p5p4
trusted1 = p1 trusted2 = p1 trusted3 = p1 timeout on p1
trusted4 = p2
trusted5 = p1
p3
Communication-optimal Omega
11
UPV / EHU
Master SIA – Sistemas Distribuidos
iii) If a process trusts itself, then it starts sending messages periodically to its successors
p2p1 p5p4
trusted1 = p1 trusted3 = p1 trusted4 = p2 trusted5 = p1
p3
timeout on p1
trusted2 = p2
Communication-optimal Omega
12
UPV / EHU
Master SIA – Sistemas Distribuidos
iv) If a process receives a message from a process pi preceding its trusted process, then it will trust pi again, increasing its timeout period with respect to pi
p2p1 p5
trusted1 = p1 message from p1
trusted2 = p1
timeout_period21++
trusted3 = p2 message from p1
trusted4 = p1
timeout_period41++
trusted5 = p1
p3 p4
Communication-optimal Omega
13
UPV / EHU
Master SIA – Sistemas Distribuidos
• Lemma. With the previous algorithm, eventually all the correct processes will permanently trust the first correct process in p1, ..., pn
• This property trivially allows us to provide the properties of S:
– Eventual weak accuracy: by not suspecting the trusted process– Strong completeness: by suspecting all the processes except the
trusted process
Communication-optimal Omega
14
UPV / EHU
Master SIA – Sistemas Distribuidos
Communication-optimal Omega