Download - SD T 05 ToleranciaFalhas.pdf
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
1/51
Distributed SystemsPrinciples and Paradigms
Maarten van Steen
VU Amsterdam, Dept. Computer ScienceRoom R4.20, [email protected]
Chapter 08: Fault Tolerance
Version: December 2, 2010
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
2/51
Contents
Chapter01: Introduction02: Architectures03: Processes04: Communication
05: Naming06: Synchronization07: Consistency & Replication08: Fault Tolerance09: Security10: Distributed Object-Based Systems11: Distributed File Systems12: Distributed Web-Based Systems13: Distributed Coordination-Based Systems
2/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
3/51
Fault Tolerance
Introduction
Basic conceptsProcess resilienceReliable client-server communicationReliable group communicationDistributed commitRecovery
3/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
4/51
Fault Tolerance 8.1 Introduction
Dependability
BasicsA component provides services to clients . To provide services, thecomponent may require the services from other components a componentmay depend on some other component.
SpecicallyA component C depends on C if the correctness of C s behavior depends onthe correctness of C s behavior. Note : components are processes orchannels.
Availability Readiness for usageReliability Continuity of service deliverySafety Very low probability of catastrophesMaintainability How easy can a failed system be repaired
4/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
5/51
Fault Tolerance 8.1 Introduction
Terminology
Subtle differencesFailure: When a component is not living up to its specications, a failureoccursError: That part of a components state that can lead to a failureFault: The cause of an error
What to do about faults
Fault prevention: prevent the occurrence of a faultFault tolerance: build a component such that it can mask the presence offaultsFault removal: reduce presence, number, seriousness of faultsFault forecasting: estimate present number, future incidence, andconsequences of faults
5/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
6/51
Fault Tolerance 8.1 Introduction
Failure models
Failure semantics
Crash failures: Component halts, but behaves correctly before haltingOmission failures: Component fails to respondTiming failures: Output is correct, but lies outside a specied real-time
interval ( performance failures: too slow)Response failures: Output is incorrect (but can at least not be accountedto another component)
Value failure: Wrong value is producedState transition failure: Execution of component brings it into awrong state
Arbitrary failures: Component produces arbitrary output and be subjectto arbitrary timing failures
6/65
l l d
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
7/51
Fault Tolerance 8.1 Introduction
Crash failures
Problem
Clients cannot distinguish between a crashed component and one that is justa bit slow
Consider a server from which a client is expecting output
Is the server perhaps exhibiting timing or omission failures?Is the channel between client and server faulty?
Assumptions we can make
Fail-silent: The component exhibits omission or crash failures; clientscannot tell what went wrongFail-stop: The component exhibits crash failures, but its failure can bedetected (either through announcement or timeouts)Fail-safe: The component exhibits arbitrary, but benign failures (theycant do any harm)
7/65
F l T l 8 2 P R ili
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
8/51
Fault Tolerance 8.2 Process Resilience
Process resilience
Basic issueProtect yourself against faulty processes by replicating and distributingcomputations in a group.
Flat groups: Good for fault tolerance as information exchangeimmediately occurs with all group members; however, may imposemore overhead as control is completely distributed (hard toimplement).
Hierarchical groups: All communication through a single coordinatornot really fault tolerant and scalable, but relatively easy to
implement.
8/65
Fault Tolerance 8 2 Process Resilience
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
9/51
Fault Tolerance 8.2 Process Resilience
Process resilience
(a) (b)
Flat group Hierarchical group Coordinator
Worker
9/65
Fault Tolerance 8 2 Process Resilience
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
10/51
Fault Tolerance 8.2 Process Resilience
Groups and failure masking
K-fault tolerant groupWhen a group can mask any k concurrent member failures ( k is calleddegree of fault tolerance).
How large does a k -fault tolerant group need to be?
Assume crash/performance failure semantics a total of k + 1members are needed to survive k member failures.Assume arbitrary failure semantics, and group output dened by voting
a total of 2 k + 1 members are needed to survive k member failures.
Assumption
All members are identical, and process all input in the same order onlythen are we sure that they do exactly the same thing.
10/65
Fault Tolerance 8 2 Process Resilience
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
11/51
Fault Tolerance 8.2 Process Resilience
Groups and failure masking
ScenarioGroup members are not identical, i.e., we have a distributedcomputation Nonfaulty group members should reach agreement onthe same value.
1 13
2 2
b b
a ab a
Process 2 tellsdifferent things
Process 3 passesa different value
3
(a) (b)
11/65
Fault Tolerance 8.2 Process Resilience
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
12/51
Fault Tolerance 8.2 Process Resilience
Groups and failure masking
ScenarioAssuming arbitrary failure semantics, we need 3 k + 1 group membersto survive the attacks of k faulty members. This is also known asByzantine failures .
EssenceWe are trying to reach a majority vote among the group of loyalists, in
the presence of k traitors need 2 k + 1 loyalists.
12/65
Fault Tolerance 8.2 Process Resilience
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
13/51
Groups and failure masking
1 2
3 4
1
2
2 4
z
41 x
1
4
y
2
1234
Got(Got(Got(Got(
1, 2, x, 41, 2, y, 41 ,2 ,3 ,41, 2, z, 4
))))
1 Got 2 Got 4 Got( ( (( ( (( ( (
1, 1, 1,a, e, 1,1, 1, i,
2, 2, 2,b, f, 2,2, 2, j,
y, x, x,c, g, y,z, z, k,
4 4 4d h 44 4 l
) ) )) ) )) ) )
(a)
(b) (c)
Faulty process
(a) what they send to each other(b) what each one got from the
other(c) what each one got in second
step
13/65
Fault Tolerance 8.2 Process Resilience
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
14/51
Groups and failure masking
1
23
121
x
y
2
123
Got(Got(Got(
1, 2, x1, 2, y1, 2, 3
)))
1 Got 2 Got( (( (1, 1,a, d,
2, 2,b, e,
y xc f
) )) )
(a)
(b) (c)
Faulty process(a) what they send to each other(b) what each one got from the
other(c) what each one got in second
step
14/65
Fault Tolerance 8.2 Process Resilience
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
15/51
Groups and failure masking
IssueWhat are the necessary conditions for reaching agreement?
Synchronous
Asynchronous
OrderedUnordered
Bounded
Bounded
Unbounded
Unbounded
Unicast UnicastMulticast Multicast
X X
X
X
X
X
X
X
C omm
uni c
a t i on
d el a
y
P r o c e s s
b e
h a v
i o r
Message ordering
Message transmission
Process: Synchronous operate in lockstepDelays: Are delays on communication bounded?Ordering: Are messages delivered in the order they were sent?Transmission: Are messages sent one-by-one, or multicast?
15/65
Fault Tolerance 8.2 Process Resilience
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
16/51
Failure detection
EssenceWe detect failures through timeout mechanisms
Setting timeouts properly is very difcult and applicationdependentYou cannot distinguish process failures from network failuresWe need to consider failure notication throughout the system:
Gossiping (i.e., proactively disseminate a failure detection)On failure detection, pretend you failed as well
16/65
Fault Tolerance 8.3 Reliable Communication
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
17/51
Reliable communication
So farConcentrated on process resilience (by means of process groups).What about reliable communication channels?
Error detectionFraming of packets to allow for bit error detectionUse of frame numbering to detect packet loss
Error correctionAdd so much redundancy that corrupted packets can beautomatically corrected Request retransmission of lost, or last N packets
17/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
18/51
Fault Tolerance 8.4 Reliable Group Communication
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
19/51
Reliable multicasting
ObservationIf we can stick to a local-area network, reliable multicasting is easy
PrincipleLet the sender log messages submitted to channel c :
If P sends message m , m is stored in a history bufferEach receiver acknowledges the receipt of m , or requestsretransmission at P when noticing message lostSender P removes m from history buffer when everyone has
acknowledged receipt
QuestionWhy doesnt this scale?
24/65
Fault Tolerance 8.4 Reliable Group Communication
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
20/51
Scalable reliable multicasting: Feedback suppression
Basic ideaLet a process P suppress its own feedback when it notices anotherprocess Q is already asking for a retransmission
AssumptionsAll receivers listen to a common feedback channel to whichfeedback messages are submittedProcess P schedules its own feedback message randomly , andsuppresses it when observing another feedback message
25/65
Fault Tolerance 8.4 Reliable Group Communication
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
21/51
Scalable reliable multicasting: Feedback suppression
NACK
NACK
NACK NACK NACKT=3 T=4 T=1 T=2
Sender Receiver Receiver Receiver Receiver
Network
Receivers suppress their feedbackSender receivesonly one NACK
QuestionWhy is the random schedule so important?
26/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
22/51
Fault Tolerance 8.4 Reliable Group Communication
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
23/51
Scalable reliable multicasting: Hierarchical solutions
CC
S(Long-haul) connection
Sender
Coordinator
RootRReceiver
Local-area network
QuestionWhats the main problem with this solution?
28/65
Fault Tolerance 8.4 Reliable Group Communication
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
24/51
Atomic multicast
P1 joins the group P3 crashes P3 rejoins
G = {P1,P2,P3,P4} G = {P1,P2,P4} G = {P1,P2,P3,P4}
Partial multicastfrom P3 is discarded
P1
P2
P3
P4
Time
Reliable multicast by multiplepoint-to-point messages
IdeaFormulate reliable multicasting in the presence of process failures interms of process groups and changes to group membership.
29/65
Fault Tolerance 8.4 Reliable Group Communication
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
25/51
Atomic multicast
P1 joins the group P3 crashes P3 rejoins
G = {P1,P2,P3,P4} G = {P1,P2,P4} G = {P1,P2,P3,P4}
Partial multicastfrom P3 is discarded
P1
P2
P3
P4
Time
Reliable multicast by multiplepoint-to-point messages
GuaranteeA message is delivered only to the nonfaulty members of the currentgroup. All members should agree on the current group membershipVirtually synchronous multicast .
30/65
Fault Tolerance 8.5 Distributed Commit
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
26/51
Distributed commit
Two-phase commitThree-phase commit
Essential issueGiven a computation distributed across a process group, how can weensure that either all processes commit to the nal result, or none ofthem do ( atomicity )?
39/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
27/51
Fault Tolerance 8.5 Distributed Commit
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
28/51
Two-phase commit
ModelThe client who initiated the computation acts as coordinator;processes required to commit are the participants
Phase 1a: Coordinator sends vote-request to participants (alsocalled a pre-write )Phase 1b: When participant receives vote-request it returns eithervote-commit or vote-abort to coordinator. If it sends vote-abort , itaborts its local computationPhase 2a: Coordinator collects all votes; if all are vote-commit , it
sends global-commit to all participants, otherwise it sendsglobal-abort Phase 2b: Each participant waits for global-commit or global-abort and handles accordingly.
40/65
Fault Tolerance 8.5 Distributed Commit
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
29/51
Two-phase commit
COMMIT
INIT
WAIT
ABORT
CommitVote-request
Vote-abortGlobal-abort
Vote-commitGlobal-commit
(a)
COMMIT
INIT
READY
ABORT
Vote-requestVote-commit
Vote-requestVote-abort
Global-abort ACK
Global-commit ACK
(b)
Coordinator Participant
41/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
30/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
31/51
Fault Tolerance 8.5 Distributed Commit
l
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
32/51
2PC Failing participant
Scenario
Participant crashes in state S , and recovers to S
Initial state: No problem: participant was unaware of protocolReady state: Participant is waiting to either commit or abort. Afterrecovery, participant needs to know which state transition it should make
log the coordinators decisionAbort state: Merely make entry into abort state idempotent , e.g.,removing the workspace of resultsCommit state: Also make entry into commit state idempotent, e.g.,copying workspace to storage.
ObservationWhen distributed commit is required, having participants use temporaryworkspaces to keep their results allows for simple recovery in the presence offailures.
42/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
33/51
Fault Tolerance 8.5 Distributed Commit
2PC F ili i i
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
34/51
2PC Failing participant
Scenario
Participant crashes in state S , and recovers to S Initial state: No problem: participant was unaware of protocolReady state: Participant is waiting to either commit or abort. Afterrecovery, participant needs to know which state transition it should make
log the coordinators decisionAbort state: Merely make entry into abort state idempotent , e.g.,removing the workspace of resultsCommit state: Also make entry into commit state idempotent, e.g.,copying workspace to storage.
ObservationWhen distributed commit is required, having participants use temporaryworkspaces to keep their results allows for simple recovery in the presence offailures.
42/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
35/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
36/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
37/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
38/51
Fault Tolerance 8.5 Distributed Commit
3PC Failing participant
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
39/51
3PC Failing participant
Basic issue
Can P nd out what it should it do after crashing in the ready or pre-commitstate, even if other participants or the coordinator failed?
Reasoning
Essence: Coordinator and participants on their way to commit, never differ bymore than one state transitionConsequence: If a participant timeouts in ready state, it can nd out at the
coordinator or other participants whether it should abort, or enterpre-commit state
Observation: If a participant already made it to the pre-commit state, it canalways safely commit (but is not allowed to do so for the sake of failingother processes)
Observation: We may need to elect another coordinator to send off the nalCOMMIT
47/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
40/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
41/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
42/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
43/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
44/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
45/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
46/51
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
47/51
Fault Tolerance 8.6 Recovery
Coordinated checkpointing
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
48/51
p g
Simple solutionUse a two-phase blocking protocol:
A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reportsback that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue
ObservationIt is possible to consider only those processes that depend on therecovery of the coordinator, and ignore the rest
57/65
Fault Tolerance 8.6 Recovery
Coordinated checkpointing
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
49/51
p g
Simple solutionUse a two-phase blocking protocol:
A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reportsback that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue
ObservationIt is possible to consider only those processes that depend on therecovery of the coordinator, and ignore the rest
57/65
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
50/51
Fault Tolerance 8.6 Recovery
Coordinated checkpointing
-
8/2/2019 SD T 05 ToleranciaFalhas.pdf
51/51
Simple solutionUse a two-phase blocking protocol:
A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reports
back that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue
ObservationIt is possible to consider only those processes that depend on therecovery of the coordinator, and ignore the rest
57/65