snapshot algorithm

40
SNAPSHOT ALGORITHM A paper by k. Mani Chady Leslie Lamport Presenting Einat Zuker

Upload: waldo

Post on 24-Feb-2016

139 views

Category:

Documents


0 download

DESCRIPTION

Snapshot Algorithm. A paper by k. Mani Chady Leslie Lamport Presenting Einat Zuker. What is a Snapshot - intuition. Given a system of processors and communication channels between them, we want each processor to have a “picture” of the global system state. - PowerPoint PPT Presentation

TRANSCRIPT

Sanpshot

Snapshot AlgorithmA paper byk. Mani ChadyLeslie Lamport

PresentingEinat ZukerWhat is a Snapshot - intuitionGiven a system of processors and communication channels between them, we want each processor to have a picture of the global system state.

Each processor however can only take a small picture of the global system (only itself)

But, if we put together all the small pictures, we would have a complete description of the global state of the system.

The big picture we are putting together must be meaningful and informative to be called a snapshot of the system.

2Snapshot - why do we want itStability detectionA stable system - the system in a given state holds a certain propriety means that all the possible next states of the system will hold that property too, then we can call the system stable.

Examples of stability:DeadlockNo tokens in a token ringComputation has terminated3The distributed system modelRepresentation a directed graph.Vertices - represent the processorsEdges - represent the communication channels

Assumptions:no synchronization (no clocks)Channels have infinite buffersChannels are error-freeChannels deliver messages in the order sent (FIFO)A message in a channel can be delayed for an arbitrary but finite time (all messages will eventually arrive at their destination)

4add an example here4The distributed system model - DefinitionsState of a channel - the sequence of messages sent along the channel, excluding the messages received along the channel.State of a processor a single element of some finite set.no messages sent.state of c is: emptyprocessor p sent M1 state of c is: M1processor p sent M2state of c is: M2 M1qpcqpcqpc

5

add an example here5The distributed system model Definitions contdEvent an event e is the tuple: where:p the processor in which the event occurss the state of p before the events the state of p after the eventc the channel whose state was changed by the event (can be null)M the message sent (or received) from p throw the channel c (can be null)Less formally: an event is an atomic action of a processor, that may change the state of the processors, and the state of at most one channel connected to p.

6Example the single token conservation systemThe system properties:two processors, two communication channels, one tokenprocessors states: s0 no tokens1 has tokeninitial state for p: s1, initial state for q: s0, initial state for channels: emptyevents in the system can be: e1 = e2 = etcqpccS1S0qpccS0S1qpccS0S0e1e2qpcc7The distributed system model Definitions contdGlobal state the set of the processors states and the channels states.initial global state a global state where each processor is in its initial state and each channel is in an empty state.

Next(S,e) a function which value is the global state immediately after the occurrence of the event e in the global state S.next() is defined only if event e can occur in the global state S.for a global state S, and an event e = if next(S,e) = S thenthe state of p in S is sthe state of the channel c in S is its state in S with the message M added to its tail or removed from its head 8e0= Example the single token conservation systemthe possible global states of the single token conservation systemS0e0= next(S0,e0) = S1 e3= next(S3,e3) = S0e1 = next(S1,e1) = S2e2= next(S2,e2) = S3S1S2S3qpccs1s0qpccs0s0qpccs0s1qpccs0s09The distributed system model Definitions contdComputation of the system a sequence of events in the system.more formally: given a sequence of eventsseq = (e0,e1,,ei,en) seq is a computation of the system iff event ei can occur in state Si andnext(Si, ei) = Si+1(S0 is the initial global state)in the previews example: the computation of the system was: (e0,e1,e2,e3) but the sequence (e0, e2) can not be.1010The Algorithm requirementsThe snapshot algorithm must run concurrently with the system computation.

The snapshot algorithm can not alter the computation in any way.

Any messages sent for recording purpose must not interfere with the computation of the system.

11Snapshot Algorithm - first ideaEach processor will add its state to the recorded snapshot at some point of the computation (lets assume we can see the channels states also and record them in the same fashion)

What can happen?12e0= e0= first idea - the problemthe system is in global state S0 - token in p.p decides to record itselfthe snapshot receivedthere is no such global state reachable from S0!Lets take a look at the single token conservation system:S0S1qpccs1s0qpccs0s0the system moves to global stateS1 - token in cc, c, q decide to record themselvesS*qpccs1s013First idea - the problem contdWhat happened? p was recorded before it sent a message. c was recorded after p sent a message.the snapshot had too many messages in it.Let us denote:n - # of messages in channel right before its source was recordedn - # of messages in channel right before recording the channelIn our case: n=0, n=1Can we conclude that if n < n the snapshot is inconsistent?

14e0= e0= first idea - the problem contdthe system is in global state S0 - token in p.c decides to record itselfthe snapshot receivedthere is no such global state reachable from S0!Lets take a look again at the single token conservation system:S0S1qpccs1s0qpccs0s0the system moves to global stateS1 - token in cp, c, q decide to record themselvesS*qpccs0s015first idea - the problem contdWhat happened? c was recorded before p sent a message. p was recorded after it sent a message.we lost messages in the snapshot. Remember the denotation:n - # of messages in channel right before its source was recordedn - # of messages in channel right before recording the channelIn our case: n=1, n=0Can we conclude that if n > n the snapshot is inconsistent?

16First idea - conclusionsthe problem in both cases was that we didnt had a means to monitor the messages that went throw the channel when the recording was done.

we need the algorithm to insure that the snapshot we take will reflect the messages passing in the channel17The snapshot algorithm conditionsdenotations:for two processor p, q and a channel c between them from p to qn - # of messages sent throw c before p was recordedn - # of messages sent throw c before c was recordedm # of messages received from c before q was recordedm # of messages received from c before c was recordedthe following conditions are required from the snapshot:n = nm = mn mn mif n = m, the recorded state of c must be the empty sequenceif n > m, the recorded state of c must contain the messages: [tail] (n),,(m+1)[head] messages sent by p along c

the n-th messagethe (m+1)-th messagemnM1M2M3M4M5M618The snapshot algorithm conditions contdM6M5M4M3M2M1p recordedq recordedthe recording of cIn less formal way:

The recorded state of c must be the sequence of messages sent along c before the state of p is recorded,excluding the sequence of messages received along c before the state of q is recorded 1919The algorithm outlinep will send a special message called a marker after the n message it sent (and before sending other message)

q will record channel cs state. the recorded sate will be the messages received by q after q recorded its state and before q received the marker.

q will record its state spontaneously, or immediately after the marker is received that is, before receiving (or sending) any other messages20The algorithm creators

k. Mani Chandy

Leslie Lamport

E. W. Dijkstra21the algorithmMarker-Sending Rule for a Processor p:For each channel c directed away from p, p sends one marker along c right after p records its state and before p sends further messages along c.

Marker-Receiving Rule for a processor q: On receiving a marker along a channel C if q has not recorded its state then q records its stateq records the state of c as the empty sequenceelse q records the state of c as the sequence of messages received along c after qs state was recorded and before q received the marker along c.22The algorithm - Running examplep sends the token, then record itselfcqcpthe snapshotp sends a markerq receives the token, and then receives the marker.q records itself and the incoming channel cq sends a markerp receives the marker.it already recorded itself, so it only needs to record the state of its incoming channel cS0 no tokenemptyemptyS1 has tokenqpccs1s0qpccs0s0qpccs0s1qpccs0s1qpccs0s123Some notes about the algorithmThe algorithm can be initiated by one or more processors.each processor records its state spontaneously (without receiving markers from other processors)

the collection of the snapshot pieces from each processor is a topic for a separate discussionbut, if we will recall the synchronization algorithm for asynchronies system (with some variations), we can come up with ways to form the big picture for each processor.

24termination of the algorithmdo we have a snapshot of the system in a finite time?that is, do we have a recording of each processor and channel in a finite time?

Lemma 1: if there is a path in the system from p to q, and p recorded itself, then q will record itself in finite time.proof: if p is directly connected to q then p will send a marker to q and q will record itself once the marker has reached (remember that all messages sent throw a channel will reach their destination in finite time).so, if p records its state and there is a path from p to q, then q will record its state in finite time because, by induction, every processor along the path will record its state in finite time and will send a marker in all of its outgoing channels.

25Lemma 2: the algorithm terminates in finite time, with a recording of each processor and channelproof:all the processors will eventually record their state (spontaneously, or because some other processor recorded itself as we know from Lemma 1)this means every processor will send a marker throw all of its outgoing channels so, a marker will be sent throw all channels.once the marker reaches its destination the channel will be recorded. this is true for all channels since all of them had a marker sent throw them.thus, all the channels are recorded in finite time too.

termination of the algorithm contd26Example non deterministic systemnote that the calculation in this case is not deterministic.for example, from S0 the event occurred could have been also: e0=initial global statee0 = e1 = e2 = the system properties:two processors: p, q. two communication channels: c, cp has 2 states {A,B}q has 2 states {C,D}p can send the message M while in state A. sending the message cusses it to move to state B.p can receive the message N while in state B. receiving the message cusses it to move back to state A.q works symmetrically to p.a possible computation of the system:qpccACMNqpccADMNS0S1S2S327qpccBMNCqpccBMNDthe system is in global state S0p records itself and sends the markercqcpthe snapshotsystem goes to global state S1p receives the marker.it already recorded itself so it needs to record the state of cANemptyDsystem goes to global state S2system goes to global state S3q receive the marker.q records itself and the incoming channel c.q sends the markerThe algorithm - Running example 2qpccACMNqpccBCMNqpccADMNqpccADMNqpccBDMNqpccADMNwhat is strangein this snapshot?28the snapshot the algorithm takes is not necessarily a global state the system was in.

so, what does the snapshot represent then?

the answer is, that the snapshot is a reachable global state of the system.in addition, if the events were to occur in a different order, the snapshot would be one of the global states reached.

this makes the snapshot consistent with its system.

The non deterministic example - analysis29Given:seq = (ei, i 0) a computation of some systemSi the global state of the system before event eiSj the initial global state of the systemSk the global state of the system when the algorithm terminated (0 j k)S* the global state the algorithm recorded (the snapshot)then there is a computation of the system seq that:for all i, i < j or i k, ei = ei for all i, i j or i k, Si=Sithe sub sequence (ei, j i < k) is a permutation of the sub sequence (ei, j i < k)there exists some t, j t k, such that S* = StSjSkTheoremseq:e0e1ej-1ejek-1ekei30pre-recording event an event that occurred in processor p before p recorded its state.post-recording event - an event that occurred in processor p after p recorded its state.

note: for event ei in seq :if i < j then ei is a pre-recording eventif i k then ei is a post-recording event

note: for event ei in seq such that j < i < k the event ei-1 can be a post-recording event and the event ei can be a pre-recording event if they occurred in different processors.if they occurred in the same processor and ei-1 is a post-recording event then both must be post-recording eventsProof - definitions31lets denote ei-1=, ei=lets assume:ei-1 is a post-recording eventei is a pre-recording eventcan M=M and c=c?that is, can q be receiving the message p sent?

the answer is no.

ei-1 is a post-recording event which means that a marker was sent in c before M was sent.the same marker was received by q before M reached it. when q received the marker it recorded itself so if ei = it can only be a post-recording event. in contradiction to the fact that ei is a pre-recording event

Proof - details32we saw that ei-1 and ei are independent of each otherthis means we can swap their order in the computation seq the new computation: ei-2,ei,ei-1 will end with the same global state as the original computation: ei-2,ei-1,ei

Proof details contdejej+1ei-1eiek-1ekejej+1eiei-1ek-1ekswapSkSiSi-1SkSiSi-1Si+1Si+133let seq be a computation were every post- recording event that occur right before a pre-recording event are swappedwe repeat the swapping until seq has all pre-recording events before post-recording eventsnote:seq is a computation of the systemfor all i, i < j or i k, ei=ei for all i, i j or i k, Si=Si

Proof details contdejej+1ei-1eiek-1ekswape0ejej+1ei-1eiek-1eke034lets look at the global system state after the last pre-recording event and before the first post-recording event. we will denote this state St (j t k)for some processor p let us assume the last state p was in before recording is a. (that means p recorded a as its state)in the global state St we will see that p is in state ain the snapshot S* we also see that the state of p is a (because p recorded a)we conclude that the state of each processor in St is the same as in S*Proof details contd35for some channel c from p to q:in St the messages in c are the ones p send before sending a marker in c (before p recorded itself) without the messages q received before recording itselfin the snapshot S* c contains all the messages q received in c after it recorded itself and before it received a marker in cwe conclude that the messages in c in the global state St and in the snapshot S* are the same.Proof details contd36it is now clear that we have proven our Theorem:there is a computation of the system seq that:for all i, i < j or i k, ei = ei for all i, i j or i k, Si=Sithe sub sequence (ei, j i < k) is a permutation of the sub sequence (ei, j i < k)there exists some t, j t k, such that S* = StProof conclusions37Example permute a computationrecall the non deterministic example:the computation we saw was:Next(S0,e0)=S1Post-recordinge0=S0Next(S1,e1)=S2Pre-recordinge1=S1Next(S2,e2)=S3Post-recordinge2=S2and the recorded global state wascqcpS*NDemptyAnow, lets swap the events so all pre-recordings will precede post-recordings:the global state S1 of this computation is exactly the snapshot of the original computation.Next(S0,e0)=S1Pre-recordinge0 =S0Next(S1,e1)=S2Post-recordinge1=S1=S*Next(S2,e2)=S3Post-recordinge2=S238The algorithm - final conclusionswe saw that St=S*. from this we can see:that the snapshot S* is reachable from Sjthat Sk is reachable from the snapshot S*

we saw S* could have been a global state of the computation if events were to occur in a different order

this means the snapshot is indeed valuable and informative when judging stability of a system

39ReferanceChandy, K. M and Lamport, L. Distributed Snapshots: Determining Global States of Distributed Systems

Dijkstra, E. W. The distributed snapshot of K. M. Chandy and L. Lamport.

40