project 2 review (part 2) ananth rao. overview stabilize and notify join (slides stolen from...

21
Project 2 Review (Part 2) Ananth Rao

Upload: damaris-colvin

Post on 16-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Project 2 Review (Part 2)

Ananth Rao

Page 2: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Overview

• Stabilize and Notify

• Join (slides stolen from lecture)

• Coding Trivia

• Bootstrapping and debugging

Page 3: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Identifier to Node Mapping Example• Node 8 maps [5,8]

• Node 15 maps [9,15]

• Node 20 maps [16, 20]

• …• Node 4 maps [59,

4]

4

20

3235

8

15

44

58

Page 4: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Routing• Each node maintains

its successor • Route packet (ID,

data) to the node responsible for ID using successor pointers

4

20

3235

8

15

44

58 send(34,data)

Page 5: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Stabilize

• Sent to the current successorNode periodically

• “Request” for a notify packet from the successor

Page 6: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Notify

• Sent in reply to the stabilize packet.

• Helps build a list of k-successors at the predecessor.

Page 7: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Stabilize-Notify

• Direct communication only with immediate successor and predecessor

• You receive only “n th” hand info about the n th successor

• It takes n*STABILIZE_PERIOD for a change in the n th successor to get propagated

Page 8: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Dealing with failures

• What happens when successorNode fails..– Timeout while waiting to receive a notify– Shift successorNode list by one

• What happens when predecssorNode fails– Timeout on receiving a stabilize from the

prececessor

Page 9: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Dealing with failures (cont.)

• We use fine-grained timers for detecting successor failures

• We use a coarse-grained timer for detecting a predecessor failure– Predecessor is not useful for forwarding

anyway– A fine-grained timer is not useful unless we

maintain a list of precessors

Page 10: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Joining Operation4

20

3235

8

15

44

58

50

• Node 50 asks node 15 to forward join message

• When join(50) reaches the destination (i.e., node 58), node 58 returns a notify message to node 50

• Node 50 updates its successor to 58

join(50)

notify(58)

succ=58

Page 11: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Joining Operation (cont’d)4

20

3235

8

15

44

58

50

• Node 50 sends a stabilize to Node 58. The predecessor gets updated at Node 58

• Node 44 sends a stabilize message to its successor, node 58

• Node 58 reply with a notify message

• Node 44 updates its successor to 50

succ=58stabilize()no

tify(predecessor=50)

succ=50

pred=50

Page 12: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Joining Operation (cont’d)4

20

3235

8

15

44

58

50

• Node 44 sends a stabilize message to its new successor, node 50

• Node 50 sets its predecessor to node 44

succ=58

succ=50

Stabilize()pred=44

pred=50

Page 13: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Joining Operation (cont’d)4

20

3235

8

15

44

58

50

• This completes the joining operation!

succ=58

succ=50

pred=44

pred=50

Page 14: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Stabilize-Notify-Join

• Very simple

• Easy to code

• Can handle concurrent joins and failures– Try a few examples.. It may a take a few more

STABILIZE_PERIODS to converge, but will eventually converge

Page 15: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Stabilize-Notify-Join (cont.)

• Not easy to understand– When you get it.. you get it.

• Very hard to debug

• Hard to bootstrap– Lots of corner cases when there are less than k-

nodes in the ring

Page 16: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Coding Advice

• Checkpoint submissions better than expected :-)• No major flaws• Be careful with timers

– “select” returns “no sooner than the requested timeout period”

– Each function call takes time!!– Careful in dealing with negative struct timeval

• More feedback coming soon..– Watch the newsgroup over the weekend :-(.

Page 17: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Problems with timers

• After handing the event at the head of the queue..– Get current time again– Check the “due time” of the next event in the

queue

Page 18: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Timers for stabilize

• Time out for receiving a notify

• When to send the next stabilize– Keep track of lastStabilizeSentTime– Use MIN(lastStabilizeSentTime+STABILIZE_PERIOD-

currTime, nextEventDueTime) for timeout to select– Careful when the successorNode changes

Page 19: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Debugging Tips

• Most problems occur when bootstrapping the ring

• Prefer cerr/fprintf debugging to using gdb– If you set a breakpoint in gdb, every other

program on the ring is going to timeout for some reason or the other

• In the beginning, you may want to increase timers to large values

Page 20: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Testing with lost packets

• With large timeouts– Use keyboard input to determine whether or not

to send a packet– Make sure STABILIZE_PERIOD >

(MAX_STABILIZE_RETRIES+1) * STABILIZE_TIMEOUT

• Use randomized drops with a small drop percentage

Page 21: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Go step-by-step

• Before implementing join, try and implement stabilize and notify– Start with a predetermined ring– Start with only one successor in command line, but the

list should soon grow (because of stabilize-notify)– Detect failures only (no new nodes)– Use large (1s) timeout so don’t have to start all

“chatpeers” at exactly the same time

• Helps get rid of bootstrapping artifacts in the first step