local-spin algorithms
DESCRIPTION
Multiprocessor synchronization algorithms (20225241). Local-Spin Algorithms. Lecturer: Danny Hendler. - PowerPoint PPT PresentationTRANSCRIPT
Local-Spin Algorithms
Multiprocessor synchronization algorithms (20225241)
Lecturer: Danny Hendler
This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman
Remote and local memory accesses
In a DSM system: local
remote
In a Cache-coherent system:
An access of v by p is remote if it is the first access of v or if v has been written by another process since p’s last access of it.
Local-spin algorithms
• In a local-spin algorithm, all busy waiting (‘await’) is done by read-only loops of local-accesses, that do not cause interconnect traffic.
• The same algorithm may be local-spin on one architecture (DSM or CC) and non-local spin on the other.
For local-spin algorithms, our complexity metric is theworst-case number of Remote Memory References (RMRs)
Peterson’s 2-process algorithm
Program for process 1
1. b[1]:=true2. turn:=13. await (b[0]=false or
turn=0)4. CS5. b[1]:=false
Program for process 0
1. b[0]:=true2. turn:=03. await (b[1]=false or
turn=1)4. CS5. b[1]:=false
Is this algorithm local-spin on a DSM machine?No
Is this algorithm local-spin on a CC machine?Yes
Recall the following simple test-and-set based algorithm
Shared lock initially 0
1. While (! lock.test-and-set() ) // entry section2. Critical Section3. Lock := 0 // exit section
This algorithm is not local-spin on neither a DSM or CC machine
(A RMW operation always incurs an RMR)
A better algorithm: test-and-test-and-set
Shared lock initially 0
1. While (! lock.test-and-set() )// entry section 2. await(lock == 0)3. Critical Section4. Lock := 0 // exit section
Creates less traffic in CC machines, still not local-spin.
Local Spinning Mutual ExclusionUsing Strong Primitives
Anderson’s queue-based algorithm(Anderson, 1990)
Shared:integer ticket – A RMW object, initially 0bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i{1,..,n-1}Local:integer myTicket
Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor
0 1 2 3 n-1
valid 1 0
1
0 0 0 0
ticket
Anderson’s queue-based algorithm (cont’d)
0ticket
valid 1 0 0 0 0
Initial configuration
1ticket
valid 1 0 0 0 0
After entry section of p3
0myTicket3
After p1 performs entry section
2ticket
valid 1 0 0 0 0
0myTicket3
1myTicket1
2ticket
valid 0 1 0 0 0
After p3 exits
1myTicket1
Anderson’s queue-based algorithm (cont’d)
What is the RMR complexity on a DSM machine?
Unbounded
What is the RMR complexity on a CC machine?Constant
Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor
The MCS queue-based algorithm(Mellor-Crummey and Scott, 1991)
Type:Qnode: structure {bit locked, Qnode *next}Shared:Qnode nodes[0..n-1]
Qnode *tail initially null
Local:Qnode *myNode, initially &nodes[i]Qnode *successor
• Has constant RMR complexity under both the DSM and CC models
• Uses swap and CAS
Tail
nodes
1 2 3 n-1 n
F T T
The MCS queue-based algorithm (cont’d)
Program for process i1. myNode->next := null; prepare to be last in queue2. pred=swap(&tail, myNode ) ;tail now points to myNode3. if (pred ≠ null) ;I need to wait for a predecessor4. myNode->locked := true ;prepare to wait5. pred->next := myNode ;let my predecessor know it has to unlock me6. await myNode.locked := false7. CS8. if (myNode.next = null) ; if not sure there is a successor 9. if (compare-and-swap(&tail, myNode, null) = false) ; if there is a
successor10. await (myNode->next ≠ null) ; spin until successor lets me know its
identity11. successor := myNode->next ; get a pointer to my successor12. successor->locked := false ; unlock my successor13. else ; for sure, I have a successor14. successor := myNode->next ; get a pointer to my successor15. successor->locked := false ; unlock my successor
The MCS queue-based algorithm (cont’d)
Local Spinning Mutual ExclusionUsing reads and writes
A local-spin tournament-tree algorithm(Anderson, Yang, 1993)
O(log n) RMR complexity for both DSM and CC systems
This is optimal (Attiya, Hendler, woelfel, 2008)
Uses O(n log n) registers
0
0 1
0 1 2 3
0 1 2 3 4 5 6 7
Level 0
Level 1
Level 2
Processes
Each node is identified by
(level, number)
A local-spin tournament-tree algorithm (cont’d)
Shared:- Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node]
- Per each level l and process i, a spin flag: flag[ level, i ] initially 0
Local:level, node, id
A local-spin tournament-tree algorithm (cont’d)Program for process i1. node:=i2. For level = o to log n-1 do ;from leaf to root3. node:= node/2 ;compute node in new level4. id=node mod 2 ; compute ID for 2-process mutex algorithm (0 or 1)5. name[level, 2node + id]:=i ;identify yourself6. turn[level,node]:=i ;update the tie-breaker7. flag[level, i]:=0 ;initialize my locally-accessible spin flag8. rival:=name[level, 2node+1-id] 9. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival10. if (flag[level, rival] =0) If rival may get to wait at line 1411. flag[level, rival]:=1 ;Release rival by letting it know I updated tie-breaker12. await flag[level, i] ≠ 0 ;await until signaled by rival (so it updated tie-
breaker)13. if (turn[level,node]=i) ;if I lost14. await flag[level,i]=2 ;wait till rival notifies me its my turn15. id:=node ;move to the next level16. EndFor17. CS18. for level=log n –1 downto 0 do ;begin exit code19. id:= i/2level, node:= id/2 ;set node and id20. name[level, 2node+id ]) :=-1 ;erase name21. rival := turn[level,node] ;find who rival is (if there is one)22. if rival ≠ i ;if there is a rival23. flag[level,rival] :=2 ;notify rival
Local-Spin Leader Election
• Exactly one process is elected
• All other processes are not-elected
• Processes may busy-wait
Choy and Sing's filter
Filter
m processes
The rest are “halted”
Between 1 and m/2 processes “exit “
Filter guarantees:
• Safety: if m processes enter a filter, at most m/2 exit.• Progress: if some processes enter a filter, at least one exits.
Choy and Singh's filter (cont’d)Shared:integer turnBoolean b, initially false
Program for process i1. turn := i2. await b // wait for barrier to open3. b := true // close barrier4. if turn ≠ i // not last to cross the barrier 5. b := false // open barrier6. halt7. else8. exit
Why are filter guarantees satisfied?
Why does the barrier has to be re-opened?
Choy and Sing’s filter algorithm
Filter #1
Filter #2
Filter #i
Choy and Sing’s filter algorithm (cont’d)Shared:typdef struct{integer turn, boolean b,c initially false} filterfilter A[log n + 1]
Program for process i1. For (curr=0; cur < log n +1; curr++)2. A[curr].turn := p3. Await A[curr].b4. A[curr].b:=true5. if (A[curr]. turn ≠ i)6. A[curr].c := true // mark that some process failed on filter7. A[curr].b := false8. return not-elected9. else if (curr > 0) A[curr-1].c10. return elected // Other processes will never exit this filter11. else12. curr := curr+113. EndFor
Do you see any problem with this algorithm?How can this be fixed?
Choy and Sing’s filter algorithm (cont’d)
• What is the DSM RMR complexity? Unbounded
Program for process i1. For (curr=0; cur < log n +1; curr++)2. A[curr].turn := p3. Await A[curr].b4. A[curr].b:=true5. if (A[curr]. turn ≠ i)6. A[curr].c := true // mark that some process failed on filter7. A[curr].b := false8. return not-elected9. else if (curr > 0) A[curr-1].c10. return elected // Other processes will never reach this filter11. Else12. curr := curr+113. EndFor
Choy and Sing’s filter algorithm (cont’d)
• What is the CC RMR complexity?
Program for process i1. For (curr=0; cur < log n +1; curr++)2. A[curr].turn := p3. Await A[curr].b4. A[curr].b:=true5. if (A[curr]. turn ≠ i)6. A[curr].c := true // mark that some process failed on filter7. A[curr].b := false8. return not-elected9. else if (curr > 0) A[curr-1].c10. return elected // Other processes will never reach this filter11. Else12. curr := curr+113. EndFor
Choy and Sing’s filter algorithm (cont’d)
• What is the CC RMR complexity?
Program for process i1. For (curr=0; cur < log n +1; curr++)2. A[curr].turn := p3. Await A[curr].b4. A[curr].b:=true5. if (A[curr]. turn ≠ i)6. A[curr].c := true // mark that some process failed on filter7. A[curr].b := false8. return not-elected9. else if (curr > 0) A[curr-1].c10. return elected // Other processes will never reach this filter11. Else12. curr := curr+113. EndFor
A process may incur here a linear number of RMRs
• What is the worst-case CC RMR complexity?
Choy and Sing’s filter algorithm (cont’d)
Linear
• Any ideas for a (log n)-RMRs algorithm?
A simple modification of the tournament-tree algorithm
Is there an O(1) RMRs leader election algorithm from reads and writes?
Yes [Golab, Hendler and Woelfel, 2006]
Conditional primitives (e.g. compare-and-swap) are no stronger than reads & writes for RMR complexity [Golab, Hadzilacos, Hendler and Woelfel, 2007]