local-spin algorithms

Local-Spin Algorithms

Multiprocessor synchronization algorithms (20225241)

Lecturer: Danny Hendler

This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

Remote and local memory accesses

In a DSM system: local

remote

In a Cache-coherent system:

An access of v by p is remote if it is the first access of v or if v has been written by another process since p’s last access of it.

Local-spin algorithms

• In a local-spin algorithm, all busy waiting (‘await’) is done by read-only loops of local-accesses, that do not cause interconnect traffic.

• The same algorithm may be local-spin on one architecture (DSM or CC) and non-local spin on the other.

For local-spin algorithms, our complexity metric is theworst-case number of Remote Memory References (RMRs)

Peterson’s 2-process algorithm

Program for process 1

1. b[1]:=true2. turn:=13. await (b[0]=false or

turn=0)4. CS5. b[1]:=false

Program for process 0

1. b[0]:=true2. turn:=03. await (b[1]=false or

turn=1)4. CS5. b[1]:=false

Is this algorithm local-spin on a DSM machine?No

Is this algorithm local-spin on a CC machine?Yes

Recall the following simple test-and-set based algorithm

Shared lock initially 0

1. While (! lock.test-and-set() ) // entry section2. Critical Section3. Lock := 0 // exit section

This algorithm is not local-spin on neither a DSM or CC machine

(A RMW operation always incurs an RMR)

A better algorithm: test-and-test-and-set

Shared lock initially 0

1. While (! lock.test-and-set() )// entry section 2. await(lock == 0)3. Critical Section4. Lock := 0 // exit section

Creates less traffic in CC machines, still not local-spin.

Local Spinning Mutual ExclusionUsing Strong Primitives

Anderson’s queue-based algorithm(Anderson, 1990)

Shared:integer ticket – A RMW object, initially 0bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i{1,..,n-1}Local:integer myTicket

Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor

0 1 2 3 n-1

valid 1 0

1

0 0 0 0

ticket

Anderson’s queue-based algorithm (cont’d)

0ticket

valid 1 0 0 0 0

Initial configuration

1ticket

valid 1 0 0 0 0

After entry section of p3

0myTicket3

After p1 performs entry section

2ticket

valid 1 0 0 0 0

0myTicket3

1myTicket1

2ticket

valid 0 1 0 0 0

After p3 exits

1myTicket1

Anderson’s queue-based algorithm (cont’d)

What is the RMR complexity on a DSM machine?

Unbounded

What is the RMR complexity on a CC machine?Constant

Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor

The MCS queue-based algorithm(Mellor-Crummey and Scott, 1991)

Type:Qnode: structure {bit locked, Qnode *next}Shared:Qnode nodes[0..n-1]

Qnode *tail initially null

Local:Qnode *myNode, initially &nodes[i]Qnode *successor

• Has constant RMR complexity under both the DSM and CC models

• Uses swap and CAS

Tail

nodes

1 2 3 n-1 n

F T T

The MCS queue-based algorithm (cont’d)

Program for process i1. myNode->next := null; prepare to be last in queue2. pred=swap(&tail, myNode ) ;tail now points to myNode3. if (pred ≠ null) ;I need to wait for a predecessor4. myNode->locked := true ;prepare to wait5. pred->next := myNode ;let my predecessor know it has to unlock me6. await myNode.locked := false7. CS8. if (myNode.next = null) ; if not sure there is a successor 9. if (compare-and-swap(&tail, myNode, null) = false) ; if there is a

successor10. await (myNode->next ≠ null) ; spin until successor lets me know its

identity11. successor := myNode->next ; get a pointer to my successor12. successor->locked := false ; unlock my successor13. else ; for sure, I have a successor14. successor := myNode->next ; get a pointer to my successor15. successor->locked := false ; unlock my successor

The MCS queue-based algorithm (cont’d)

Local Spinning Mutual ExclusionUsing reads and writes

A local-spin tournament-tree algorithm(Anderson, Yang, 1993)

O(log n) RMR complexity for both DSM and CC systems

This is optimal (Attiya, Hendler, woelfel, 2008)

Uses O(n log n) registers

0

0 1

0 1 2 3

0 1 2 3 4 5 6 7

Level 0

Level 1

Level 2

Processes

Each node is identified by

(level, number)

A local-spin tournament-tree algorithm (cont’d)

Shared:- Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node]

- Per each level l and process i, a spin flag: flag[ level, i ] initially 0

Local:level, node, id

A local-spin tournament-tree algorithm (cont’d)Program for process i1. node:=i2. For level = o to log n-1 do ;from leaf to root3. node:= node/2 ;compute node in new level4. id=node mod 2 ; compute ID for 2-process mutex algorithm (0 or 1)5. name[level, 2node + id]:=i ;identify yourself6. turn[level,node]:=i ;update the tie-breaker7. flag[level, i]:=0 ;initialize my locally-accessible spin flag8. rival:=name[level, 2node+1-id] 9. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival10. if (flag[level, rival] =0) If rival may get to wait at line 1411. flag[level, rival]:=1 ;Release rival by letting it know I updated tie-breaker12. await flag[level, i] ≠ 0 ;await until signaled by rival (so it updated tie-

breaker)13. if (turn[level,node]=i) ;if I lost14. await flag[level,i]=2 ;wait till rival notifies me its my turn15. id:=node ;move to the next level16. EndFor17. CS18. for level=log n –1 downto 0 do ;begin exit code19. id:= i/2level, node:= id/2 ;set node and id20. name[level, 2node+id ]) :=-1 ;erase name21. rival := turn[level,node] ;find who rival is (if there is one)22. if rival ≠ i ;if there is a rival23. flag[level,rival] :=2 ;notify rival

Local-Spin Leader Election

• Exactly one process is elected

• All other processes are not-elected

• Processes may busy-wait

Choy and Sing's filter

Filter

m processes

The rest are “halted”

Between 1 and m/2 processes “exit “

Filter guarantees:

• Safety: if m processes enter a filter, at most m/2 exit.• Progress: if some processes enter a filter, at least one exits.

Choy and Singh's filter (cont’d)Shared:integer turnBoolean b, initially false

Program for process i1. turn := i2. await b // wait for barrier to open3. b := true // close barrier4. if turn ≠ i // not last to cross the barrier 5. b := false // open barrier6. halt7. else8. exit

Why are filter guarantees satisfied?

Why does the barrier has to be re-opened?

Choy and Sing’s filter algorithm

Filter #1

Filter #2

Filter #i

Choy and Sing’s filter algorithm (cont’d)Shared:typdef struct{integer turn, boolean b,c initially false} filterfilter A[log n + 1]

Program for process i1. For (curr=0; cur < log n +1; curr++)2. A[curr].turn := p3. Await A[curr].b4. A[curr].b:=true5. if (A[curr]. turn ≠ i)6. A[curr].c := true // mark that some process failed on filter7. A[curr].b := false8. return not-elected9. else if (curr > 0) A[curr-1].c10. return elected // Other processes will never exit this filter11. else12. curr := curr+113. EndFor

Do you see any problem with this algorithm?How can this be fixed?

Choy and Sing’s filter algorithm (cont’d)

• What is the DSM RMR complexity? Unbounded

Program for process i1. For (curr=0; cur < log n +1; curr++)2. A[curr].turn := p3. Await A[curr].b4. A[curr].b:=true5. if (A[curr]. turn ≠ i)6. A[curr].c := true // mark that some process failed on filter7. A[curr].b := false8. return not-elected9. else if (curr > 0) A[curr-1].c10. return elected // Other processes will never reach this filter11. Else12. curr := curr+113. EndFor


• What is the CC RMR complexity?



• What is the CC RMR complexity?


A process may incur here a linear number of RMRs

• What is the worst-case CC RMR complexity?


Linear

• Any ideas for a (log n)-RMRs algorithm?

A simple modification of the tournament-tree algorithm

Is there an O(1) RMRs leader election algorithm from reads and writes?

Yes [Golab, Hendler and Woelfel, 2006]

Conditional primitives (e.g. compare-and-swap) are no stronger than reads & writes for RMR complexity [Golab, Hadzilacos, Hendler and Woelfel, 2007]

local-spin algorithms

Documents