1 chapter 5 synchronization algorithms and concurrent programming gadi taubenfeld © 2014...
TRANSCRIPT
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
1Chapter 5
Synchronization Algorithms and Concurrent ProgrammingGadi Taubenfeld
Chapter 5Barrier Synchronization
Version: June 2014
This presentation is a modified version of a presentation that Itai Avrian and Shachar Gidron prepared for my Seminar in Concurrent and Distributed Computing, 2012.
Chapter 5Synchronization Algorithms and Concurrent
Programming Gadi Taubenfeld © 2014
A note on the use of these ppt slides:
I am making these slides freely available to all (faculty, students, readers).
They are in PowerPoint form so you can add, modify, and delete slides and slide
content to suit your needs. They obviously represent a lot of work on my part.
In return for use, I only ask the following:
That you mention their source, after all, I would like people to use my
book! That you note that they are adapted from (or perhaps identical to)
my slides, and note my copyright of this material.
Thanks and enjoy! Gadi Taubenfeld
All material copyright 2014Gadi Taubenfeld, All Rights Reserved
A note on the use of these ppt slides:
I am making these slides freely available to all (faculty, students, readers).
They are in PowerPoint form so you can add, modify, and delete slides and slide
content to suit your needs. They obviously represent a lot of work on my part.
In return for use, I only ask the following:
That you mention their source, after all, I would like people to use my
book! That you note that they are adapted from (or perhaps identical to)
my slides, and note my copyright of this material.
Thanks and enjoy! Gadi Taubenfeld
All material copyright 2014Gadi Taubenfeld, All Rights Reserved
To get the most updated version of these slides go to :http://www.faculty.idc.ac.il/gadi/book.htm
2
Synchronization Algorithms and Concurrent Programming
ISBN: 0131972596, 1st edition
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Chapter 5
5.1 Barriers
5.2 Atomic Counter
5.3 Test-and-set Bits
5.4 Combining Tree Barrier*
5.5 A Tree-based Barriers
5.6 The Dissemination Barrier*
5.7 The See-Saw Barrier
5.8 Semaphores
5.9 Bibliographic Notes*
5.10 Problems*
Chapter 5 Barrier Synchronization
*Not covered in this presentation
3
Definition and Motivation
Barrier Synchronization
4Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
What is a Barrier ?
5
Barr
ier
P1P1
P2P2
P3P3
P4P4
Barr
ier
P1P1
P2P2
P3P3
P4P4
P1P1
P2P2
P3P3
P4P4
time
Barr
ier
four processes approach the
barrier
all except P4 arrive
Once all arrive, they
continue
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
What is a Barrier ?
6
A barrier is a coordination mechanism (an algorithm), that forces processes which participate in a concurrent (or distributed) algorithm to wait until each one of them has reached a certain point in its program.
The collection of this coordination points is called the barrier.
Once all the processes have reached the barrier, they are all permitted to continue pass the barrier.
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
7
abegin b c d e f
aend a+b a+b+c a+b+c+d a+b+c+d+e
a+b+c+d+e+f
time
Chapter 5
Example: Parallel Prefix Sum
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
8
abegin b c d e f
a a+b c d e f
a a+b a+b+c a+b+c+d e f
aend a+b a+b+c a+b+c+d a+b+c+d+e
a+b+c+d+e+f
a a+b a+b+c d e f
a a+b a+b+c a+b+c+da+b+c+d+e f
Chapter 5
time
Example: Parallel Prefix Sum
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Example: Parallel Prefix Sum
9
abegin b c d e f
a a+b b+c c+d d+e e+f
a a+b a+b+c a+b+c+db+c+d+ec+d+e+f
aend a+b a+b+c a+b+c+d a+b+c+d+e
a+b+c+d+e+f
Chapter 5
time
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
10
abegin b c d e f
a a+b b+c c+d d+e e+f
a a+b a+b+c a+b+c+db+c+d+ec+d+e+f
aend a+b a+b+c a+b+c+d a+b+c+d+e
a+b+c+d+e+f
barrier
barrier
Chapter 5
time
Example: Parallel Prefix Sum
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Example: VideoSingle thread
11
Assume we have a video application Each frame needs to be calculated,
before being displayed Prepare frame for display by graphics processor
while (true)
{
frame = prepare_next_frame();
frame.display();
}
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
12
Now, we have n threads running in parallel It makes sense to split the frame into n disjoint parts
Each thread prepares its own parts in parallel with others
Each thread may run on different graphical processor
Chapter 5
Barrier globalBarrier;
i = getThreadID();
while (true)
{
frame[ i ].prepare();
globalBarrier.await();
frame[ i ].display();
}
Example: VideoMultiple threads
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Where it is needed
13
Scientific & numeric computation
Computer graphics
Garbage collections
Parallel computing in general
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Various Barrier Goals
14
Ideally when designing barriers, we would like to have the following properties:
Low shared memory space complexityLow contention on shared objectsLow shared memory reference per processNo need for shared memory initialization Symmetric-ness (same amount of work for all
processes)Algorithm simplicitySimple basic primtiveMinimal propagation timeReusability of the barrier (must!)
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
15
Atomic BitAtomic RegisterFetch-and-increment registerTest and set bitsRead-Modify-Write registerSemaphores
Chapter 5
Data Objects in Use
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Barriers using atomic counters
Section 5.2
Chapter 5 16
Atomic BitAtomic RegisterFetch-and-increment register / atomic counter
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
17
A shared register that supports a F&I operation: Input: register r Atomic operation:
r is incremented by 1the old value of r is returned
Fetch-and-increment Register
function fetch-and-increment (r : register)
orig_r := r;
r:= r + 1;
return (orig_r);
end-function
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
await macro
18
For clarity, we use the await macro Not an operation of an object This is also called: “spinning”
macro await (condition : boolean condition)
repeat
cond = eval(condition);
until (cond)
end-macro
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
19
1 local.go := go
2 local.counter := fetch-and-increment (counter)
3 if local.counter + 1 = n then
4 counter := 0
5 go := 1 - go
6 else await(local.go ≠ go) fi
shared counter: fetch and increment reg. – {0,..n}, initially = 0
go: atomic bit, initial value is immaterial
local local.go: a bit, initial value is immaterial
local.counter: register
Simple Barrier Using an Atomic CounterProgram of a Process
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
SM
Simple Barrier Using an Atomic CounterRun for n=2 Processes
20
? ?counter go
1 local.go := go
2 local.counter := fetch-and-increment (counter)
3 if local.counter + 1 = n then
4 counter := 0
5 go := 1 - go
6 else await(local.go ≠ go) fi
P1?local.go
?local.counterP2
?local.go
?local.counter
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
P1
21
P1P2
SM0 0counter go
?
1 local.go := go
2 local.counter := fetch-and-increment (counter)
3 if local.counter + 1 = n then
4 counter := 0
5 go := 1 - go
6 else await(local.go ≠ go) fi
0
120 1
local.go
P1 Busy wait
?0local.counterP2
?0local.go
?1local.counter
0+1≠2 1+1=2
Chapter 5
Simple Barrier Using an Atomic CounterRun for n=2 Processes
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
P1
22
P1P2
Simple Barrier Using an Atomic CounterAnother Run for n=2 Processes
SM0 0counter go
?
1 local.go := go
2 local.counter := fetch-and-increment (counter)
3 if local.counter + 1 = n then
4 counter := 0
5 go := 1 - go
6 else await(local.go ≠ go) fi
0
120 1
local.go
P1 Busy wait
?0local.counterP2
?0local.go
?1local.counter
Counter is “fetch-and-increment”
register
P1: 0+1≠2P2: 1+1=2
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
23
1 local.counter := fetch-and-increment(counter)
2 if local.counter + 1 = n then
3 counter := 0
4 else await(counter = 0) fi
shared counter: fetch and increment reg. – {0,..n}, initially = 0
local local.counter: register
Another Algorithm Using an Atomic CounterProgram of a Process
Is this implementation
incorrect?
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Simple Barrier Using an Atomic Counter
24
There is high memory contention on go bit Reducing the contention:
Replace the go bit with n bits: go[1],…,go[n]
Process pi may spin only on the bit go[i]
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
25
shared counter: fetch and increment reg. – {0,..n}, initially = 0
go[1..n]: array of atomic bits, initial values are immaterial
local local.go: a bit, initial value is immaterial
local.counter: register
A Local Spinning Counter BarrierProgram of a Process i
1 local.go := go[i]
2 local.counter := fetch-and-increment (counter)
3 if local.counter + 1 = n then
4 counter := 0
5 for j=1 to n do go[j] := 1 – go[j] od
6 else await(local.go ≠ go[i]) fi
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
SM
A Local Spinning Counter BarrierExample Run for n=3 Processes
26
0 ?counter go ? ?
1 local.go := go[i]
2 local.counter := fetch-and-increment (counter)
3 if local.counter + 1 = n then
4 counter := 0
5 for j=1 to n do go[j] := 1 – go[j] od
6 else await(local.go ≠ go[i]) fi
P1?loc.go
?loc.counterP2
?loc.go
?loc.counterP3
?loc.go
?loc.counter
0 0 0
P2
0
1
0
0+1≠3
P1
P1 Busy wait
0
2
1
1+1≠3
P1,P2 Busy wait
P3
0
3
2
2+1=3
0 1 1 1
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Comparison of fetch-and-increment Barriers
Simple Barrier Simple Barrier with go array
27
Pros:Very SimpleShared memory: O(log
n) bitsTakes O(1) until last
waiting p is awaken
Cons:High contention on the
go bitContention on the
counter register (*)
Pros:Low contention on the
go arrayIn some models:
spinning is done on local memory
remote mem. ref.: O(1)
Cons:Shared memory: O(n)Still contention on the
counter register (*)Takes O(n) until last
waiting p is awaken
Chapter 5 )*(One technique for solving this contention is the
Combining Tree Barriers – page 210
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
28
Barrier is a basic synchronization method To initialize shared memory, processes need to be
synchronized Thus, barrier may be a prerequisite for shared memory
initialization and cannot assume one Processes may not be implemented in the same way So it is desirable to reduce the dependency between
them
Chapter 5
A Barrier without Memory Initialization
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
29
1 local.go := go // remember current value
2 local.counter := counter // remember current value
3 counter := counter +1 (mod n) // atomic increment mod n
4 repeat
5 if counter = local.counter // all processes have arrived
6 then go := 1 – local.go fi // notify all
7 until (local.go ≠ go)
shared counter: atomic counter – {0,..n-1}, initial value is immaterial
go: atomic bit, initial value is immaterial
local local.go: a bit, initial value is immaterial
local.counter: register, initial value is immaterial
A Barrier without Memory InitializationProgram of a Process
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Using Test-and-Set Bits
Section 5.3
30Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Test-and-Set Bit
31
Input: bit b Test-and-set is an atomic operation:
b is set to 1the old value of b (i.e., 0 or 1) is returned
An atomic reset operation, which sets the value to 0, is supported
function test-and-set (b : bit)
orig_b := b;
b:= 1;
return (orig_b);
end-function
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Test-and-Test-and-Set Bit
32
Operations supported:
Test-and-set
Reset
Atomic read (test)
Chapter 5
like a test-and-set bit
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
33
shared leader: test-and-set bit, initial value = 0
countflag: test-and-test-and-set bit, initial value = 0
go: atomic bit, initial value is immaterial
local local.go: a bit, initial value is immaterial
local.counter: register, initial value is immaterial
Chapter 5
0 leader: test-and-set bit
0 countflag: test-and-test-set bit
go: atomic register
Test-and-set based Barrier
Local.counter: register
local.go: bit
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Test-and-set based Barrier
34
1 local.go := go
2 if test-and-set(leader) = 0 then // the leader
3 local.counter := 0
4 repeat
5 await(countflag = 1) // a test operation
6 local.counter = local.counter + 1
7 reset(countflag)
8 until (local.counter = n - 1)
9 reset(leader)
10 go := 1 – go
11 else // the other processes
12 await(test-and-set(countflag) = 0)
13 await(local.go ≠ go)
14 fi
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Test-and-Set Barrier
35
P1 P4P3P2
leader test-and-set atomic operation
First to set the leader bit is the
leader
SM0leader 1
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
repeat
until(local.counter = n - 1)
36
P1 P4P3P2
await (test-and-set atomic operation on countflag)
SM0countflag 1
await ( go changed ? )
await (countflag = 1)
P4 – the leader
0local.counter 123
All processes has arrived, change go bit and exit barrier
go!
Chapter 5
Test-and-Set Barrier
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
A Barrier without Memory Initialization
37
1. The leader count each process twice Needs only to count to 2n – 2 Allows off-by-one mistakes Thus make memory initialization redundant
2. Asymmetric-ness Process has a role according to its index i Pros: saves bits and operations Cons: different processes differ in their tasks
Chapter 5
Two new techniques
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Asymmetric Test-and-set based Barrier w/o M/Iprogram of process i
38
shared countflag: test-and-test-and-set bit, initial value is immaterial
go: atomic bit, initial value is immaterial
local local.go: a bit, initial value is immaterial
local.counter: atomic register, initial value is immaterial
No need for the leader
test-and-set bit
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Asymmetric Test-and-set based Barrier w/o M/Iprogram of process i
39
1 local.go := go
2 if i = 1 then // the leader
3 local.counter := 0
4 repeat
5 await(countflag = 1) // a test operation
6 local.counter = local.counter + 1
7 reset(countflag)
8 until (local.counter = 2n - 2)
9 go := 1 – go
10 else // the other processes
11 await(test-and-set(countflag) = 0)
12 await(test-and-set(countflag) = 0)
13 await(local.go ≠ go)
14 fi
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Test-and-Set based BarriersProperties
40
Different object (T&S instead of F&I)
Pros:Shared memory: Only bits - O(1) space
As opposed to the counter-based which requires O(log n)Does not require memory initialization (in the second
version)
Cons:Asymmetric (in the second version)Still high contention on countflag & go bits
Chapter 5
Tree Based Barriers
Section 5.5
Chapter 5Synchronization Algorithms and Concurrent
ProgrammingGadi Taubenfeld © 2014
41
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
A Tree-based Barrier
42
The processes are organized in a binary tree
Each node is owned by a predetermined process
Each process waits until its 2 children arrive, combines
the results and passes them on to its parent
When the root learns that its 2 children have arrived, it
tells its children that they can move on
The signal propagates down the tree until all the
processes get the message
7654
32
1
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
43
1098 11 12 13 14 15
7654
32
1
Assume
arrive
go
2 3 4 5 6 7 8 9 10 11 12 13 14 15
𝑖
2 𝑖
Chapter 5
A Tree-based Barrier
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
A Tree-based Barrierprogram of process i
44
shared arrive[2..n]: array of atomic bits, initial values = 0
go[2..n]: array of atomic bits, initial values = 0
1 if i=1 then // root
2 await(arrive[2] = 1); arrive[2] := 0
3 await(arrive[3] = 1); arrive[3] := 0
4 go[2] = 1; go[3] = 1
5 else if i ≤ (n-1)/2 then // internal node
6 await(arrive[2i] = 1); arrive[2i] := 0
7 await(arrive[2i+1] = 1); arrive[2i+1] := 0
8 arrive[i] := 1
9 await(go[i] = 1); go[i] := 0
10 go[2i] = 1; go[2i+1] := 1
11 else // leaf
12 arrive[i] := 1
13 await(go[i] = 1); go[i] := 0 fi
14 fi
Chapter 5
7654
32
1
A Tree-based BarrierExample Run for n=7 Processes
45
arrive
go
2 3 4 5 6 7
11 1
7654
32
1
7654
32
1
7654
32
1
7654
32
1
7654
32
1
Waiting for p4 to arrive
Waiting for go[5]
Waiting for go[4]
7654
32
1
Waiting for go[2]
0 0
7654
32
1
1 0 0 1
7654
32
1
Waiting for go[6]
7654
32
1Waiting for p3 to arrive
arrive[2]=1?
P2 zeros arrive[4,5]Arrive[1]=1
?
7654
32
1
7654
32
1
Waiting for go[7]
7654
32
1
P3 zeros arrive[6,7]
1 0 01 0 0 1
P1 zeros arrive[2]
0 0 0 10 0 0 1 10 0 0 0 00 1 0 0 0 0
Waiting for go[3]
7654
32
1
P1 zeros arrive[3]
0 0 0 0 0 0
7654
32
1
0 0 0 0 0 0
1 1
0 0 0 0 0 0
1 1 1 1 1 1
Finished!!
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
46
Pros:Low shared memory contention
No bit is shared by more than 2 processesGood for larger n
Fast (in comparison to local spinning) – information from the root propagates after log(n) stepsUses only atomic bits (no special objects)On some models:
each process spins on a locally accessible bit# (remote memory ref.) = O(1) per process
Cons:Shared memory space complexity – O(n)Asymmetric – not all the processes do the same amount of
work (*)
A Tree-based Barrier
)*(There is a similar barrier which is symmetric, but at the cost of more shared memory consumption -- O(nlogn) as opposed to O(n) .
See the Dissemination Barrier from Section 5.6 page 213.
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
The See-Saw Barrier
Section 5.7
47Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
48
Now, we’ll use a Read-Modify-Write object Allows to construct a symmetric barrier, that requires
only few shared bits This algorithm can also be used to solve the leader
election and the consensus problems
The See-Saw barrier is based on a solution to the wake-up problem which was proposed by M. J. Fischer, S. Moran, S. Rudich, G. Taubenfeld (1996)
See-Saw Barrier
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Read-Modify-Write Register
49
Input: register r with n bits, function f(r) Atomic operation:
Reads the registerCalls function f on r, return value is written into rThe old value of r is returned
Usually f is custom made for the algorithm
function read-modify-write (r : register, f : function)
orig_r := r;
r := f(r);
return (orig_r);
end-function
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Data Flow
50
Tokens:Each process starts with 2 tokensTotal number of tokens doesn’t changeEach process can absorb one token or emit one
token, at a time See-Saw:
One see-sawCan be left-up-right-down OR left-down-right-upEach process that enters the playground needs to
get-up on the see-sawEach process which is on the see-saw is either on
the left side or the right side Tokens are weightless
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Data Representation
Using 2-bit read-modify-write register
51
Token BitTwo states:
1. no-token-present
2. token-present
See-saw BitTwo states:
1. left-side-down
2. right-side-down
P2T: 2
P1T: 2P1T: 1
P2T: 3
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
P7T: 2
P6T: 2
P5T: 2
Process State
52
P3T: 2
P2T: 2
P1T: 0P4
T: 2
Never been on
On left side
On right side
Got-off
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Runtime Flow
53
Each process loops until it got-off from the see-sawAfter it got-off, waits for the go flag
The algorithm is based on 5-rules On each loop iteration:
According to its state, one rule is performedOnly one process at a time performs a ruleA rule is done atomically, using the RMW registerEach rule changes the tokens and/or the state of the see-
saw
There can be many processes on each side (up to )
When one of the processes gets 2n tokens, it gets-off and sets the go flag
Chapter 5
54
Applicable if:scheduled process is “never-been-on”
Operation: Saves the go bit locallygot on the up side, and swings the see-saw
P1T: 2
P2T: 2
RMWToken-state
Left-side-downSee-saw-state
No-token-presentRule #1 – Start ofAlgorithm Right-side-down
Chapter 5
55
Applicable if:scheduled process is “down-side”, has tokens,
and token-state = no-token-presentOperation:
Deposit one token in the shared token-stateIf remains without tokens, got-off the see-saw, and
swing it
P1T: 2
P2T: 2
RMWNo-token-presentToken-state
Left-side-downSee-saw-state
P2T: 1
Token-present
Chapter 5
Rule #2 – Emitter
56
Applicable if:scheduled process is “up-side”, and
token-state = token-presentOperation:
Takes the token from token-state
P1T: 2
P2T: 1
RMWToken-state
Left-side-downSee-saw-state
Token-presentNo-token-present
P1T: 3
Chapter 5
Rule #3 – Absorber
57
Applicable if:scheduled process is “down-side”, has tokens,and token-state = no-token-present
Operation: Deposit one token in the shared token-stateIf remains without tokens, got-off the see-saw, and
swing it
RMWToken-state
Left-side-downSee-saw-state
No-token-presentToken-present
P1T: 3
P2T: 1P2T: 0
!
The process that got-off now awaits the go flag
Right-side-down
Chapter 5
Rule #2 – Emitter
58
P1T: 3
RMWToken-state
Right-side-downSee-saw-state
Token-presentNo-token-present
P2T: 0
Z Z Z…
Chapter 5
Rule #4 – Leader
Applicable if: scheduled process is on the see-saw, and sees at
least 2n tokensOperation:
Gets-off the see-saw, and flips the shared go bit
59
Applicable if: scheduled process is on the see-saw, and sees at
least 2n tokensOperation:
Gets-off the see-saw, and flips the shared go bit
P1T: 4
RMWToken-state
Right-side-downSee-saw-state
No-token-present
P2T: 0
Z Z Z…
go!
Chapter 5
Rule #4 – Leader
60
Applicable if:scheduled process notices that the go bit has
been flipped (relative to its local.go)Operation:
Everybody has arrived continue past the barrier
P1T: 4
RMWToken-state
Right-side-downSee-saw-state
No-token-present
P2T: 0
Z Z Z…
go!
P2T: 0
Chapter 5
Rule #5 – End of the Algorithm
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Important Invariants
61
Token InvariantDuring a single episode of the see-saw barrier,
the number of tokens in the systemis either 2n or 2n+1 (like in the test-and-set barrier)never changes
Balance InvariantDuring a single episode of the see-saw barrier,
the number of processes on the left and on the right side of the see-saw iseither perfectly balancedor favored the down-side by 1
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Correctness
62
When all processes are on the see-saw:Tokens are given from the down side, until one gets-offBy induction, at some point:
one process will see 2n tokens So no deadlock.
2n tokens can only be accumulated if all processes have arrived, so this is a barrier.
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Remarks
63
All the logic is done inside the atomic Modify function of the RMW register
Needs to read and modify all the three bits atomically, to prevent race-conditions
Before a process applies a rule, it first checks whether the go bit has been flipped relative to its local.go (regardless of its current state) !!!
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Question
64
How many times does the state of the shared memory change during one episode of the see-saw barrier?
O(n) in the best case O(n2) in the worst case
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
65
Pros:O(1) shared memory space complexityNo need to initialize shared memorySymmetric
Cons:Uses custom Read-Modify-Write registerHigh memory contention on the RMW bitsWorst case O(n2) total shared memory
referencesComplex
The See-Saw Barrier
Chapter 5
The code:
Chapter 5Synchronization Algorithms and Concurrent
ProgrammingGadi Taubenfeld © 2014
66
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
A Barrier using Semaphores
Section 5.8
67Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Semaphore
68
Shared objectTakes a non-negative integer valueSupports two operations:
DownIf value > 0, the value is decremented by 1Otherwise, the process is blocked until the value
becomes > 0Up – the value is incremented by 1
Incrementing, Decrementing and testing the semaphore are executed atomically
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Binary Semaphore
69
Semaphore whose value is only 0 or 1Decrementing is identical to general
semaphoreIncrementing is equal to setting the value
to 1Initial value is assume to be 1
Can be used to implement a deadlock-free mutual exclusion:
down(S)
critical-section
up(S)
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Barrier using SemaphoresAlgorithm for n processes
70
1 down(arrival)
2 counter := counter + 1 // atomic register
3 if counter < n then up(arrival) else up(departure) fi
4 down(departure)
5 counter := counter - 1
6 if counter > 0 then up(departure) else up(arrival) fi
shared arrival: binary semaphore, initially 1
departure: binary semaphore, initially 0
counter: atomic register ranges over {0, …, n}, initially 0
Question:Would this barrier be correct if the
shared counter won’t be an atomic register?
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Barrier using SemaphoresProperties
71
Pros:Very SimpleSpace complexity O(1)Symmetric
Cons:Required a strong object
Requires some central managerHigh contention on the semaphores if no central
managerPropagation delay O(n)
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Summary
Barrier Synchronization
72Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Barriers we’ve seen
73
Simple barrierBased on atomic fetch-and-increment counter
Local spinning barrierBased on atomic fetch-and-increment counter
and go array
Test-and-Set barriersBased on test-and-test-and-set objectsOne version without memory initialization
Tree-based barrierSee-Saw barrierSemaphore-based barrier
Chapter 5
Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014
Conclusions
74
Many possible algorithms for Barrier Synchronization
Each has pros/cons
Different shared objects allow various algorithms
Choosing the correct barrier is application/platform
dependent (need to do benchmarking to know for sure).
Chapter 5