1 chapter 5 synchronization algorithms and concurrent programming gadi taubenfeld © 2014...

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

1Chapter 5

Synchronization Algorithms and Concurrent ProgrammingGadi Taubenfeld

Chapter 5Barrier Synchronization

Version: June 2014

This presentation is a modified version of a presentation that Itai Avrian and Shachar Gidron prepared for my Seminar in Concurrent and Distributed Computing, 2012.

Chapter 5Synchronization Algorithms and Concurrent

Programming Gadi Taubenfeld © 2014

A note on the use of these ppt slides:

I am making these slides freely available to all (faculty, students, readers).

They are in PowerPoint form so you can add, modify, and delete slides and slide

content to suit your needs. They obviously represent a lot of work on my part.

In return for use, I only ask the following:

That you mention their source, after all, I would like people to use my

book! That you note that they are adapted from (or perhaps identical to)

my slides, and note my copyright of this material.

Thanks and enjoy! Gadi Taubenfeld

All material copyright 2014Gadi Taubenfeld, All Rights Reserved

A note on the use of these ppt slides:

I am making these slides freely available to all (faculty, students, readers).

They are in PowerPoint form so you can add, modify, and delete slides and slide

content to suit your needs. They obviously represent a lot of work on my part.

In return for use, I only ask the following:

That you mention their source, after all, I would like people to use my

book! That you note that they are adapted from (or perhaps identical to)

my slides, and note my copyright of this material.

Thanks and enjoy! Gadi Taubenfeld

All material copyright 2014Gadi Taubenfeld, All Rights Reserved

To get the most updated version of these slides go to :http://www.faculty.idc.ac.il/gadi/book.htm

2

Synchronization Algorithms and Concurrent Programming

ISBN: 0131972596, 1st edition


Chapter 5

5.1 Barriers

5.2 Atomic Counter

5.3 Test-and-set Bits

5.4 Combining Tree Barrier*

5.5 A Tree-based Barriers

5.6 The Dissemination Barrier*

5.7 The See-Saw Barrier

5.8 Semaphores

5.9 Bibliographic Notes*

5.10 Problems*

Chapter 5 Barrier Synchronization

*Not covered in this presentation

3

Definition and Motivation

Barrier Synchronization

4Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

http://www.google.co.il/url?sa=i&source=images&cd=&cad=rja&uact=8&docid=JINGcCJCgDvBiM&tbnid=AII34FSeWRo75M&ved=0CAgQjRw&url=http://www.sustainability.com/blog/overcoming-barriers-to-a-green-economy&ei=4eqBU_jqKMON7Aan5YCABA&psig=AFQjCNGhLQ4Ql339FVuTr3_ZCY2C_ghEvw&ust=1401109601763560


What is a Barrier ?

5

Barr

ier

P1P1

P2P2

P3P3

P4P4

Barr

ier

P1P1

P2P2

P3P3

P4P4

P1P1

P2P2

P3P3

P4P4

time

Barr

ier

four processes approach the

barrier

all except P4 arrive

Once all arrive, they

continue

Chapter 5


What is a Barrier ?

6

A barrier is a coordination mechanism (an algorithm), that forces processes which participate in a concurrent (or distributed) algorithm to wait until each one of them has reached a certain point in its program.

The collection of this coordination points is called the barrier.

Once all the processes have reached the barrier, they are all permitted to continue pass the barrier.

Chapter 5


7

abegin b c d e f

aend a+b a+b+c a+b+c+d a+b+c+d+e

a+b+c+d+e+f

time

Chapter 5

Example: Parallel Prefix Sum


8

abegin b c d e f

a a+b c d e f

a a+b a+b+c a+b+c+d e f


a+b+c+d+e+f

a a+b a+b+c d e f

a a+b a+b+c a+b+c+da+b+c+d+e f

Chapter 5

time




9

abegin b c d e f

a a+b b+c c+d d+e e+f

a a+b a+b+c a+b+c+db+c+d+ec+d+e+f


a+b+c+d+e+f

Chapter 5

time


10

abegin b c d e f

a a+b b+c c+d d+e e+f

a a+b a+b+c a+b+c+db+c+d+ec+d+e+f


a+b+c+d+e+f

barrier

barrier

Chapter 5

time



Example: VideoSingle thread

11

Assume we have a video application Each frame needs to be calculated,

before being displayed Prepare frame for display by graphics processor

while (true)

{

frame = prepare_next_frame();

frame.display();

}

Chapter 5


12

Now, we have n threads running in parallel It makes sense to split the frame into n disjoint parts

Each thread prepares its own parts in parallel with others

Each thread may run on different graphical processor

Chapter 5

Barrier globalBarrier;

i = getThreadID();

while (true)

{

frame[ i ].prepare();

globalBarrier.await();

frame[ i ].display();

}

Example: VideoMultiple threads


Where it is needed

13

Scientific & numeric computation

Computer graphics

Garbage collections

Parallel computing in general

Chapter 5


Various Barrier Goals

14

Ideally when designing barriers, we would like to have the following properties:

Low shared memory space complexityLow contention on shared objectsLow shared memory reference per processNo need for shared memory initialization Symmetric-ness (same amount of work for all

processes)Algorithm simplicitySimple basic primtiveMinimal propagation timeReusability of the barrier (must!)

Chapter 5


15

Atomic BitAtomic RegisterFetch-and-increment registerTest and set bitsRead-Modify-Write registerSemaphores

Chapter 5

Data Objects in Use


Barriers using atomic counters

Section 5.2

Chapter 5 16

Atomic BitAtomic RegisterFetch-and-increment register / atomic counter


17

A shared register that supports a F&I operation: Input: register r Atomic operation:

r is incremented by 1the old value of r is returned

Fetch-and-increment Register

function fetch-and-increment (r : register)

orig_r := r;

r:= r + 1;

return (orig_r);

end-function

Chapter 5


await macro

18

For clarity, we use the await macro Not an operation of an object This is also called: “spinning”

macro await (condition : boolean condition)

repeat

cond = eval(condition);

until (cond)

end-macro

Chapter 5


19

1 local.go := go

2 local.counter := fetch-and-increment (counter)

3 if local.counter + 1 = n then

4 counter := 0

5 go := 1 - go

6 else await(local.go ≠ go) fi

shared counter: fetch and increment reg. – {0,..n}, initially = 0

go: atomic bit, initial value is immaterial

local local.go: a bit, initial value is immaterial

local.counter: register

Simple Barrier Using an Atomic CounterProgram of a Process

Chapter 5


SM

Simple Barrier Using an Atomic CounterRun for n=2 Processes

20

? ?counter go

1 local.go := go



4 counter := 0

5 go := 1 - go


P1?local.go

?local.counterP2

?local.go

?local.counter

Chapter 5


P1

21

P1P2

SM0 0counter go

?

1 local.go := go



4 counter := 0

5 go := 1 - go


0

120 1

local.go

P1 Busy wait

?0local.counterP2

?0local.go

?1local.counter

0+1≠2 1+1=2

Chapter 5

Simple Barrier Using an Atomic CounterRun for n=2 Processes


P1

22

P1P2

Simple Barrier Using an Atomic CounterAnother Run for n=2 Processes

SM0 0counter go

?

1 local.go := go



4 counter := 0

5 go := 1 - go


0

120 1

local.go

P1 Busy wait

?0local.counterP2

?0local.go

?1local.counter

Counter is “fetch-and-increment”

register

P1: 0+1≠2P2: 1+1=2

Chapter 5


23

1 local.counter := fetch-and-increment(counter)


3 counter := 0

4 else await(counter = 0) fi


local local.counter: register

Another Algorithm Using an Atomic CounterProgram of a Process

Is this implementation

incorrect?

Chapter 5


Simple Barrier Using an Atomic Counter

24

There is high memory contention on go bit Reducing the contention:

Replace the go bit with n bits: go[1],…,go[n]

Process pi may spin only on the bit go[i]

Chapter 5


25


go[1..n]: array of atomic bits, initial values are immaterial


local.counter: register

A Local Spinning Counter BarrierProgram of a Process i

1 local.go := go[i]



4 counter := 0

5 for j=1 to n do go[j] := 1 – go[j] od

6 else await(local.go ≠ go[i]) fi

Chapter 5


SM

A Local Spinning Counter BarrierExample Run for n=3 Processes

26

0 ?counter go ? ?

1 local.go := go[i]



4 counter := 0

5 for j=1 to n do go[j] := 1 – go[j] od

6 else await(local.go ≠ go[i]) fi

P1?loc.go

?loc.counterP2

?loc.go

?loc.counterP3

?loc.go

?loc.counter

0 0 0

P2

0

1

0

0+1≠3

P1

P1 Busy wait

0

2

1

1+1≠3

P1,P2 Busy wait

P3

0

3

2

2+1=3

0 1 1 1

Chapter 5


Comparison of fetch-and-increment Barriers

Simple Barrier Simple Barrier with go array

27

Pros:Very SimpleShared memory: O(log

n) bitsTakes O(1) until last

waiting p is awaken

Cons:High contention on the

go bitContention on the

counter register (*)

Pros:Low contention on the

go arrayIn some models:

spinning is done on local memory

remote mem. ref.: O(1)

Cons:Shared memory: O(n)Still contention on the

counter register (*)Takes O(n) until last

waiting p is awaken

Chapter 5 )*(One technique for solving this contention is the

Combining Tree Barriers – page 210


28

Barrier is a basic synchronization method To initialize shared memory, processes need to be

synchronized Thus, barrier may be a prerequisite for shared memory

initialization and cannot assume one Processes may not be implemented in the same way So it is desirable to reduce the dependency between

them

Chapter 5

A Barrier without Memory Initialization


29

1 local.go := go // remember current value

2 local.counter := counter // remember current value

3 counter := counter +1 (mod n) // atomic increment mod n

4 repeat

5 if counter = local.counter // all processes have arrived

6 then go := 1 – local.go fi // notify all

7 until (local.go ≠ go)

shared counter: atomic counter – {0,..n-1}, initial value is immaterial



local.counter: register, initial value is immaterial

A Barrier without Memory InitializationProgram of a Process

Chapter 5


Using Test-and-Set Bits

Section 5.3

30Chapter 5


Test-and-Set Bit

31

Input: bit b Test-and-set is an atomic operation:

b is set to 1the old value of b (i.e., 0 or 1) is returned

An atomic reset operation, which sets the value to 0, is supported

function test-and-set (b : bit)

orig_b := b;

b:= 1;

return (orig_b);

end-function

Chapter 5


Test-and-Test-and-Set Bit

32

Operations supported:

Test-and-set

Reset

Atomic read (test)

Chapter 5

like a test-and-set bit


33

shared leader: test-and-set bit, initial value = 0

countflag: test-and-test-and-set bit, initial value = 0



local.counter: register, initial value is immaterial

Chapter 5

0 leader: test-and-set bit

0 countflag: test-and-test-set bit

go: atomic register

Test-and-set based Barrier

Local.counter: register

local.go: bit


Test-and-set based Barrier

34

1 local.go := go

2 if test-and-set(leader) = 0 then // the leader

3 local.counter := 0

4 repeat

5 await(countflag = 1) // a test operation

6 local.counter = local.counter + 1

7 reset(countflag)

8 until (local.counter = n - 1)

9 reset(leader)

10 go := 1 – go

11 else // the other processes

12 await(test-and-set(countflag) = 0)

13 await(local.go ≠ go)

14 fi

Chapter 5


Test-and-Set Barrier

35

P1 P4P3P2

leader test-and-set atomic operation

First to set the leader bit is the

leader

SM0leader 1

Chapter 5


repeat

until(local.counter = n - 1)

36

P1 P4P3P2

await (test-and-set atomic operation on countflag)

SM0countflag 1

await ( go changed ? )

await (countflag = 1)

P4 – the leader

0local.counter 123

All processes has arrived, change go bit and exit barrier

go!

Chapter 5

Test-and-Set Barrier


A Barrier without Memory Initialization

37

1. The leader count each process twice Needs only to count to 2n – 2 Allows off-by-one mistakes Thus make memory initialization redundant

2. Asymmetric-ness Process has a role according to its index i Pros: saves bits and operations Cons: different processes differ in their tasks

Chapter 5

Two new techniques


Asymmetric Test-and-set based Barrier w/o M/Iprogram of process i

38

shared countflag: test-and-test-and-set bit, initial value is immaterial



local.counter: atomic register, initial value is immaterial

No need for the leader

test-and-set bit

Chapter 5


Asymmetric Test-and-set based Barrier w/o M/Iprogram of process i

39

1 local.go := go

2 if i = 1 then // the leader

3 local.counter := 0

4 repeat

5 await(countflag = 1) // a test operation

6 local.counter = local.counter + 1

7 reset(countflag)

8 until (local.counter = 2n - 2)

9 go := 1 – go

10 else // the other processes



13 await(local.go ≠ go)

14 fi

Chapter 5


Test-and-Set based BarriersProperties

40

Different object (T&S instead of F&I)

Pros:Shared memory: Only bits - O(1) space

As opposed to the counter-based which requires O(log n)Does not require memory initialization (in the second

version)

Cons:Asymmetric (in the second version)Still high contention on countflag & go bits

Chapter 5

Tree Based Barriers

Section 5.5


ProgrammingGadi Taubenfeld © 2014

41


A Tree-based Barrier

42

The processes are organized in a binary tree

Each node is owned by a predetermined process

Each process waits until its 2 children arrive, combines

the results and passes them on to its parent

When the root learns that its 2 children have arrived, it

tells its children that they can move on

The signal propagates down the tree until all the

processes get the message

7654

32

1

Chapter 5


43

1098 11 12 13 14 15

7654

32

1

Assume

arrive

go

2 3 4 5 6 7 8 9 10 11 12 13 14 15

𝑖

2 𝑖

Chapter 5



A Tree-based Barrierprogram of process i

44

shared arrive[2..n]: array of atomic bits, initial values = 0

go[2..n]: array of atomic bits, initial values = 0

1 if i=1 then // root

2 await(arrive[2] = 1); arrive[2] := 0

3 await(arrive[3] = 1); arrive[3] := 0

4 go[2] = 1; go[3] = 1

5 else if i ≤ (n-1)/2 then // internal node

6 await(arrive[2i] = 1); arrive[2i] := 0

7 await(arrive[2i+1] = 1); arrive[2i+1] := 0

8 arrive[i] := 1

9 await(go[i] = 1); go[i] := 0

10 go[2i] = 1; go[2i+1] := 1

11 else // leaf

12 arrive[i] := 1

13 await(go[i] = 1); go[i] := 0 fi

14 fi

Chapter 5

7654

32

1

A Tree-based BarrierExample Run for n=7 Processes

45

arrive

go

2 3 4 5 6 7

11 1

7654

32

1

7654

32

1

7654

32

1

7654

32

1

7654

32

1

Waiting for p4 to arrive

Waiting for go[5]

Waiting for go[4]

7654

32

1

Waiting for go[2]

0 0

7654

32

1

1 0 0 1

7654

32

1

Waiting for go[6]

7654

32

1Waiting for p3 to arrive

arrive[2]=1?

P2 zeros arrive[4,5]Arrive[1]=1

?

7654

32

1

7654

32

1

Waiting for go[7]

7654

32

1

P3 zeros arrive[6,7]

1 0 01 0 0 1

P1 zeros arrive[2]

0 0 0 10 0 0 1 10 0 0 0 00 1 0 0 0 0

Waiting for go[3]

7654

32

1

P1 zeros arrive[3]

0 0 0 0 0 0

7654

32

1

0 0 0 0 0 0

1 1

0 0 0 0 0 0

1 1 1 1 1 1

Finished!!

Chapter 5


46

Pros:Low shared memory contention

No bit is shared by more than 2 processesGood for larger n

Fast (in comparison to local spinning) – information from the root propagates after log(n) stepsUses only atomic bits (no special objects)On some models:

each process spins on a locally accessible bit# (remote memory ref.) = O(1) per process

Cons:Shared memory space complexity – O(n)Asymmetric – not all the processes do the same amount of

work (*)


)*(There is a similar barrier which is symmetric, but at the cost of more shared memory consumption -- O(nlogn) as opposed to O(n) .

See the Dissemination Barrier from Section 5.6 page 213.

Chapter 5


The See-Saw Barrier

Section 5.7

47Chapter 5


48

Now, we’ll use a Read-Modify-Write object Allows to construct a symmetric barrier, that requires

only few shared bits This algorithm can also be used to solve the leader

election and the consensus problems

The See-Saw barrier is based on a solution to the wake-up problem which was proposed by M. J. Fischer, S. Moran, S. Rudich, G. Taubenfeld (1996)

See-Saw Barrier

Chapter 5


Read-Modify-Write Register

49

Input: register r with n bits, function f(r) Atomic operation:

Reads the registerCalls function f on r, return value is written into rThe old value of r is returned

Usually f is custom made for the algorithm

function read-modify-write (r : register, f : function)

orig_r := r;

r := f(r);

return (orig_r);

end-function

Chapter 5


Data Flow

50

Tokens:Each process starts with 2 tokensTotal number of tokens doesn’t changeEach process can absorb one token or emit one

token, at a time See-Saw:

One see-sawCan be left-up-right-down OR left-down-right-upEach process that enters the playground needs to

get-up on the see-sawEach process which is on the see-saw is either on

the left side or the right side Tokens are weightless

Chapter 5


Data Representation

Using 2-bit read-modify-write register

51

Token BitTwo states:

1. no-token-present

2. token-present

See-saw BitTwo states:

1. left-side-down

2. right-side-down

P2T: 2

P1T: 2P1T: 1

P2T: 3

Chapter 5


P7T: 2

P6T: 2

P5T: 2

Process State

52

P3T: 2

P2T: 2

P1T: 0P4

T: 2

Never been on

On left side

On right side

Got-off

Chapter 5


Runtime Flow

53

Each process loops until it got-off from the see-sawAfter it got-off, waits for the go flag

The algorithm is based on 5-rules On each loop iteration:

According to its state, one rule is performedOnly one process at a time performs a ruleA rule is done atomically, using the RMW registerEach rule changes the tokens and/or the state of the see-

saw

There can be many processes on each side (up to )

When one of the processes gets 2n tokens, it gets-off and sets the go flag

Chapter 5

54

Applicable if:scheduled process is “never-been-on”

Operation: Saves the go bit locallygot on the up side, and swings the see-saw

P1T: 2

P2T: 2

RMWToken-state

Left-side-downSee-saw-state

No-token-presentRule #1 – Start ofAlgorithm Right-side-down

Chapter 5

55

Applicable if:scheduled process is “down-side”, has tokens,

and token-state = no-token-presentOperation:

Deposit one token in the shared token-stateIf remains without tokens, got-off the see-saw, and

swing it

P1T: 2

P2T: 2

RMWNo-token-presentToken-state


P2T: 1

Token-present

Chapter 5

Rule #2 – Emitter

56

Applicable if:scheduled process is “up-side”, and

token-state = token-presentOperation:

Takes the token from token-state

P1T: 2

P2T: 1

RMWToken-state


Token-presentNo-token-present

P1T: 3

Chapter 5

Rule #3 – Absorber

57

Applicable if:scheduled process is “down-side”, has tokens,and token-state = no-token-present

Operation: Deposit one token in the shared token-stateIf remains without tokens, got-off the see-saw, and

swing it

RMWToken-state


No-token-presentToken-present

P1T: 3

P2T: 1P2T: 0

!

The process that got-off now awaits the go flag

Right-side-down

Chapter 5

Rule #2 – Emitter

58

P1T: 3

RMWToken-state

Right-side-downSee-saw-state

Token-presentNo-token-present

P2T: 0

Z Z Z…

Chapter 5

Rule #4 – Leader

Applicable if: scheduled process is on the see-saw, and sees at

least 2n tokensOperation:

Gets-off the see-saw, and flips the shared go bit

59

Applicable if: scheduled process is on the see-saw, and sees at

least 2n tokensOperation:

Gets-off the see-saw, and flips the shared go bit

P1T: 4

RMWToken-state


No-token-present

P2T: 0

Z Z Z…

go!

Chapter 5

Rule #4 – Leader

60

Applicable if:scheduled process notices that the go bit has

been flipped (relative to its local.go)Operation:

Everybody has arrived continue past the barrier

P1T: 4

RMWToken-state


No-token-present

P2T: 0

Z Z Z…

go!

P2T: 0

Chapter 5

Rule #5 – End of the Algorithm


Important Invariants

61

Token InvariantDuring a single episode of the see-saw barrier,

the number of tokens in the systemis either 2n or 2n+1 (like in the test-and-set barrier)never changes

Balance InvariantDuring a single episode of the see-saw barrier,

the number of processes on the left and on the right side of the see-saw iseither perfectly balancedor favored the down-side by 1

Chapter 5


Correctness

62

When all processes are on the see-saw:Tokens are given from the down side, until one gets-offBy induction, at some point:

one process will see 2n tokens So no deadlock.

2n tokens can only be accumulated if all processes have arrived, so this is a barrier.

Chapter 5


Remarks

63

All the logic is done inside the atomic Modify function of the RMW register

Needs to read and modify all the three bits atomically, to prevent race-conditions

Before a process applies a rule, it first checks whether the go bit has been flipped relative to its local.go (regardless of its current state) !!!

Chapter 5


Question

64

How many times does the state of the shared memory change during one episode of the see-saw barrier?

O(n) in the best case O(n2) in the worst case

Chapter 5


65

Pros:O(1) shared memory space complexityNo need to initialize shared memorySymmetric

Cons:Uses custom Read-Modify-Write registerHigh memory contention on the RMW bitsWorst case O(n2) total shared memory

referencesComplex

The See-Saw Barrier

Chapter 5


A Barrier using Semaphores

Section 5.8

67Chapter 5


Semaphore

68

Shared objectTakes a non-negative integer valueSupports two operations:

DownIf value > 0, the value is decremented by 1Otherwise, the process is blocked until the value

becomes > 0Up – the value is incremented by 1

Incrementing, Decrementing and testing the semaphore are executed atomically

Chapter 5


Binary Semaphore

69

Semaphore whose value is only 0 or 1Decrementing is identical to general

semaphoreIncrementing is equal to setting the value

to 1Initial value is assume to be 1

Can be used to implement a deadlock-free mutual exclusion:

down(S)

critical-section

up(S)

Chapter 5


Barrier using SemaphoresAlgorithm for n processes

70

1 down(arrival)

2 counter := counter + 1 // atomic register

3 if counter < n then up(arrival) else up(departure) fi

4 down(departure)

5 counter := counter - 1

6 if counter > 0 then up(departure) else up(arrival) fi

shared arrival: binary semaphore, initially 1

departure: binary semaphore, initially 0

counter: atomic register ranges over {0, …, n}, initially 0

Question:Would this barrier be correct if the

shared counter won’t be an atomic register?

Chapter 5


Barrier using SemaphoresProperties

71

Pros:Very SimpleSpace complexity O(1)Symmetric

Cons:Required a strong object

Requires some central managerHigh contention on the semaphores if no central

managerPropagation delay O(n)

Chapter 5


Summary

Barrier Synchronization

72Chapter 5

http://www.google.co.il/url?sa=i&source=images&cd=&cad=rja&uact=8&docid=kmTj7cm6bpvXqM&tbnid=ZxfNRdHDiOnyeM&ved=0CAgQjRw&url=http://www.e4africa.co.za/?attachment_id=4572&ei=neqBU9iTNpSw7AaAh4CIDQ&psig=AFQjCNHd1Lr4LYf60trbqAIETXayJxDbmw&ust=1401109533981238


Barriers we’ve seen

73

Simple barrierBased on atomic fetch-and-increment counter

Local spinning barrierBased on atomic fetch-and-increment counter

and go array

Test-and-Set barriersBased on test-and-test-and-set objectsOne version without memory initialization

Tree-based barrierSee-Saw barrierSemaphore-based barrier

Chapter 5


Conclusions

74

Many possible algorithms for Barrier Synchronization

Each has pros/cons

Different shared objects allow various algorithms

Choosing the correct barrier is application/platform

dependent (need to do benchmarking to know for sure).

Chapter 5

1 chapter 5 synchronization algorithms and concurrent programming gadi taubenfeld © 2014...

Documents

abeginbcdefaenda ba

b c d efchapter

b c d ea b c d e ftimechapter

b c d ea b c d e fchapter

synchronization algorithms

concurrent programmingisbn

tree barrier

dissemination barrier