1 chapter 5 synchronization algorithms and concurrent programming gadi taubenfeld © 2014...

74
1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld Chapter 5 Barrier Synchronization Version: June 2014 This presentation is a modified version of a presentation that Itai Avrian and Shachar Gidron prepared for my Seminar in Concurrent and Distributed Computing, 2012.

Upload: rosamund-mae-copeland

Post on 04-Jan-2016

237 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

1Chapter 5

Synchronization Algorithms and Concurrent ProgrammingGadi Taubenfeld

Chapter 5Barrier Synchronization

Version: June 2014

This presentation is a modified version of a presentation that Itai Avrian and Shachar Gidron prepared for my Seminar in Concurrent and Distributed Computing, 2012.

Page 2: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Chapter 5Synchronization Algorithms and Concurrent

Programming Gadi Taubenfeld © 2014

A note on the use of these ppt slides:

I am making these slides freely available to all (faculty, students, readers).

They are in PowerPoint form so you can add, modify, and delete slides and slide

content to suit your needs. They obviously represent a lot of work on my part.

In return for use, I only ask the following:

That you mention their source, after all, I would like people to use my

book! That you note that they are adapted from (or perhaps identical to)

my slides, and note my copyright of this material.

Thanks and enjoy! Gadi Taubenfeld

All material copyright 2014Gadi Taubenfeld, All Rights Reserved

A note on the use of these ppt slides:

I am making these slides freely available to all (faculty, students, readers).

They are in PowerPoint form so you can add, modify, and delete slides and slide

content to suit your needs. They obviously represent a lot of work on my part.

In return for use, I only ask the following:

That you mention their source, after all, I would like people to use my

book! That you note that they are adapted from (or perhaps identical to)

my slides, and note my copyright of this material.

Thanks and enjoy! Gadi Taubenfeld

All material copyright 2014Gadi Taubenfeld, All Rights Reserved

To get the most updated version of these slides go to :http://www.faculty.idc.ac.il/gadi/book.htm

2

Synchronization Algorithms and Concurrent Programming

ISBN: 0131972596, 1st edition

Page 3: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Chapter 5

5.1 Barriers

5.2 Atomic Counter

5.3 Test-and-set Bits

5.4 Combining Tree Barrier*

5.5 A Tree-based Barriers

5.6 The Dissemination Barrier*

5.7 The See-Saw Barrier

5.8 Semaphores

5.9 Bibliographic Notes*

5.10 Problems*

Chapter 5 Barrier Synchronization

*Not covered in this presentation

3

Page 5: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

What is a Barrier ?

5

Barr

ier

P1P1

P2P2

P3P3

P4P4

Barr

ier

P1P1

P2P2

P3P3

P4P4

P1P1

P2P2

P3P3

P4P4

time

Barr

ier

four processes approach the

barrier

all except P4 arrive

Once all arrive, they

continue

Chapter 5

Page 6: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

What is a Barrier ?

6

A barrier is a coordination mechanism (an algorithm), that forces processes which participate in a concurrent (or distributed) algorithm to wait until each one of them has reached a certain point in its program.

The collection of this coordination points is called the barrier.

Once all the processes have reached the barrier, they are all permitted to continue pass the barrier.

Chapter 5

Page 7: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

7

abegin b c d e f

aend a+b a+b+c a+b+c+d a+b+c+d+e

a+b+c+d+e+f

time

Chapter 5

Example: Parallel Prefix Sum

Page 8: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

8

abegin b c d e f

a a+b c d e f

a a+b a+b+c a+b+c+d e f

aend a+b a+b+c a+b+c+d a+b+c+d+e

a+b+c+d+e+f

a a+b a+b+c d e f

a a+b a+b+c a+b+c+da+b+c+d+e f

Chapter 5

time

Example: Parallel Prefix Sum

Page 9: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Example: Parallel Prefix Sum

9

abegin b c d e f

a a+b b+c c+d d+e e+f

a a+b a+b+c a+b+c+db+c+d+ec+d+e+f

aend a+b a+b+c a+b+c+d a+b+c+d+e

a+b+c+d+e+f

Chapter 5

time

Page 10: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

10

abegin b c d e f

a a+b b+c c+d d+e e+f

a a+b a+b+c a+b+c+db+c+d+ec+d+e+f

aend a+b a+b+c a+b+c+d a+b+c+d+e

a+b+c+d+e+f

barrier

barrier

Chapter 5

time

Example: Parallel Prefix Sum

Page 11: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Example: VideoSingle thread

11

Assume we have a video application Each frame needs to be calculated,

before being displayed Prepare frame for display by graphics processor

while (true)

{

frame = prepare_next_frame();

frame.display();

}

Chapter 5

Page 12: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

12

Now, we have n threads running in parallel It makes sense to split the frame into n disjoint parts

Each thread prepares its own parts in parallel with others

Each thread may run on different graphical processor

Chapter 5

Barrier globalBarrier;

i = getThreadID();

while (true)

{

frame[ i ].prepare();

globalBarrier.await();

frame[ i ].display();

}

Example: VideoMultiple threads

Page 13: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Where it is needed

13

Scientific & numeric computation

Computer graphics

Garbage collections

Parallel computing in general

Chapter 5

Page 14: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Various Barrier Goals

14

Ideally when designing barriers, we would like to have the following properties:

Low shared memory space complexityLow contention on shared objectsLow shared memory reference per processNo need for shared memory initialization Symmetric-ness (same amount of work for all

processes)Algorithm simplicitySimple basic primtiveMinimal propagation timeReusability of the barrier (must!)

Chapter 5

Page 15: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

15

Atomic BitAtomic RegisterFetch-and-increment registerTest and set bitsRead-Modify-Write registerSemaphores

Chapter 5

Data Objects in Use

Page 16: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Barriers using atomic counters

Section 5.2

Chapter 5 16

Atomic BitAtomic RegisterFetch-and-increment register / atomic counter

Page 17: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

17

A shared register that supports a F&I operation: Input: register r Atomic operation:

r is incremented by 1the old value of r is returned

Fetch-and-increment Register

function fetch-and-increment (r : register)

orig_r := r;

r:= r + 1;

return (orig_r);

end-function

Chapter 5

Page 18: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

await macro

18

For clarity, we use the await macro Not an operation of an object This is also called: “spinning”

macro await (condition : boolean condition)

repeat

cond = eval(condition);

until (cond)

end-macro

Chapter 5

Page 19: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

19

1 local.go := go

2 local.counter := fetch-and-increment (counter)

3 if local.counter + 1 = n then

4 counter := 0

5 go := 1 - go

6 else await(local.go ≠ go) fi

shared counter: fetch and increment reg. – {0,..n}, initially = 0

go: atomic bit, initial value is immaterial

local local.go: a bit, initial value is immaterial

local.counter: register

Simple Barrier Using an Atomic CounterProgram of a Process

Chapter 5

Page 20: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

SM

Simple Barrier Using an Atomic CounterRun for n=2 Processes

20

? ?counter go

1 local.go := go

2 local.counter := fetch-and-increment (counter)

3 if local.counter + 1 = n then

4 counter := 0

5 go := 1 - go

6 else await(local.go ≠ go) fi

P1?local.go

?local.counterP2

?local.go

?local.counter

Chapter 5

Page 21: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

P1

21

P1P2

SM0 0counter go

?

1 local.go := go

2 local.counter := fetch-and-increment (counter)

3 if local.counter + 1 = n then

4 counter := 0

5 go := 1 - go

6 else await(local.go ≠ go) fi

0

120 1

local.go

P1 Busy wait

?0local.counterP2

?0local.go

?1local.counter

0+1≠2 1+1=2

Chapter 5

Simple Barrier Using an Atomic CounterRun for n=2 Processes

Page 22: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

P1

22

P1P2

Simple Barrier Using an Atomic CounterAnother Run for n=2 Processes

SM0 0counter go

?

1 local.go := go

2 local.counter := fetch-and-increment (counter)

3 if local.counter + 1 = n then

4 counter := 0

5 go := 1 - go

6 else await(local.go ≠ go) fi

0

120 1

local.go

P1 Busy wait

?0local.counterP2

?0local.go

?1local.counter

Counter is “fetch-and-increment”

register

P1: 0+1≠2P2: 1+1=2

Chapter 5

Page 23: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

23

1 local.counter := fetch-and-increment(counter)

2 if local.counter + 1 = n then

3 counter := 0

4 else await(counter = 0) fi

shared counter: fetch and increment reg. – {0,..n}, initially = 0

local local.counter: register

Another Algorithm Using an Atomic CounterProgram of a Process

Is this implementation

incorrect?

Chapter 5

Page 24: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Simple Barrier Using an Atomic Counter

24

There is high memory contention on go bit Reducing the contention:

Replace the go bit with n bits: go[1],…,go[n]

Process pi may spin only on the bit go[i]

Chapter 5

Page 25: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

25

shared counter: fetch and increment reg. – {0,..n}, initially = 0

go[1..n]: array of atomic bits, initial values are immaterial

local local.go: a bit, initial value is immaterial

local.counter: register

A Local Spinning Counter BarrierProgram of a Process i

1 local.go := go[i]

2 local.counter := fetch-and-increment (counter)

3 if local.counter + 1 = n then

4 counter := 0

5 for j=1 to n do go[j] := 1 – go[j] od

6 else await(local.go ≠ go[i]) fi

Chapter 5

Page 26: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

SM

A Local Spinning Counter BarrierExample Run for n=3 Processes

26

0 ?counter go ? ?

1 local.go := go[i]

2 local.counter := fetch-and-increment (counter)

3 if local.counter + 1 = n then

4 counter := 0

5 for j=1 to n do go[j] := 1 – go[j] od

6 else await(local.go ≠ go[i]) fi

P1?loc.go

?loc.counterP2

?loc.go

?loc.counterP3

?loc.go

?loc.counter

0 0 0

P2

0

1

0

0+1≠3

P1

P1 Busy wait

0

2

1

1+1≠3

P1,P2 Busy wait

P3

0

3

2

2+1=3

0 1 1 1

Chapter 5

Page 27: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Comparison of fetch-and-increment Barriers

Simple Barrier Simple Barrier with go array

27

Pros:Very SimpleShared memory: O(log

n) bitsTakes O(1) until last

waiting p is awaken

Cons:High contention on the

go bitContention on the

counter register (*)

Pros:Low contention on the

go arrayIn some models:

spinning is done on local memory

remote mem. ref.: O(1)

Cons:Shared memory: O(n)Still contention on the

counter register (*)Takes O(n) until last

waiting p is awaken

Chapter 5 )*(One technique for solving this contention is the

Combining Tree Barriers – page 210

Page 28: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

28

Barrier is a basic synchronization method To initialize shared memory, processes need to be

synchronized Thus, barrier may be a prerequisite for shared memory

initialization and cannot assume one Processes may not be implemented in the same way So it is desirable to reduce the dependency between

them

Chapter 5

A Barrier without Memory Initialization

Page 29: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

29

1 local.go := go // remember current value

2 local.counter := counter // remember current value

3 counter := counter +1 (mod n) // atomic increment mod n

4 repeat

5 if counter = local.counter // all processes have arrived

6 then go := 1 – local.go fi // notify all

7 until (local.go ≠ go)

shared counter: atomic counter – {0,..n-1}, initial value is immaterial

go: atomic bit, initial value is immaterial

local local.go: a bit, initial value is immaterial

local.counter: register, initial value is immaterial

A Barrier without Memory InitializationProgram of a Process

Chapter 5

Page 30: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Using Test-and-Set Bits

Section 5.3

30Chapter 5

Page 31: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Test-and-Set Bit

31

Input: bit b Test-and-set is an atomic operation:

b is set to 1the old value of b (i.e., 0 or 1) is returned

An atomic reset operation, which sets the value to 0, is supported

function test-and-set (b : bit)

orig_b := b;

b:= 1;

return (orig_b);

end-function

Chapter 5

Page 32: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Test-and-Test-and-Set Bit

32

Operations supported:

Test-and-set

Reset

Atomic read (test)

Chapter 5

like a test-and-set bit

Page 33: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

33

shared leader: test-and-set bit, initial value = 0

countflag: test-and-test-and-set bit, initial value = 0

go: atomic bit, initial value is immaterial

local local.go: a bit, initial value is immaterial

local.counter: register, initial value is immaterial

Chapter 5

0 leader: test-and-set bit

0 countflag: test-and-test-set bit

go: atomic register

Test-and-set based Barrier

Local.counter: register

local.go: bit

Page 34: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Test-and-set based Barrier

34

1 local.go := go

2 if test-and-set(leader) = 0 then // the leader

3 local.counter := 0

4 repeat

5 await(countflag = 1) // a test operation

6 local.counter = local.counter + 1

7 reset(countflag)

8 until (local.counter = n - 1)

9 reset(leader)

10 go := 1 – go

11 else // the other processes

12 await(test-and-set(countflag) = 0)

13 await(local.go ≠ go)

14 fi

Chapter 5

Page 35: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Test-and-Set Barrier

35

P1 P4P3P2

leader test-and-set atomic operation

First to set the leader bit is the

leader

SM0leader 1

Chapter 5

Page 36: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

repeat

until(local.counter = n - 1)

36

P1 P4P3P2

await (test-and-set atomic operation on countflag)

SM0countflag 1

await ( go changed ? )

await (countflag = 1)

P4 – the leader

0local.counter 123

All processes has arrived, change go bit and exit barrier

go!

Chapter 5

Test-and-Set Barrier

Page 37: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

A Barrier without Memory Initialization

37

1. The leader count each process twice Needs only to count to 2n – 2 Allows off-by-one mistakes Thus make memory initialization redundant

2. Asymmetric-ness Process has a role according to its index i Pros: saves bits and operations Cons: different processes differ in their tasks

Chapter 5

Two new techniques

Page 38: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Asymmetric Test-and-set based Barrier w/o M/Iprogram of process i

38

shared countflag: test-and-test-and-set bit, initial value is immaterial

go: atomic bit, initial value is immaterial

local local.go: a bit, initial value is immaterial

local.counter: atomic register, initial value is immaterial

No need for the leader

test-and-set bit

Chapter 5

Page 39: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Asymmetric Test-and-set based Barrier w/o M/Iprogram of process i

39

1 local.go := go

2 if i = 1 then // the leader

3 local.counter := 0

4 repeat

5 await(countflag = 1) // a test operation

6 local.counter = local.counter + 1

7 reset(countflag)

8 until (local.counter = 2n - 2)

9 go := 1 – go

10 else // the other processes

11 await(test-and-set(countflag) = 0)

12 await(test-and-set(countflag) = 0)

13 await(local.go ≠ go)

14 fi

Chapter 5

Page 40: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Test-and-Set based BarriersProperties

40

Different object (T&S instead of F&I)

Pros:Shared memory: Only bits - O(1) space

As opposed to the counter-based which requires O(log n)Does not require memory initialization (in the second

version)

Cons:Asymmetric (in the second version)Still high contention on countflag & go bits

Chapter 5

Page 41: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Tree Based Barriers

Section 5.5

Chapter 5Synchronization Algorithms and Concurrent

ProgrammingGadi Taubenfeld © 2014

41

Page 42: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

A Tree-based Barrier

42

The processes are organized in a binary tree

Each node is owned by a predetermined process

Each process waits until its 2 children arrive, combines

the results and passes them on to its parent

When the root learns that its 2 children have arrived, it

tells its children that they can move on

The signal propagates down the tree until all the

processes get the message

7654

32

1

Chapter 5

Page 43: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

43

1098 11 12 13 14 15

7654

32

1

Assume

arrive

go

2 3 4 5 6 7 8 9 10 11 12 13 14 15

𝑖

2 𝑖

Chapter 5

A Tree-based Barrier

Page 44: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

A Tree-based Barrierprogram of process i

44

shared arrive[2..n]: array of atomic bits, initial values = 0

go[2..n]: array of atomic bits, initial values = 0

1 if i=1 then // root

2 await(arrive[2] = 1); arrive[2] := 0

3 await(arrive[3] = 1); arrive[3] := 0

4 go[2] = 1; go[3] = 1

5 else if i ≤ (n-1)/2 then // internal node

6 await(arrive[2i] = 1); arrive[2i] := 0

7 await(arrive[2i+1] = 1); arrive[2i+1] := 0

8 arrive[i] := 1

9 await(go[i] = 1); go[i] := 0

10 go[2i] = 1; go[2i+1] := 1

11 else // leaf

12 arrive[i] := 1

13 await(go[i] = 1); go[i] := 0 fi

14 fi

Chapter 5

Page 45: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

7654

32

1

A Tree-based BarrierExample Run for n=7 Processes

45

arrive

go

2 3 4 5 6 7

11 1

7654

32

1

7654

32

1

7654

32

1

7654

32

1

7654

32

1

Waiting for p4 to arrive

Waiting for go[5]

Waiting for go[4]

7654

32

1

Waiting for go[2]

0 0

7654

32

1

1 0 0 1

7654

32

1

Waiting for go[6]

7654

32

1Waiting for p3 to arrive

arrive[2]=1?

P2 zeros arrive[4,5]Arrive[1]=1

?

7654

32

1

7654

32

1

Waiting for go[7]

7654

32

1

P3 zeros arrive[6,7]

1 0 01 0 0 1

P1 zeros arrive[2]

0 0 0 10 0 0 1 10 0 0 0 00 1 0 0 0 0

Waiting for go[3]

7654

32

1

P1 zeros arrive[3]

0 0 0 0 0 0

7654

32

1

0 0 0 0 0 0

1 1

0 0 0 0 0 0

1 1 1 1 1 1

Finished!!

Chapter 5

Page 46: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

46

Pros:Low shared memory contention

No bit is shared by more than 2 processesGood for larger n

Fast (in comparison to local spinning) – information from the root propagates after log(n) stepsUses only atomic bits (no special objects)On some models:

each process spins on a locally accessible bit# (remote memory ref.) = O(1) per process

Cons:Shared memory space complexity – O(n)Asymmetric – not all the processes do the same amount of

work (*)

A Tree-based Barrier

)*(There is a similar barrier which is symmetric, but at the cost of more shared memory consumption -- O(nlogn) as opposed to O(n) .

See the Dissemination Barrier from Section 5.6 page 213.

Chapter 5

Page 47: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

The See-Saw Barrier

Section 5.7

47Chapter 5

Page 48: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

48

Now, we’ll use a Read-Modify-Write object Allows to construct a symmetric barrier, that requires

only few shared bits This algorithm can also be used to solve the leader

election and the consensus problems

The See-Saw barrier is based on a solution to the wake-up problem which was proposed by M. J. Fischer, S. Moran, S. Rudich, G. Taubenfeld (1996)

See-Saw Barrier

Chapter 5

Page 49: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Read-Modify-Write Register

49

Input: register r with n bits, function f(r) Atomic operation:

Reads the registerCalls function f on r, return value is written into rThe old value of r is returned

Usually f is custom made for the algorithm

function read-modify-write (r : register, f : function)

orig_r := r;

r := f(r);

return (orig_r);

end-function

Chapter 5

Page 50: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Data Flow

50

Tokens:Each process starts with 2 tokensTotal number of tokens doesn’t changeEach process can absorb one token or emit one

token, at a time See-Saw:

One see-sawCan be left-up-right-down OR left-down-right-upEach process that enters the playground needs to

get-up on the see-sawEach process which is on the see-saw is either on

the left side or the right side Tokens are weightless

Chapter 5

Page 51: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Data Representation

Using 2-bit read-modify-write register

51

Token BitTwo states:

1. no-token-present

2. token-present

See-saw BitTwo states:

1. left-side-down

2. right-side-down

P2T: 2

P1T: 2P1T: 1

P2T: 3

Chapter 5

Page 52: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

P7T: 2

P6T: 2

P5T: 2

Process State

52

P3T: 2

P2T: 2

P1T: 0P4

T: 2

Never been on

On left side

On right side

Got-off

Chapter 5

Page 53: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Runtime Flow

53

Each process loops until it got-off from the see-sawAfter it got-off, waits for the go flag

The algorithm is based on 5-rules On each loop iteration:

According to its state, one rule is performedOnly one process at a time performs a ruleA rule is done atomically, using the RMW registerEach rule changes the tokens and/or the state of the see-

saw

There can be many processes on each side (up to )

When one of the processes gets 2n tokens, it gets-off and sets the go flag

Chapter 5

Page 54: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

54

Applicable if:scheduled process is “never-been-on”

Operation: Saves the go bit locallygot on the up side, and swings the see-saw

P1T: 2

P2T: 2

RMWToken-state

Left-side-downSee-saw-state

No-token-presentRule #1 – Start ofAlgorithm Right-side-down

Chapter 5

Page 55: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

55

Applicable if:scheduled process is “down-side”, has tokens,

and token-state = no-token-presentOperation:

Deposit one token in the shared token-stateIf remains without tokens, got-off the see-saw, and

swing it

P1T: 2

P2T: 2

RMWNo-token-presentToken-state

Left-side-downSee-saw-state

P2T: 1

Token-present

Chapter 5

Rule #2 – Emitter

Page 56: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

56

Applicable if:scheduled process is “up-side”, and

token-state = token-presentOperation:

Takes the token from token-state

P1T: 2

P2T: 1

RMWToken-state

Left-side-downSee-saw-state

Token-presentNo-token-present

P1T: 3

Chapter 5

Rule #3 – Absorber

Page 57: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

57

Applicable if:scheduled process is “down-side”, has tokens,and token-state = no-token-present

Operation: Deposit one token in the shared token-stateIf remains without tokens, got-off the see-saw, and

swing it

RMWToken-state

Left-side-downSee-saw-state

No-token-presentToken-present

P1T: 3

P2T: 1P2T: 0

!

The process that got-off now awaits the go flag

Right-side-down

Chapter 5

Rule #2 – Emitter

Page 58: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

58

P1T: 3

RMWToken-state

Right-side-downSee-saw-state

Token-presentNo-token-present

P2T: 0

Z Z Z…

Chapter 5

Rule #4 – Leader

Applicable if: scheduled process is on the see-saw, and sees at

least 2n tokensOperation:

Gets-off the see-saw, and flips the shared go bit

Page 59: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

59

Applicable if: scheduled process is on the see-saw, and sees at

least 2n tokensOperation:

Gets-off the see-saw, and flips the shared go bit

P1T: 4

RMWToken-state

Right-side-downSee-saw-state

No-token-present

P2T: 0

Z Z Z…

go!

Chapter 5

Rule #4 – Leader

Page 60: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

60

Applicable if:scheduled process notices that the go bit has

been flipped (relative to its local.go)Operation:

Everybody has arrived continue past the barrier

P1T: 4

RMWToken-state

Right-side-downSee-saw-state

No-token-present

P2T: 0

Z Z Z…

go!

P2T: 0

Chapter 5

Rule #5 – End of the Algorithm

Page 61: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Important Invariants

61

Token InvariantDuring a single episode of the see-saw barrier,

the number of tokens in the systemis either 2n or 2n+1 (like in the test-and-set barrier)never changes

Balance InvariantDuring a single episode of the see-saw barrier,

the number of processes on the left and on the right side of the see-saw iseither perfectly balancedor favored the down-side by 1

Chapter 5

Page 62: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Correctness

62

When all processes are on the see-saw:Tokens are given from the down side, until one gets-offBy induction, at some point:

one process will see 2n tokens So no deadlock.

2n tokens can only be accumulated if all processes have arrived, so this is a barrier.

Chapter 5

Page 63: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Remarks

63

All the logic is done inside the atomic Modify function of the RMW register

Needs to read and modify all the three bits atomically, to prevent race-conditions

Before a process applies a rule, it first checks whether the go bit has been flipped relative to its local.go (regardless of its current state) !!!

Chapter 5

Page 64: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Question

64

How many times does the state of the shared memory change during one episode of the see-saw barrier?

O(n) in the best case O(n2) in the worst case

Chapter 5

Page 65: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

65

Pros:O(1) shared memory space complexityNo need to initialize shared memorySymmetric

Cons:Uses custom Read-Modify-Write registerHigh memory contention on the RMW bitsWorst case O(n2) total shared memory

referencesComplex

The See-Saw Barrier

Chapter 5

Page 66: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

The code:

Chapter 5Synchronization Algorithms and Concurrent

ProgrammingGadi Taubenfeld © 2014

66

Page 67: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

A Barrier using Semaphores

Section 5.8

67Chapter 5

Page 68: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Semaphore

68

Shared objectTakes a non-negative integer valueSupports two operations:

DownIf value > 0, the value is decremented by 1Otherwise, the process is blocked until the value

becomes > 0Up – the value is incremented by 1

Incrementing, Decrementing and testing the semaphore are executed atomically

Chapter 5

Page 69: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Binary Semaphore

69

Semaphore whose value is only 0 or 1Decrementing is identical to general

semaphoreIncrementing is equal to setting the value

to 1Initial value is assume to be 1

Can be used to implement a deadlock-free mutual exclusion:

down(S)

critical-section

up(S)

Chapter 5

Page 70: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Barrier using SemaphoresAlgorithm for n processes

70

1 down(arrival)

2 counter := counter + 1 // atomic register

3 if counter < n then up(arrival) else up(departure) fi

4 down(departure)

5 counter := counter - 1

6 if counter > 0 then up(departure) else up(arrival) fi

shared arrival: binary semaphore, initially 1

departure: binary semaphore, initially 0

counter: atomic register ranges over {0, …, n}, initially 0

Question:Would this barrier be correct if the

shared counter won’t be an atomic register?

Chapter 5

Page 71: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Barrier using SemaphoresProperties

71

Pros:Very SimpleSpace complexity O(1)Symmetric

Cons:Required a strong object

Requires some central managerHigh contention on the semaphores if no central

managerPropagation delay O(n)

Chapter 5

Page 73: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Barriers we’ve seen

73

Simple barrierBased on atomic fetch-and-increment counter

Local spinning barrierBased on atomic fetch-and-increment counter

and go array

Test-and-Set barriersBased on test-and-test-and-set objectsOne version without memory initialization

Tree-based barrierSee-Saw barrierSemaphore-based barrier

Chapter 5

Page 74: 1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld

Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014

Conclusions

74

Many possible algorithms for Barrier Synchronization

Each has pros/cons

Different shared objects allow various algorithms

Choosing the correct barrier is application/platform

dependent (need to do benchmarking to know for sure).

Chapter 5