concurrent programming

נכתב על ידי

027382977מאיר בכור

32033946אביתר שרעבי

Module

The module we are talking about is: computer with multiple processors but only one

memory unit. All the processors are synchronized using the same

clock. The processors are all connected to each other and

to the memory. If more then one processor writes the same value to

the same address in memory at the same time then the value will be written correctly. If the values are not the same then any value can be written.

Module

More then one processor can read the same memory address at the same time.

Other modules: The processors are on different computers. There is no sheared memory for all the

processors. The processors are not using the same clock.

Array Maximum Problem

On a computer with one processor: Time: O(N). Algorithm: Going over an array and keeping

the maximum.On a computer with K processors:

Time: O(N/K). Algorithm: Each processor handles N/K

elements from the array. And all the sum's of the parts of the array are summed together.


On a computer with O(N) processors. Time: O(log(N)). Algorithm: On the first stage every processor

will add 2 items. So after the first round will have N/2 numbers. On the next round N/4 processors each will take 2 numbers and sum them so we will have on ly N/4 result after the 2 round. After log(N) rounds we will have the sum of the array.


1 2 3 4 5 6 7 8

Example: 8 elements time 3 = Log(8).


The number of commutations that are performed is 7 (4 in the first round, 2 in the second and 1 in the last). This is the same number of computation that is being done in the serial algorithm but it’s being done in less time.

This Algorithm will work for a lot of other functions not just Max like Min, Sum, Avg.It will work for every Associative function.

Finding The Two Greatest Numbers

Simple solution for O(N) processors. Algorithm: Find the first maximum remove it from the

array and find the second. Time: 2 Log(N).

Smart algorithm for O(N) processors. Algorithm:

First round: each processor handles 2 items find the max and puts the other item in a.

Rounds 2..log(n): each processors handles 2 of the result of the second round compares the 2 Max values takes the Max as the new Max. and Takes the candidate group of the new max adds the max of the second group to it as the new candidate group.


On The last round the Max of the array is the maximum and the second max is the maximum of the candidate group.

Sample:Array: 7, 10, 1, 3, 100, 8, 55, 6.


7 10 1 3 100 8 55 6

10 3 100 55 7 1 8 6

10 100 7 8

3 55

100 8

55 10

Results: The maximum is the maximum of the array (100) and the second maximum is the maximum of the candidate group (55).


Time:Log(N) + LogLog(N).Log(N) to find the first maximum and the

candidate group.LogLog(N) to find the maximum in the

candidate group.The candidate group size grows in 1 in

each round (the maximum of the other group) so at the end it’s size is Log(N).

Merge problem

Description: We have 2 sorted N size arrays B, C and we need to divide them into 2 new N sized arrays A1, A2 that the N largest items from both B and C will be in A1 and the N smallest will be in A2.

Simple solution: We can merge B and C into one sorted array A and copy the firs N elements to A1 and the last N elements to A2. But with this algorithm we can’t use multiple processors the cost will still be O(N).

Merge problem

Smart algorithm for O(N) processors. Processor I compares Bi with Cn+1-i the largest of

the two is going to A1 and the other to A2.

Correction proof. If Bi > Cn+1-i the Bi > B1..Bi-1 and Cn+1-i > C1..Cn-

iso Bi is larger then N elements (I - 1 from B and N - i + 1 from C) so Bi needs to be in A1.

If Cn+1-i > Bi then Cn+1-i is larger then N elements ( N - I from C and I from B ) so Cn+1-i needs to be in A1.

Merge problem

Example: B: 1, 8, 10, 17C: 9, 12, 67, 100(B1, Cn), (B2, Cn-1), (B3, Cn-2), (B4, Cn-3).A1 : 100, 67, 12, 17.A2 : 1, 8, 10, 9.

Time: We can do all the comparisons at the same time so the cost will be O(1).

Prefix Problem

Description: Find the sum of the elements group.S11 = X1S12 = X1 + X2

S1n = X1 + X2 +… Xn-1+XnSimple solution: Compute the sums with N

processors time O(NLogN) N sums where each one takes O(LogN).

Prefix Problem

Algorithm:for I = 0 to n-1 doip

Si = Xifor j = 0 to log n dofor I = 2^j to n-1 doip Si = Si + Si-2^jThe doip means do in parallel in the different

processor.At the end the results are in the array s.

Prefix Problem

Example: With 8 numbers X1..X8 Sij is Xi + Xi+1… + Xj.

X1 X2 X3 X4 X5 X6 X7 X8

S11 S12 S23 S34 S45 S56 S67 S78

S11 S12 S13 S14 S25 S36 S47 S58

S11 S12 S13 S14 S15 S16 S17 S18

Prefix Problem

Time:each round we get double the result S1i so after log(n) rounds we will get all the result.

In order to use this algorithm each processor needs to be connected to log(n) other processors.

Prefix Problem

Usage exampleProblem : we have an arithmetic expression and we need to test if the brackets arrangement is legal. Algorithm: we will create an array x by adding 1 for each “(“ and -1 for each “)”. And run the prefix algorithm. The results needs to be.S11 = 1 and S11..S1n-1>=0 and S1n = 0.Time with N processors : O(logN) log(N) for the prefix algorithm and O(1) for the test.

Partition Problem Description: We have and array X that some of it’s

element are signed we need to move all the signed elements to one array and the none signed to another array.

Simple solution: We take 2 stacks we push the signed into one stack and the none signed into the other stack. It will take o(N) time.

Simple solution 2: We take two indexes one for the start of the array and one to the end. The first search for signed and the second for none signed and when they both find they exchange the items they point to and move on until they meet. This will take o(N) time too but it’s more parallel.

Partition Problem

Smart algorithm for O(N) processors: Create a new array B but in be if the

element i is signed B[i] = 1 else B[i] = 0.

Create an array C with the prefix sums of B that is C[i] = B[1] + B[2] + … B[i].

If X[i] is signed then Y1[C[i]] = X[i]. If X[i] is not signed then Y2[i-C[i]] = X[i].

Partition Problem

Example: X = 2, 4, 7, 8, 1, 3, 10, 12, 15.

X = 2, 4, 7, 8, 1, 3, 10, 12, 15 B = 0, 1, 0, 0, 0, 1, 1, 0, 1 C = 0, 1, 1, 1, 1, 2, 3, 3, 4 Y1 = 4, 3, 10, 15 Y2 = 2, 7, 8, 1, 12

Partition Problem

Time with O(N) processor.Computing B: O(1).Computing C: O(log(n)) using the prefix algorithm.Computing Y1 and Y2: O(1).Total: O(log(n)).

Sorting AlgorithmDescription: Sorting array A using O(N^2)

processors and put the result into array C.Simple algorithm: The serial algorithm for

sorting an array takes a minimum of O(Nlog(N)) time.

Smart algorithm Create a matrix B size of N*N and initialize it

with zeroes at all cells. We will look at the N^2 processor as a matrix of

processors. Processor Pi,j will compute Ai>=Aj if true then B[i,j] =1.

Sorting Algorithm

For each i from 1 to N C[Sum(i)] = A[i]. When Sum(i) is the sum of B[i,1] to B[i,N].

Example: A=3, 5, 2, 9, 1Matrix B 1 2 3 4 5

1 1 0 1 0 1 2 1 1 1 0 1 3 0 0 1 0 1 4 1 1 1 1 0

5 0 0 0 0 1

Sorting Algorithm

C = 1, 2, 3, 5, 9. Time: Using O(N^2) processors finding B matrix will

take O(1) and finding C will cost O(log(N)).So the total cost of the algorithm will be

O(log(N)).Using O(N) processors finding B will take O(N)

time and finding C will take O(N) time so the total will be O(N).

Sorting Algorithm

Description: Sorting array A using O(N^2) processors and put the result into array C.

Algorithm: Merge sort the largest cost in the merge sort algorithm is the cost of the merge. Using a serial algorithm the cost of merging 2 sorted arrays is O(N) and the cost of the merge sort algorithm is O(Nlog(N)). We will use the regular algorithm but with a smarter merge algorithm.

Sorting Algorithm

Smart merge algorithm Description: We need to merge two sorted

arrays A, B to a sorted array R. Algorithm: We will describe a recursive

algorithm Merge.C=merge(even(A), odd(B)).D=merge(odd(A), even(B)).Where odd(A) is all the items in A with an Odd index. And Even(A) is all the items in A with an even index.

Sorting Algorithm

When C = C0, C1, C2….Cn D = D0, D1, D2….DnE=C0, D0, C1, D1…Cn, Dn.Compare each Ci,Di and if Ci>Di then replace Ci and Di in array E.And array E is the merger of C and D.

Sorting Algorithm Example: A = 3, 5, 8, 10

B = 4, 7, 9, 12Even(A) = 5 ,10 Odd(A) = 3, 8Even(B) = 7, 12 Odd(B) = 4, 9C = 3, 7, 8, 12D = 4, 5, 9, 10E = 3, 4, 7, 5, 8, 9, 12, 10After replacing in EE = 3, 4, 5, 7, 8, 9, 10, 12

Time: Using O(N) processors the merge will take O(log(N)) time The merge sort runs the merge algorithm log(N) times so the total cost of the merge sort is O(log^2(N)).

Find Algorithm

Description: If array X contains the value Val the Res needs to be True else Res needs to be False.

Simple Algorithm: Using a serial algorithm it will take O(N) time.

Smart Algorithm: Using O(N) processor. Res = False. Each process i tests if X[I] = Val if true Res = True.

Time: O(1).

Model Description

Many processors.Processors can send messages to

each other through communication.We will want that each processor will

have a unique identification.Since we have O(n) processors we

need O(logn) bit to represent the Id.

Model Description

Clean Net: when a processor doesn’t now anything about his neighbors, not even their Id’s. he only knows how many neighbors he have.

We will explicitly mention when dealing with Clean Net, otherwise every processor has a unique Id.

Model Description

Message should include sender and receiver Id and some information - total O(logn) bits.

If X wants to send message to Y through Z, it will cost 2 steps to send the message.

X Z Y

Model Description

Local computation doesn’t take time.

we will analyze:time complexity - the number of steps the algorithm takes in the worst case.communication complexity - the total number of messages that we sent in the execution of the algorithm in the worst case.

Distributed vs. Sequential

Communication - we need in the distributed model but not in the sequential.

Partial knowledge - together all the processor knows everything, but not all the processors necessarily knows everything.

There can be processors or communication channels down.

Distributed vs. Sequential

Synchronization - we need to synchronize the processor.

Synchronic Model

there is a global clock. In any clock cycle each of the

processor- send messages to his neighbors.- receive messages from his neighbors.- make local computation in 0 time.- change state.

Asynchronies Model

There is no global clock.if a message was sent it will

eventually arrive to its destination (with no fall downs) but we can't assume anything about the arrival time.

we will start the time from the beginning of the execution until the last processor stooped.

Asynchronies Model

We will force the assumption that any of the messages arrived in one time unit in the worst case for time complexity calculations.

Model Representation

We can represent the processors net with a graph.

Each node in the graph is a processor.

There is an edge between two nodes if there is a direct communication channel between the two processors they represent.

Complexity

C(, G, I) - communication complexity:the total number of messages that were sent in the execution in the worst case.

T(, G, I) - time complexity:the number of clock cycles that the execution take in the worst case.

Where is the protocol, G is the graph and I is the input.

Complexity - examples

The following examples are in a full graph.

n

21

Complexity - example 1

Protocol A: node 1 send the message m to node 2.

C(A, G, I) = 1.T(A, G, I) = 1.

1 2m


Protocol B: node 1 send the message mi to the node i.

C(B, G, I) = n.T(B, G, I) = 1.

1 imi

iG


Protocol C: node i send the message mi to node i+1.

C(C, G, I) = n.T(C, G, I) = 1.

i i+1mi

iG


Protocol D: node i send the message m to node i+1 in cycle i.

C(D, G, I) = n.T(D, G, I) = n.

1m

2

2m

3

.

.

.

Transmission Problem

Input: there is a message m in the node V0.

Output: the message m is written in all the nodes in the graph.

dG(x,y) - the shortest path from x to y in graph G.

D = Diameter(G) = max x,yV { dG(x,y) }.

Algorithms for the Transmission ProblemDirect Delivery.Spanning Tree.DFS.Flooding.

Direct Delivery

Bases on the assumptions:- there is a routing system, such as that messages are sent in the shortest path.- V0 knows the addresses of all other nodes in the graph.

V0 send the message m n-1 times, each time to a different node.

DD Communication ComplexityV0 sends n messages.It takes O(D) steps for each

massages.C(DD, G, I) = O(n*D).

DD Time Complexity

Under the assumptions:1. synchronic model.2. V0 sends one new message in any clock cycle.

There won’t be collisions between messages, because messages goes in the shortest path, and therefore we can’t have more then one message for a given distance from V0.

DD Time Complexity

The last messages will be sent in the n-1 cycle.

It will take O(D) steps for the last message to arrive.

T(DD, G, I) = O( n+D ).

DD Time Complexity

We can show the same time complexity even without assumption 2.

If we will have two messages in a node competing for the same edge. We will send the message that should arrive to the node with the smaller Id.

the message for node i, in time t, must be in a distance t-i+1 from V0 (or in Vi).

Spanning Tree

Assumptions:We have a spanning tree in the graph, that all the node aware off (each node knows which of his edges is part of the spanning tree).

Each node that receive the message send it on the spanning tree edges.

Spanning Tree Complexity

We send the message once for each spanning tree edge.

C(ST) = n-1.We need tree depth rounds until

the last node receive the message.T(ST) = O( Depth( tree, V0 ) ).If we choose a BFS tree: T(ST) =

O(D).

Building a Spanning Tree

If we don’t have a spanning tree, we can built one using any algorithm A for Transmission.

Execute algorithm A.each node V choose as a parent

the node W from which it received the message for the first time.

Building a Spanning Tree

V inform W that he is his parent.The edge E(W,V) is marked as a

spanning tree edge.Since transmission algorithm

deliver the message to all nodes, we know that all the nodes are in the spanning tree.

We have no cycles since V choose only one parent.

DFS

We traverse the graph in DFS order.

If we reached a new node we leave a copy of the message, mark the node and continue the traversal.

If we reached a marked node we go back.

DFS Complexity

In the DFS algorithm we move on each edge exactly twice.

C(DFS) = T(DFS) = O(E).

Flooding

Each node that receive the message for the first time, sent it to all of his neighbors.

When a node receive a message in the next times, it just dump the message.

Flooding is affective also in a Clean Net.

Flooding Complexity

In each edge the message will pass twice, once in each direction.

C(Flood) = O(E).After t time unit the message will

reach all the nodes that their distance from V0 is smaller or equal to t.

T(Flood) = O(D).

concurrent programming

Documents