concurrent programming

62
יייי יי ייי יייי יייי027382977 ייייי ייייי32033946

Upload: kinsey

Post on 15-Jan-2016

65 views

Category:

Documents


0 download

DESCRIPTION

Concurrent Programming. נכתב על ידי מאיר בכור 027382977 אביתר שרעבי 32033946. Module. The module we are talking about is : computer with multiple processors but only one memory unit. All the processors are synchronized using the same clock. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Concurrent  Programming

נכתב על ידי

027382977מאיר בכור

32033946אביתר שרעבי

Page 2: Concurrent  Programming

Module

The module we are talking about is: computer with multiple processors but only one

memory unit. All the processors are synchronized using the same

clock. The processors are all connected to each other and

to the memory. If more then one processor writes the same value to

the same address in memory at the same time then the value will be written correctly. If the values are not the same then any value can be written.

Page 3: Concurrent  Programming

Module

More then one processor can read the same memory address at the same time.

Other modules: The processors are on different computers. There is no sheared memory for all the

processors. The processors are not using the same clock.

Page 4: Concurrent  Programming

Array Maximum Problem

On a computer with one processor: Time: O(N). Algorithm: Going over an array and keeping

the maximum.On a computer with K processors:

Time: O(N/K). Algorithm: Each processor handles N/K

elements from the array. And all the sum's of the parts of the array are summed together.

Page 5: Concurrent  Programming

Array Maximum Problem

On a computer with O(N) processors. Time: O(log(N)). Algorithm: On the first stage every processor

will add 2 items. So after the first round will have N/2 numbers. On the next round N/4 processors each will take 2 numbers and sum them so we will have on ly N/4 result after the 2 round. After log(N) rounds we will have the sum of the array.

Page 6: Concurrent  Programming

Array Maximum Problem

1 2 3 4 5 6 7 8

Example: 8 elements time 3 = Log(8).

Page 7: Concurrent  Programming

Array Maximum Problem

The number of commutations that are performed is 7 (4 in the first round, 2 in the second and 1 in the last). This is the same number of computation that is being done in the serial algorithm but it’s being done in less time.

This Algorithm will work for a lot of other functions not just Max like Min, Sum, Avg.It will work for every Associative function.

Page 8: Concurrent  Programming

Finding The Two Greatest Numbers

Simple solution for O(N) processors. Algorithm: Find the first maximum remove it from the

array and find the second. Time: 2 Log(N).

Smart algorithm for O(N) processors. Algorithm:

First round: each processor handles 2 items find the max and puts the other item in a.

Rounds 2..log(n): each processors handles 2 of the result of the second round compares the 2 Max values takes the Max as the new Max. and Takes the candidate group of the new max adds the max of the second group to it as the new candidate group.

Page 9: Concurrent  Programming

Finding The Two Greatest Numbers

On The last round the Max of the array is the maximum and the second max is the maximum of the candidate group.

Sample:Array: 7, 10, 1, 3, 100, 8, 55, 6.

Page 10: Concurrent  Programming

Finding The Two Greatest Numbers

7 10 1 3 100 8 55 6

10 3 100 55 7 1 8 6

10 100 7 8

3 55

100 8

55 10

Results: The maximum is the maximum of the array (100) and the second maximum is the maximum of the candidate group (55).

Page 11: Concurrent  Programming

Finding The Two Greatest Numbers

Time:Log(N) + LogLog(N).Log(N) to find the first maximum and the

candidate group.LogLog(N) to find the maximum in the

candidate group.The candidate group size grows in 1 in

each round (the maximum of the other group) so at the end it’s size is Log(N).

Page 12: Concurrent  Programming

Merge problem

Description: We have 2 sorted N size arrays B, C and we need to divide them into 2 new N sized arrays A1, A2 that the N largest items from both B and C will be in A1 and the N smallest will be in A2.

Simple solution: We can merge B and C into one sorted array A and copy the firs N elements to A1 and the last N elements to A2. But with this algorithm we can’t use multiple processors the cost will still be O(N).

Page 13: Concurrent  Programming

Merge problem

Smart algorithm for O(N) processors. Processor I compares Bi with Cn+1-i the largest of

the two is going to A1 and the other to A2.

Correction proof. If Bi > Cn+1-i the Bi > B1..Bi-1 and Cn+1-i > C1..Cn-

iso Bi is larger then N elements (I - 1 from B and N - i + 1 from C) so Bi needs to be in A1.

If Cn+1-i > Bi then Cn+1-i is larger then N elements ( N - I from C and I from B ) so Cn+1-i needs to be in A1.

Page 14: Concurrent  Programming

Merge problem

Example: B: 1, 8, 10, 17C: 9, 12, 67, 100(B1, Cn), (B2, Cn-1), (B3, Cn-2), (B4, Cn-3).A1 : 100, 67, 12, 17.A2 : 1, 8, 10, 9.

Time: We can do all the comparisons at the same time so the cost will be O(1).

Page 15: Concurrent  Programming

Prefix Problem

Description: Find the sum of the elements group.S11 = X1S12 = X1 + X2

S1n = X1 + X2 +… Xn-1+XnSimple solution: Compute the sums with N

processors time O(NLogN) N sums where each one takes O(LogN).

Page 16: Concurrent  Programming

Prefix Problem

Algorithm:for I = 0 to n-1 doip

Si = Xifor j = 0 to log n dofor I = 2^j to n-1 doip Si = Si + Si-2^jThe doip means do in parallel in the different

processor.At the end the results are in the array s.

Page 17: Concurrent  Programming

Prefix Problem

Example: With 8 numbers X1..X8 Sij is Xi + Xi+1… + Xj.

X1 X2 X3 X4 X5 X6 X7 X8

S11 S12 S23 S34 S45 S56 S67 S78

S11 S12 S13 S14 S25 S36 S47 S58

S11 S12 S13 S14 S15 S16 S17 S18

Page 18: Concurrent  Programming

Prefix Problem

Time:each round we get double the result S1i so after log(n) rounds we will get all the result.

In order to use this algorithm each processor needs to be connected to log(n) other processors.

Page 19: Concurrent  Programming

Prefix Problem

Usage exampleProblem : we have an arithmetic expression and we need to test if the brackets arrangement is legal. Algorithm: we will create an array x by adding 1 for each “(“ and -1 for each “)”. And run the prefix algorithm. The results needs to be.S11 = 1 and S11..S1n-1>=0 and S1n = 0.Time with N processors : O(logN) log(N) for the prefix algorithm and O(1) for the test.

Page 20: Concurrent  Programming

Partition Problem Description: We have and array X that some of it’s

element are signed we need to move all the signed elements to one array and the none signed to another array.

Simple solution: We take 2 stacks we push the signed into one stack and the none signed into the other stack. It will take o(N) time.

Simple solution 2: We take two indexes one for the start of the array and one to the end. The first search for signed and the second for none signed and when they both find they exchange the items they point to and move on until they meet. This will take o(N) time too but it’s more parallel.

Page 21: Concurrent  Programming

Partition Problem

Smart algorithm for O(N) processors: Create a new array B but in be if the

element i is signed B[i] = 1 else B[i] = 0.

Create an array C with the prefix sums of B that is C[i] = B[1] + B[2] + … B[i].

If X[i] is signed then Y1[C[i]] = X[i]. If X[i] is not signed then Y2[i-C[i]] = X[i].

Page 22: Concurrent  Programming

Partition Problem

Example: X = 2, 4, 7, 8, 1, 3, 10, 12, 15.

X = 2, 4, 7, 8, 1, 3, 10, 12, 15 B = 0, 1, 0, 0, 0, 1, 1, 0, 1 C = 0, 1, 1, 1, 1, 2, 3, 3, 4 Y1 = 4, 3, 10, 15 Y2 = 2, 7, 8, 1, 12

Page 23: Concurrent  Programming

Partition Problem

Time with O(N) processor.Computing B: O(1).Computing C: O(log(n)) using the prefix algorithm.Computing Y1 and Y2: O(1).Total: O(log(n)).

Page 24: Concurrent  Programming

Sorting AlgorithmDescription: Sorting array A using O(N^2)

processors and put the result into array C.Simple algorithm: The serial algorithm for

sorting an array takes a minimum of O(Nlog(N)) time.

Smart algorithm Create a matrix B size of N*N and initialize it

with zeroes at all cells. We will look at the N^2 processor as a matrix of

processors. Processor Pi,j will compute Ai>=Aj if true then B[i,j] =1.

Page 25: Concurrent  Programming

Sorting Algorithm

For each i from 1 to N C[Sum(i)] = A[i]. When Sum(i) is the sum of B[i,1] to B[i,N].

Example: A=3, 5, 2, 9, 1Matrix B 1 2 3 4 5

1 1 0 1 0 1 2 1 1 1 0 1 3 0 0 1 0 1 4 1 1 1 1 0

5 0 0 0 0 1

Page 26: Concurrent  Programming

Sorting Algorithm

C = 1, 2, 3, 5, 9. Time: Using O(N^2) processors finding B matrix will

take O(1) and finding C will cost O(log(N)).So the total cost of the algorithm will be

O(log(N)).Using O(N) processors finding B will take O(N)

time and finding C will take O(N) time so the total will be O(N).

Page 27: Concurrent  Programming

Sorting Algorithm

Description: Sorting array A using O(N^2) processors and put the result into array C.

Algorithm: Merge sort the largest cost in the merge sort algorithm is the cost of the merge. Using a serial algorithm the cost of merging 2 sorted arrays is O(N) and the cost of the merge sort algorithm is O(Nlog(N)). We will use the regular algorithm but with a smarter merge algorithm.

Page 28: Concurrent  Programming

Sorting Algorithm

Smart merge algorithm Description: We need to merge two sorted

arrays A, B to a sorted array R. Algorithm: We will describe a recursive

algorithm Merge.C=merge(even(A), odd(B)).D=merge(odd(A), even(B)).Where odd(A) is all the items in A with an Odd index. And Even(A) is all the items in A with an even index.

Page 29: Concurrent  Programming

Sorting Algorithm

When C = C0, C1, C2….Cn D = D0, D1, D2….DnE=C0, D0, C1, D1…Cn, Dn.Compare each Ci,Di and if Ci>Di then replace Ci and Di in array E.And array E is the merger of C and D.

Page 30: Concurrent  Programming

Sorting Algorithm Example: A = 3, 5, 8, 10

B = 4, 7, 9, 12Even(A) = 5 ,10 Odd(A) = 3, 8Even(B) = 7, 12 Odd(B) = 4, 9C = 3, 7, 8, 12D = 4, 5, 9, 10E = 3, 4, 7, 5, 8, 9, 12, 10After replacing in EE = 3, 4, 5, 7, 8, 9, 10, 12

Time: Using O(N) processors the merge will take O(log(N)) time The merge sort runs the merge algorithm log(N) times so the total cost of the merge sort is O(log^2(N)).

Page 31: Concurrent  Programming

Find Algorithm

Description: If array X contains the value Val the Res needs to be True else Res needs to be False.

Simple Algorithm: Using a serial algorithm it will take O(N) time.

Smart Algorithm: Using O(N) processor. Res = False. Each process i tests if X[I] = Val if true Res = True.

Time: O(1).

Page 32: Concurrent  Programming

Model Description

Many processors.Processors can send messages to

each other through communication.We will want that each processor will

have a unique identification.Since we have O(n) processors we

need O(logn) bit to represent the Id.

Page 33: Concurrent  Programming

Model Description

Clean Net: when a processor doesn’t now anything about his neighbors, not even their Id’s. he only knows how many neighbors he have.

We will explicitly mention when dealing with Clean Net, otherwise every processor has a unique Id.

Page 34: Concurrent  Programming

Model Description

Message should include sender and receiver Id and some information - total O(logn) bits.

If X wants to send message to Y through Z, it will cost 2 steps to send the message.

X Z Y

Page 35: Concurrent  Programming

Model Description

Local computation doesn’t take time.

we will analyze:time complexity - the number of steps the algorithm takes in the worst case.communication complexity - the total number of messages that we sent in the execution of the algorithm in the worst case.

Page 36: Concurrent  Programming

Distributed vs. Sequential

Communication - we need in the distributed model but not in the sequential.

Partial knowledge - together all the processor knows everything, but not all the processors necessarily knows everything.

There can be processors or communication channels down.

Page 37: Concurrent  Programming

Distributed vs. Sequential

Synchronization - we need to synchronize the processor.

Page 38: Concurrent  Programming

Synchronic Model

there is a global clock. In any clock cycle each of the

processor- send messages to his neighbors.- receive messages from his neighbors.- make local computation in 0 time.- change state.

Page 39: Concurrent  Programming

Asynchronies Model

There is no global clock.if a message was sent it will

eventually arrive to its destination (with no fall downs) but we can't assume anything about the arrival time.

we will start the time from the beginning of the execution until the last processor stooped.

Page 40: Concurrent  Programming

Asynchronies Model

We will force the assumption that any of the messages arrived in one time unit in the worst case for time complexity calculations.

Page 41: Concurrent  Programming

Model Representation

We can represent the processors net with a graph.

Each node in the graph is a processor.

There is an edge between two nodes if there is a direct communication channel between the two processors they represent.

Page 42: Concurrent  Programming

Complexity

C(, G, I) - communication complexity:the total number of messages that were sent in the execution in the worst case.

T(, G, I) - time complexity:the number of clock cycles that the execution take in the worst case.

Where is the protocol, G is the graph and I is the input.

Page 43: Concurrent  Programming

Complexity - examples

The following examples are in a full graph.

n

21

Page 44: Concurrent  Programming

Complexity - example 1

Protocol A: node 1 send the message m to node 2.

C(A, G, I) = 1.T(A, G, I) = 1.

1 2m

Page 45: Concurrent  Programming

Complexity - example 2

Protocol B: node 1 send the message mi to the node i.

C(B, G, I) = n.T(B, G, I) = 1.

1 imi

iG

Page 46: Concurrent  Programming

Complexity - example 3

Protocol C: node i send the message mi to node i+1.

C(C, G, I) = n.T(C, G, I) = 1.

i i+1mi

iG

Page 47: Concurrent  Programming

Complexity - example 4

Protocol D: node i send the message m to node i+1 in cycle i.

C(D, G, I) = n.T(D, G, I) = n.

1m

2

2m

3

.

.

.

Page 48: Concurrent  Programming

Transmission Problem

Input: there is a message m in the node V0.

Output: the message m is written in all the nodes in the graph.

dG(x,y) - the shortest path from x to y in graph G.

D = Diameter(G) = max x,yV { dG(x,y) }.

Page 49: Concurrent  Programming

Algorithms for the Transmission ProblemDirect Delivery.Spanning Tree.DFS.Flooding.

Page 50: Concurrent  Programming

Direct Delivery

Bases on the assumptions:- there is a routing system, such as that messages are sent in the shortest path.- V0 knows the addresses of all other nodes in the graph.

V0 send the message m n-1 times, each time to a different node.

Page 51: Concurrent  Programming

DD Communication ComplexityV0 sends n messages.It takes O(D) steps for each

massages.C(DD, G, I) = O(n*D).

Page 52: Concurrent  Programming

DD Time Complexity

Under the assumptions:1. synchronic model.2. V0 sends one new message in any clock cycle.

There won’t be collisions between messages, because messages goes in the shortest path, and therefore we can’t have more then one message for a given distance from V0.

Page 53: Concurrent  Programming

DD Time Complexity

The last messages will be sent in the n-1 cycle.

It will take O(D) steps for the last message to arrive.

T(DD, G, I) = O( n+D ).

Page 54: Concurrent  Programming

DD Time Complexity

We can show the same time complexity even without assumption 2.

If we will have two messages in a node competing for the same edge. We will send the message that should arrive to the node with the smaller Id.

the message for node i, in time t, must be in a distance t-i+1 from V0 (or in Vi).

Page 55: Concurrent  Programming

Spanning Tree

Assumptions:We have a spanning tree in the graph, that all the node aware off (each node knows which of his edges is part of the spanning tree).

Each node that receive the message send it on the spanning tree edges.

Page 56: Concurrent  Programming

Spanning Tree Complexity

We send the message once for each spanning tree edge.

C(ST) = n-1.We need tree depth rounds until

the last node receive the message.T(ST) = O( Depth( tree, V0 ) ).If we choose a BFS tree: T(ST) =

O(D).

Page 57: Concurrent  Programming

Building a Spanning Tree

If we don’t have a spanning tree, we can built one using any algorithm A for Transmission.

Execute algorithm A.each node V choose as a parent

the node W from which it received the message for the first time.

Page 58: Concurrent  Programming

Building a Spanning Tree

V inform W that he is his parent.The edge E(W,V) is marked as a

spanning tree edge.Since transmission algorithm

deliver the message to all nodes, we know that all the nodes are in the spanning tree.

We have no cycles since V choose only one parent.

Page 59: Concurrent  Programming

DFS

We traverse the graph in DFS order.

If we reached a new node we leave a copy of the message, mark the node and continue the traversal.

If we reached a marked node we go back.

Page 60: Concurrent  Programming

DFS Complexity

In the DFS algorithm we move on each edge exactly twice.

C(DFS) = T(DFS) = O(E).

Page 61: Concurrent  Programming

Flooding

Each node that receive the message for the first time, sent it to all of his neighbors.

When a node receive a message in the next times, it just dump the message.

Flooding is affective also in a Clean Net.

Page 62: Concurrent  Programming

Flooding Complexity

In each edge the message will pass twice, once in each direction.

C(Flood) = O(E).After t time unit the message will

reach all the nodes that their distance from V0 is smaller or equal to t.

T(Flood) = O(D).