parallel prefix and data parallel operations

22
Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a 2 ) ) a 3 = a 1 ) (a 2 ) a 3 ) How to compute (a 1 ) a 2 ) …. ) a n ) in parallel in O(logn) time?

Upload: sokanon-brown

Post on 31-Dec-2015

58 views

Category:

Documents


9 download

DESCRIPTION

Parallel Prefix and Data Parallel Operations. Motivation: basic parallel operations which occurs repeatedly . Let ) be an associative operation. (a 1 ) a 2 ) ) a 3 = a 1 ) (a 2 ) a 3 ) How to compute (a 1 ) a 2 ) …. ) a n ) in parallel in O(logn) time?. Approach 1. - PowerPoint PPT Presentation

TRANSCRIPT

Parallel Prefix and Data Parallel Operations

Motivation: basic parallel operations which occurs repeatedly.Let ) be an associative operation.

(a1 ) a2) ) a3 = a1 ) (a2 ) a3 )

How to compute

(a1 ) a2 ) …. ) an ) in parallel in O(logn) time?

Approach 1

a0 a1 a2 a3 a4 a5 a6 a7

[0:1][0:0] [1:2] [2:3] [3:4] [4:5] [5:6] [6:7]

[0:1][0:0] [0:2] [0:3] [1:4] [2:5] [3:6] [4:7]

[0:1][0:0] [0:2] [0:3] [0:4] [0:5] [0:6] [0:7]

d=1

d=2

d=4

Assume that n = 2k

for i = 0 to k-1 for j = 0 to n-1-2i do in parallel

x[j+ 2i ] = x[j] + x[j+ 2i ]

How to do on Tree Architecture?

for each nodeif there is a signal from left and right

St <- Sl + Sr

if there is a signal R, send R to both its children

if the node is a leaf and there is a signal R, X <- X + R

SlSr

StR

How to do on a Hypercube

A complete binary tree can be embedded into a hypercubeSimpler solution: each node computes prefix and total sum for i = 0 to k-1 for j = 0 to n-1 do in parallel

x[j] = x[j] + sum[ji] if i-th bit of j = 1

sum[j ] = sum[j] + sum[ji],

where ji and j have the same binary number representation

except their i-th bit, where the i-th bit of ji is the

complement of the i-bit of j.

Prefix on Hypercube

a0 a1 a2 a3 a4 a5 a6 a7

for i = 0 to k-1 for j = 0 to n-1 do in parallel

x[j] = x[j] + sum[ji] if i-th bit of j = 1

sum[j ] = sum[j] + sum[ji],

[0:1]

[0:1]

[0:0]

[0:1]

[2:2]

[2:3]

[2:3]

[2:3]

[4:4]

[4:5]

[4:5]

[4:5]

[6:6]

[6:7]

[6:7]

[6:7]d=1X

SUM

[0:1]

[0:3]

[0:0]

[0:3]

[2:2]

[0:3]

[2:3]

[0:3]

[4:4]

[4:7]

[4:5]

[4:7]

[4:6]

[4:7]

[4:7]

[4:7]d=2X

SUM

[0:1]

[0:7]

[0:0]

[0:7]

[2:2]

[0:7]

[2:3]

[0:7]

[0:4]

[0:7]

[0:5]

[0:7]

[0:6]

[0:7]

[0:7]

[0:7]d=4X

SUM

Applications of Data Parallel Operations

Any associative operations:

Examples:– min, max, add– adding two binary numbers– finite state automata– radix sort– segmented prefix sum– routing

• packing• unpacking• broadcast (copy-scan)

– solving recurrence equations– straight line computation (parallel arithmetic evaluation)

Adding two n bit numbers as parallel prefix

• a = an-1 …. a0

• b = bn-1 …. b0

• s = a + b

• note that si = ai bi ci-1

• to compute ci define g and p as:

gi = ai bi , pi = ai bi

• define as : (g,p) (g’,p’) = (g (p g’), p p’)

Then carry bit ci can be computed by:

(g,p) (g’,p’) = (g (p g’), p p’)

(Gi, Pi) = (gi,pi) (gi-1, pi-1) … (g0,p0)

and Gi = ci

Hardware circuit of recursive look-ahead adder

a0

b0

a10

b10

a12

b12

a6

b6

a9

b9

a3

b3

a14

b14

a13

b13

a1

b1

a5

b5

a7

b7

a4

b4

a2

b2

a8

b8

a15

b15

a11

b11

Parsing a regular language

b b

c cq1q2q0

(q0,b) = q2, (q0,c) = q1, (q1,b) = q0, (q1,c) = qr,(q2,b) = qr, (q2,c) = q0qr: reject state

q0->q2q1->q0q2->qr

q2q0qr

q1qrq0

q1qrq0

q2q0qr

q1’q2’q3’

q1’q2’q3’

q0q1qr

q1qrq0

b

q1’q2’q3’

q0q1qr

q0qrq2

q0q1qr

q0qrq2

q0qrqr

bccb c

Segmented Prefix operation

Segment boundary

1 3 3 7 12 18 7 15after

1 2 3 4 5 6 7 8

before

Segmented Prefix computation

Let be any associative operation.For segmented operation of , define ’ as follows:

’ b | b

a a b | b | a | (a b) | b

Then ’ is associativeand we can compute segmented operation in O(logn) time.

Enumerating

Data = [5 6 3 1 8 3 7 5 9 2]

active procs = [1 0 1 1 0 0 1 0 1 0]

enumerated = [0 x 1 2 x x 3 x 4 0]

packing

data = [5 6 3 1 8 3 7 5 9 2]

active procs = [1 0 1 1 0 0 1 0 1 0]

enumerated = [0 x 1 2 x x 3 x 4 x]

packed data =[5 3 1 7 9 x x x x x]

Packing and Unpacking on Hypercube

Packing• adjust bit 0• adjust bit 1• adjust bit 2 • ...• adjust bit k-1

Unpacking• adjust bit k-1• adjust bit k-2• ...• adjust bit 1• adjust bit 0

How about in the order of adjust bit 0, 1, ..., k-1 for packing?

Unpacking

Address 0 1 2 3 4 5 6 7 8 9

data = [6 2 3 5 9 x x x x x]

active procs = [1 0 1 1 0 0 1 0 1 0]

enumerated = [0 x 1 2 x x 3 x 4 x]

destination = [0 2 3 6 8 x x x x x]

unpacked data = [6 x 2 3 x x 5 x 9 x]

Copy Scan (broadcast)

address 0 1 2 3 4 5 6 7 8 9

data = [ 6 2 3 5 9 4 1 7 8 10]

segmented bit = [ 1 0 1 1 0 0 1 0 1 0]

result = [ 6 6 3 5 5 5 1 1 8 8]

Radix Sort

for j = k-1 to 0 // x has k bits for all i in [0 .. n-1] do parallel { if j-th bit of x[i] is 0 { y[i] = enumerate c = count } if j-th bit of x[i] is 1 y [i] <- enumerate + c

x [y[i]] = x [i] }

Radix sort another code

for j = k-1 to 0 // x has k bits for all i in [0 .. n-1] do parallel { pack left x[i] if j-th bit of x[i] pack right x[i] if j-th bit of x[i] }

Quick Sort

1. Pick a pivot p

2. Broadcast p

3. For all PE i, compare A[i] with p

{ if A[i] <p, pack left A[i] in the segment

if A[i] >= p, pack right A[i] in the segment

}

4. Mark the segment boundary

5. Each segment, quick sort recursively

Solving Linear Recurrence Equations

fn=an-1fn-1 + an-2fn-2

fn

fn-1

Pointer Jumping and Tree Computation

How to compute a prefix on a linked list?

1 2 3 4 5 6 7

If NEXT[i] != NILL then X[i] <- X[i] + X[NEXT[i]] NEXT[i] <- NEXT[NEXT[i]]

10 14 18 22 18 13 7

3 5 7 9 11 13 7

28 27 25 22 18 13 7

How to make 1 3 6 10 15 21 28 order?

Application: Tree computationPre-order numbering

Each node

Leaf node

1

1

Can be applied to in order, post ordernumber of children, depth etc.Bi-component, etc also

Recurrence Equation

Example: LU decomposition on a triangular matrix