lecture 8 – collective pattern collectives pattern parallel computing cis 410/510 department of...

31
Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Upload: clifford-short

Post on 18-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Collectives Pattern

Parallel Computing

CIS 410/510

Department of Computer and Information Science

Page 2: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Outline What are Collectives? Reduce Pattern Scan Pattern Sorting

2Introduction to Parallel Computing, University of Oregon, IPCC

Page 3: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Collectives Collective operations deal with a collection of data

as a whole, rather than as separate elements Collective patterns include:

❍ Reduce ❍ Scan ❍ Partition❍ Scatter❍ Gather

3Introduction to Parallel Computing, University of Oregon, IPCC

Page 4: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Collectives Collective operations deal with a collection of data

as a whole, rather than as separate elements Collective patterns include:

❍ Reduce ❍ Scan ❍ Partition❍ Scatter❍ Gather

4

Reduce and Scan will be covered in this lecture

Introduction to Parallel Computing, University of Oregon, IPCC

Page 5: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce Reduce is used to combine a collection of

elements into one summary value A combiner function combines elements pairwise A combiner function only needs to be associative

to be parallelizable Example combiner functions:

❍ Addition❍ Multiplication❍ Maximum / Minimum

5Introduction to Parallel Computing, University of Oregon, IPCC

Page 6: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce

6

Serial Reduction Parallel Reduction

Introduction to Parallel Computing, University of Oregon, IPCC

Page 7: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce Vectorization

7Introduction to Parallel Computing, University of Oregon, IPCC

Page 8: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce Tiling is used to break chunks of work up for

workers to reduce serially

8Introduction to Parallel Computing, University of Oregon, IPCC

Page 9: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce – Add Example

9

1 2 45 9 7 0 1

Introduction to Parallel Computing, University of Oregon, IPCC

Page 10: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce – Add Example

10

1 2 45 9 7 0 1

28

12

3

8

21

29

28

29

Introduction to Parallel Computing, University of Oregon, IPCC

Page 11: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce – Add Example

11

1 2 45 9 7 0 1

Introduction to Parallel Computing, University of Oregon, IPCC

Page 12: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce – Add Example

12

1 2 45 9 7 0 1

3 9 116

12 17

29

29

Introduction to Parallel Computing, University of Oregon, IPCC

Page 13: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce We can “fuse” the map and reduce patterns

13Introduction to Parallel Computing, University of Oregon, IPCC

Page 14: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce Precision can become a problem with reductions

on floating point data Different orderings of floating point data can

change the reduction value

14Introduction to Parallel Computing, University of Oregon, IPCC

Page 15: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Reduce Example: Dot Product 2 vectors of same length Map (*) to multiply the components Then reduce with (+) to get the final answer

15

Also:

Introduction to Parallel Computing, University of Oregon, IPCC

Page 16: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Dot Product – Example Uses Essential operation in physics, graphics, video games,… Gaming analogy: in Mario Kart, there are “boost pads” on the

ground that increase your speed❍ red vector is your speed (x and y direction)❍ blue vector is the orientation of the boost pad (x and y direction). Larger

numbers are more power.

16

Photo source

How much boost will you get? For the analogy, imagine the pad multiplies your speed:• If you come in going 0, you’ll get nothing• If you cross the pad perpendicularly, you’ll

get 0 [just like the banana obliteration, it will give you 0x boost in the perpendicular direction]

Ref: http://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/

Introduction to Parallel Computing, University of Oregon, IPCC

Page 17: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan The scan pattern produces partial reductions of

input sequence, generates new sequence Trickier to parallelize than reduce Inclusive scan vs. exclusive scan

❍ Inclusive scan: includes current element in partial reduction

❍ Exclusive scan: excludes current element in partial reduction, partial reduction is of all prior elements prior to current element

17Introduction to Parallel Computing, University of Oregon, IPCC

Page 18: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan – Example Uses Lexical comparison of strings – e.g., determine that “strategy”

should appear before “stratification” in a dictionary Add multi-precision numbers (those that cannot be represented in

a single machine word) Evaluate polynomials Implement radix sort or quicksort Delete marked elements in an array Dynamically allocate processors Lexical analysis – parsing programs into tokens Searching for regular expressions Labeling components in 2-D images Some tree algorithms – e.g., finding the depth of every vertex in a

tree18Introduction to Parallel Computing, University of Oregon, IPCC

Page 19: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan

19

Serial Scan

Parallel Scan

Introduction to Parallel Computing, University of Oregon, IPCC

Page 20: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan One algorithm for parallelizing scan is to perform

an “up sweep” and a “down sweep” Reduce the input on the up sweep The down sweep produces the intermediate results

20

Up sweep – compute reduction

Down sweep – compute intermediate values

Introduction to Parallel Computing, University of Oregon, IPCC

Page 21: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan – Maximum Example

21

1 2 7 2 4 34 0

1 2 7 2 4 34 0

Introduction to Parallel Computing, University of Oregon, IPCC

Page 22: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan – Maximum Example

22

1 2 7 2 4 34 0

1 2 7 2 4 34 0

1 1

4

4

4

44

4

4

4

44

42

4

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

4

777

Introduction to Parallel Computing, University of Oregon, IPCC

Page 23: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan Three phase scan with tiling

23Introduction to Parallel Computing, University of Oregon, IPCC

Page 24: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan

24Introduction to Parallel Computing, University of Oregon, IPCC

Page 25: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan Just like reduce, we can also fuse the map pattern

with the scan pattern

25Introduction to Parallel Computing, University of Oregon, IPCC

Page 26: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Scan

26Introduction to Parallel Computing, University of Oregon, IPCC

Page 27: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Merge Sort as a reduction We can sort an array via a pair of a map and a

reduce Map each element into a vector containing just that

element❍<> is the merge operation: [1,3,5,7] <> [2,6,15]

= [1,2,3,5,6,7,15]❍ [] is the empty list

How fast is this?

27Introduction to Parallel Computing, University of Oregon, IPCC

Page 28: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Right Biased Sort

Start with [14,3,4,8,7,52,1]

Map to [[14],[3],[4],[8],[7],[52],[1]]

Reduce:

[14] <> ([3] <> ([4] <> ([8] <> ([7] <> ([52] <> [1])))))

= [14] <> ([3] <> ([4] <> ([8] <> ([7] <> [1,52]))))

= [14] <> ([3] <> ([4] <> ([8] <> [1,7,52])))

= [14] <> ([3] <> ([4] <> [1,7,8,52]))

= [14] <> ([3] <> [1,4,7,8,52])

= [14] <> [1,3,4,7,8,52]

= [1,3,4,7,8,14,52]

28Introduction to Parallel Computing, University of Oregon, IPCC

Page 29: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Right Biased Sort Cont How long did that take? We did O(n) merges…but each one took O(n) time O(n2) We wanted merge sort, but instead we got insertion

sort!

29Introduction to Parallel Computing, University of Oregon, IPCC

Page 30: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Tree Shape Sort

Start with [14,3,4,8,7,52,1]

Map to [[14],[3],[4],[8],[7],[52],[1]]

Reduce:(([14] <> [3]) <> ([4] <> [8])) <> (([7] <> [52]) <> [1])

= ([3,14] <> [4,8]) <> ([7,52] <> [1])

= [3,4,8,14] <> [1,7,52]

= [1,3,4,7,8,14,52]

30Introduction to Parallel Computing, University of Oregon, IPCC

Page 31: Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science

Lecture 8 – Collective Pattern

Tree Shaped Sort Performance Even if we only had a single processor this is

better❍ We do O(log n) merges❍ Each one is O(n)❍ So O(n*log(n))

But opportunity for parallelism is not so great❍ O(n) assuming sequential merge❍ Takeaway: the shape of reduction matters!

31Introduction to Parallel Computing, University of Oregon, IPCC