6.189 - lecture 6 - parallel programmng design patterns i · dr. rodric rabbah, ibm. 10 6.189 iap...

30
Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT 6.189 IAP 2007 Lecture 6 Design Patterns for Parallel Programming I

Upload: others

Post on 27-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT

6.189 IAP 2007

Lecture 6

Design Patterns for Parallel Programming I

2 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

4 Common Steps to Creating a Parallel Program

P0

Tasks Processes Processors

P1

P2 P3

p0 p1

p2 p3

p0 p1

p2 p3

Partitioning

Sequentialcomputation

Parallelprogram

decomposition

assignment

orchestration

mapping

3 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Decomposition (Amdahl’s Law)

● Identify concurrency and decide at what level to exploit it

● Break up computation into tasks to be divided among processes

Tasks may become available dynamicallyNumber of tasks may vary with time

● Enough tasks to keep processors busyNumber of tasks available at a time is upper bound on achievable speedup

4 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Assignment (Granularity)

● Specify mechanism to divide work among coreBalance work and reduce communication

● Structured approaches usually work wellCode inspection or understanding of applicationWell-known design patterns

● As programmers, we worry about partitioning firstIndependent of architecture or programming modelBut complexity often affect decisions!

5 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Orchestration and Mapping (Locality)

● Computation and communication concurrency

● Preserve locality of data

● Schedule tasks to satisfy dependences early

6 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Parallel Programming by Pattern

● Provides a cookbook to systematically guide programmers Decompose, Assign, Orchestrate, MapCan lead to high quality solutions in some domains

● Provide common vocabulary to the programming communityEach pattern has a name, providing a vocabulary for discussing solutions

● Helps with software reusability, malleability, and modularityWritten in prescribed format to allow the reader to quickly understand the solution and its context

● Otherwise, too difficult for programmers, and software will not fully exploit parallel hardware

7 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

History

● Berkeley architecture professor Christopher Alexander

● In 1977, patterns for city planning, landscaping, and architecture in an attempt to capture principles for “living”design

8 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Example 167 (p. 783): 6ft Balcony

9 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Patterns in Object-Oriented Programming

● Design Patterns: Elements of Reusable Object-Oriented Software (1995)

Gang of Four (GOF): Gamma, Helm, Johnson, VlissidesCatalogue of patternsCreation, structural, behavioral

10 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Patterns for Parallelizing Programs

Algorithm Expression

● Finding ConcurrencyExpose concurrent tasks

● Algorithm StructureMap tasks to processes to exploit parallel architecture

Software Construction

● Supporting StructuresCode and data structuring patterns

● Implementation MechanismsLow level mechanisms used to write parallel programs

4 Design Spaces

Patterns for Parallel Programming. Mattson, Sanders, and Massingill(2005).

11 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Picture Reorder

join

IDCT

IQuantization

split

VLDmacroblocks, motion vectors

frequency encodedmacroblocks differentially coded

motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

Motion Vector Decode

Repeat

Color Conversion

MPEG bit streamMPEG Decoder

Motion Compensation

Display

Here’s my algorithm.Where’s the concurrency?

12 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

● Task decompositionIndependent coarse-grained computationInherent to algorithm

● Sequence of statements (instructions) that operate together as a group

Corresponds to some logical part of programUsually follows from the way programmer thinks about a problem

join

IDCT

IQuantization

split

VLDmacroblocks, motion vectors

frequency encodedmacroblocks differentially coded

motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

MPEG bit streamMPEG Decoder

Picture Reorder

Color Conversion

Display

Motion Vector Decode

Repeat

Motion Compensation

Here’s my algorithm.Where’s the concurrency?

13 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

join

IDCT

IQuantization

split

VLDmacroblocks, motion vectors

frequency encodedmacroblocks differentially coded

motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

MPEG bit streamMPEG Decoder

Motion Compensation

Picture Reorder

Color Conversion

Display

Motion Vector Decode

Repeat

Here’s my algorithm.Where’s the concurrency?

● Task decompositionParallelism in the application

● Data decompositionSame computation is applied to small data chunks derived from large data set

14 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

join

IDCT

IQuantization

split

VLDmacroblocks, motion vectors

frequency encodedmacroblocks differentially coded

motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

MPEG bit streamMPEG Decoder

Motion Compensation

Picture Reorder

Color Conversion

Display

Motion Vector Decode

Repeat

Here’s my algorithm.Where’s the concurrency?

● Task decompositionParallelism in the application

● Data decompositionSame computation many data

● Pipeline decompositionData assembly lines Producer-consumer chains

15 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Guidelines for Task Decomposition

● Algorithms start with a good understanding of the problem being solved

● Programs often naturally decompose into tasksTwo common decompositions are– Function calls and – Distinct loop iterations

● Easier to start with many tasks and later fuse them, rather than too few tasks and later try to split them

16 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Guidelines for Task Decomposition

● FlexibilityProgram design should afford flexibility in the number and size of tasks generated– Tasks should not tied to a specific architecture– Fixed tasks vs. Parameterized tasks

● EfficiencyTasks should have enough work to amortize the cost of creating and managing themTasks should be sufficiently independent so that managing dependencies doesn’t become the bottleneck

● SimplicityThe code has to remain readable and easy to understand, and debug

17 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Guidelines for Data Decomposition

● Data decomposition is often implied by task decomposition

● Programmers need to address task and data decomposition to create a parallel program

Which decomposition to start with?

● Data decomposition is a good starting point whenMain computation is organized around manipulation of a large data structureSimilar operations are applied to different parts of the data structure

18 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Common Data Decompositions

● Array data structuresDecomposition of arrays along rows, columns, blocks

● Recursive data structuresExample: decomposition of trees into sub-trees

problem

subproblem subproblem

computesubproblem

computesubproblem

computesubproblem

computesubproblem

subproblem subproblem

solution

merge merge

merge

split split

split

19 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Guidelines for Data Decomposition

● FlexibilitySize and number of data chunks should support a wide range of executions

● EfficiencyData chunks should generate comparable amounts of work (for load balancing)

● SimplicityComplex data compositions can get difficult to manage and debug

20 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Case for Pipeline Decomposition

● Data is flowing through a sequence of stagesAssembly line is a good analogy

● What’s a prime example of pipeline decomposition in computer architecture?

Instruction pipeline in modern CPUs

● What’s an example pipeline you may use in your UNIX shell?Pipes in UNIX: cat foobar.c | grep bar | wc

● Other examplesSignal processingGraphics

IDCT

IQuantization

ZigZag

Saturation

Dr. Rodric Rabbah, IBM. 21 6.189 IAP 2007 MIT

6.189 IAP 2007

Re-engineering for Parallelism

22 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Reengineering for Parallelism

● Parallel programs often start as sequential programsEasier to write and debugLegacy codes

● How to reengineer a sequential program for parallelism:Survey the landscapePattern provides a list of questions to help assess existing codeMany are the same as in any reengineering projectIs program numerically well-behaved?

● Define the scope and get users acceptanceRequired precision of resultsInput rangePerformance expectationsFeasibility (back of envelope calculations)

23 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Reengineering for Parallelism

● Define a testing protocol

● Identify program hot spots: where is most of the time spent?Look at codeUse profiling tools

● ParallelizationStart with hot spots firstMake sequences of small changes, each followed by testingPattern provides guidance

24 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Example: Molecular dynamics

● Simulate motion in large molecular systemUsed for example to understand drug-protein interactions

● Forces Bonded forces within a moleculeLong-range forces between atoms

● Naïve algorithm has n2 interactions: not feasible

● Use cutoff method: only consider forces from neighbors that are “close enough”

25 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Sequential Molecular Dynamics Simulator

// pseudo codereal[3,n] atomsreal[3,n] forceint [2,m] neighborsfunction simulate(steps)

for time = 1 to steps and for each atomCompute bonded forcesCompute neighbors Compute long-range forcesUpdate position

end loopend function

26 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Decomposition Patterns

Finding Concurrency Design Space

Dependency Analysis Patterns

Design Evaluation

27 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Decomposition Patterns

● Main computation is a loop over atoms

● Suggests task decompositionTask corresponds to a loop iteration

– Update a single atomAdditional tasks

– Calculate bonded forces – Calculate long range forces

Find neighborsUpdate position

● There is data shared between the tasks

for time = 1 to steps andfor each atom

Compute bonded forcesCompute neighborsCompute long-range forcesUpdate position

end loop

28 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Understand Control Dependences

Bonded forcesNeighbor list

Update position

Long-rangeforces

next time step

29 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Understand Data Dependences

Bonded forcesNeighbor list

Update position

Long-rangeforces

next time step

atoms[3,n]

forces[2,n]

neighbors[2,m]

Read

Write

Accumulate

30 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Evaluate Design

● What is the target architecture?Shared memory, distributed memory, message passing, …

● Does data sharing have enough special properties (read only, accumulate, temporal constraints) that we can deal with dependences efficiently?

● If design seems OK, move to next design space