ece 669 parallel computer architecture lecture 5 grid computations

22
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

Upload: brigid

Post on 11-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations. Outline. Motivating Problems (application case studies) Classifying problems Parallelizing applications Examining tradeoffs Understanding communication costs Remember: software and communication! . Jacobi, ... - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

ECE 669

Parallel Computer Architecture

Lecture 5

Grid Computations

Page 2: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Outline

° Motivating Problems (application case studies)

° Classifying problems

° Parallelizing applications

° Examining tradeoffs

° Understanding communication costs

• Remember: software and communication!

Page 3: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Current status° We saw how to set up a system of equations° How to solve them° Poisson: Basic idea

° In iterative methods

• Iterate till no difference• The ultimate parallel method

IterativeDirect

Jacobi, ...Multigrid... . . .

Or

0 for Laplace

0

1s2 i 1, jA i 1, jA i , j1A i ,j 1

kA i ,j4A i , jB

Ai,j

Ai1,j Ai1, j i, j1A i, j1A4 i,jC

i , jk1A i 1, j

kA i 1, jkA i ,j 1

kA i ,j 1kA

4 i , jC

Page 4: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Examining Optimizations

• Optimizations • SOR• Gauss-Seidel

•Use recent values ASAP

Slow!

Jacobi relaxation

nO(n2)

Page 5: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Parallel Implementation

° Q: How would you partition the problem? • Say, on 4 processors?

° Communication! ° What about synchronization?

Ai, j

4

i: 0 1 2 3 4 5 6 7j: 0

654321

7

Page 6: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Machine model

• Data is distributed among memories (ignore initial I/O costs)• Communication over network-explicit• Processor can compute only on data in local memory.

- To effect communication, processor sends data to other node

Interconnectionnetwork

M

P

M M

P P

Page 7: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Turbo charging – Iterative methods

° SOR: Successive Over Relaxation• Accelerate towards direction of change

Difference (new-old)0(n2)

n

2A B 0

i,jk1A i1, j

kA i1, jkA i,j1

kA i,j 1kA

4 i,jb

i,jk1A i, j

kA w i1, jkA i1,j

kA i,j1kA i, j1

kA4 i, j

kA

Page 8: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

° Basic idea ---> Solve on coarse grid ---> then on fine grid

• In practice -- solve for errors in next finer grid. But communication and computation patterns stay the same.

Multigrid

8, 1

1, 1

8, 8

1, 8

X k+1

Page 9: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

°

Multigrid

Basic idea ---> Solve on coarse grid ---> then on fine grid

8, 1

1, 1

8, 8

1, 8

X k+1

Page 10: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

8, 1

1, 1

8, 8

1, 8

Basic idea ---> Solve on coarse grid ---> then on fine grid

(Note: In practice -- solve for errors in next finer grid. But communication and computation patterns stay the same.)

8, 1

1, 1

8, 8

1, 8

°

Multigrid

X k+1 i, j

Page 11: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Example: iterative equation solver

° Simplified version of a piece of Ocean simulation° Illustrate program in low-level parallel language

• C-like pseudocode with simple extensions for parallelism• Expose basic comm. and synch. primitives• State of most real parallel programming today

A[i,j ] = 0.2 (A[i,j ] + A[i,j – 1] + A[i – 1, j] +A[i,j + 1] + A[i + 1, j ])

Expression for updating each interior point:

Page 12: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Grid Solver

° Gauss-Seidel (near-neighbor) sweeps to convergence• Interior n-by-n points of (n+2)-by-(n+2) updated in each sweep• Updates done in-place in grid• Difference from previous value computed• Check if has converged

- to within a tolerance parameter

A[i,j ] = 0.2 (A[i,j ] + A[i,j – 1] + A[i – 1, j] +A[i,j + 1] + A[i + 1, j ])

Expression for updating each interior point:

Page 13: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Exploit Application Knowledge

• Different ordering of updates: may converge quicker or slower • Red sweep and black sweep are each fully parallel: • Global synchronization between them

Red point

Black point

•Reorder grid traversal: red-black ordering

Page 14: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Point to Point Event Synchronization

° One process notifies another of an event so it can proceed

• Common example: producer-consumer (bounded buffer)• Concurrent programming on uniprocessor: semaphores• Shared address space parallel programs: semaphores, or use

ordinary variables as flags

•Busy-waiting or spinning

P1 P2

A = 1;a: while (flag is 0) do nothing; b: flag = 1;

print A;

Page 15: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Group Event Synchronization

° Subset of processes involved• Can use flags or barriers (involving only the subset)• Concept of producers and consumers

° Major types:• Single-producer, multiple-consumer• Multiple-producer, single-consumer• Multiple-producer, single-consumer

Page 16: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Message Passing Grid Solver

° Cannot declare A to be global shared array• compose it logically from per-process private arrays• usually allocated in accordance with the assignment of work

- process assigned a set of rows allocates them locally

° Transfers of entire rows between traversals ° Structurally similar to shared memory° Orchestration different

• data structures and data access/naming• communication• synchronization

° Ghost rows

Page 17: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Data Layout and Orchestration

P0

P1

P2

P4

P0

P2

P4

P1

Data partition allocated per processor

Add ghost rows to hold boundary data

Send edges to neighbors

Receive into ghost rows

Compute as in sequential program

Page 18: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Notes on Message Passing Program

° Use of ghost rows° Communication done at beginning of iteration, so no

asynchrony° Communication in whole rows, not element at a time° Core similar, but indices/bounds in local rather than global

space° Synchronization through sends and receives

• Could implement locks and barriers with messages

Page 19: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Send and Receive Alternatives

• Affect event synch (mutual excl. by fiat: only one process touches data)• Affect ease of programming and performance

° Synchronous messages provide built-in synch. through match

Can extend functionality: stride, scatter-gather, groups

Semantic flavors: based on when control is returnedAffect when data structures or buffers can be reused at either end Send/Receive

Synchronous Asynchronous

Blocking asynch. Nonblocking asynch.

Page 20: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Orchestration: Summary

° Shared address space• Shared and private data explicitly separate• Communication implicit in access patterns• No correctness need for data distribution• Synchronization via atomic operations on shared data• Synchronization explicit and distinct from data communication

° Message passing• Data distribution among local address spaces needed• No explicit shared structures (implicit in comm. patterns)• Communication is explicit• Synchronization implicit in communication (at least in synch.

case)

Page 21: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Correctness in Grid Solver Program

° Decomposition and Assignment similar in SAS and message-passing

° Orchestration is different• Data structures, data access/naming, communication,

synchronization• Performance?

SAS Msg-Passing

Explicit global data structure? Yes No

Assignment indept of data layout? Yes No

Communication Implicit Explicit

Synchronization Explicit Implicit

Explicit replication of border rows? No Yes

Page 22: ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004

Summary

° Several techniques to parallelizing grid problems

° Specify as a series of difference relations

° Use of currently computer values can speed convergence

° Multigrid methods require specialized communication

° Understanding shared memory and message passing constraints for grid computation

• Remember: software and communication!