cs 770g - parallel algorithms in scientific computing ...cs770g/handout/mpi1.pdfcs 770g - parallel...

16
CS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

Upload: others

Post on 24-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

CS 770G - Parallel Algorithms in Scientific Computing

May 9 , 2001Lecture 2

Message-Passing I:Communication

Page 2: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

2

References

• Parallel Computer Architecture: A Hardware / Software Approach

Culler, Singh, Gupta, Morgan Kaufmann

• Introduction to Parallel Computing: Design and Analysis of Algorithms

Kumar, Grama, Gupta, Karypis, Benjamin Cummings

Page 3: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

3

Routing Mechanism for Static Networks

• Routing mechanism determines the path a message takes through the network to get from the source to the destination proc.

• Minimal routing: select one of the shortest path.• Nonminimal routing: may use a longer path to avoid

network congestion.• Deterministic routing: determine a unique path based

on source and destination.• Adaptive routing: determine a path to avoid network

conjestion.

Page 4: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

4

Examples of Deterministic Minimal Routing

• XY-routing for a 2D-mesh: along X-dimension and then Y-dimension.

• E-cube routing for a hypercube: based on Hamming distance and start with the least significant bit.

Page 5: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

5

Communication Cost• Latency

– Sum of time to prepare a message for transmission, and the time taken by the message to traverse the network to its destination.

• Principle parameters:– Startup time (ts): time required to handle a message (prepare

the message, execute the routing algorithm, establish an interface between proc & router).

– Per-hop time (th): time taken by the header of a message to travel between 2 connected procs.

– Per-word transfer time (tw): suppose channel bandwidth is rwords/s. Then tw = 1/r.

Page 6: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

6

Switching Techniques I

Store-and-forward routing:• When a message is traversing a path with multiple

links, each intermediate proc forwards the message after it has received the entire message.

• Suppose message size = m, number of links = l.• Total traversing time = (th + m tw) l.• Total communication time (tcomm):

ltmttt whscomm )( ++=

Page 7: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

7

Switching Techniques II

Cut-through routing:• A message travels in small units (packets) called flits

(flow-control digits).• As soon as a flit is received at an intermediate proc, it

is passed on to the next proc.• No need to have (large) buffer to store entire message.• Less memory bandwidth.• Deadlock may occur! Can be fixed by using, e.g. XY-

routing, E-cube routing.• whscomm tmtltt ++=

Page 8: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

8

Basic Communication Operations

• Point-to-point communication.

• One-to-all broadcast.

• All-to-all broadcast.

• One-to-all personalized.

• All-to-all personalized.

Page 9: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

9

Point-to-Point Communication• Sending a message from one proc to another.• Store-and-forward routing (tcomm = ts + tw m l ) Single message transfer time with p procs:

– Ring: ts + tw m p/2 .– Mesh: ts + 2 tw m √ p/2 .– Hypercube: ts + tw m log p .

• Cut-through routing (ts + tw m + th l ).

Page 10: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

10

One-to-All Broadcast• One proc sends messages to all (or a subset) of procs.• Reverse direction → all-to-one/collective comm.• Store-and-forward routing (tcomm = ts + tw m l ):• Ring

– each proc receives a message on one of its link and passes it on to its other link.

– time = (ts + tw m) p/2 .

• Mesh– ring broadcast along rows, and then along columns.– time = 2 (ts + tw m)√ p/2 .

Page 11: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

11

One-to-All Broadcast (cont.)• Hypercube

– message is sent in each dimension at a time.– time = (ts + tw m) log p .

• Note: (under certain assumptions,) one-to-all broadcast cannot be performed in less than (ts + tw m) log p. One reason is that, on a hypercube, every opportunity to send a message is used.

Page 12: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

12

All-to-All Broadcast• All p procs simultaneously initiate a broadcast.• Reverse direction → reduction communication.• Store-and-forward routing (tcomm = ts + tw m l ):• Ring

– Each proc first sends to one of its neighbors the data it needs to broadcast.

– Then, it forwards the data received from one of its neighbors to its other neighbor.

– Time = (ts + tw m) (p-1) .

Page 13: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

13

All-to-All Broadcast (cont.)• Mesh

– First phase: all-to-all ring broadcast along the rows.– time_x = (ts + tw m) (√ p - 1 ).– Second phase: all-to-all ring broadcast along the columns.– time_y = (ts + tw m √ p ) (√ p - 1 ).– Total time = 2 ts (√ p - 1 )+ tw m ( p - 1) .

Page 14: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

14

All-to-All Broadcast (cont.)• Hypercube

– log p steps.– In every step, pairs of procs exchange their data and double

the size of the message to be transmitted in the next step.– time_i = ts + tw m 2i-1.– Total time = ts log p + tw m ( p - 1 ) .

Page 15: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

15

One-to-All Personalized Comm.• Scatter.• A single proc sends a unique message to every other proc.• Reverse direction → gather communication.• Similar to all-to-all broadcast.• In all-to-all broadcast, each proc receives m(p-1) words.

In one-to-all personalized comm, the source proc sends mwords for each of the other p-1 procs.

Page 16: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi1.pdfCS 770G - Parallel Algorithms in Scientific Computing May 9 , 2001 Lecture 2 Message-Passing I: Communication

16

All-to-All Personalized Comm.• All scatter.• Each proc sends a unique message to every other proc.• Use in parallel fast Fourier transform, matrix transpose,

etc.