astrophysical na single processor and memory. main memory processor instructions (to processor) data...

268
1 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen Prentice Hall, 1998 Figure 1.1 Astrophysical N-body simulation by Scott Linssen (undergraduate University of North Carolina at Charlotte [UNCC] student).

Upload: others

Post on 22-Jul-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

1Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.1 Astrophysical N-bodysimulation by Scott Linssen (undergraduateUniversity of North Carolina at Charlotte[UNCC] student).

Page 2: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

2Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.2 Conventional computer havinga single processor and memory.

Main memory

Processor

Instructions (to processor)Data (to or from processor)

Page 3: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

3Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.3 Traditional shared memorymultiprocessor model.Processors

Interconnectionnetwork

Memory modulesOneaddressspace

Page 4: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

4Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Processor

Interconnectionnetwork

Local

Computers

Messages

Figure 1.4 Message-passingmultiprocessor model (multicomputer).

memory

Page 5: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

5Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Processor

Interconnectionnetwork

Shared

Computers

Messages

Figure 1.5 Shared memory multiprocessorimplementation.

memory

Page 6: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

6Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.6 MPMD structure.

Program

Processor

Data

Program

Processor

Data

InstructionsInstructions

Page 7: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

7Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P M

C

P M

C

P M

C

Figure 1.7 Static link multicomputer.

Computers

Network with direct linksbetween computers

Page 8: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

8Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Linksto other

nodes

Switch

Processor Memory

Computer (node)

Linksto othernodes

Figure 1.8 Node with a switch for internode message transfers.

Page 9: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

9Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Link

Figure 1.9 A link between two nodes withseparate wires in each direction.

NodeNode

Page 10: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

10Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.10 Ring.

Page 11: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

11Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.11 Two-dimensional array(mesh).

LinksComputer/processor

Page 12: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

12Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.12 Tree structure.

Processingelement

Root

Links

Page 13: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

13Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.13 Three-dimensional hypercube.000 001

010 011

100

110

101

111

Page 14: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

14Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0000 0001

0010 0011

0100

0110

0101

0111

1000 1001

1010 1011

1100

1110

1101

1111

Figure 1.14 Four-dimensional hypercube.

Page 15: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

15Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.15 Embedding a ring onto a torus.

Ring

Page 16: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

16Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.16 Embedding a mesh into ahypercube.

00

01

11

10

00 01 11 10yx

Nodal address1011

Page 17: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

17Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.17 Embedding a tree into a mesh.

Root

A

A

A

A

A

A

Page 18: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

18Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

HeadPacket

Request/Acknowledge

signal(s)

Figure 1.18 Distribution of flits.

Flit buffer

Movement

Page 19: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

19Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Data

R/A

Source Destinationprocessor processor

Figure 1.19 A signaling method betweenprocessors for wormhole routing (Ni andMcKinley, 1993).

Page 20: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

20Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Packet switching

Circuit switchingWormhole routing

Distance

Network

(number of nodes between source and destination)

latency

Figure 1.20 Network delay characteristics.

Page 21: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

21Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Messages

Node 1 Node 2

Node 3Node 4

Figure 1.21 Deadlock in store-and-forwardnetworks.

Page 22: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

22Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Physical link

Virtual channel

Route

buffer Node Node

Figure 1.22 Multiple virtual channels mapped onto a single physical channel.

Page 23: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

23Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Workstations Figure 1.23 Ethernet-type single wirenetwork.

Workstation/

Ethernet

file server

Page 24: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

24Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.24 Ethernet frame format.

Preamble

(64 bits)

Destinationaddress(48 bits)

Sourceaddress(48 bits)

Type

(16 bits)

Data

(variable)

Frame checksequence(32 bits)

Direction

Page 25: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

25Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Network

Workstation/

Workstations

Figure 1.25 Network of workstations connected via a ring.

file server

Page 26: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

26Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Workstation/file server

Workstations

Figure 1.26 Star connected network.

Page 27: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

27Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.27 Overlapping connectivity Ethernets.

(a) Using specially designed adaptors

(b) Using separate Ethernet interfaces

Parallel programming cluster

Page 28: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

28Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

Process 1

Process 2

Process 3

Process 4

Waiting to send a message

Figure 1.28 Space-time diagram of a message-passing program.

Message

Computing

Slope indicating timeto send message

Page 29: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

29Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Serial section Parallelizable sections

(a) One processor

(b) Multipleprocessors

fts (1 − f)ts

ts

(1 − f)ts/n

Figure 1.29 Parallelizing sequential problem — Amdahl’s law.

tp

n processors

Page 30: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

30Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.30 (a) Speedup against number of processors. (b) Speedup against serial fraction, f.

4

8

12

16

20

0.2 0.4 0.6 0.8 1.0

Spee

dup

fact

or,S

(n)

Serial fraction, f

(b)

n = 256

n = 164

8

12

16

20

4 8 12 16 20

f = 20%

f = 10%

f = 5%

f = 0%

Spee

dup

fact

or,S

(n)

Number of processors, n

(a)

Page 31: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

31Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Sourcefile

Executables

Processor 0 Processor n − 1Figure 2.1 Single program, multiple dataoperation.

Compile to suitprocessor

Page 32: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

32Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process 1

Process 2spawn();

Figure 2.2 Spawning a process.

Time

Start executionof process 2

Page 33: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

33Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.3 Passing a message betweenprocesses using send() and recv()library calls.

Process 1 Process 2

send(&x, 2);

recv(&y, 1);

x y

Movementof data

Page 34: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

34Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.4 Synchronous send() and recv() library calls using a three-way protocol.

Process 1 Process 2

send();

recv();Suspend

Time

processAcknowledgment

MessageBoth processescontinue

(a) When send() occurs before recv()

Process 1 Process 2

recv();

send();Suspend

Time

process

Acknowledgment

MessageBoth processescontinue

(b) When recv() occurs before send()

Request to send

Request to send

Page 35: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

35Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.5 Using a message buffer.

Process 1 Process 2

send();

recv();

Message buffer

Readmessage buffer

Continueprocess

Time

Page 36: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

36Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

bcast();

buf

bcast();

data

bcast();

datadata

Figure 2.6 Broadcast operation.

Process 0 Process n − 1Process 1

Action

Code

Page 37: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

37Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

scatter();

buf

scatter();

data

scatter();

datadata

Figure 2.7 Scatter operation.

Process 0 Process n − 1Process 1

Action

Code

Page 38: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

38Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.8 Gather operation.

gather();

buf

gather();

data

gather();

datadata

Process 0 Process n − 1Process 1

Action

Code

Page 39: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

39Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.9 Reduce operation (addition).

reduce();

buf

reduce();

data

reduce();

datadata

Process 0 Process n − 1Process 1

+

Action

Code

Page 40: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

40Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.10 Message passing between workstations using PVM.

PVM

Application

daemon

program

Workstation

PVMdaemon

Applicationprogram

Applicationprogram

PVMdaemon

Workstation

Workstation

Messagessent throughnetwork

(executable)

(executable)

(executable)

Page 41: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

41Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.11 Multiple processes allocated to each processor (workstation).

Workstation

Applicationprogram

PVMdaemon Workstation

Workstation

Messagessent throughnetwork

(executable)

PVMdaemon

PVMdaemon

Page 42: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

42Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.12 pvm_psend() and pvm_precv() system calls.

Process 1 Process 2

pvm_psend();

pvm_precv();Continueprocess

Wait for message

Pack

Send bufferArray Array toholdingdata

receivedata

Page 43: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

43Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

pvm_pkint( … &x …);pvm_pkstr( … &s …);pvm_pkfloat( … &y …);pvm_send(process_2 … ); pvm_recv(process_1 …);

pvm_upkint( … &x …);pvm_upkstr( … &s …);pvm_upkfloat(… &y … );

Send

Receivebuffer

buffer

xsy

Process_1 Process_2

Figure 2.13 PVM packing messages, sending, and unpacking.

Message

pvm_initsend();

Page 44: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

44Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.14 Sample PVM program.

#include <stdio.h>#include <stdlib.h>#include <pvm3.h>#define SLAVE “spsum”#define PROC 10#define NELEM 1000main() {

int mytid,tids[PROC];int n = NELEM, nproc = PROC;int no, i, who, msgtype;int data[NELEM],result[PROC],tot=0;char fn[255];FILE *fp;mytid=pvm_mytid();/*Enroll in PVM */

/* Start Slave Tasks */no= pvm_spawn(SLAVE,(char**)0,0,““,nproc,tids);if (no < nproc) {

printf(“Trouble spawning slaves \n”);for (i=0; i<no; i++) pvm_kill(tids[i]);pvm_exit(); exit(1);

}

/* Open Input File and Initialize Data */strcpy(fn,getenv(“HOME”));strcat(fn,”/pvm3/src/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {

printf(“Can’t open input file %s\n”,fn);exit(1);

}for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]);

/* Broadcast data To slaves*/pvm_initsend(PvmDataDefault);msgtype = 0;pvm_pkint(&nproc, 1, 1);pvm_pkint(tids, nproc, 1);pvm_pkint(&n, 1, 1);pvm_pkint(data, n, 1);pvm_mcast(tids, nproc, msgtag);

/* Get results from Slaves*/msgtype = 5;for (i=0; i<nproc; i++){

pvm_recv(-1, msgtype);pvm_upkint(&who, 1, 1);pvm_upkint(&result[who], 1, 1);printf(“%d from %d\n”,result[who],who);

}

/* Compute global sum */for (i=0; i<nproc; i++) tot += result[i];printf (“The total is %d.\n\n”, tot);

pvm_exit(); /* Program finished. Exit PVM */ return(0);

#include <stdio.h>#include “pvm3.h”#define PROC 10#define NELEM 1000

main() {int mytid;int tids[PROC];int n, me, i, msgtype;int x, nproc, master;int data[NELEM], sum;

mytid = pvm_mytid();

/* Receive data from master */msgtype = 0;pvm_recv(-1, msgtype);pvm_upkint(&nproc, 1, 1);pvm_upkint(tids, nproc, 1);pvm_upkint(&n, 1, 1);pvm_upkint(data, n, 1);

/* Determine my tid */for (i=0; i<nproc; i++)

if(mytid==tids[i]){me = i;break;}

/* Add my portion Of data */x = n/nproc;low = me * x;high = low + x;for(i = low; i < high; i++)

sum += data[i];

/* Send result to master */pvm_initsend(PvmDataDefault);pvm_pkint(&me, 1, 1);pvm_pkint(&sum, 1, 1);msgtype = 5;master = pvm_parent();pvm_send(master, msgtype);

/* Exit PVM */pvm_exit();return(0);

}

Master

Slave

Broadcast data

Receive results

Page 45: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

45Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.15 Unsafe message passing with libraries.

lib()

lib()

send(…,1,…);

recv(…,0,…);

Process 0 Process 1

send(…,1,…);

recv(…,0,…);

(a) Intended behavior

(b) Possible behavior

lib()

lib()

send(…,1,…);

recv(…,0,…);

Process 0 Process 1

send(…,1,…);

recv(…,0,…);

Destination

Source

Page 46: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

46Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.16 Sample MPI program.

#include “mpi.h”#include <stdio.h>#include <math.h>#define MAXSIZE 1000

void main(int argc, char *argv){

int myid, numprocs;int data[MAXSIZE], i, x, low, high, myresult, result;char fn[255];char *fp;

MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);

if (myid == 0) { /* Open input file and initialize data */strcpy(fn,getenv(“HOME”));strcat(fn,”/MPI/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {

printf(“Can’t open the input file: %s\n\n”, fn);exit(1);

}for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);

}

/* broadcast data */MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);

/* Add my portion Of data */x = n/nproc;low = myid * x;high = low + x;for(i = low; i < high; i++)

myresult += data[i];printf(“I got %d from %d\n”, myresult, myid);

/* Compute global sum */MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);if (myid == 0) printf(“The sum is %d.\n”, result);

MPI_Finalize();}

Page 47: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

47Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Tim

e

Number of data items (n)

Startup time

Figure 2.17 Theoretical communicationtime.

Page 48: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

48Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

160

140

120

100

80

60

40

20

01 2 3 4 50

x0

c2g(x) = 6x2

c1g(x) = 2x2

f(x) = 4x2 + 2x + 12

Figure 2.18 Growth of function f(x) = 4x2 + 2x + 12.

Page 49: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

49Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.19 Broadcast in a three-dimensional hypercube.

000 001

010 011

100

110

101

111

1st step

2nd step

3rd step

Page 50: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

50Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.20 Broadcast as a tree construction.

P000

P000

P010P000

P000

P010P100 P110 P001 P101 P011

P001

P111

P001 P011

Step 1

Step 2

Step 3

Message

Page 51: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

51Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.21 Broadcast in a mesh.

1 2 3

4

5 6

2

3

4

5

6

3 4

5

4

Steps

Page 52: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

52Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Source Destinations

Message

Figure 2.22 Broadcast on an Ethernetnetwork.

Page 53: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

53Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.23 1-to-N fan-out broadcast.

Source

N destinations

Sequential

Page 54: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

54Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Source

Sequential message issue

DestinationsFigure 2.24 1-to-N fan-out broadcast on atree structure.

Page 55: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

55Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process 1

Process 2

Process 3

Time

Computing

Waiting

Message-passing system routine

Message

Figure 2.25 Space-time diagram of a parallel program.

Page 56: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

56Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Statement number or regions of program1 2 3 4 5 6 7 8 9 10

Num

ber

of r

epet

ition

s or

tim

e

Figure 2.26 Program profile.

Page 57: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

57Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Processes

Results

Input data

Figure 3.1 Disconnected computationalgraph (embarrassingly parallel problem).

Page 58: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

58Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 3.2 Practical embarrassingly parallel computational graph with dynamic processcreation and the master-slave approach.

Send initial data

Collect results

MasterSlaves

spawn()

recv()

send()

recv()send()

Page 59: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

59Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

640

480

80

80

640

480

10

(a) Square region for each process

(b) Row region for each process

Figure 3.3 Partitioning into regions for individual processes.

Process

Map

Process

Map

x

y

Page 60: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

60Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Real

Figure 3.4 Mandelbrot set.

+2−2 0

+2

−2

0

Imaginary

Page 61: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

61Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Work pool

(xc, yc)(xa, ya)

(xd, yd)(xb, yb)

(xe, ye)

Figure 3.5 Work pool approach.

Task

Return results/request new task

Page 62: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

62Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0 disp_height

Row returned

Row sent

Increment

Decrement

Rows outstanding in slaves (count)

Figure 3.6 Counter termination.Terminate

Page 63: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

63Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 3.7 Computing π by a Monte Carlomethod.

Area = π

Total area = 4

2

2

Page 64: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

64Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x

y 1 x2–=1

f(x)

Figure 3.8 Function being integrated incomputing π by a Monte Carlo method.1

1

Page 65: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

65Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Master

Slaves

Random numberprocess

Randomnumber

Partial sum

Request

Figure 3.9 Parallel Monte Carlointegration.

Page 66: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

66Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x1 x2 xk-1 xk xk+1 xk+2 x2k-1 x2k

Figure 3.10 Parallel computation of a sequence.

Page 67: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

67Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.1 Partitioning a sequence of numbers into parts and adding the parts.

Sum

x0 … x(n/m)−1 xn/m … x(2n/m)−1 x(m−1)n/m … xn−1…

Partial sums

+ +

+

+

Page 68: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

68Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.2 Tree construction.

Initial problem

Divide

Final tasks

problem

Page 69: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

69Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.3 Dividing a list into parts.

P0 P1 P2 P3 P4 P5 P6 P7

P0

P0

P0 P2 P4 P6

P4

Original list

x0 xn−1

Page 70: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

70Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.4 Partial summation.

P0 P1 P2 P3 P4 P5 P6 P7

P0

P0

P0 P2 P4 P6

P4

Final sum

x0 xn−1

Page 71: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

71Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

OR

OROR

Found/Not found

Figure 4.5 Part of a search tree.

Page 72: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

72Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.6 Quadtree.

Page 73: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

73Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Image area

First division

Second division

into four parts

Figure 4.7 Dividing an image.

Page 74: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

74Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Unsorted numbers

Sorted numbers

Buckets

Figure 4.8 Bucket sort.

Sortcontentsof buckets

Merge lists

Page 75: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

75Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Unsorted numbers

Sort

Figure 4.9 One parallel version of bucket sort.

Buckets

contentsof buckets

Merge lists

p processors

Sorted numbers

Page 76: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

76Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Unsorted numbers

Sort

Large

Figure 4.10 Parallel version of bucket sort.

Smallbuckets

Emptysmallbuckets

buckets

contentsof buckets

Merge lists

p processors

n/m numbers

Sorted numbers

Page 77: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

77Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Send Receive

Send

Process 1 Process n − 1

Process 0 Process n − 1

Process 0 Process n − 2

0 n − 1 0 n − 1 0 n − 1 0 n − 1

Figure 4.11 “All-to-all” broadcast.

buffer buffer

buffer

Page 78: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

78Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

A0,0 A0,1 A0,2 A0,3

A1,0 A1,1 A1,2 A1,3

A3,0 A3,1 A3,2 A3,3

A2,0 A2,1 A2,2 A2,3

A0,0 A1,0 A2,0 A3,0

A0,1 A1,1 A2,1 A3,1

A0,3 A1,3 A2,3 A3,3

A0,2 A1,2 A2,2 A3,2

P0

P1

P2

P3

“All-to-all”

Figure 4.12 Effect of “all-to-all” on anarray.

Page 79: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

79Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.13 Numerical integration usingrectangles.

f(q)f(p)

δ

f(x)

xp qa b

Page 80: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

80Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

f(q)f(p)

δFigure 4.14 More accurate numericalintegration using rectangles.

f(x)

xp qa b

Page 81: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

81Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.15 Numerical integration usingthe trapezoidal method.

f(q)f(p)

δ

f(x)

xp qa b

Page 82: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

82Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.16 Adaptive quadratureconstruction.

A B

Cf(x)

x

Page 83: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

83Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.17 Adaptive quadrature with falsetermination.

f(x)

x

A B

C = 0

Page 84: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

84Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Distant cluster of bodiesr

Center of mass

Figure 4.18 Clustering distant bodies.

Page 85: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

85Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Subdivisiondirection

Figure 4.19 Recursive division of two-dimensional space.

Partial quadtreeParticles

Page 86: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

86Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.20 Orthogonal recursive bisectionmethod.

Page 87: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

87Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Binary Tree

Result

Figure 4.21 Process diagram for Problem 4-12(b).

log n numbers

Page 88: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

88Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

f(a)

f(b)

ab

y

x

f(x)

Figure 4.22 Bisection method for findingthe zero crossing location of a function.

Page 89: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

89Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.23 Convex hull (Problem 4-22).

Page 90: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

90Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4 P5

Figure 5.1 Pipelined processes.

Page 91: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

91Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

sum

a[0] a[1] a[2] a[3] a[4]

soutsin

Figure 5.2 Pipeline for an unfolded loop.

soutsin soutsin soutsin soutsin

a aaaa

Page 92: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

92Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

f(t) foutfin

Figure 5.3 Pipeline for a frequency filter.

foutfin foutfin foutfin foutfin

f0 f4f3f2f1 Filtered signal

Signal withoutfrequency f0

Signal withoutfrequency f1

Signal withoutfrequency f2

Signal withoutfrequency f3

Page 93: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

93Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P4

P3

P5

P2

P1Instance

1

Instance1

Instance1

Instance1

Instance1

Instance1

Instance2

Instance2

Instance2

Instance2

Instance2

Instance2

Instance4

Instance3

Instance3

Instance3

Instance3

Instance3

Instance3

Instance4

Instance4

Instance4

Instance4

Instance4

Instance5

Instance5

Instance5

Instance5

Instance5

Instance6

Instance5

Instance6

Instance6

Instance6

Instance6

Instance7

Instance7

Instance7

Instance7

Time

Figure 5.4 Space-time diagram of a pipeline.

p − 1 m

Page 94: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

94Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4 P5

P0 P1 P2 P3 P4 P5

P0 P1 P2 P3 P4 P5

P0 P1 P2 P3 P4 P5

P0 P1 P2 P3 P4 P5

Time

Instance 1

Instance 2

Instance 3

Instance 0

Instance 4

Figure 5.5 Alternative space-time diagram.

Page 95: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

95Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P4

P3

P5

P2

P1

Time

Figure 5.6 Pipeline processing 10 data elements.

d9d8d7d6d5d4d3d2d1d0 P0 P1 P2 P3 P4 P5

(a) Pipeline structure

(b) Timing diagram

P8

P7

P9

P6

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

P7P6 P8 P9

Input sequence

p − 1 n

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8

d0 d1 d2 d3 d4 d5 d6 d7

d0 d1 d2 d3 d4 d5 d6

Page 96: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

96Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

P0

P1

P2

P3

P4

P5

(a) Processes with the same (b) Processes not with the

P0

P1

P2

P3

P4

P5

Time

Figure 5.7 Pipeline processing where information passes to next stage before end of process.

Informationtransfersufficient tostart nextprocess

same execution timeexecution time

Information passedto next stage

Page 97: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

97Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4 P5 P7P6 P8 P9 P11P10

Processor 1Processor 0 Processor 2

Figure 5.8 Partitioning processes onto processors.

Page 98: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

98Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Host

Multiprocessor

computer

Figure 5.9 Multiprocessor system with a line configuration.

Page 99: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

99Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P3P2P1 P4

Figure 5.10 Pipelined addition.

Σ1

5iΣ

1

i Σ1

2i Σ

1

3i Σ

1

4i

Page 100: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

100Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P2P1 Pn−1

Figure 5.11 Pipelined addition numbers with a master process and ring configuration.

dn−1… d2d1d0

Master process

Sum

Slaves

Page 101: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

101Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P2P1 Pn−1

Figure 5.12 Pipelined addition of numbers with direct access to slave processes.

Master process

Sum

Slaves dn−1d0 d1

Numbers

Page 102: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

102Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

5

4, 3, 1, 2, 5

4, 3, 1, 2

4, 3, 12

5

4, 3 5 21

4 53

21

54

32

5 43

5 4

1

21

32

1

5 4 3 21

5 4 3 2 1

Figure 5.13 Steps in insertion sort with five numbers.

P0 P2 P3 P4P1

Time

1

2

3

4

5

6

8

7

(cycles)

9

10

Page 103: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

103Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2

Largest number Next largestnumber

Series of numbersxn−1 … x1x0

Figure 5.14 Pipeline for sorting using insertion sort.

xmax

Compare

Smallernumbers

Page 104: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

104Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P2P1 Pn−1

Figure 5.15 Insertion sort with results returned to the master process using a bidirectional line configuration.

dn−1… d2d1d0Sorted sequence

Master process

Page 105: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

105Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P4

P3

P2

P1

Time

Figure 5.16 Insertion sort with results returned.

Sorting phase Returning sorted numbers

2n − 1 n

Shown for n = 5

Page 106: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

106Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2

1st prime 2nd prime

Series of numbersxn−1 … x1x0

Figure 5.17 Pipeline for sieve of Eratosthenes.

Compare

Not multiples of

3rd primemultiples number number number

1st prime number

Page 107: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

107Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0x0

x0 x0

x1x1 x1x2 x2

x3

Figure 5.18 Solving an upper triangular set of linear equation using a pipeline.

Compute x0 Compute x1 Compute x2 Compute x3

P0 P1 P2 P3

Page 108: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

108Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

P0

P1

P2

P3

P4

P5

Figure 5.19 Pipeline processing using backsubstitution.

ProcessesFinal computed value

First value passed onward

Page 109: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

109Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4

dividesend(x0) ⇒ recv(x0)end send(x0) ⇒ recv(x0)

multiply/add send(x0) ⇒ recv(x0)divide/subtract multiply/add send(x0) ⇒ recv(x0)send(x1) ⇒ recv(x1) multiply/add send(x1) ⇒end send(x1) ⇒ recv(x1) multiply/add

multiply/add send(x1) ⇒ recv(x1)divide/subtract multiply/add send(x1) ⇒send(x2) ⇒ recv(x2) multiply/addend send(x2) ⇒ recv(x2)

multiply/add send(x2) ⇒divide/subtract multiply/addsend(x3) ⇒ recv(x3)end send(x3) ⇒

multiply/adddivide/subtractsend(x4) ⇒end

Figure 5.20 Operations in back substitution pipeline.

Time

Page 110: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

110Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x1 x2

x

x3 x4

yin yout

a

x

yin yout

a

x

yin yout

a

x

yin yout

a

a1 a2 a3 a4

y4y3y2y1 Output

Figure 5.21 Pipeline for Problem 5-9.

Page 111: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

111Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Display

Pipeline

Audio input(digitized)

Figure 5.22 Audio histogram display.

Display

Audio input(digitized)

(a) Pipeline solution (b) Direct decomposition

Page 112: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

112Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 Pn−1

Processes

Barrier

Figure 6.1 Processes reaching the barrier atdifferent times.

Time

Active

Waiting

Page 113: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

113Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

Processes

Figure 6.2 Library call barriers.

Barrier();

P1

Barrier();

Pn−1

Barrier();

Processes wait untilall reach theirbarrier call

Page 114: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

114Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

Processes

Figure 6.3 Barrier using a centralized counter.

Barrier();

P1

Barrier();

Pn−1

Barrier();

Counter, C

Incrementand check for n

Page 115: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

115Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

for(i=0;i<n;i++)recv(Pany);

for(i=0;i<n;i++)send(Pi);

Master

Figure 6.4 Barrier implementation in a message-passing system.

ArrivalphaseDeparturephase

send(Pmaster);recv(Pmaster);

Barrier:

send(Pmaster);recv(Pmaster);

Barrier:

Slave processes

Page 116: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

116Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4 P5 P6 P7

Arrivalat barrier

Departurefrom barrier

Figure 6.5 Tree barrier.

Sychronizingmessage

Page 117: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

117Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1st stage

2nd stage

3rd stage

P0 P1 P2 P3 P4 P5 P6 P7

Time

Figure 6.6 Butterfly construction.

Page 118: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

118Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a[0]=a[0]+k; a[n-1]=a[n-1]+k;a[1]=a[1]+k;

Instructiona[] = a[] + k;

a[0] a[n-1]a[1]

Figure 6.7 Data parallel computation.

Processors

Page 119: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

119Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Σi=0

0

Σi=0

1

Σi=0

2

Σi=0

3

Σi=0

4

Σi=0

5

Σi=0

6

Σi=0

7

Σi=0

8

Σi=0

9

Σi=0

10

Σi=0

11

Σi=0

12

Σi=0

15

Σi=0

14

Σi=0

13

Σi=0

0

Σi=0

1

Σi=1

2

Σi=2

3

Σi=3

4

Σi=4

5

Σi=5

6

Σi=6

7

Σi=7

8

Σi=8

9

Σi=9

10

Σi=10

11

Σi=11

12

Σi=14

15

Σi=13

14

Σi=12

13

Σi=0

0

Σi=0

1

Σi=0

2

Σi=0

3

Σi=1

4

Σi=2

5

Σi=3

6

Σi=4

7

Σi=5

8

Σi=6

9

Σi=7

10

Σi=8

11

Σi=9

12

Σi=12

15

Σi=11

14

Σi=10

13

Σi=0

0

Σi=0

1

Σi=0

2

Σi=0

3

Σi=0

4

Σi=0

5

Σi=0

6

Σi=0

7

Σi=1

8

Σi=2

9

Σi=3

10

Σi=4

11

Σi=5

12

Σi=8

15

Σi=7

14

Σi=6

13

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15

Figure 6.8 Data parallel prefix sum operation.

Numbers

Step 1

Step 2

Step 3

Final step

Add

Add

Add

Add

(j = 0)

(j = 1)

(j = 2)

(j = 3)

Page 120: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

120Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Computedvalue

Error

Iteration

Exact value

Figure 6.9 Convergence rate.t+1t

Page 121: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

121Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 6.10 Allgather operation.

Allgather(); Allgather();

data

Allgather();

datadata

Process 0 Process n − 1Process 1

Send

Receive

buffer

buffer

xn−1x0 x1

Page 122: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

122Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

3228242016128400

1 × 106

2 × 106

Figure 6.11 Effects of computation and communication in Jacobi iteration.

OverallCommunication

Computation

Execution

Number of processors, p

time(τ = 1)

Page 123: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

123Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

hi,j

hi−1,j

hi,j−1 hi,j+1

hi+1,j

j

iMetal plate

Figure 6.12 Heat distribution problem.

Enlarged

Page 124: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

124Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

xi−1

xi

xi+1

xi+k

xi−k

x1 x2 xk−1

xk+1 xk+2

xk

x2k−1

xk2

x2k

Figure 6.13 Natural ordering of heatdistribution problem.

Page 125: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

125Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

Figure 6.14 Message passing for heat distribution problem.

i

j

column

row

Page 126: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

126Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P1

P1

P0

Pp−1

Pp−1

Figure 6.15 Partitioning heat distribution problem.

Blocks Strips (columns)

Page 127: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

127Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Square blocks

Strips

n

np---

Figure 6.16 Communication consequences of partitioning.

Page 128: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

128Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

10001001010

1000

2000

Strip partition best

Block partition best

tstartup

Processors, pFigure 6.17 Startup times for block andstrip partitions.

Page 129: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

129Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Ghost points

Process i

Process i+1

One rowof points

Array heldby process i

Array heldby process i+1

Figure 6.18 Configurating array into contiguous rows for each process, with ghost points.

Copy

Page 130: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

130Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

20°C100°C

10ft

10ft

4ft

Figure 6.19 Room for Problem 6-14.

Page 131: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

131Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

vehicle

Figure 6.20 Road junction forProblem 6-16.

Page 132: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

132Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Airflow

Figure 6.21 Figure for Problem 6-23.

Actual dimensionsselected at will

Page 133: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

133Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P4

P5

P0

P1

P2

P3

P4

P5

P2P1P0

P3

Time

(b) Perfect load balancing

(a) Imperfect load balancing leading

t

Figure 7.1 Load balancing.

to increased execution time

Processors

Processors

Page 134: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

134Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

QueueWork pool

Slave “worker” processes

Masterprocess

Figure 7.2 Centralized work pool.

Tasks

Request task

Send task

(and possiblysubmit new tasks)

Page 135: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

135Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process M0 Process Mn−1

Master, Pmaster

Slaves

Initial tasks

Figure 7.3 A distributed work pool.

Page 136: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

136Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process

Requests/tasks

ProcessProcess

Process

Figure 7.4 Decentralized work pool.

Page 137: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

137Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 7.5 Decentralized selection algorithm requesting tasks between slaves.

RequestsSlave Pi

Localselectionalgorithm

RequestsSlave Pj

Localselectionalgorithm

Page 138: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

138Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Masterprocess

P1 P2 P3 Pn−1

P0

Figure 7.6 Load balancing using a pipeline structure.

Page 139: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

139Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

If buffer empty,make request

Receive taskfrom request

If free,requesttask

Receivetask fromrequest

If buffer full,send task

Request for task

Figure 7.7 Using a communication process in line load balancing.

Ptask

Pcomm

Page 140: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

140Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P1

P3

P2

P6P4P5

Figure 7.8 Load balancing using a tree.

Taskwhenrequested

Page 141: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

141Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Inactive

Active

Parent

First task

Other processes

Finalacknowledgment

Process

TaskAcknowledgment

Figure 7.9 Termination using messageacknowledgments.

Page 142: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

142Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P2P1 Pn−1

Token passed to next processor

Figure 7.10 Ring termination detection algorithm.

when reached local termination condition

Page 143: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

143Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Terminated

Token

AND

Figure 7.11 Process algorithm for localtermination.

Page 144: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

144Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 PiPj Pn−1

Figure 7.12 Passing task to previous processes.

Task

Page 145: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

145Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Terminated

AND

Terminated

AND Terminated

AND

Figure 7.13 Tree termination.

Page 146: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

146Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Base camp

Summit

Possible intermediate camps

B

C

A

Figure 7.14 Climbing a mountain.

F

E

D

Page 147: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

147Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 7.15 Graph of mountain climb.

A B C

D

E

F

10

13

17

51

8

24

9

14

Page 148: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

148Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

A

B

C

D

E

F

A B C D E F

10

13

17

518 24

9

∞ ∞ ∞ ∞ ∞

∞ ∞

∞∞

∞ ∞ ∞ ∞

∞14Source

Destination

A

B

C

D

E

F

Source

Weight NULL

10

8 13 24 51C D E F

14D

9E

17F

(a) Adjacency matrix

(b) Adjacency list

Figure 7.16 Representing a graph.

B

Page 149: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

149Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Vertex i

Vertex j

wi , j

dj

di

Figure 7.17 Moore’s shortest-path algo-rithm.

Page 150: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

150Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Start at

w[]

dist Process C

Process A

Master process

Figure 7.18 Distributed graph search.

Vertex

sourcevertex

w[]

dist

Vertex

dist

Process B

Newdistance

Newdistance

w[]Vertex

Other processes

Page 151: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

151Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Entrance

Exit

Search path

Figure 7.19 Sample maze for Problem 7-9.

Page 152: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

152Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Gold

Entrance

Figure 7.20 Plan of rooms for Problem 7-10.

Page 153: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

153Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Door

Room A

Room B

Figure 7.21 Graph representation forProblem 7-10.

Page 154: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

154Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Processors Memory modulesFigure 8.1 Shared memory multiprocessorusing a single bus.

Bus

Cache

Page 155: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

155Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a. Brinch Hansen, P. (1975), “The Programming Language Concurrent Pascal,” IEEE Trans. Software Eng.,Vol. 1, No. 2 (June), pp. 199–207.

b. U.S. Department of Defense (1981), “The Programming Language Ada Reference Manual,” LectureNotes in Computer Science, No. 106, Springer-Verlag, Berlin.

c. Bräunl, T., R. Norz (1992), Modula-P User Manual, Computer Science Report, No. 5/92 (August), Univ.Stuttgart, Germany.

d. Thinking Machines Corp. (1990), C* Programming Guide, Version 6, Thinking Machines System Docu-mentation.

e. Gehani, N., and W. D. Roome (1989), The Concurrent C Programming Language, Silicon Press, NewJersey.

f. Fox, G., S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu (1990), Fortran DLanguage Specification, Technical Report TR90-141, Dept. of Computer Science, Rice University.

TABLE 8.1 SOME EARLY PARALLEL PROGRAMMING LANGUAGES

Language Originator/date Comments

Concurrent Pascal Brinch Hansen, 1975a Extension to Pascal

Ada U.S. Dept. of Defense, 1979b Completely new language

Modula-P Bräunl, 1986c Extension to Modula 2

C* Thinking Machines, 1987d Extension to C for SIMD systems

Concurrent C Gehani and Roome, 1989e Extension to C

Fortran D Fox et al., 1990f Extension to Fortran for data parallel programming

Page 156: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

156Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 8.2 FORK-JOIN construct.

Main program

FORK

FORK

FORK

JOIN

JOIN JOIN

JOIN

Spawned processes

Page 157: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

157Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

IP

Stack

Code Heap

Files

Interrupt routines

Code Heap

Files

Interrupt routines

IP

Stack

IP

Stack

Thread

Thread

(a) Process

(b) ThreadsFigure 8.3 Differences between a processand threads.

Page 158: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

158Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Main program

pthread_create(&thread1, NULL, proc1, &arg);

pthread_join(thread1, *status);

proc1(&arg)

return(*status);

{

}

Figure 8.4 pthread_create() and pthread_join().

thread1

Page 159: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

159Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Main program

Figure 8.5 Detached threads.

Thread

pthread_create();

pthread_create();

pthread_create(); Termination

Thread

Thread

Termination

Termination

Page 160: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

160Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

+1 +1

Shared variable, x

Read

Write Write

Read

Process 1 Process 2Figure 8.6 Conflict in accessing sharedvariable.

Page 161: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

161Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process 1 Process 2

while (lock == 1) do_nothing;lock = 1;

Critical section

lock = 0;

while (lock == 1)do_nothing;

lock = 1;

Critical section

lock = 0;

Figure 8.7 Control of critical sections through busy waiting.

Page 162: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

162Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 8.8 Deadlock (deadly embrace).

R1 R2

R1 R2 Rn −1 Rn

P1 P2

P1 P2 Pn −1 Pn

(a) Two-process deadlock

(b) n-process deadlock

Resource

Process

Page 163: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

163Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Block

Cache

Processor 1

Cache

Processor 2

Main memory

Block in cache

76543210

Addresstag

Figure 8.9 False sharing in caches.

Page 164: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

164Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Array a[]sum

addr

Figure 8.10 Shared memory locations for Section 8.4.1 program example.

Page 165: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

165Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Array a[]global_index

addr

sum

Figure 8.11 Shared memory locations for Section 8.4.2 program example.

Page 166: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

166Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Test1Test2

Test3

Output1

Output2

1 2

3Figure 8.12 Sample logic circuit.

TABLE 8.2 LOGIC CIRCUIT DESCRIPTION FOR FIGURE 8.12

Gate Function Input 1 Input 2 Output

1 AND Test1 Test2 Gate1

2 NOT Gate1 Output1

3 OR Test3 Gate1 Output2

Page 167: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

167Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Movement

River

Log

of logs

Figure 8.13 River and frog for Problem 8-23.

Frog

Page 168: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

168Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Master

Slaves

Pool of threads

Request

Signal

Requestserviced

Figure 8.14 Thread pool for Problem 8-24.

Page 169: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

169Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a[i] a[0] a[i] a[n-1]

Incrementcounter, x

b[x] = a[i] Figure 9.1 Finding the rank in parallel.

Compare

Page 170: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

170Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a[i] a[0] a[i] a[1] a[i] a[2] a[i] a[3]

Tree

Add

0/1 0/10/1 0/1

Add

0/1/2 0/1/2

Add

Figure 9.2 Parallelizing the rank computation.

0/1/2/3/4

Compare

Page 171: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

171Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 9.3 Rank sort using a master andslaves.

a[] b[]

Slaves

Master

Readnumbers

Place selectednumber

Page 172: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

172Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

A

P1

Compare

B

P2

Send(A)

If A > B send(B)

Figure 9.4 Compare and exchange on a message-passing system — Version 1.

If A > B load Aelse load B

else send(A)

1

3

2

Sequence of steps

Page 173: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

173Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Compare

A

P1

Compare

B

P2

Send(A)

Send(B)

Figure 9.5 Compare and exchange on a message-passing system — Version 2.

If A > B load AIf A > B load B

1

3

2

3

Page 174: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

174Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

43422825

88502825

Returnlowernumbers

98804342

88502825

43422825

98888050

Merge

Keephighernumbers

Figure 9.6 Merging two sublists — Version 1.

Originalnumbers

Finalnumbers

P1 P2

Page 175: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

175Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

88502825

98804342

43422825

98888050

Merge

Keeplowernumbers

88502825

98804342

43422825

98888050

Merge

Keephighernumbers

Figure 9.7 Merging two sublists — Version 2.

P1 P2

Originalnumbers

Originalnumbers

(final

(finalnumbers)

numbers)

Page 176: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

176Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

4 2 7 8 5 1 3 6

2 4 7 8 5 1 3 6

2 4 7 8 5 1 3 6

2 4 7 8 5 1 3 6

2 4 7 5 8 1 3 6

2 4 7 5 1 8 3 6

2 4 7 5 1 3 8 6

2 4 7 5 1 3 6 8

2 4 7 5 1 3 6 8

2 4 7 5 1 3 6 8

2 4 5 7 1 3 6 8

2 4 5 1 7 3 6 8

2 4 5 1 3 7 6 8

2 4 5 1 3 6 7 8

2 4 5 1 3 6 7 8

Figure 9.8 Steps in bubble sort.

Original

Phase 1

Phase 2

Phase 3

sequence: 4 2 7 8 5 1 3 6

Placelargestnumber

Placenextlargestnumber

Page 177: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

177Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1

1

1

12

2

3 2 1

Time

Figure 9.9 Overlapping bubble sort actions in a pipeline.

Phase 3

Phase 2

Phase 1

3 2 1

Phase 4

4 3 2 1

Page 178: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

178Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

4 2 7 5 1 68 3

2 4 7 1 5 68 3

2 4 7 8 3 61 5

2 4 1 3 8 67 5

2 1 4 7 5 63 8

1 2 3 5 7 84 6

1 2 3 5 6 84 7

1 2 3 5 6 84 7

Step

1

2

3

4

5

6

7

0

Figure 9.10 Odd-even transposition sort sorting eight numbers.

P0 P1 P2 P3 P4 P5 P6 P7

Time

Page 179: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

179Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Smallest

Largest

number

number Figure 9.11 Snakelike sorted list.

Page 180: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

180Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

4 14 8 2

10 3 13 16

7 15 1 5

12 6 11 9

2 4 8 14

16 13 10 3

1 5 7 15

12 11 9 6

1 4 7 3

2 5 8 6

12 11 9 14

16 13 10 15

1 3 4 7

8 6 5 2

9 11 12 14

16 15 13 10

1 3 4 2

8 6 5 7

9 11 12 10

16 15 13 14

1 2 3 4

8 7 6 5

9 10 11 12

16 15 14 13

(a) Original placement

Figure 9.12 Shearsort.

(b) Phase 1 — Row sort (c) Phase 2 — Column sort

(d) Phase 3 — Row sort (e) Phase 4 — Column sort (f) Final phase — Row sort

of numbers

Page 181: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

181Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(b) Transpose operation(a) Operations between elementsin rows

(c) Operations between elementsin rows (originally columns)

Figure 9.13 Using the transpose operation to maintain operations in rows.

Page 182: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

182Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

4 2 6

4 2 7 8 5 1 3 6

4 2 7 8 5 1 3 6

7 8 5 1 3

4 2 67 8 5 1 3

2 4 6

1 2 3 4 5 6 7 8

2 4 7 8 1 3 5 6

7 8 1 5 3

Sorted list

Unsorted list

Figure 9.14 Mergesort using tree allocation of processes.

Merge

Dividelist

P0

P2P0

P4 P5 P6 P7P1 P2 P3P0

P0

P6P4

P4

P0

P2P0

P0

P6P4

P4

Process allocation

Page 183: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

183Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P4

P6P1P0

2 1 6

4 2 7 8 5 1 3 6

3 2 1 4 5 7 8 6

3 4 5 7 8

1 2 7 86

Sorted list

Unsorted list

Figure 9.15 Quicksort using tree allocation of processes.

P0

P0

P7

P0

P6

P4

Process allocation

Pivot

3

P2

Page 184: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

184Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

862 6

1 2 6

4 2 7 8 5 1 3 6

3 2 1 5 7 8 6

7 8

Sorted list

Unsorted list

Figure 9.16 Quicksort showing pivot withheld in processes.

4

1

82

3

7

5

Pivots

Pivot

Page 185: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

185Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Work pool

Sublists

Slave processes

Requestsublist Return

sublistFigure 9.17 Work pool implementation ofquicksort.

Page 186: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

186Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(a) Phase 1 001 010 011 100 101 110 111000

001 010 011 100 101 110 111000(b) Phase 2

≤ p1 > p1

001 010 011 100 101 110 111000(c) Phase 3

> p2 > p3≤ p3≤ p2

> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4

Figure 9.18 Hypercube quicksort algorithm when the numbers are originally in node 000.

Page 187: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

187Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(a) Phase 1

Broadcast pivot, p1

001 010 011 100 101 110 111000

001 010 011 100 101 110 111000(b) Phase 2

≤ p1 > p1

Broadcast pivot, p3Broadcast pivot, p2

001 010 011 100 101 110 111000(c) Phase 3

Broadcastpivot, p4

Broadcastpivot, p5

Broadcastpivot, p6

Broadcastpivot, p7

> p2 > p3≤ p3≤ p2

> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4

Figure 9.19 Hypercube quicksort algorithm when numbers are distributed among nodes.

Page 188: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

188Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(a) Phase 1 communication

(b) Phase 2 communication

(c) Phase 3 communication

Figure 9.20 Hypercube quicksortcommunication.

000 001

101

010 011

110 111

100

000 001

101

010 011

110 111

100

000 001

101

010 011

110 111

100

Page 189: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

189Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(a) Phase 1

Broadcast pivot, p1

001 011 010 110 111 101 100000

001 011 010 110 111 101 100000(b) Phase 2

≤ p1 > p1

Broadcast pivot, p3Broadcast pivot, p2

001 011 010 110 111 101 100000(c) Phase 3

Broadcastpivot, p4

Broadcastpivot, p5

Broadcastpivot, p6

Broadcastpivot, p7

> p2 > p3≤ p3≤ p2

> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4

Figure 9.21 Quicksort hypercube algorithm with Gray code ordering.

Page 190: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

190Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

82 4 5 1 6 73

83 4 761 2 5

Odd indicesEven indices

Sorted lists

a[] b[]

c[] d[]

e[]Final sorted list

Compare and exchange

1 2 3 4 5 6 7 8Figure 9.22 Odd-even merging of twosorted lists.

Merge

Merge

Page 191: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

191Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a2

b2

a4

b4

a3

b3

a1

b1

bn

anan−1

bn−1

Evenmergesort

Oddmergesort

c1c2c3c4

c2nc2n−1

Compare andexchange

Figure 9.23 Odd-even mergesort.

c5

c7c6

c2n−2

Page 192: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

192Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a0, a1, a2, a3, … an−2, an−1

Figure 9.24 Bitonic sequences.

Value

a0, a1, a2, a3, … an−2, an−1

(a) Single maximum (b) Single maximum and single minimum

Page 193: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

193Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

3 5 8 9 7 4 2 1

3 4 2 1 7 5 8 9

Bitonic sequence

Bitonic sequence Bitonic sequence

Compare andexchange

Figure 9.25 Creating two bitonicsequences from one bitonic sequence.

Page 194: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

194Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

3 5 8 9 7 4 2 1

3 4 2 1 7 5 8 9

Compare andexchange

2 1 3 4 7 5 8 9

1 2 3 4 5 7 8 9Sorted list Figure 9.26 Sorting a bitonic sequence.

Unsorted numbers

Page 195: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

195Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Sorted list

Figure 9.27 Bitonic mergesort.

Unsorted numbers

Bitonicsortingoperation

Directionof increasingnumbers

Page 196: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

196Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

8 3 4 7 9 2 1 5

3 8 7 4 2 9 5 1

3 4 7 8 5 9 2 1

3 4 7 8 9 5 2 1

3 4 2 1 9 5 7 8

2 1 3 4 7 5 9 8

1 2 3 4 5 7 8 9

1

2

3

4

5

6

Compare and exchangeai with ai+n/2 (n numbers)

n = 2 ai with ai+1

n = 4 ai with ai+2

Formbitonic listsof four

Formbitonic listof eight

numbers

numbers

Split

Sort

n = 2 ai with ai+1

Sort bitonic list

n = 8 ai with ai+4

n = 4 ai with ai+2

n = 2 ai with ai+1

Split

Split

Sort

Step

Figure 9.28 Bitonic mergesort on eight numbers.

Compare andexchange

HigherLower

= bitonic list[Fig. 9.24 (a) or (b)]

Page 197: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

197Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

88502825

98804342

43422825

98888050

50422825

98888043

Figure 9.29 Compare-and-exchangealgorithm for Problem 9-5.

Step 1

Step 2

Step 3

Terminates when insertions at top/bottom of lists

Page 198: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

198Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a0,0 a0,1

a1,0

a0,m−2

an−1,0

a0,m−1

an−2,0

an−1,m−1an−1,m−2

an−2,m−1

a1,1 a1,m−2 a1,m−1

an−2,1 an−2,m-2

an−1,1

Row

Column

Figure 10.1 An n × m matrix.

Page 199: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

199Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

× =A B C

Figure 10.2 Matrix multiplication, C = A × B.

i

j

ci,j

Row

ColumnMultiply Sum

results

Page 200: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

200Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

× =A b c

Figure 10.3 Matrix-vector multiplicationc = A × b.

i ci

Rowsum

Page 201: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

201Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

× =

Sum

A B C

Figure 10.4 Block matrix multiplication.

p

qMultiply results

Page 202: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

202Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a0,0 a0,1 a0,2 a0,3

a1,0

a2,0

a3,0

a1,2a1,1

a2,1

a3,1

a2,2

a3,2 a3,3

a1,3

a2,3

b0,0 b0,1 b0,2 b0,3

b1,0

b2,0

b3,0

b1,2b1,1

b2,1

b3,1

b2,2

b3,2 b3,3

b1,3

b2,3

a0,0 a0,1

a1,0 a1,1

b0,0 b0,1

b1,0 b1,1

a0,2 a0,3

a1,2 a1,3

b2,0 b2,1

b3,0 b3,1

(a) Matrices

(b) Multiplying A0,0 × B0,0 to obtain C0,0

a0,0b0,0+a0,1b1,0 a0,0b0,1+a0,1b1,1

a1,0b0,0+a1,1b1,0 a1,0b0,1+a1,1b1,1

A0,0 B0,0 A0,1 B1,0

a0,2b2,0+a0,3b3,0 a0,2b2,1+a0,3b3,1

a1,2b2,0+a1,3b3,0 a1,2b2,1+a1,3b3,1

+

× + ×

=

=

a0,0b0,0+a0,1b1,0+a0,2b2,0+a0,3b3,0 a0,0b0,1+a0,1b1,1+a0,2b2,1+a0,3b3,1

a1,0b0,0+a1,1b1,0+a1,2b2,0+a1,3b3,0 a1,0b0,1+a1,1b1,1+a1,2b2,1+a1,3b3,1

= C0,0

Figure 10.5 Submatrix multiplication.

×

Page 203: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

203Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

b[][j]

a[i][]Row i

Column j

c[i][j]

Processor Pi,j

Figure 10.6 Direct implementation ofmatrix multiplication.

Page 204: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

204Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 10.7 Accumulation using a treeconstruction.

P0 P1 P2 P3

P0

P0 P2

c0,0

a0,0 b0,0 a0,1 b1,0 a0,2 b2,0 a0,3 b3,0

×

+

×××

+

+

Page 205: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

205Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

App

Aqp Aqq

Apq

i j

i

j

Bpp

Bqp Bqq

Bpq Cpp

Cqp Cqq

Cpq

P1 P3P2P0

P0 + P1

P4 + P5 P6 + P7

P2 + P3

P5 P7P6P4

Figure 10.8 Submatrix multiplication and summation.

Page 206: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

206Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

B

A

Figure 10.9 Movement of A and Belements.

j

i

Pi,j

Page 207: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

207Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

B

A

Figure 10.10 Step 2 — Alignment ofelements of A and B.

j

i

bi+j,j

ai,j+i

i places

j places

Page 208: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

208Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

B

A

Figure 10.11 Step 4 — One-place shift ofelements of A and B.

j

i

Pi,j

Page 209: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

209Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

c0,0 c0,1 c0,2 c0,3

c1,0 c1,1 c1,2 c1,3

c2,0 c2,1 c2,2 c2,3

c3,0 c3,1 c3,2 c3,3

b3,0b2,0b1,0b0,0

b3,3b2,3b1,3b0,3

b3,2b2,2b1,2b0,2

b3,1b2,1b1,1b0,1

a0,3 a0,2 a0,1 a0,0

a3,3 a3,2 a3,1 a3,0

a2,3 a2,2 a2,1 a2,0

a1,3 a1,2 a1,1 a1,0

Figure 10.12 Matrix multiplication using a systolic array.

Pumpingaction

One cycle delay

Page 210: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

210Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

c0

c1

c2

c3

b3b2b1b0

a0,3 a0,2 a0,1 a0,0

a3,3 a3,2 a3,1 a3,0

a2,3 a2,2 a2,1 a2,0

a1,3 a1,2 a1,1 a1,0

Figure 10.13 Matrix-vector multiplicationusing a systolic array.

Pumpingaction

Page 211: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

211Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Clearedto zero

Alreadyclearedto zero

Row i

Column i

Column

Row

Figure 10.14 Gaussian elimination.

Row j

Step throughaji

Page 212: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

212Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Already

Row i

Column

Row

Figure 10.15 Broadcast in parallel implementation of Gaussian elimination.

Broadcastith row

n − i +1 elements(including b[i])

clearedto zero

Page 213: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

213Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Broadcast Figure 10.16 Pipeline implementation ofGaussian elimination.

P0 P1 P2 Pn−1

rows

Row

Page 214: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

214Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P1

P3

P2

0

n/p

2n/p

3n/p

Figure 10.17 Strip partitioning.

Row

Page 215: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

215Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P1

Figure 10.18 Cyclic partitioning toequalize workload.

0

n/p

2n/p

3n/p

Row

Page 216: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

216Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

∆ ∆

f(x, y)

Solution space

y

x Figure 10.19 Finite difference method.

Page 217: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

217Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 10.20 Mesh of points numbered in natural order.

x1 x4x3x2 x8x7x6x5 x9

x31 x34x33x32 x38x37x36x35 x39

x41 x44x43x42 x48x47x46x45 x49

x51 x54x53x52 x58x57x56x55 x59

x61 x64x63x62 x68x67x66x65 x69

x71 x74x73x72 x78x77x76x75 x79

x11 x14x13x12 x18x17x16x15 x19

x21 x24x23x22 x28x27x26x25 x29

x81 x84x83x82 x88x87x86x85 x89

x60

x70

x80

x90

x40

x50

x30

x20

x10

x91 x94x93x92 x98x97x96x95 x99 x100

Boundary points (see text)

Page 218: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

218Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

−4−4

−4

−4

−4

1 1 11 1

1 1

1 11

ai,i ai,i+nai,i−1 ai,i+1ai,i−n1 1ith equation

11

11

1

1

Figure 10.21 Sparse matrix for Laplace’s equation.

×

x1

=

To includeboundary values

11

A x

00

00

and some zeroentries (see text)

x2

xN

xN-1

Those equations with a boundarypoint on diagonal unnecessary

for solution

Page 219: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

219Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Point

Point to becomputed

computed

Sequential order of computation

Figure 10.22 Gauss-Seidel relaxation with natural order, computed sequentially.

Page 220: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

220Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Red

Black

Figure 10.23 Red-black ordering.

Page 221: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

221Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 10.24 Nine-point stencil.

Page 222: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

222Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Coarsest grid points Finer grid pointsProcessor

Figure 10.25 Multigrid processorallocation.

Page 223: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

223Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

50°C

40°C 60°C

Ambient temperature at edges of board = 20°C

Figure 10.26 Printed circuit board for Problem 10-18.

Page 224: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

224Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 11.1 Pixmap.

j

i

Origin (0, 0)

p(i, j)Picture element(pixel)

Page 225: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

225Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0 255Gray level

Numberof pixels

Figure 11.2 Image histogram.

Page 226: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

226Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0

x3 x4 x5

x1 x2

x6 x7 x8 Figure 11.3 Pixel values for a 3 × 3 group.

Page 227: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

227Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Step 1Each pixel addspixel from left

Step 2Each pixel addspixel from right

Step 3Each pixel adds pixel

from above

Step 4Each pixel adds pixel

from below

Figure 11.4 Four-step data transfer for the computation of mean.

Page 228: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

228Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x3 + x4

x0 + x1

x6 + x7

(a) Step 1 (b) Step 2

(c) Step 3 (d) Step 4

x0

x3

x2x1

x7x6

x4 x5

x8

x3 + x4 + x5

x0 + x1 + x2

x6 + x7 + x8

x0

x3

x2x1

x7x6

x4 x5

x8

x0 + x1 + x2

x0 + x1 + x2

x6 + x7 + x8

x0

x3

x2x1

x7x6

x4 x5

x8

x3 + x4 + x5

x0 + x1 + x2

x0 + x1 + x2

x6 + x7 + x8

x0

x3

x2x1

x7x6

x4 x5

x8

x3 + x4 + x5x6 + x7 + x8

Figure 11.5 Parallel mean data accumulation.

Page 229: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

229Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Largest Next largest

Next largest

in row in row

in column

Figure 11.6 Approximate median algorithm requiring six steps.

Page 230: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

230Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

w0 w2

w3 w4

w1

w7w6

w5

w8

x0 x2

x3 x4

x1

x7x6

x5

x8

⊗ =

Figure 11.7 Using a 3 × 3 weighted mask.

x4'

Mask Pixels Result

Page 231: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

231Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1

1 1 1

1 1

1 1 1Figure 11.8 Mask to compute mean.

19

k =

Page 232: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

232Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1

1 8 1

1 1

1 1 1Figure 11.9 A noise reduction mask.

116

k =

Page 233: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

233Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

−1

−1 8 −1

−1 −1

−1 −1 −1 Figure 11.10 High-pass sharpening filtermask.

19

k =

Page 234: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

234Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Intensity transition

First derivative

Figure 11.11 Edge detection usingdifferentiation.

Second derivative

Page 235: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

235Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

f(x, y)

x

y

Image

Figure 11.12 Gray level gradient anddirection.

Gradient

φ

Constantintensity

Page 236: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

236Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0

1

0

1

−1

0

1

−1−1

−1

1

0

0

1

1

−1

0−1

Figure 11.13 Prewitt operator.

Page 237: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

237Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0

1

0

2

−1

0

1

−2−1

−2

1

0

0

1

2

−1

0−1

Figure 11.14 Sobel operator.

Page 238: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

238Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 11.15 Edge detection with Sobel operator.

(a) Original image (Annabel) (b) Effect of Sobel operator

Page 239: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

239Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

−1

0

4

−1

0

−1

0

−10

Figure 11.16 Laplace operator.

Page 240: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

240Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x1

x5

x7

x4x3

Left pixel Right pixel

Upper pixel

Lower pixelFigure 11.17 Pixels used in Laplaceoperator.

Page 241: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

241Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 11.18 Effect of Laplace operator.

Page 242: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

242Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

y = ax + b b = −x1a + y1y

x

b

a

Figure 11.19 Mapping a line into (a, b) space.

(b) Parameter space(a) (x, y) plane

(a, b)

Pixel in image

b = −xa + y(x1, y1)

Page 243: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

243Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

r = x cos θ + y sin θ

r

θ

Figure 11.20 Mapping a line into (r, θ) space.

(b) (r, θ) plane(a) (x, y) plane

y = ax + by

x

(r, θ)

Page 244: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

244Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

θr

y

x

Figure 11.21 Normal representation usingimage coordinate system.

Page 245: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

245Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

r

θ0°

5

10°0

1015

20°30°

Accumulator

Figure 11.22 Accumulators, acc[r][θ], forthe Hough transform.

Page 246: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

246Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

xjk

j

k Transformrows

Xjm

Transformcolumns

Xlm

Figure 11.23 Two-dimensional DFT.

Page 247: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

247Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

fj,k

gj,k

hj,k

Image

Filter/image

F(j, k)

G(j, k)

H(j, k)

f(j, k)

g(j, k)

h(j, k)

MultiplyConvolution

Transform

Inversetransform

×

(a) Direct convolution (b) Using Fourier transform

Figure 11.24 Convolution using Fourier transforms.

Page 248: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

248Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Slave processes

Master process

w0 w1 wn−1

X[0] X[1] X[n−1]Figure 11.25 Master-slave approach forimplementing the DFT directly.

Page 249: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

249Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

+××a

wk

x[j]

Figure 11.26 One stage of a pipelineimplementation of DFT algorithm.

X[k]

Process j

X[k]

a

Values fornext iteration

a × x[j]

wk

Page 250: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

250Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

Figure 11.27 Discrete Fourier transform with a pipeline.

P0 P1 P2 P3 PN−1

(a) Pipeline structure

(b) Timing diagram

Output sequence

X[0] X[1] X[2] X[3] X[4] X[6]X[5]

Pipelinestages

X[0],X[1],X[2],X[3]…X[k]

a

wk

0

1

wk

x[0] x[1] x[2] x[3] x[N−1]

P0

P1

P2

PN−1

PN−2

Page 251: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

251Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0x1x2

xN−1

x3

xN−2

N/2 ptDFT

N/2 ptDFT

Xk

Xk+N/2

k = 0, 1, … N/2

+

−Xodd × wk

Xeven

Figure 11.28 Decomposition of N-point DFT into two N/2-point DFTs.

Input sequence Transform

Page 252: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

252Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

X0

X1

X2

X3

x0

x1

x2

x3

+−

+++

+

+Figure 11.29 Four-point discrete Fouriertransform.

Page 253: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

253Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Xk = Σ(0,2,4,6,8,10,12,14)+wkΣ(1,3,5,7,9,11,13,15)

{[Σ(0,8)+wkΣ(4,12)]+wk[Σ(2,10)+wkΣ(6,14)]}+{[Σ(1,9)+wkΣ(5,13)]+wk[Σ(3,11)+wkΣ(7,15)]}

{Σ(0,4,8,12)+wkΣ(2,6,10,14)}+wk{Σ(1,5,9,13)+wkΣ(3,7,11,15)}

x0 x8 x4 x12 x2 x10 x6 x14 x1 x9 x5 x13 x3 x11 x7 x15

0000 1000 0100 1100 0010 1010 0110 1011 0001 1001 0101 1101 0011 1011 0111 1111

Figure 11.30 Sixteen-point DFT decomposition.

Page 254: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

254Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0

x8

x4

x12

x2

x10

x6

x14

x1

x9

x5

x13

x3

x11

x7

x15

X0

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

Figure 11.31 Sixteen-point FFT computational flow.

Page 255: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

255Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0

x8

x4

x12

x2

x10

x6

x14

x1

x9

x5

x13

x3

x11

x7

x15

X0

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

Figure 11.32 Mapping processors onto 16-point FFT computation.

P0

P2

P1

P3

0000

0001

0010

0011

0100

0101

0110

0111

1001

1010

1011

1100

1101

1110

1111

1000

P/r

ProcessRow

Inputs Outputs

Page 256: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

256Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3

Figure 11.33 FFT using transposealgorithm — first two steps.

x0

x4

x8

x12

x1

x5

x9

x13

x2

x6

x10

x14

x3

x7

x11

x15

Page 257: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

257Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3

Figure 11.34 Transposing array fortranspose algorithm.

x0

x4

x8

x12

x1

x5

x9

x13

x2

x6

x10

x14

x3

x7

x11

x15

Page 258: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

258Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 11.35 FFT using transposealgorithm — last two steps.

P0 P1 P2 P3

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

Page 259: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

259Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1 2 3 4 5 6 7

1

2

3

4

5

6

7

Mask

Figure 11.36 Image for Problem 11-3.

Page 260: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

260Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

First choice

Second choice

Third choice

Figure 12.1 State space tree.

C0 C1 Cn−1

Notincluding

C0

Notincluding

C1

Notincluding

Cn−1

Page 261: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

261Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

A1 A2

B1 B2

A1 B2

B1 A2

Parent A

Parent B

Child 1

Child 2Figure 12.2 Single-point crossover.

1

1

1

1

p

p

p

p

p+1

p+1

p+1

p+1

m

m

m

m

Page 262: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

262Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Subpopulation

Migration path;

Figure 12.3 Island model.

every island sendsto every other island

Page 263: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

263Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Island subpopulations

Limited migration path Figure 12.4 Stepping stone model

Page 264: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

264Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Instructions

Figure D.1 PRAM model.

Shared memory

Program

Data

ProcessorsClock

with localmemory

Page 265: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

265Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

d[1] s[1] d[2] s[2] d[3] s[3] d[4] s[4] d[5] s[5] d[6] s[6]d[0] s[0] d[7] s[7]

1 111 0111

Figure D.2 List ranking by pointer jumping.

Null

2 222 0122

4 444 0123

7 456 0123

Page 266: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

266Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Local computation

Communication

Barrier synchronization

Threads or processes

Maximum of hsends or receives

(maximum time w)

Figure D.3 A view of the bulk synchronous parallel model.

Page 267: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

267Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

o

L

g

Pi

Pk

PiTime

MessageProcessors

Figure D.4 LogP parameters.o

Next message

Page 268: Astrophysical Na single processor and memory. Main memory Processor Instructions (to processor) Data (to or from processor) 3 ... Figure 1.4 Message-passing multiprocessor model (multicomputer)

268Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998