12c.1 collective communication in mpi unc-wilmington, c. ferner, 2008 nov 4, 2008

24
12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.1

Collective Communication in MPI

UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

Page 2: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.2

Barrier• A barrier is a way to synchronize all (or a

subset) of the processors.• When processors reach the

MPI_Barrier(), they block until all processors have reach the same barrier

• All processors should call the barrier function, or else you have a deadlock

• Syntax:MPI_Barrier(MPI_COMM_WORLD);

Page 3: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.3

Barrier

• Example: MPI_Barrier(MPI_COMM_WORLD); if (mypid == 0) { gettimeofday(&tv1, NULL); }

... // Do some work

MPI_Barrier(MPI_COMM_WORLD); if (mypid == 0) { gettimeofday(&tv2, NULL); }

Page 4: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.4

Broadcast

• A broadcast is when one processor needs to send the same information to all (or a subset) of the other processors

• Syntax:MPI_Bcast (buffer, count, datatype, root, MPI_COMM_WORLD)

• buffer, count, datatype are the same as with MPI_Send()

• root is the id of the thread initiating the broadcast

Page 5: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.5

Broadcast

• Example:int N = ___;float b = ____;float a[N];

MPI_Bcast (&N, 1, MPI_INT, 0, MPI_COMM_WORLD);MPI_Bcast (&b, 1, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Bcast (a, N, MPI_FLOAT, 0, MPI_COMM_WORLD);

Page 6: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.6

Broadcast

• All processors participating in the broadcast (whether they are the source or a destination) must call the broadcast function with the same parameters or else it won't work

• The runtime of a broadcast is O(log(p)) instead of O(p), where p is the number of processors, as it would be if the root send the data to each processor in turn

Page 7: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.7

Broadcast

0

CommunicationNon communication

0 4

0 42 6

0 42 61 53 7

Page 8: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.8

Reduction

• A Reduction is where an array of values is reduced to a single value by applying a binary (usually commutative) operators.

MPI_MAX maximumMPI_MIN minimum MPI_SUM sum

MPI_PROD product

MPI_LAND logical and

MPI_BANDMPI_LOR logical or MPI_BOR bit-wise or

MPI_LXOR logical xor

bit-wise and

Page 9: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.9

ReductionP

1P

0P

3P

2P

5P

4P

7P

6

+

37 16 48 2 73 44 32 11

53 16 50 2 117 44 43 11

103 16 50 2 160 44 43 11

263 16 50 2 160 44 43 11

+ + +

+ +

+

CommunicationNon communication

Page 10: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.10

Reduction

• Syntax:

MPI_Reduce(sendbuf, recvbuf, count, MPI_Datatype, MPI_Op, root, MPI_Comm)

• sendbuf, count, datatype, and MPI_Comm are the same as with MPI_Send() and Bcast()

• root is the id of the thread which will posses the final value

• MPI_Op is one of the constants on previous slide

Page 11: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.11

Reduction

• Example:

int x, y;// Each processor has a different// value for x

MPI_Reduce(&x, &y, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

• The root thread (0) has the sum of all x's in the variable y

Page 12: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.12

Reduction

• Example:

int x[N], y[N];// Each processor has different// values in the array x

MPI_Reduce(x, y, N, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

• The root thread (0) has the sum of all x[0]'s in y[0], the sum of all x[1]'s in y[1], ...

Page 13: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.13

Reduction

• All processors participating in the reduction (whether they are the source or a destination) must call the Reduce function with the same parameters or else it won't work

• The runtime of a reduction is O(log(p)) instead of O(p), where p is the number of processors

Page 14: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.14

Reduction

0

CommunicationNon communication

0 4

0 42 6

0 42 61 53 7

Page 15: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.15

Scatter/Gather

• Scatter sends parts of an array from the root to each processors

• Syntax:MPI_Scatter(send_data, send_count, send_type, recv_data, recv_count, recv_type, root, MPI_Comm)

• Gather brings together parts of an array of different processors to the root

• Syntax:MPI_Gather(send_data, send_count, send_type, recv_data, recv_count, recv_type, root, MPI_Comm)

Page 16: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.16

}

Scatter

P1P

0P

3P

2

P0 } } }37 16 48 2 73 44 32 11

37 16 48 2 73 44 32 11

Page 17: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.17

}

Gather

P1P

0P

3P

2

P0

} } }

37 16 48 2 73 44 32 11

37 16 48 2 73 44 32 11

Page 18: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.18

Scatter/Gather

float a[N], localA[N];...

if (mypid == 0) {

printf ("<pid %d>: a = ",mypid); for (i = 0; i < N; i++) printf ("%f ", a[i]); printf ("\n");

}

Page 19: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.19

Scatter/Gather

blksz = (int) ceil (((float) N)/P);

MPI_Scatter(a, blksz, MPI_FLOAT, &localA[0], blksz, MPI_FLOAT, 0, MPI_COMM_WORLD);

Page 20: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.20

Scatter/Gather

for (i = 0; i < blksz; i++) printf ("<pid%d>: localA = %.2f\n", mypid, localA[i]);

for (i = 0; i < blksz; i++) localA[i] += mypid;

for (i = 0; i < blksz; i++) printf ("<pid%d>: new localA =%.2f\n", mypid, localA[i]);

Page 21: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.21

Scatter/Gather

MPI_Gather(&localA[0], blksz, MPI_FLOAT, a, blksz, MPI_FLOAT, 0, MPI_COMM_WORLD);

if (mypid == 0) { printf ("<pid %d>: A = ",mypid); for (i = 0; i < N; i++) printf ("%f ", a[i]); printf ("\n");}

Page 22: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.22

Scatter/Gather

$ mpirun -nolocal -np 3 mpiGatherScatter 6 <pid 0>: A = 84.019997 39.439999 78.309998 79.839996 91.160004 19.760000

<pid0>: localA = 84.02<pid0>: localA = 39.44<pid0>: new localA = 84.02<pid0>: new localA = 39.44

Page 23: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.23

Scatter/Gather<pid1>: localA = 78.31<pid1>: localA = 79.84<pid1>: new localA = 79.31<pid1>: new localA = 80.84

<pid2>: localA = 91.16<pid2>: localA = 19.76<pid2>: new localA = 93.16<pid2>: new localA = 21.76

<pid 0>: A = 84.019997 39.439999 79.309998 80.839996 93.160004 21.760000

Page 24: 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.24

For further reading

• Man pages on MPI Routines: – http://www-unix.mcs.anl.gov/mpi/www/ www3/

• Barry Wilkinson and Michael Allen, Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers, Prentice Hall, Upper Saddle River, NJ, 1999, ISBN 0-13-671710-1

• Peter S. Panache, Parallel Programming with MPI, Morgan Kaufmann Publishers, Inc., San Francisco, CA, 1997, ISBN 1-55860-339-5