high performance computing: concepts, methods & means mpi: the message passing interface

58
AT LOUISIANA STATE UNIVERSITY High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface Prof. Daniel S. Katz Department of Electrical and Computer Engineering Louisiana State University February 22 nd , 2007

Upload: lahela

Post on 12-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface. Prof. Daniel S. Katz Department of Electrical and Computer Engineering Louisiana State University February 22 nd , 2007. Topics. Introduction MPI Standard MPI-1.x Model and Basic Calls - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

AT LOUISIANA STATE UNIVERSITY

High Performance Computing: Concepts, Methods & Means

MPI: The Message Passing Interface

Prof. Daniel S. KatzDepartment of Electrical and Computer Engineering

Louisiana State University

February 22nd, 2007

Page 2: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

2

Topics

• Introduction • MPI Standard• MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication • Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 3: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

3

Topics

• Introduction • MPI Standard• MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication • Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 4: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Opening Remarks

• Context: distributed memory parallel computers• We have communicating sequential processes, each

with their own memory, and no access to another process’s memory– A fairly common scenario from the mid 1980s (Intel

Hypercube) to today– Processes interact (exchange data, synchronize) through

message passing– Initially, each computer vendor had its own library and calls– First standardization was PVM

• Started in 1989, first public release in 1991

• Worked well on distributed machines

• A library, not an API

– Next was MPI

Page 5: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

What you’ll Need to Know

• What is a standard API• How to build and run an MPI-1.x program• Basic MPI functions

– 4 basic environment functions• Including the idea of communicators

– Basic point-to-point functions• Blocking and non-blocking

• Deadlock and how to avoid it

• Datatypes

– Basic collective functions

• The advanced MPI-1.x material may be required for the problem set

• The MPI-2 highlights are just for information

Page 6: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

6

Topics

• Introduction • MPI Standard• MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication • Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 7: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

MPI Standard

• From 1992-1994, a bunch of people representing both vendors and users got together and decided to create a standard interface to message passing calls– In the context of distributed memory parallel computers

(MPPs, there weren’t really clusters yet)

• MPI-1 was the result– “Just” an API– FORTRAN77 and C bindings– Reference implementation (mpich) also developed– Vendors also kept their own internals (behind the API)– Vendor interfaces faded away over about 2 years

Page 8: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

MPI Standard

• Since then– MPI-1.1

• Fixed bugs, clarified issues

– MPI-2• Included MPI-1.2

– Fixed more bugs, clarified more issues

• Extended MPI without new functionality– New datatype constructors, language interoperability

• New functionality– One-sided communication– MPI I/O– Dynamic processes

• FORTRAN90 and C++ bindings

• Best MPI reference– MPI Standard - on-line at: http://www.mpi-forum.org/

Page 9: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

9

Topics

• Introduction• MPI Standard • MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication • Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 10: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Building an MPI Executable

• Not specified in the standard– Two normal options, dependent on implementation

– Library versioncc -Iheaderdir -Llibdir mpicode.c -lmpi

• User knows where header file and library are, and tells compiler

– Wrapper versionmpicc -o executable mpicode.c

• Does the same thing, but hides the details from the user

– You can do either one, but don't try to do both!

– On Celeritas (celeritas.cct.lsu.edu), the latter is easier

Page 11: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

MPI Model

• Some number of processes are started somewhere– Again, standard doesn’t talk about this

– Implementation and interface varies

– Usually, some sort of mpirun command starts some number of copies of an executable according to a mapping

– Example:mpirun -np 2 ./a.out

– Run two copies of ./a.out where the system specifies

– Most production supercomputing resources wrap the mpi run command with higher level scripts that interact with scheduling systems such as PBS / LoadLeveler for efficient resource management and multi-user support

– Sample PBS / Load Leveler job submission scripts :

PBS File: #!/bin/bash#PBS -l walltime=120:00:00,nodes=8:ppn=4cd /home/cdekate/S1_L2_Demos/adc/pwddatempirun -np 32 -machinefile $PBS_NODEFILE ./padcircdate

LoadLeveler File: #!/bin/bash#@ job_type = parallel#@ job_name = SIMID#@ wall_clock_limit = 120:00:00#@ node = 8#@ total_tasks = 32#@ initialdir = /scratch/cdekate/#@ executable = /usr/bin/poe#@ arguments = /scratch/cdekate/padcirc #@ queue

Page 12: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

MPI Communicators

• Communicator is an internal object– MPI provides functions to interact with it

• Default communicator is MPI_COMM_WORLD– All processes are member of it– It has a size (the number of processes)– Each process has a rank within it– Can think of it as an ordered list of processes

• Additional communicator can co-exist• A process can belong to more than one

communicator• Within a communicator, each process has a unique

rank

Page 13: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Sample MPI program

...INCLUDE mpif.h...CALL MPI_INITIALIZE(IERR)...CALL MPI_COMM_SIZE(MPI_COMM_WORLD, SIZE, IERR)CALL MPI_COMM_RANK(MPI_COMM_WORLD, RANK, IERR)...CALL MPI_FINALIZE(IERR)...

Page 14: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Sample MPI program

...#include <mpi.h>...err = MPI_Init(&Argc,&Argv);...err = MPI_Comm_size(MPI_COMM_WORLD, &size);err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);...err = MPI_Finalize();...

Page 15: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Sample MPI program

...#include <mpi.h>...err = MPI_Init(&Argc,&Argv);...err = MPI_Comm_size(MPI_COMM_WORLD, &size);err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);...err = MPI_Finalize();...

• Mandatory in any MPI code• Defines MPI-related parameters

Page 16: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Sample MPI program

...#include <mpi.h>...err = MPI_Init(&Argc,&Argv);...err = MPI_Comm_size(MPI_COMM_WORLD, &size);err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);...err = MPI_Finalize();...

• Must be called in any MPI code byall processes once and only oncebefore any other MPI calls

Page 17: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Sample MPI program

...#include <mpi.h>...err = MPI_Init(&Argc,&Argv);...err = MPI_Comm_size(MPI_COMM_WORLD, &size);err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);...err = MPI_Finalize();...

• Must be called in any MPI code byall processes once and only onceafter all other MPI calls

Page 18: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Sample MPI program

...#include <mpi.h>...err = MPI_Init(&Argc,&Argv);...err = MPI_Comm_size(MPI_COMM_WORLD, &size);err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);...err = MPI_Finalize();...

• Returns the number of processes (size) in the communicator (MPI_COMM_WORLD)

Page 19: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Sample MPI program

...#include <mpi.h>...err = MPI_Init(&Argc,&Argv);...err = MPI_Comm_size(MPI_COMM_WORLD, &size);err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);...err = MPI_Finalize();...

• Returns the rank of this process (rank) in the communicator (MPI_COMM_WORLD)

• Has unique return value per process

Page 20: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Sample MPI program

...#include <mpi.h>...err = MPI_Init(&Argc,&Argv);...err = MPI_Comm_size(MPI_COMM_WORLD, &size);err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);...err = MPI_Finalize();...

int

int

int

int

char**

Page 21: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

A Complete MPI Example#include <stdio.h>#include <mpi.h>main(int argc, char *argv[]){ int err, size, rank;

err = MPI_Init(&argc, &argv); /* Initialize MPI */ if (err != MPI_SUCCESS) { printf("MPI initialization failed!\n"); exit(1); } err = MPI_Comm_size(MPI_COMM_WORLD, &size); err = MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { /* root process */ printf("I am the root\n"); } else { printf("I am not the root\n"); } printf("My rank is %d\n",rank); err = MPI_Finalize(); exit(0);}

Output (with 3 processes):

I am not the rootMy rank is 2I am the rootMy rank is 0I am not the rootMy rank is 1

Output (with 3 processes):

I am not the rootMy rank is 2I am the rootMy rank is 0I am not the rootMy rank is 1

Page 22: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

22

Topics

• Introduction • MPI Standard• MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication • Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 23: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Point-to-point Communication

• How two processes interact• Most flexible communication in MPI• Two basic varieties

– Blocking and nonblocking

• Two basic functions– Send and receive

• With these two functions, and the four functions we already know, you can do everything in MPI– But there's probably a better way to do a lot

things, using other functions

Page 24: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Basic concept (buffered)

Kernel

mode

User mode

Process

0

Kernel

mode

User mode

Process

1

Call send Subroutine

Return from send

Subroutine

Copy data from

sendbuf to sysbuf

Send data to the

sysbuf at the receiving

end

Call receive

Subroutine

Return from receive

Subroutine

Receive data from the

sysbuf at the sending

end

Copy data from sysbuf

to recvbuf

sendbuf

sysbuf

sysbuf

recvbuf

Step 1

Step 2

Step 3

Page 25: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Blocking (buffered)

Kernel

mode

User mode

Process

0

Kernel

mode

User mode

Process

1

Call send Subroutine

Return from send

Subroutine

Copy data from

sendbuf to sysbuf

Send data to the

sysbuf at the receiving

end

Call receive

Subroutine

Return from receive

Subroutine

Receive data from the

sysbuf at the sending

end

Copy data from sysbuf

to recvbuf

sendbuf

sysbuf

sysbuf

recvbuf

• Calls do not return until data transfer is done– Send doesnt

return until sendbuf can be reused

– Receive doesn't return until recvbuf can be used

Page 26: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Nonblocking (buffered)

• Calls return after data transfer is started

• Faster than blocking calls

• Could cause problems if sendbuf or recvbuf is changed during call

• Need new call (wait) to know if nonblocking call is done

Kernel

mode

User mode

Process

0

Kernel

mode

User mode

Process

1

Call send Subroutine

Return from send

Subroutine

Copy data from

sendbuf to sysbuf

Send data to the

sysbuf at the receiving

end

Call receive

SubroutineReturn from receive

Subroutine

Receive data from the

sysbuf at the sending

end

Copy data from sysbuf

to recvbuf

sendbuf

sysbuf

sysbuf

recvbuf

Page 27: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Datatypes

• Basic datatypes

– You can also define your own (derived datatypes), such as an array of ints of size 100, or more complex examples, such as a struct or an array of structs

MPI datatype C datatype

MPI_CHAR signed char

MPI_SHORT signed short int

MPI_INT singed int

MPI_LONG singed long int

MPI_UNSIGNED_CHAR unsigned char

MPI_UNSIGNED_SHORT unsigned short

MPI_UNSIGNED unsigned int

MPI_UNSIGNED_LONG unsigned long int

MPI_FLOAT float

MPI_DOUBLE double

MPI_LONG_DOUBLE long double

MPI_BYTE

MPI_PACKED

MPI datatype Fortran datatype

MPI_INTEGER INTEGER

MPI_DOUBLE_PRECISION DOUBLE PRECISION

MPI_COMPLEX COMPLEX

MPI_LOGICAL LOGICAL

MPI_CHARACTER CHARACTER(1)

MPI_BYTE

MPI_PACKED

Page 28: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Point-to-Point Syntax

• Blocking

• Nonblocking

err = MPI_Send(sendbuf, count, datatype, destination, tag, comm);

err = MPI_Recv(recvbuf, count, datatype, source, tag, comm, &status);

err = MPI_Isend(sendbuf, count, datatype, destination, tag, comm,&req);

err = MPI_Irecv(recvbuf, count, datatype, source, tag, comm, &req);

err = MPI_Wait(req, &status);

Page 29: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Deadlock

• Something to avoid• A situation where the dependencies

between processors are cyclic– One processor is waiting for a message

from another processor, but that processor is waiting for a message from the first, so nothing happens

• Until your time in the queue runs out and your job is killed

• MPI does not have timeouts

Page 30: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Deadlock Example

• If the message sizes are small enough, this should work because of systems buffers

• If the messages are too large, or system buffering is not used, this will hang

If (rank == 0) { err = MPI_Send(sendbuf, count, datatype, 1, tag, comm); err = MPI_Recv(recvbuf, count, datatype, 1, tag, comm, &status);}else { err = MPI_Send(sendbuf, count, datatype, 0, tag, comm); err = MPI_Recv(recvbuf, count, datatype, 0, tag, comm, &status);}

Page 31: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Deadlock Example Solutions

If (rank == 0) { err = MPI_Send(sendbuf, count, datatype, 1, tag, comm); err = MPI_Recv(recvbuf, count, datatype, 1, tag, comm, &status);}else { err = MPI_Recv(recvbuf, count, datatype, 0, tag, comm, &status); err = MPI_Send(sendbuf, count, datatype, 0, tag, comm);}

orIf (rank == 0) { err = MPI_Isend(sendbuf, count, datatype, 1, tag, comm, &req); err = MPI_Recv(recvbuf, count, datatype, 1, tag, comm); err = MPI_Wait(req, &status);}else { err = MPI_Isend(sendbuf, count, datatype, 0, tag, comm, &req); err = MPI_Recv(recvbuf, count, datatype, 0, tag, comm); err = MPI_Wait(req, &status);}

Page 32: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

32

Topics

• Introduction • MPI Standard• MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication • Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 33: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Collective Communication

• How a group of processes interact– “group” here means processes in a communicator– A group can be as small as one or two processes

• One process communicating with itself isn't interesting• Two processes communicating are probably better

handled through point-to-point communication

• Most efficient communication in MPI• All collective communication is blocking

Page 34: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Collective Communication Types

• Three types of collective communication– Synchronization

• Example: barrier

– Data movement• Examples: broadcast, gather

– Reduction (computation)• Example: reduce

• All of these could also be done with point-to-point communications– Collective operations give better performance and

better productivity

Page 35: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Synchronization: Barrier

• Called by all processes in a communicator• Each calling processes blocks until all

processes have made the call.

err = MPI_Barrier(comm);

• My opinions• Barriers are not needed in MPI-1.x codes unless

based on external events• Examples: signals from outside, I/O

• Barriers are a performance bottleneck• One-sided communication in MPI-2 codes is

different• Barriers can help in printf-style debugging• Barriers may be needed for accurate timing

Page 36: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Data Movement: Broadcast

• Send data from one process, called root (P0 here) to all other processes in the communicator

• Called by all processes in the communicator with the same arguments

err = MPI_Bcast(buf, count, datatype, root, comm);

Broadcast

Page 37: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Broadcast Example#include <stdio.h>#include <mpi.h>main(int argc, char *argv[]){ int err, rank, data[100];

err = MPI_Init(&argc, &argv); /* Initialize MPI */ err = MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { /* root process */ /* fill all of data array here, with data[99] as xyz */ } err = MPI_Bcast(data, 100, MPI_INT, 0, MPI_COMM_WORLD); if (rank == 2) printf("My data[99] is %d\n",data[99]); err = MPI_Finalize(); exit(0);}

Output (with 3 processes):

My data[99] is xyz

Output (with 3 processes):

My data[99] is xyz

Page 38: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Data Movement: Gather

• Collect data from all one process in a communicator to one process, called root (P0 here)

• Called by all processes in the communicator with the same arguments

err = MPI_Bcast(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm);

– As if all n processes calledMPI_Send(sendbuf, sendcount, sendtype, root, …);

– And root made n callsMPI_Recv(recvbuf+i•recvcount•extent(recvtype, recvcount, recvtype, i, …);

Gather

Page 39: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Gather Example#include <stdio.h>#include <mpi.h>main(int argc, char *argv[]){ int err, size, rank, data[100], *rbuf;

MPI_Init(&argc, &argv); /* Initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); /* all processes fill their data array here */ /* make data[0]=xyz on process 1 */ if (rank == 0) { /* root process */ rbuf = malloc(size*100*sizeof(int); } MPI_Gather(data, 100, MPI_INT, rbuf, 100, MPI_INT, 0, MPI_COMM_WORLD); if (rank == 0) printf("rbuf[100] is %d\n",rbuf[100]); MPI_Finalize(); exit(0);}

Output (with >1 processes):

My rbuf[100] is xyz

Output (with >1 processes):

My rbuf[100] is xyz

Page 40: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Reduction: Reduce

• Similar to gather:– Collect data from all one process in a communicator to one process, called

root (P0 here)

• But, the data is operated upon• Called by all processes in the communicator with the same arguments

err = MPI_Reduce(sendbuf, recvbuf, count, datatype, operation, root, comm);

Reduce

Page 41: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Reduction: Reduce Operations

• Predefined operations:

– MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC

• MAXLOC and MINLOC are tricky

– Read the spec

• Also can define your own operation

– Read the spec

Page 42: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Reduce Example#include <stdio.h>#include <mpi.h>main(int argc, char *argv[]){ int err, rank, data, datasum;

MPI_Init(&argc, &argv); /* Initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &rank); data = rank*3+1; /* {1,4,7} */ MPI_reduce(data, datasum, 1, MPI_INT, MPI_SUM, 0,MPI_COMM_WORLD); if (rank == 0) printf("datasum is %d\n", datasum); MPI_Finalize(); exit(0);}

Output (with 3 processes):

My datasum is 12

Output (with 3 processes):

My datasum is 12

Page 43: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Collective Operations

Page 44: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

44

Topics

• Introduction • MPI Standard• MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication• Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 45: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Communicators and Groups

• Group is an MPI term related to communicators• MPI functions exist to define a group from a

communicator or vice-versa, to split communicators, and to work with groups

• You can build communicators to fit your application, not just use MPI_COMM_WORLD– Communicators can be built to match a logical process

topology, such as Cartesian

Page 46: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Communication Modes

• MPI_Send uses standard mode– May be buffered, depending on message size– Details are implementation dependent

• Other modes exist:– Buffered - requires message buffering (MPI_Bsend)– Synchronous - forbids message buffering (MPI_Ssend)– Ready - Recv most be posted before Send (MPI_Rsend)

• Only one MPI_Recv• This is independent of blocking/nonblocking

Page 47: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

More Collective Communication

• What if data to be communicated is not the same size in each process?

• Varying calls (v) exist:– MPI_Gatherv, MPI_Scatterv, MPI_Allgatherv– Additional arguments include information about data on

each process

Page 48: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Persistent Communication

• Used when a communication with the same argument list is repeatedly executed within the inner loop of a parallel computation– Bind list of communication arguments to a persistent

communication request once, and then, repeatedly use the request to initiate and complete messages

– Allows reduction of overhead for communication between the process and communication controller, not overhead for communication between one communication controller and another

– Not necessary that messages sent with persistent request be received by receive operation with persistent request, or vice versa

Page 49: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

49

Topics

• Introduction • MPI Standard• MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication• Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 50: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

MPI-2 Status

• All vendors have complete MPI-1, and have for 5 - 10 years

• Free implementations (MPICH, LAM) support heterogeneous workstation networks

• MPI-2 implementations are being undertaken by all vendors– Fujitsu, NEC have complete MPI-2 implementations– Other vendors generally have all but dynamic process

management– MPICH-2 is complete– Open MPI (new MPI from LAM and other MPIs) is

becoming complete

Page 51: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

MPI-2 Dynamic Processes

• MPI-2 supports dynamic processes• An application can start, and later more processes

can be added to it, through a complex process• Intracommunicators - everything we talked about so

far• Intercommunicators - for multiple sets of processes

to work together

Page 52: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

One-sided communication

• One process can put data to another process's memory, or get data from another process's memory, with the other process being affected

• If hardware supports it, allows the second process to compute while the communication is happening

• Separates data transfer and synchronization• Barriers become essential

Page 53: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Parallel I/O (1)

• Old ways to do I/O– Process 0 does all the I/O to a single file and

broadcasts/scatters/gathers to/from other processes– All processes do their own I/O to separate files– All tasks read from same file– All tasks write to same file, using seeks to get to right

place– One task at a time appends to a single file, using barriers

to prevent overlapping writes

Page 54: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Parallel I/O (2)

• New way is to use parallel I/O library, such as MPI I/O– Multiple tasks can simultaneously read or write to a

single file (possibly on a parallel file system) using the MPI I/O API

– A parallel file system usually looks like a single file system, but has multiple I/O servers to permit high bandwidth from multiple processes

– MPI I/O is part of MPI-2– Allows single or collective operations to/of contiguous or

non-contiguous regions/data using MPI datatypes, including derived datatypes, blocking or nonblocking

• Sound familiar? Writing: sending message, reading: receiving

Page 55: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Parallel I/O (3)

• Uses high-level access– Given complete access information, an

implementation can perform optimizations such as:• Data Sieving: Read large chunks and extract

what is really needed• Collective I/O: Merge requests of different

processes into larger requests• Improved prefetching and caching

Page 56: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

56

Topics

• Introduction • MPI Standard• MPI-1.x Model and Basic Calls• Point-to-point Communication• Collective Communication• Advanced MPI-1.x Highlights• MPI-2 Highlights• Summary

Page 57: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY

Summary – Material for the Test

• MPI standard: slides 4,7• Compile and Run an MPI Program: slides 10,11 • Environment functions: slides 12,14• Point-to-point functions: slides 27,28• Blocking vs. nonblocking: slides 25,26• Deadlock: slides 29-31• Basic collective functions: slides

33,34,36,38,40,41,43

Page 58: High Performance Computing: Concepts, Methods & Means MPI: The Message Passing Interface

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY