mpi and high performance computing: systems and programming barry britt, systems administrator...

31
MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University

Upload: phyllis-price

Post on 01-Jan-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

MPI and High Performance Computing:Systems and Programming

Barry Britt, Systems Administrator Department of Computer Science

Iowa State University

Purpose

To give you: … an overview of some new system-level MPI

functions … access to tools that you need to compile and

run MPI jobs … some instruction in the creation and use of

Makefiles … some instruction on how to tell time in C

programs.

Makefiles

Makefiles

GNU Make Enables the end user to build and install a

package without worrying about the details. Automatically figures out which files it needs

to update based on which source files have changed.

Not language dependent Not limited to building a package; can be used

to install or uninstall

Makefile Rules

A rule tells Make how to execute a series of commands in order to build a target from source files.

Specifies a list of dependancies Dependancies should include ALL files that are

dependancies for a target

target:dependancies ....

commands...

Example Makefile for C Source

CC=gccCFLAGS=-WallINCLUDES=BINARIES=rand test

.SUFFIXES: .c .o

.c.o: $(CC) $(CFLAGS) -c $*.call: $(BINARIES)rand.o: rand.ctest.o: test.crand: rand.o $(CC) $(CFLAGS) -o rand rand.otest: test.o $(CC) $(CFLAGS) -o test test.oclean: rm -f a.out core *.o $(BINARIES)

Example Makefile for C SourceCC=gccCFLAGS=-WallINCLUDES=BINARIES=rand test

Variables CC is set to use the GCC compiler

For MPI programs, set it to mpicc, not gcc CFLAGS:

-c: compile -Wall: set warnings to all

Example Makefile for C Source

Target “clean”. Use by typing make clean

Rule states: In my current directory, run:

rm -f a.out core *.o $(BINARIES) rm -f a.out core *.o rand test

clean:rm -f a.out core *.o $(BINARIES)

Example Makefile for C Source

Makefile instruction on how to handle .c files and turn them into object (.o) files

Compile using $(CC) value with $(CFLAGS) Compile each individual file into its

appropriate .o file

.SUFFIXES: .c .o

.c.o: $(CC) $(CFLAGS) -c $*.c

Example Makefile for C Source

Target: rand or test Run $(CC) $(CFLAGS) -o rand rand.o gcc -Wall -o rand rand.o

If you were going to include external libraries to link, they would be linked at the end of the rule.

rand.o:rand.ctest.o: test.crand: rand.o

$(CC) $(CFLAGS) -o rand rand.otest: test.o

$(CC) $(CFLAGS) -o test test.o

Random Matrix Generation

Random Generator for Matrices

Rand -f: filename to which to write the matrix -c: number of matrix columns -r: number of matrix rows -h: help documentation -s: seed -m: max integer in matrix cells

Random Generator for Matrices

Completely random generation for an m by n matrix

Uses a random seed to create the matrix Output file

First line contains the number of rows and the number of columns

Subsequent lines contain matrix cell values, one per line.

Random Generator for Matrices

For a Matrix with row length m, cell A[i,j] is on line:

m * i + j + 2 Lines are not zero-indexed for the purpose of

this calculation. Therefore, for a 5 x 5 matrix (zero-indexed):

A[0, 0] is on line 2 A[0, 1] is on line 3 A[4, 4] is on line 26 A[2, 3] is on line 15

Calculating Run Time in C

Calculating Running Time in C

#include <stdio.h>#include <sys/time.h>

int main() { struct timeval begin, end; double time;

gettimeofday(&begin, NULL); sleep(10); gettimeofday(&end, NULL);

time = (end.tv_sec - begin.tv_sec) +((end.tv_usec - begin.tv_usec) / 1000000.0); printf("This program ran for %f seconds\n", time);

return 0;}

C Time

Includes seconds and microseconds Used by the gettimeofday() system call gettimeofday()

Returns the number of seconds (and microseconds) since the UNIX Epoch

Is this completely accurate? No, but it's VERY close (within a few

microseconds).

C Time

You MUST use the timeval struct for the gettimeofday() call

On UNIX systems, you need to include sys/time.h to use this.

Calculation of time is:(end seconds – begin seconds) +

((end microseconds – begin microseconds) / 1000000)

You can calculate: Program run time Algorithm execution time

Using the PBS Job Submission System

PBS (Torque/Maui)

hpc-class job submission system qsub All queues are managed by the scheduler. PBS scripts can be created at:

http://hpcgroup.public.iastate.edu/HPC/hpc-class/hpc-class_script_writer.html

Example script

#!/bin/csh

#PBS -o BATCH_OUTPUT #PBS -e BATCH_ERRORS

#PBS -lvmem=256Mb,pmem=256Mb,mem=256Mb,nodes=16:ppn=2,cput=2:00:00,walltime=1:00:00

# Change to directory from which qsub was executed cd $PBS_O_WORKDIR

time mpirun -np 32 <program>

PBS Variables

-l (resources) vmem: total virtual memory pmem: per task memory mem: total aggregate memory nodes – total number of nodes ppn – processors per node cput – CPU time walltime – total time for all CPUs

PBS Variables

vmem = pmem = mem total CPUs = nodes * ppn cput = walltime * ppn

PBS (Torque/Maui)

Based on the previous script BATCH_OUTPUT contains the output from the

batch job BATCH_ERRORS contains the error

information from the batch job

Some other important information

Max CPU – 32 for classwork Max memory – 2.0 GB Max swap – 2.0 GB Short queue -

4 nodes per job; 16 total CPUs 1 hour per job 2 total jobs per user

MPI Blocking vs. Non-Blocking Communication

MPI Communication

Blocking Communication: MPI_Send MPI_Recv

MPI_Send → Basic blocking send operation. Routine returns only after the application buffer in the sending task is free for reuse.

MPI_Recv → Receive a message and block until the requested data is available in the application buffer in the receiving task.

MPI Communication

Non-blocking Communication

MPI_Isend | MPI_Irecv MPI_Wait | MPI_Test

MPI_Isend → Identifies an area in memory to serve as a send buffer. Processing continues without waiting for the message to be copied out from the buffer.

MPI_Irecv → Identifies an area in memory to serve as a receive buffer. Processing continues immediately without waiting for the message to be received and copied into the the buffer.

MPI_Test → check the status of a non-blocking send or receive

MPI_Wait → block until a specified non-blocking send or receive operation has completed

Why non-blocking communication?

In some cases, it can increase performance. If there is an expensive operation you need to

do, it helps speed up the program Disk I/O Heavy processing on already received data

BE CAREFUL!!! If you try to access a buffer when it isn't there,

your program WILL fail.

int main (int argc, char **argv) { int myRank;

MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myRank);

if (myRank == 0) master(); else slave();

MPI_Finalize(); return 0;}

int master() { int i, size, my_answer = 0, their_work = 0; MPI_Status status;

MPI_Comm_size(MPI_COMM_WORLD, &size);

for (i = 1; i < size; i++) { MPI_Recv ( &their_work, 1, MPI_INT, i, TAG, MPI_COMM_WORLD, &status); my_answer += their_work; } printf("The answer is: %d\n", my_answer);

return 0;}

int slave() { int i, myRank, size, namelength, work = 0; char name[MPI_MAX_PROCESSOR_NAME];

MPI_Comm_rank(MPI_COMM_WORLD, &myRank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_processor_name(name, &namelength);

printf("[%s]: Adding the nubmers %d to %d = ", name, (100 / (size-1)) * (myRank-1) + 1 , (100 / (size-1)) * myRank);

for (i = (100 / (size-1)) * (myRank-1) + 1; i <= myRank * (100 / (size-1)); i++) { work = work + i; } printf("%d\n", work);

MPI_Send(&work, 1, MPI_INT, 0, TAG, MPI_COMM_WORLD);

return 0;}