1Computer Science, University of WarwickComputer Science, University of Warwick
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of the distributed memory machines
The users have a virtual global address space and the message passing underneath is sorted out by DSM transparently from the users
Then we can use shared memory programming techniques
Software of implementing DSM http://www.ics.uci.edu/~javid/dsm/page.html
2Computer Science, University of WarwickComputer Science, University of Warwick
Three types of DSM implementationsThree types of DSM implementations
Page-based technique
The virtual global address space is divided into equal sized chunks (pages) which are spread over the machines
Page is the minimal sharing unit
The request by a process to access a non-local piece of memory results in a page fault
a trap occurs and the DSM software fetches the required page of memory and restarts the instruction
a decision has to be made whether to replicate pages or maintain only one copy of any page and move it around the network
The granularity of the pages has to be decided before implementation
3Computer Science, University of WarwickComputer Science, University of Warwick
Three types of DSM implementationsThree types of DSM implementations
Shared-variable based technique
only the variables and data structures required by more than one process are shared.
Variable is minimal sharing unit
Trade-off between consistency and network traffic
4Computer Science, University of WarwickComputer Science, University of Warwick
Three types of DSM implementationsThree types of DSM implementations
Object-based technique
memory can be conceptualized as an abstract space filled with objects (including data and methods)
Object is minimal sharing unit
Trade-off between consistency and network traffic
5Computer Science, University of WarwickComputer Science, University of Warwick
OpenMPOpenMP
OpenMP stands for Open specification for Multi-processing
used to assist compilers to understand and parallelise the serial code better
Can be used to specify shared memory parallelism in Fortran, C and C++ programs
OpenMP is a specification for
a set of compiler directives,
RUN TIME library routines, and
environment variables
Started mid-late 80s with emergence of shared memory parallel computers with proprietary directive-driven programming environments
OpenMP is industry standard
6Computer Science, University of WarwickComputer Science, University of Warwick
OpenMPOpenMP
OpenMP specifications include:
OpenMP 1.0 for Fortran, 1997
OpenMP 1.0 for C/C++, 1998
OpenMP 2.0 for Fortran, 2000
OpenMP 2.0 for C/C++ , 2002
OpenMP 2.5 for C/C++ and Fortran, 2005
OpenMP Architecture Review Board: Compaq, HP, IBM, Intel, SGI, SUN
7Computer Science, University of WarwickComputer Science, University of Warwick
OpenMP programming modelOpenMP programming model
Shared Memory, thread-based parallelism
Explicit parallelism
Fork-join model
8Computer Science, University of WarwickComputer Science, University of Warwick
OpenMP code structure in COpenMP code structure in C
#include <omp.h>
main () {
int var1, var2, var3;
Serial code
/*Beginning of parallel section. Fork a team of threads. Specify variable scoping*/
#pragma omp parallel private(var1, var2) shared(var3)
{
Parallel section executed by all threads
…
All threads join master thread and disband
}
Resume serial code
}
9Computer Science, University of WarwickComputer Science, University of Warwick
OpenMP code structure in FortranOpenMP code structure in Fortran
PROGRAM HELLO
INTEGER VAR1, VAR2, VAR3 Serial code . . . !Beginning of parallel section. Fork a team of threads. Specify variable scoping
!$OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3)
Parallel section executed by all threads
. . .
All threads join master thread and disband
!$OMP END PARALLEL
Resume serial code
. . .
END
10Computer Science, University of WarwickComputer Science, University of Warwick
OpenMP Directives FormatOpenMP Directives Format
C/C++
Fortran
11Computer Science, University of WarwickComputer Science, University of Warwick
OpenMP featuresOpenMP features
OpenMP directives are ignored by compilers that don’t support OpenMP, so codes can also be run on sequential machines
Compiler directives used to specify
sections of code that can be executed in parallel
critical sections
Scope of variables (private or shared)
Mainly used to parallelize loops, e.g. separate threads to handle separate iterations of the loop
There is also a run-time library that has several useful routines for checking the number of threads and number of processors, changing the number of threads, etc
12Computer Science, University of WarwickComputer Science, University of Warwick
Fork-Join ModelFork-Join Model
Multiple threads are created using the parallel construct
For C and C++#pragma omp parallel {
... do stuff }
For Fortran!$OMP PARALLEL
... do stuff!$OMP END PARALLEL
13Computer Science, University of WarwickComputer Science, University of Warwick
How many threads generatedHow many threads generated
The number of threads in a parallel region is determined by the following factors, in order of precedence:
Use of the omp_set_num_threads() library function
Setting of the OMP_NUM_THREADS environment variable
Implementation default - the number of CPUs on a node
Threads are numbered from 0 (master thread) to N-1
14Computer Science, University of WarwickComputer Science, University of Warwick
Parallelizing loops in OpenMP – Parallelizing loops in OpenMP – Work Sharing constructWork Sharing construct
Compiler directive specifies that loop can be done in parallelFor C and C++
#pragma omp parallel forfor (i=0;i++;i<N){
value[i] = compute(i);}
For Fortran!$OMP PARALLEL DO DO (i=1:N) value(i) = compute(i); END DO!$OMP END PARALLEL DO
Can use thread scheduling to specify partition and allocation of iterations to threads
#pragma omp parallel for schedule(static,4)
schedule(static [,chunk])
Deal out blocks of iterations of size chunk to each thread
schedule(dynamic [,chunk])
Each thread grabs a chunk iterations off a queue until all are done
schedule(runtime) Find schedule from an environment variable OMP_SCHEDULE
15Computer Science, University of WarwickComputer Science, University of Warwick
Synchronisation in OpenMPSynchronisation in OpenMP
Critical construct
Barrier construct
16Computer Science, University of WarwickComputer Science, University of Warwick
Example of Critical Section in OpenMPExample of Critical Section in OpenMP
#include <omp.h>
main() {
int x;
x = 0;
#pragma omp parallel shared(x)
{
#pragma omp critical
x = x+1;
} /* end of parallel section */
}
17Computer Science, University of WarwickComputer Science, University of Warwick
Example of Barrier in OpenMPExample of Barrier in OpenMP
#include <omp.h> #include <stdio.h>
int main (int argc, char *argv[]) { int th_id, nthreads; #pragma omp parallel private(th_id) { th_id = omp_get_thread_num(); printf("Hello World from thread %d\n", th_id); #pragma omp barrier if ( th_id == 0 ) { nthreads = omp_get_num_threads(); printf("There are %d threads\n",nthreads); } } return 0; }
18Computer Science, University of WarwickComputer Science, University of Warwick
Data Scope Attributes in OpenMPData Scope Attributes in OpenMP
OpenMP Data Scope Attribute Clauses are used to explicitly define how variables should be scoped
These clauses are used in conjunction with several directives (e.g. PARALLEL, DO/for) to control the scoping of enclosed variables
Three often encountered clauses:
Shared
Private
Reduction
19Computer Science, University of WarwickComputer Science, University of Warwick
Shared and private data in OpenMPShared and private data in OpenMP
private(var) creates a local copy of var for each thread
shared(var) states that var is a global variable to be shared among threads
Default data storage attribute is shared
!$OMP PARALLEL DO!$OMP& PRIVATE(xx,yy) SHARED(u,f) DO j = 1,m DO i = 1,n xx = -1.0 + dx * (i-1) yy = -1.0 + dy * (j-1) u(i,j) = 0.0 f(i,j) = -alpha * (1.0-xx*xx) * & (1.0-yy*yy) END DO END DO!$OMP END PARALLEL DO
20Computer Science, University of WarwickComputer Science, University of Warwick
Reduction ClauseReduction Clause
Reduction -
reduction (op : var)
e.g. add, logical OR. A local copy of the variable is made for each thread. Reduction operation done for each thread, then local values combined to create global value
double ZZ, res=0.0;#pragma omp parallel for reduction (+:res)private(ZZ)for (i=1;i<=N;i++) { ZZ = i; res = res + ZZ:}
21Computer Science, University of WarwickComputer Science, University of Warwick
Run-Time Library RoutinesRun-Time Library Routines
Can perform a variety of functions, including
Query the number of threads/thread no.
Set number of threads
…
22Computer Science, University of WarwickComputer Science, University of Warwick
Run-Time Library RoutinesRun-Time Library Routines
query routines allow you to get the number of threads and the ID of a specific thread
id = omp_get_thread_num(); //thread no.
Nthreads = omp_get_num_threads(); //number of threads
Can specify number of threads at runtime
omp_set_num_threads(Nthreads);
23Computer Science, University of WarwickComputer Science, University of Warwick
Environment VariableEnvironment Variable
Controlling the execution of parallel code
Four environment variables
OMP_SCHEDULE: how iterations of a loop are scheduled
OMP_NUM_THREADS: maximum number of threads
OMP_DYNAMIC: enable or disable dynamic adjustment of the number of threads
OMP_NESTED: enable or disable nested parallelism
24Computer Science, University of WarwickComputer Science, University of Warwick
OpenMP compilersOpenMP compilers
Since parallelism is mostly achieved by parallelising loops using shared memory, OpenMP compilers work well for multiprocessor SMPs and vector machines
OpenMP could work for distributed memory machines, but would need to use a good distributed shared memory (DSM) implementation
For more information on OpenMP, see
www.openmp.org
High Performance ComputingHigh Performance ComputingCourse Notes 2007-2008Course Notes 2007-2008
Message Passing Programming IMessage Passing Programming I
26Computer Science, University of WarwickComputer Science, University of Warwick
Message Passing Programming
Message Passing is the most widely used parallel programming model
Message passing works by creating a number of tasks, uniquely named, that interact by sending and receiving messages to and from one another (hence the message passing)
Generally, processes communicate through sending the data from the address space of one process to that of another
Communication of processes (via files, pipe, socket)
Communication of threads within a process (via global data area)
Programs based on message passing can be based on standard sequential language programs (C/C++, Fortran), augmented with calls to library functions for sending and receiving messages
27Computer Science, University of WarwickComputer Science, University of Warwick
Message Passing Interface (MPI)Message Passing Interface (MPI)
MPI is a specification, not a particular implementation
Does not specify process startup, error codes, amount of system buffer, etc
MPI is a library, not a language
The goals of MPI: functionality, portability and efficiency
Message passing model > MPI specification > MPI implementation
28Computer Science, University of WarwickComputer Science, University of Warwick
OpenMP vs MPIOpenMP vs MPI
In a nutshell
MPI is used on distributed-memory systems
OpenMP is used for code parallelisation on shared-memory systems
Both are explicit parallelism
High-level control (OpenMP), lower-level control (MPI)
29Computer Science, University of WarwickComputer Science, University of Warwick
A little historyA little history
Message-passing libraries developed for a number of early distributed memory computers
By 1993 there were loads of vendor specific implementations
By 1994 MPI-1 came into being
By 1996 MPI-2 was finalized
30Computer Science, University of WarwickComputer Science, University of Warwick
The MPI programming modelThe MPI programming model
MPI standards -
MPI-1 (1.1, 1.2), MPI-2 (2.0)
Forwards compatibility preserved between versions
Standard bindings - for C, C++ and Fortran. Have seen MPI bindings for Python, Java etc (all non-standard)
We will stick to the C binding, for the lectures and coursework. More info on MPI www.mpi-forum.org
Implementations - For your laptop pick up MPICH (free portable implementation of MPI (http://www-unix.mcs.anl. gov/mpi/mpich/index.htm)
Coursework will use MPICH
31Computer Science, University of WarwickComputer Science, University of Warwick
MPIMPI
MPI is a complex system comprising of 129 functions with numerous parameters and variants
Six of them are indispensable, but can write a large number of useful programs already
Other functions add flexibility (datatype), robustness (non-blocking send/receive), efficiency (ready-mode communication), modularity (communicators, groups) or convenience (collective operations, topology).
In the lectures, we are going to cover most commonly encountered functions
32Computer Science, University of WarwickComputer Science, University of Warwick
The MPI programming modelThe MPI programming model
Computation comprises one or more processes that communicate via library routines and sending and receiving messages to other processes
(Generally) a fixed set of processes created at outset, one process per processor
Different from PVM
33Computer Science, University of WarwickComputer Science, University of Warwick
Intuitive Interfaces for sending and Intuitive Interfaces for sending and receiving messages receiving messages
Send(data, destination), Receive(data, source)
minimal interface
Not enough in some situations, we also need
Message matching – add message_id at both send and receive interfaces
they become Send(data, destination, msg_id), receive(data, source, msg_id)
Message_id• Is expressed using an integer, termed as message tag
• Allows the programmer to deal with the arrival of messages in an orderly fashion (queue and then deal with
34Computer Science, University of WarwickComputer Science, University of Warwick
How to express the data in the How to express the data in the send/receive interfacessend/receive interfaces
Early stages: (address, length) for the send interface
(address, max_length) for the receive interface
They are not always good The data to be sent may not be in the contiguous memory locations
Storing format for data may not be the same or known in advance in heterogeneous platform
Enventually, a triple (address, count, datatype) is used to express the data to be sent and (address, max_count, datatype) for the data to be received
Reflecting the fact that a message contains much more structures than just a string of bits, For example, (vector_A, 300, MPI_REAL)
Programmers can construct their own datatype
Now, the interfaces become send(address, count, datatype, destination, msg_id) and receive(address, max_count, datatype, source, msg_id)
35Computer Science, University of WarwickComputer Science, University of Warwick
How to distinguish messagesHow to distinguish messages
Message tag is necessary, but not sufficient
So, communicator is introduced …
36Computer Science, University of WarwickComputer Science, University of Warwick
CommunicatorsCommunicators
Messages are put into contexts
Contexts are allocated at run time by the system in response to programmer requests
The system can guarantee that each generated context is unique
The processes belong to groups
The notions of context and group are combined in a single object, which is called a communicator
A communicator identifies a group of processes and a communication context
The MPI library defines a initial communicator, MPI_COMM_WORLD, which contains all the processes running in the system
The messages from different process groups can have the same tag
So the send interface becomes send(address, count, datatype, destination, tag, comm)
37Computer Science, University of WarwickComputer Science, University of Warwick
Status of the received messagesStatus of the received messages
The structure of the message status is added to the receive interface
Status holds the information about source, tag and actual message size
In the C language, source can be retrieved by accessing status.MPI_SOURCE,
tag can be retrieved by status.MPI_TAG and
actual message size can be retrieved by calling the function MPI_Get_count(&status, datatype, &count)
The receive interface becomes receive(address, maxcount, datatype, source, tag, communicator, status)
38Computer Science, University of WarwickComputer Science, University of Warwick
How to express source and destination How to express source and destination
The processes in a communicator (group) are identified by ranks
If a communicator contains n processes, process ranks are integers from 0 to n-1
Source and destination processes in the send/receive interface are the ranks
39Computer Science, University of WarwickComputer Science, University of Warwick
Some other issuesSome other issues
In the receive interface, tag can be a wildcard, which means any message will be received
In the receive interface, source can also be a wildcard, which match any source
40Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)
41Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)
42Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)
43Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Calculating the size of the data to be send …
buf address of send buffer
count * sizeof (datatype) bytes of data
44Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)
45Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)
46Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Recv (buf, count, datatype, source, tag, comm, status)
Receive a message
buf address of receive buffer (var param)
count max no. of elements in receive buffer (>=0)
datatype of receive buffer elements
source process id of source process, or MPI_ANY_SOURCE
tag message tag, or MPI_ANY_TAG
comm communicator
status status object
47Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Init (int *argc, char ***argv)
Initiate a computation
argc (number of arguments) and argv (argument vector) are main program’s arguments
Must be called first, and once per process
MPI_Finalize ( )
Shut down a computation
The last thing that happens
48Computer Science, University of WarwickComputer Science, University of Warwick
MPI basicsMPI basics
First six functions (C bindings)
MPI_Comm_size (MPI_Comm comm, int *size)
Determine number of processes in comm
comm is communicator handle, MPI_COMM_WORLD is the default (including all MPI processes)
size holds number of processes in group
MPI_Comm_rank (MPI_Comm comm, int *pid)
Determine id of current (or calling) process
pid holds id of current process
49Computer Science, University of WarwickComputer Science, University of Warwick
#include "mpi.h" #include <stdio.h> int main(int argc, char *argv[]) { int rank, nprocs;
MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("Hello, world. I am %d of %d\n", rank, nprocs); MPI_Finalize(); }
MPI basics – a basic exampleMPI basics – a basic example
mpirun –np 4 myprog
Hello, world. I am 1 of 4
Hello, world. I am 3 of 4
Hello, world. I am 0 of 4
Hello, world. I am 2 of 4
50Computer Science, University of WarwickComputer Science, University of Warwick
MPI basics – send and recv example (1)MPI basics – send and recv example (1)
#include "mpi.h"#include <stdio.h> int main(int argc, char *argv[]){ int rank, size, i; int buffer[10]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (size < 2) { printf("Please run with two processes.\n"); MPI_Finalize(); return 0; } if (rank == 0) { for (i=0; i<10; i++) buffer[i] = i; MPI_Send(buffer, 10, MPI_INT, 1, 123, MPI_COMM_WORLD); }
51Computer Science, University of WarwickComputer Science, University of Warwick
MPI basics – send and recv example (2)MPI basics – send and recv example (2)
if (rank == 1) { for (i=0; i<10; i++) buffer[i] = -1; MPI_Recv(buffer, 10, MPI_INT, 0, 123, MPI_COMM_WORLD, &status); for (i=0; i<10; i++) { if (buffer[i] != i) printf("Error: buffer[%d] = %d but is expected to be %d\n", i, buffer[i], i); } } MPI_Finalize();}
52Computer Science, University of WarwickComputer Science, University of Warwick
MPI language bindingsMPI language bindings
Standard (accepted) bindings for Fortran, C and C++
Java bindings are work in progress
JavaMPIJava wrapper to native calls
mpiJavaJNI wrappers
jmpi pure Java implementation of MPI library
MPIJ same idea
Java Grande Forum trying to sort it all out
We will use the C bindings