message-passing computing
DESCRIPTION
Message-Passing Computing. Dr. Tim McGuire Sam Houston State University ACET 2002 Corpus Christi, TX. Motivation. So, you attended my 2000 talk in Austin* (or read a similar article), built a Beowulf cluster from castoff computers, and now you’re wondering what you can do with it, right? - PowerPoint PPT PresentationTRANSCRIPT
Message-Passing Computing
Dr. Tim McGuireSam Houston State University
ACET 2002Corpus Christi, TX
Motivation So, you attended my 2000 talk in
Austin* (or read a similar article), built a Beowulf cluster from castoff computers, and now you’re wondering what you can do with it, right?
Well, that’s the motivation for this talk*T. McGuire, “Building a Low-Cost Supercomputer,” ACET2000, Austin, Texas, September 2000.
The Target Machine For the purpose of this talk, we will look at
Beowulf clusters All the techniques we discuss can also be
extended to a network of workstations The differences are that a Beowulf cluster
uses: dedicated processors (rather than scavenging
cycles from idle workstations) a private system area network (enclosed SAN
rather than exposed LAN)
How Does One Program a Beowulf? The short answer is Message Passing, a
technique originally developed for distributed computing
The Beowulf architecture means that message passing is more efficient -- it doesn't have to compete with other traffic on the net
Other techniques are being explored – Java is a popular topic at this time
A Typical Uniprocessor System
Consists of a processor executing a program stored in main memory
Types of Parallel Computers
Two principal types: Shared memory multiprocessor Distributed memory multi-
computer
Shared Memory Multiprocessor System
Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module - so-called shared memory configuration
Shared memory multiprocessor system
Any memory location can be accessible by any of the processors.
A single address space exists, meaning that each memory location is given a unique address within a single range of addresses.
Generally, shared memory programming more convenient although it does require access to shared data to be controlled by the programmer (using critical sections etc.)
Message-Passing Multicomputer
Complete computers connected through an interconnection network:
Message Passing Software PVM (parallel virtual machine) was the
first widely used API Developed at Oak Ridge National Laboratory
(late 1980s) Very widely used (free) Berkeley NOW (network of workstations) project Has task scheduling and other advanced
features http://www.epm.ornl.gov/pvm/
More Recent Message Passing Work MPI (Message-passing Interface)
Standard for message passing libraries Defines routines but not implementation Has adequate features for most parallel
applications Version 1 released in 1994 with 120+ routines
defined Version 2 now available
Both PVM and MPI provide a set of user-level libraries for message passing with normal programming languages (C, C++, Fortran)
Basics of Message-Passing
Basics of Message-Passing Programming using user-level message passing libraries:
Two primary mechanisms needed:1. A method of creating separate processes
for execution on different computers2. A method of sending and receiving
messages
Single Program Multiple Data (SPMD) Model
Different processes merged into one program. Within program, control statements select different parts for each processor to execute. All executables started together - static process creation.
Source file
Executables Compile to suit processor
Processor 0 Processor n-1
Basic MPI model
Basic “point-to-point” Send and Receive Routines
Passing a message between processes using send() and recv() library calls:
MPI (Message Passing Interface)
Standard developed by group of academics and industrial partners to foster more widespread use and portability
Defines routines, not implementation Several free implementations exist
A Simple MPI Example The first C program most of us saw
was the “Hello, World!” program in K&R
We’ll look at a variant that makes some use of multiple processes to have each process send a greeting to another process
We will assume we have p processes identified by their rank 0, 1 …, p-1
First MPI Program/* From Peter Pacheco, University of San Francisco */#include <stdio.h>#include “mpi.h”int main(int argc, char *argv[]) {
int myrank; /* rank of process */int p; /* number of processes */
int source; /* rank of sender */ int dest ; /* rank of receiver */ int tag = 0; /* tag for messages */ char message[100];/* storage for message */ MPI_STATUS status;/* receive */
/* Start up MPI */MPI_Init(&argc, &argv);
First MPI Program, Cont’d /* Find out process rank */ MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* Find out number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &p); if (my_rank != 0) { /* Create message */ sprintf(message, "Greetings from process %d!", my_rank); dest = 0; /* Use strlen+1 so that '\0' gets transmitted */ MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { /* my_rank == 0 */ for (source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } /* end for */ } /* end if */
First MPI Program, Cont’d /* Shut down MPI */ MPI_Finalize();} /* main */
The details of compilation and execution depend on the system you’re using
On Bubbawulf: gcc –o greetings greetings.c –lmpi
To run with two processors: mpirun –np 2 greetings
Running the first program When the program is compiled and run with 4
processes, the output should be:Greetings from process 1!Greetings from process 2!Greetings from process 3!
This is an example of a special type of MIMD programming called SPMD (single-program, multiple-data) programming
Different processes execute different statements by branching within the program based on their process ranks
MPI The program consists entirely of C
statements MPI is simply a library of definitions
and functions (C or Fortran)
General MPI Programs Every MPI program contains the directive
#include “mpi.h”which includes the definitions and declarations necessary for compiling an MPI program
MPI uses a consistent scheme for identifiers – all begin with “MPI_”
MPI uses communicators (collections of processes that can send messages to each other) – MPI_COMM_WORLD is the default
Often 1 process per processor, but not necessarily
MPI Program Skeleton...#include "mpi.h"...int main(int argc, char* argv[]) {
... /* No MPI functions called before this */ MPI_Init(&argc, &argv); /* initialize MPI system */
... /* No MPI functions called after this */ MPI_Finalize(); /* clean up MPI memory, etc. */
...} /* main */
Essential MPI Functions MPI_Comm_size()
Used to find out how many processes are involved in the execution of a program
MPI_Comm_rank() lets a process find out its rank Essential since we are using SPMD
MPI_Send() and MPI_Recv() are used to accomplish the actual message passing
The Killer App Every paradigm shift in computing needs
a motivation The typical applications for parallel and
distributed processing are not as accessible to the general undergraduate Large matrix operations, etc
I propose a simple yet interesting application, using synchronous computations
Cellular Automata The problem space is divided into cells. Each cell can be in one of a finite number
of states. Cells affected by their neighbors according
to certain rules, and all cells are affected simultaneously in a “generation.”
Rules re-applied in subsequent generations so that cells evolve, or change state, from generation to generation.
Heat Distribution Problem An area has known temperatures along each of its
edges. Find the temperature distribution within. Divide area into fine mesh of points, hi,j.
Temperature at an inside point taken to be average of temperatures of four neighboring points. Convenient to describe edges by points.
Temperature of each point by iterating the equation:H i ,j = (Hi -1, j +Hi +1 ,j + Hi ,j -1 + Hi ,j +1 )/4(0 < i < n, 0 < j < n) for a fixed number of iterations or until the difference between iterations less than some very small amount.
Heat Distribution Problem
Parallel Codew = x = y = z = initial tempfor (iteration = 0; iteration < limit; iteration++) {
g = 0.25 * (w + x + y + z);send(&g, Pi-1,j); /* non-blocking sends */send(&g, Pi+1,j);send(&g, Pi,j-1);send(&g, Pi,j+1);recv(&w, Pi-1,j); /* synchronous recvs */recv(&x, Pi+1,j);recv(&y, Pi,j-1);recv(&z, Pi,j+1);
} Important to use send()s that do not block while
waiting for the recv()s; otherwise the processes would deadlock, each waiting for a recv() before moving on - recv()s must be synchronous and wait for the send()s.
The Game of Life Most famous cellular automata is the
“Game of Life” devised by John Conway (Scientific American, October 1970)
Also good assignment for graphical output, if available
Board game - theoretically infinite two-dimensional array of cells.
Each cell can hold one “organism” and has eight neighboring cells, including those diagonally adjacent. Initially, some cells occupied.
The Rules of Life1. Every organism with two or three neighboring
organisms survives for the next generation.2. Every organism with four or more neighbors dies
from overpopulation.3. Every organism with one neighbor or none dies
from isolation.4. Each empty cell adjacent to exactly three
occupied neighbors will give birth to an organism.
These rules were derived by Conway “after a long period of experimentation.”
How to Solve Life Each block can be represented as a
process Initialization can be done by giving
some blocks one organism and other blocks none. This can be done randomly or using a heuristic approach.
Outline of the Codedo {
iteration++current_neighbors = 0;send(current value – 0 or 1 – to all neighbors); recv(current values of all neighbors);current_neighbors = sum of received values;if (current_neighbors > 4) organism = 0; /* Dead from overcrowding */else if (current_neighbors < 1) organism = 0; /* Dead from isolation */else if (current_neighbors == 3)organism = 1; /* new organism created */
} while((!converged() || (iteration < limit));
Some Other Fun Examples Foxes and Rabbits
Rabbits move around happily (reproducing) while foxes eat any rabbits they come across
Also based on a 2-D board Sharks and Fishes
Ocean modeled as a 3-D array of cells Each cell holds one fish or one shark
Serious Applications for Cellular Automata
Diffusion of gases Airflow across an airplane wing Erosion/movement of sand at a
beach Biological growth
IEEE Task Force on Cluster Computing
Aim to foster the use and development of clusters
Has been in operation since 1999 Main home page: http://www.ieeetfcc.org
Conclusions
• Cluster computing can be effectively taught at the undergraduate level
• Excellent and fun examples of applications exist
Quote: Gill wrote in 1958(quoting papers back to 1953):
“ … There is therefore nothing new in the basic idea of parallel programming, but only its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation.”
Gill, S. (1958), “Parallel Programming,” The Computer Journal (British) Vol. 1, pp. 2-10.