a short introduction to pvm and mpi
DESCRIPTION
A Short Introduction to PVM and MPI. Philip Papadopoulos University of California, San Diego Department of CSE San Diego Supercomputer Center. Outline. What is message passing? Why do I care? “Hello-World” for message passing Level 0 Issues What are PVM and MPI? MPI Implementations - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/1.jpg)
A Short Introduction to PVM and MPI
Philip Papadopoulos
University of California, San Diego
Department of CSE
San Diego Supercomputer Center
![Page 2: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/2.jpg)
Outline
• What is message passing? Why do I care?• “Hello-World” for message passing• Level 0 Issues• What are PVM and MPI?• MPI Implementations• Inner-workings of PVM
![Page 3: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/3.jpg)
But First …
• Please ask questions at any time • Things will be more interesting when you do • I’d rather answer questions.• Got it?
![Page 4: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/4.jpg)
What is Message Passing? Why Do I Care
• Message passing allows two processes to:– Exchange information– Synchronize with each other
• Message passing is “Sockets for Dummies”• So?
– Applications need much more power and or memory than a single machine can deliver
– Large parallel programs need well-defined mechanisms to coordinate and exchange info
![Page 5: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/5.jpg)
Message Passing in the HPC World
• Large scientific applications scale to 100’s of processors (routinely) and 1000’s of processors (in rare cases)– Climate/Ocean modeling– Molecular physics (QCD, dynamics, materials, …)– Computational Fluid Dynamics– And many more …
• Message passing and SPMD programming style have been key infrastructure enablers– Why not shared memory?
![Page 6: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/6.jpg)
How Does Message Passing Differ from Socket Programming
• Socket programming (OS 101) is a type of message passing– Open, bind, connect, accept too arcane– sendto, recvfrom (UDP) not reliable– Good point-to-point, multicast, broadcast are limited
• Message passing usually means (pt-2-pt)– Low latency– High performance– Reliable, in-sequence delivery+ Group operations
+ Broadcast+ Reduce (eg, sum an array whose parts are held in different processes)+ Group synchronize (barrier)
![Page 7: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/7.jpg)
Hello World – MP Style
Process A• Initialize• Send(B, “Hello World”)• Recv(B, String)• Print String
– “Hi There”
• Finalize
Process B• Initialize• Recv(A, String)• Print String
– “Hello World”
• Send(A, “Hi There”)• Finalize
![Page 8: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/8.jpg)
Message Addressing
• Identify an endpoint • Use a tag to distinguish a particular message
– pvm_send(dest, tag)– MPI_SEND(COMM, dest, tag, buf, len, type)
• Receiving– recv(src, tag); recv(*,tag); recv (src, *); recv(*,*);
• What if you want to build a library that uses message passing? Is (src, tag) safe in all instances?
![Page 9: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/9.jpg)
Level O Issues
• Basic Pt-2-Pt Message Passing is straightforward, but how does one …– Make it go fast
• Eliminate extra memory copies• Take advantage of specialized hardware
– Move complex data structures (packing)– Receive from one-of-many (wildcarding)– Synchronize a group of tasks– Recover from errors– Start tasks– Build safe libraries– Monitor tasks– …
![Page 10: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/10.jpg)
MPI-1 addresses many of the level 0 issues
(but not all)
![Page 11: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/11.jpg)
A long history of research efforts in message passing
• P4• Chameleon• Parmacs• TCGMSG• CHIMP• NX (Intel i860, Paragon)• PVM• …
And these begot MPI
![Page 12: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/12.jpg)
So What is MPI
• It is a standard message passing API – Specifies many variants of send/recv
• 9 send interface calls– Eg., synchronous send, asynchronous send, ready send,
asynchronous ready send
– Plus other defined APIs• Process topologies• Group operations• Derived Data types• Profiling API (standard way to instrument MPI code)
– Implemented and optimized by machine vendors– Should you use it? Absolutely!
![Page 13: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/13.jpg)
So What’s Missing in MPI-1?
• Process control– How do you start 100 tasks?– How do you kill/signal/monitor remote tasks
• I/O– Addressed in MPI-2
• Fault-tolerance– One MPI process dies, the rest eventually hang
• Interoperability– No standard set for running a single parallel job across
architectures (eg. Cannot split computation between x86 Linux and Alpha)
![Page 14: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/14.jpg)
What is PVM?
• Resource Management– add/delete hosts from a virtual machine
• Process Control– spawn/kill tasks dynamically
• Message Passing– blocking send, blocking and non-blocking receive,
mcast
• Dynamic Task Groups– task can join or leave a group at any time
• Fault Tolerance– VM automatically detects faults and adjusts
Heterogeneous Virtual Machine support for:
![Page 15: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/15.jpg)
Popular PVM Uses
• Poor man’s Supercomputer– Beowulf (PC) clusters, Linux, Solaris, NT– Cobble together whatever resources you can get
• Metacomputer linking multiple Supercomputers– ultimate performance: eg. have combined nearly 3000
processors and up to 53 supercomputers
• Education Tool– teaching parallel programming– academic and thesis research
![Page 16: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/16.jpg)
PVM In a Nutshell
• Each host (could be an MPP or SMP) runs a PVMD
• A collection of PVMDs define a virtual machine
• Once configured, tasks can be started (spawned), killed, signaled from a console
• Basic message passing• Performance is OK, But API Semantics limit
optimizations
![Page 17: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/17.jpg)
MPI Design Goals
• Make it go as fast as possible• Operate in a serverless (daemonless environment)• Specify portability but not interoperability• Standardize best practices of research environments• Encourage competing implementations• Enable the building of safe libraries• The “assembly language” of Message Passing
![Page 18: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/18.jpg)
MPI in the Marketplace
• MPICH Mississippi-Argonne open source– A top-quality reference implementation– http://www-unix.mcs.anl.gov/mpi/mpich/
• High Performance Cluster MPIs– AM-MPI, FM-MPI, PM-MPI, GM-MPI, BIP-MPI
• 10us latency, 100MB/sec on Myrinet
• Vendor supported MPI– SGI, Cray, IBM, Fujitsu, Sun, Hitachi, …
• MPI Vendors– ScaMPI, MPI Soft-Tech, Genias, …
![Page 19: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/19.jpg)
Comparisons
• interoperability• fault tolerance• heterogeneity• resource control• dynamic model
• MPP performance• many communication
methods• topology• static model (SPMD)
PVM MPI
BestDistributed Computing
BestLarge Multiprocessor
Each API has its unique strengths
Evaluate the needs of your application then choose
![Page 20: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/20.jpg)
PVM? MPI?
• PVM is easy to use, especially on a network of workstations. Its message passing API is relatively simple
• MPI is a standard, has a steeper learning curve and doesn’t have a standard way to start tasks – MPICH does have an “mpirun” command
• If building a new scalable, production code, should use MPI (widely supported now)
• If experimenting with message passing, are interested in dynamics, use PVM.
![Page 21: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/21.jpg)
Some Inner Workings of PVM
• Every process has a unique, virtual-machine-wide, identifier called a task ID (TID)
• PVMDs run on each host and act as points of presence
• A single master PVMD disseminates current virtual machine configuration and holds something called the PVM mailbox.
• The VM can grow and shrink around the master (if the master dies, the machine falls apart)
• Dynamic configuration is used whenever practical
![Page 22: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/22.jpg)
host (one per IP address)pvmd - one PVM daemon per host
pvmd
pvmd
pvmd
How PVM is Designed
libpvm - task linked to PVM library
pvmds fully connected using UDP
task task task
Unix Domain Socketsinner host messages
OS network interface
task task task
Shared Memory
shared memory multiprocessor
P0 P1 P2
task task task
distributed memory MPP
task task tasktask task task
internal interconnect
tcpdirect connect
![Page 23: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/23.jpg)
PVM Tasks Can Use Multiple Transports
• Uses sockets mostly– Unix-domain on host– TCP between tasks on different hosts– UDP between Daemons (custom reliability)
• SysV Shared Memory Transport for SMPs– Tasks still use pvm_send(), pvm_recv()
• Native MPP– PVM can ride atop a native MPI implementation
![Page 24: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/24.jpg)
• PVM uses tid to identify pvmd, tasks, groups• Fits into 32-bit integer
• S bit addresses pvmd, G bit forms mcast address• Local part defined by each pvmd - eg. for PGON
Task ID (tid)
18 bits12 bitsS G host ID local part
12 bitsS G host ID process node ID
11 bits7 bits
4096 hosts 2048 nodeseach with
![Page 25: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/25.jpg)
Things to note about PVM Addressing
• Addresses contain routing information by virtue of the host part– Transport selection at runtime is simplified
• Bit-mask + table lookup
• Moving a PVM task is very difficult– Condor (U. Wisc) with effort
• Group/multicast bit makes it straightforward to implement multicast within pt-2-pt infrastructure
![Page 26: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/26.jpg)
Communication Context in MPI
• MPI Wraps together Group and Context into a single entity called a Communicator
• MPI program starts with one Communicator– MPI_COMM_WORLD
• All communicators are derived from this• Library implementers are passed a communicator
(group) and derive a new communicator -> Safe comm envelope
• Messages have a 3-tuple to identify them– (comm, src, tag)– Comm cannot be wildcarded
![Page 27: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/27.jpg)
Communication Context in PVM
One task gets and distributes a new globally unique context
newcontext = pvm_newcontext( );broadcast newcontext to all tasksor put it in persistent message
oldcontext = pvm_setcontext( newcontext));
newcontext = pvm_setcontext( oldcontext));pvm_freecontext( newcontext);
Safe communicationfor your application or library
All tasks switch to safe context
Be aware:Unlike MPI, the current
Context is not explicit in the Send/recv API
![Page 28: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/28.jpg)
Receiving a message (library viewpoint)
• Messages arrive into a process and must be discriminated– Message header contains, src, tag, context, length, flags– Library “buffers” incoming messages until task receives
• Must be match available messages with match criteria
– Tasks may ask to process messages in a different order than they are actually received
• (MPI has many variants of send/recv to handle various cases for optimization)
• PVM allows message handlers to that when a particular match criteria occurs, a subroutine is called
![Page 29: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/29.jpg)
Message Handlers
Source,tag,context
VM control messages
User defined handlers
Handlerfunction
Incoming mesg.
Data orControl messages
Activemesg.
![Page 30: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/30.jpg)
Persistent Messages
• Tasks can store and retrieve messages by name
• Distributed information database for
dynamic programs –provides rendezvous, attachment, groups, many uses.
• Multiple messages per “name” possible
index = pvm_putinfo( name, msgbuf, flag)
pvm_recvinfo( name, index, flag )
pvm_delinfo( name, index, flag )
pvm_getmboxinfo( pattern, #names, array of struct )
![Page 31: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/31.jpg)
Persistent Messages
Messagebox
Message box storage is coordinated across pvmds
Key: message
Task 2
Task stores informationeg. How to contact application,or Network load forecast, etc.
Task 1
Later, another task can requestthis message and receive it normally
Task can specify when and who can replace a messageit has placed in the message box.
![Page 32: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/32.jpg)
Monitoring Performance
• PVM allows messages to be “traced” so that flows can be debugged
• MPI provides a standard profiling interface to build profiling tools– Nupshot, Jumpshot, MPITrace, VaMPIr, …
• XPVM (screen shot next slide) provides visual information about machine utilization, flows, configuration
![Page 33: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/33.jpg)
XPVM Screen Shot
![Page 34: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/34.jpg)
Wrapping Up
• MPI has a very rich messaging interface and designed for efficiency– http://www-unix.mcs.anl.gov/mpi
• PVM has a simple messaging interface +– Process control, Interoperability, Dynamics– http://www.epm.ornl.gov/pvm
• Perform comparably when on Ethernet• MPI outperforms when on MPP• Both are still popular, but MPI is an accepted
community standard with many support chains.
![Page 35: A Short Introduction to PVM and MPI](https://reader030.vdocuments.net/reader030/viewer/2022012908/5681584e550346895dc5a8e3/html5/thumbnails/35.jpg)
Questions?