© 2007 ibm corporation ibm global engineering solutions ibm blue gene/p job submission
TRANSCRIPT
IBM Blue Gene/P System Administration
Job submission
Basic procedure1. Create a block
2. Allocate a block
3. Boot a block
4. Run a job
5. Free the block, or run another job
IBM Blue Gene/P System Administration
Block creation
Two ways of creating blocks Block builder mmcs_db_console
The use of block builder is recommended Block builder is capable to create any available blocks Block builder is a lot easier to use…
IBM Blue Gene/P System Administration
Block creation
Block builder Available via Navigator Able to create any block with a valid block size
– 16, 32, 64, 128 and 256 nodes (mesh)– 512 and multiples of 512 nodes (torus/mesh)
Starting card– 16 node : J00, J01– 64 node : N00, N02, N04, N08, N10, N12 or N14– 128 node : N00, N04, N08 or N12– 256 node : N00 or N08
IBM Blue Gene/P System Administration
Block creation
mmcs_db_console Able to create most of the available blocks
Provides a set of commands to create block
– genblock : a base partition
– genblocks : each base partition on the system
– gensmallblock : a sub-base partition
– genBPblock : a set of base partitions
– genfullblock : the entire system
Use Navigator for pass-through or split cables
IBM Blue Gene/P System Administration
Block deletion
Available via mmcs_db_console mmcs$ delete bgpblock R00-M0
type ‘help delete’ in the mmcs shell prompt for usage
Block deletion is not available via Navigator’s GUI mmcs_db_console within the Navigator is available
IBM Blue Gene/P System Administration
Exercise
Create a block from the block builder Create a block from the mmcs_db_console Delete a block from the mmcs_db_console
IBM Blue Gene/P System Administration
Job modes
There are three job modes, virtual node mode, SMP mode, and Dual Mode
MPI Ranks (processes) per node & Threads per process: VNM 4 processes/node, 1 thread/process
SMP 1 process/node, 4 threads/process
Dual 2 processes/node, 2 threads/process
CPU 0
Rank 0
CPU 1
Rank 1
CPU 2
Rank 2
CPU 3
Rank 3
Virtual Node Mode
CPU 0
Rank 0
CPU 1
thread
CPU 2
thread
CPU 3
thread
SMP Mode
CPU 0
Rank 0
CPU 1
thread
CPU 2
Rank 1
CPU 3
thread
Dual Mode
IBM Blue Gene/P System Administration
Job submission
Ways to submit a job mmcs_db_console
mpirun
LoadLeveler
IBM Blue Gene/P System Administration
Job submission
mmcs_db_console A console for the Midplane Management Control System (MMCS) Used to configure and allocate blocks of compute nodes and I/O
nodes and run programs on the BG/P system. Basically for administrator use Requires an access to the service node Environmental variable needed to be set
– /etc/profile.d/bgp.sh Caveat when submitting jobs from the console
– No stdin support– stdout & stderr sent to files
IBM Blue Gene/P System Administration
Job submission
mmcs_db_console1. $ cd /bgsys/driver/ppcfloor/bin
2. $ ./mmcs_db_console
3. mmcs$ allocate_block R00-M0
4. mmcs$ boot_block
5. mmcs$ submit_job /bghome/test/hello.rts /bghome/test
6. mmcs$ free R00-M0
7. mmcs$ quit
type ‘help’ in the mmcs shell prompt for available commands
IBM Blue Gene/P System Administration
Job submission
mmcs commands allocate_block : mark the block as allocated, but does not boot it boot_block : initialize, load and start block resource submit_job : starts an executable running on the currently
selected block free : release the resources associated with the block ID
IBM Blue Gene/P System Administration
Job submission
mpirun Launches jobs on the BG/P hardware and acts as a job monitor
– mpirun continually monitors status of the job, terminates when job is done– Transparently forwards stdin & receives stdout and stderr
Acts as a gateway for debuggers such as gdb and TotalView Each job requires a partition
– Can be allocated on the fly (-np or –shape)– Or used predefined partitions
Can boot partitions from their initial state– Disable this feature with –noallocate– User should verify no overlapping busy hardware
Can optionally not destroy booted partitions with -nofree
IBM Blue Gene/P System Administration
Job submission
mpirun$ mpirun –partition R00-M0 –mode SMP –cwd
/bghome/test –exe /bghome/test/hello.rts
partition : specify which block to use
mode : specify execution processor mode
cwd : specify currently working directory
exe : specify the program to run
type mpirun –h for available options
IBM Blue Gene/P System Administration
Job submission
LoadLeveler Allocates machine resources to run jobs
Scheduling of jobs depends on the availability of resources within the system
A user submits a job using a job command file
Maximize the efficiency of the cluster by maximizing the utilization of resources
IBM Blue Gene/P System Administration
Job submission
LoadLevelersome of the tasks can be performed:
Choosing the next job to run Examining the job requirements Collecting available resource in its cluster Dispatching the job to the selected machine Controlling running jobs Create reservations and schedule jobs to run in the reservations Job preemption to enable high priority jobs to run immediately Fair share scheduling to automatically balance resources among users or
groups of users Co-scheduling to enable several jobs to be scheduled to run at the same
time
IBM Blue Gene/P System Administration
Example code
1. Write simple hello world:
/* Hello World program */
#include<stdio.h>
void main(void)
{
printf("Hello World!\n");
}
2. Compile the program:
/bgsys/drivers/ppcfloor/comm/bin/mpicc -o hello hello.c
3. Run the program:
Assuming that the program lives in /bgsys/apps and you want the results (STDOUT and STDERR) to be written to /bgsys/apps/results:
At the mmcs_db_console prompt:
mmcs$ submit_job /bgsys/apps/hello /bgsys/apps/results/
IBM Blue Gene/P System Administration
Exercise
Submit a job using mmcs_db_console Free the block after the job finishes
Submit a job using mpirun Submit a job using LoadLeveler
IBM Blue Gene/P System Administration
Job termination
mmcs_db_console killjob, kill_job
1. mmcs$ killjob R00-M0 124
2. mmcs$ wait_job
Terminating a job can take a while default timeout is 5 minutes
IBM Blue Gene/P System Administration
Job termination
mpirunControl-C
– mpirund will do a cleanup
– Do not send multiple control-C
• Second control-C will force termination
• Third control-C is almost similar to kill -9, which may cause block state to be left in limbo
IBM Blue Gene/P System Administration
Scripting
A list of commands for mmcs_db_console can be written into a file for a scripting usage
$ mmcs_db_console < script_file
script_file is a simple ascii text file with a list of commands for mmcs_db_console
IBM Blue Gene/P System Administration
Scripting
Sample script_fileCreate and test several blocks
$ cat script_file
genblock R00-M0 R00-M0 64
allocate R00-M0
free R00-M0
genblock R00-M1 R00-M1 64
allocate R00-M1
free R00-M1
…
quit
IBM Blue Gene/P System Administration
Bridge API
Public API used by job schedulers LoadLeveler, SLURM, Altair PBS Pro, Platform LSF, Cobalt Used by mpirun too
Has Interfaces to manage various Blue Gene resources Create, destroy, query logical constructs such as jobs and partitions Query physical entities such as midplanes, node cards, switches, and
cables Essentially a thin abstraction layer of the database
Requires a polling model to obtain machine state, example: Grab a snapshot of the machine state Create a partition based on free resources Boot partition Poll partition state until it is INITIALIZED