pbspro advanced information systems & technology advanced campus services prepared by chao...
TRANSCRIPT
![Page 1: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/1.jpg)
PBSpro Advanced
Information Systems & TechnologyAdvanced Campus Services
Prepared by Chao “Bill” Xie, PhD student Computer ScienceFall 2005
![Page 2: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/2.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
2
Syllabus Environment Variables Checkpointing
![Page 3: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/3.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
3
Environment variables Environment Variables
Taken from the user’s environment Created by PBS Created by users
All names start with “PBS_” Some names start with “PBS_O_”
Indicating the variable is from the job’s originating environment
![Page 4: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/4.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
4
Important variables PBS_O_HOME
Value of HOME from submission environment PBS_O_HOST
Host name on which the qsub command was executed PBS_O_PATH
Value of path from submission environment PBS_O_QUEUE
original queue name to which the job was submitted PBS_O_SHELL
Value of shell from submission environment PBS_O_SYSTEM
Operation system name where qsub was executed PBS_O_WORKDIR
Absolute path of directory where qsub was executed
![Page 5: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/5.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
5
Important variables (cont1)
PBS_DEFAULT Name of the default PBS server
PBS_EVIRONMENT Indicate job types: PBS_BATCH or
PBS_INTERACTIVE PBS_JOBID
Job identify assigned to the job or job array PBS_JOBNAME
Job name supplied by the user PBS_MOMPORT
Port number on which this job’s MOMs will communicate
![Page 6: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/6.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
6
Important variables (cont2)
PBS_NODEFILE Filename containing a list of nodes assigned to the
job PBS_NODENUM
Logical node number of this node allocated to the job
PBS_QUEQUE Name of the queue from which the job is executed
PBS_TASKNUM Tasks (process) number for the job on this node
TMPDIR Job-specific temporary directory for this job
![Page 7: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/7.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
7
Checkpointing Two methods of checkpoint / restart:
OS-specific method SGI IRIX and Cray UNICOS
Generic site-specific method Specify the checkpointing directory
“-C path” command line option to pbs_mom PBS_CHECKPOINT_PATH environment variable “$checkpoint_path path” option in MOM’s
config file default value
![Page 8: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/8.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
8
Checkpointing (cont) Manually checkpointing a job
Use the qhold command Checkpointing jobs during PBS shutdown
Append the -t immediate option to the qterm statement in the PBS start/stop script
Suspending/checkpointing multi-node jobs Save the complete session state in a file A open socket will cause the operation to fail
![Page 9: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/9.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
9
Site-specific method Modify file mom_priv/config “periodic” job checkpoint action (during job
execution) $action checkpoint TIME_OUT SCRIPT_PATH
ARGS [...] Checkpoint just before the job is to be
terminated $action checkpoint_abort TIME_OUT
SCRIPT_PATH ARGS [...] Job restart action
$action restart TIME_OUT SCRIPT_PATH ARGS [...]
![Page 10: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/10.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
10
Site-specific method (cont) $restart_background (true|false)
A boolean flag that modifies how MOM performs a restart “false” (the default), MOM runs the restart operation and
waits for the result “true”, restart operations are done by a child of MOM
which only returns when all the restarts for all the local tasks of a job are done, while the parent (main) MOM continue processing without being blocked
$restart_transmogrify (true|false) A boolean flag that controls how MOM launches the
restart script/program “false” (the default), MOM will run the restart script and
block until the restart operation is complete “true”, MOM will run the restart script/program in such a
way that the script will “become” the task it is restarting.
![Page 11: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/11.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
11
Specify checkpoint in job “-c interval” option defines the checkpoint interval
(in minutes) The interval argument is specified as:
n No checkpointing is to be performed. s Checkpointing is to be performed only when the server
executing the job is shutdown. c Checkpointing is to be performed at the default
minimum time for the Server executing the job. c=minutes Checkpointing is to be performed at an
interval of minutes
u Checkpointing is unspecified, thus resulting in the same behavior as “s”.
If “-c” is not specified, the checkpoint attribute is set to the value “u”.
qsub –c c=10 myjob
![Page 12: PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649d0a5503460f949dc0ff/html5/thumbnails/12.jpg)
Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services
12
References PBS Professional 7 Quick Start PBS Professional 7 User Guide PBS Professional 7 Administration
Guide www.pbspro.com