Download - Advanced UPPMAX usage More SLURM
More SLURMAdvanced UPPMAX usage
Using Bash to manage jobs
Job efficiency
More SLURM and other advanced UPPMAX techniques
● A closer look at SLURM● GPUs on Snowy● Jobstats — our mutual friend in the fight for
efficiency● Advanced job submission
SLURM
● Free, popular, lightweight ● Open source:
https://github.com/SchedMD/slurm
● UPPMAX Slurm user guide: https://www.uppmax.uu.se/support/user-guides/slurm-user-guide/
More on sbatch
● A recap:● sbatch -A snic2021-1-123 -t 10:00 -p core -n 10 myjob.sh
Slurm batch
Project name Maximum runtime
“partition”(“job type”)
# cores job script
More on time limits● -t dd-hh:mm:ss
● 0-00:10:00 = 00:10:00 = 10:00 = 10
● 0-12:00:00 = 12:00:00
● 3-00:00:00 = 3-0
● 3-12:10:15
Recall from “Intro to UPPMAX”● Q: When you have no idea how long a program
will run, what should you book?− A: very long time, e.g. 10-00:00:00
● Q: When you do have an idea of how long a program should run, what should you book?− A: overbook by 150%
Recall from “Intro to UPPMAX”● Q: When you have no idea how long a program
will run, what should you book?− A: very long time, e.g. 10-00:00:00
● Q: When you do have an idea of how long a program will run, what should you book?− A: overbook by 150%
More on partitions● -p core
● The default● < 20 cores on Rackham● < 16 cores on Snowy or Bianca● A script or program written without any thought
to parallelism will use 1 core
Quick testing● The “devel” partition
− 2 nodes
− Up to 1 hour in length
− Only 1 at a time
− -p devcore, -p devel
● High-priority short jobs− 4 nodes
− Up to 15 minutes
− --qos=short
● Interactive jobs− Up to 12 hours
− Handy for debugging a script by executing it manually line by line.
When a job goes wrong● scancel
− <jobid>
− -u username — to cancel all your jobs− -t <state> — cancel pending or running jobs− -n <name> — cancel jobs with name− -i — asks for confirmation
Parameters in job script or on command line?
● Command line parameters override script parameters● Typical script maybe:
#!/bin/bash -l #SBATCH –A snic2021-22-606 #SBATCH –p core #SBATCH –n 1 #SBATCH –t 24:00:00
● Just a quick test: ● $ sbatch -p devcore -t 00:15:00 job.sh
Memory in core or devcore jobs● -n X
● On Rackham: get 6.4 GB per core● On Snowy/Bianca: get 8 GB per core
● Slurm reports available memory on starting an interactive job
More flags● -J jobname
● Email:− --mail-type=BEGIN,END,FAIL,TIME_LIMIT_80
− --mail-user Don’t use. Set your email in SUPR correctly.
● Output redirection:− --output=my.output.file
− --error=my.error.file
More flags● Memory
− -C thin / -C 128GB
− -C fat / -C 256GB / -C 1TB
● Dependencies: --dependency
● Job array: --array
● More at https://slurm.schedmd.com/sbatch.html − Or just man batch
− (though not all work on all systems!)
GPU Nodes on Snowy● Nodes with 1 Nvidia T4 ● Available to everyone, priority to groups that paid for
them
#SBATCH -M snowy#SBATCH --gres=gpu:1 or --gpus=1#SBATCH --gpus-per-node=1
● There is a system installation of CUDA v 11, other versions available via modules
● https://www.uppmax.uu.se/support/user-guides/using-the-gpu-nodes-on-snowy/
Time for a break?● That’s enough about sbatch● Next up: monitoring jobs and job efficiency
Monitoring● jobinfo — a wrapper around squeue
− jobinfo -u username
− jobinfo -A snic2021-22-606
● Can also use squeue directly
Priority● Roughly:
− First job of the day gets elevated priority− Other normal jobs run in order of submission
(subject to scheduling)− Projects exceeding allocation get successively
lower priority category− Bonus jobs run after higher priority categories
Priority● In practice:
− submit early — run early. − Bonus jobs always run eventually, sometimes wait
until night or weekend.
● In detail: − https://www.uppmax.uu.se/support/faq/running-jobs
-faq/your-priority-in-the-waiting-job-queue/
Job efficiency● jobstats — our mutual friend in the fight for
productivity− Only works for jobs > 5-15 minutes in length− -r — check running jobs− -A <project> — check all recent jobs for a
project− -p — produce CPU & memory usage plot− -M <cluster> — check jobs on other cluster
Jobstats exercise● Generate jobstats plots for your jobs
− First, find some job id’s from this month− $ finishedjobinfo -m <yourusername>
− Note the job id’s from some interesting jobs. − Generate the images− $ jobstats -p id1 id2 id3
● Look at the images. I have put some interesting ones in /proj/introtouppmax/labs/moreslurm/jobstatsplots
− $ eog *.png &
Jobstats plots● Which of the plots in labs/moreslurm/jobstatsplots:− Show good CPU/memory usage?− Show a job that needs a fat node?
Time for a break?● Next is advanced job submission
Multicore jobs● (Efficient) multicore jobs need either high
memory utilisation or multiple execution threads. Either:− Job script launches multiple programs− One program runs multithreaded
Multithreaded script#!/bin/bash -l#SBATCH -A snic2021-22-606#SBATCH –p core#SBATCH –n 4#SBATCH -t 00:15:00#SBATCH -J 4commands
cd /proj/introtouppmax/labs/moreslurm work.sh 1 10000000 & work.sh 2 15000000 & work.sh 3 20000000 & work.sh 4 10000000 & wait
Multithreaded program● Program can mention “OpenMP”, “MPI”, “pthreads”, or other parallel
programming technologies
#!/bin/bash -l#SBATCH -A snic2021-22-606#SBATCH –p core#SBATCH –n 4#SBATCH -t 00:15:00#SBATCH -J multithreaded
cd /proj/introtouppmax/labs/moreslurm ./work_threaded.sh
Dependencies● --dependency <jobid> : job added to
queue after successful end of job <jobid>● Very handy for “fire and forget” workloads● Potentially lots of time spent in queue
Exercise● Look at /proj/introtouppmax/labs/moreslurm/dependency/
● Run dependency_submit.sh and see how it works ● Read man sbatch for more information
− When might you use afterany instead of afterok? − When might singleton be a good idea?
● Discuss in HackMD
Dividing up a big chunk of work● A common question is how to divide up a really
big job and manage the chunks● The best approach depends on specifics, but is
usually either:− Jobarrays− Some Bash scripting that submits lots of
reasonable-sized jobs− A workflow manager such as SnakeMake or Nextflow
Snakemake and Nextflow● Conceptually similar, but with different flavours● First define steps, each with an input, an output,
and a command that transforms an input into an output
● Then just ask for desired output, and the system will handle the rest
Job arrays● Submit many jobs at once with same
parameters● Use $SLURM_ARRAY_TASK_ID in script to find
the correct part of the workload● You can find a simple example in moreslurm/jobarrays
Exercise● Suppose you have to do 1000 runs of a
program and want do 50 runs per job.● Modify the jobarrays example to submit 20
1-core jobs in an array, each of which will run “echo” 50 times.
DIY Workflows● For middle-of-the-road situations, some simple
Bash (or Python) will suffice.● labs/moreslurm/manyjobs/ contains an
example− job.sh does a “chunk” of work− jobsubmit.sh submits the jobs to Slurm
THE END
Now you know everything there is to know about using Slurm at UPPMAX