practical introduction to anselm: environment, jobs,...
TRANSCRIPT
Branislav Jansík
Practical introduction to Anselm:environment, jobs,software and libs
Accessing HPC resources
Grant competitions● Open Access 2x per year ● Internal Access (via IT4I) 4x per year● Directors discretion
Obtaining login credentials
AllocationCommittee
PI
Resource Allocation
Collaborator 1
Collaborator 2
Collaborator n
Authorization to utilize resources
Authorization to utilize resources
Authorization to utilize resources
Authorization chain
Obtaining login credentials Contact support to get the login credentials
To: [email protected]: Access to Anselm
Dear support,
Please open a user account for me and attach the account to OPEN-0-0Name and affiliation: John Smith, [email protected], Department of Chemistry, MIT, USI have read and accept the Acceptable use policy document (attached)
Preferred username: johnsm
Thank you,John Smith(Digitally signed)
Obtaining login credentials Authorization by PI
To: [email protected]: Authorization to Anselm
Dear support,
Please include my collaborators to project OPEN-0-0.
John Smith, [email protected], Department of Chemistry, MIT, USJonas Johansson, [email protected], Department of Physics, Royal Institute of Technology, SwedenLuisa Rossi, [email protected], Department of Mathematics, National Research Council, Italy
Thank you,PI(Digitally signed)
Anselm cluster HPC infrastructure
Switch
Storage● HOME: 300 TB (Shared)● SCRATCH 130 TB (Shared)
Interconnect● Infiniband, non blocking● 40Gb/s
Compute ● 209 nodes● SandyBridge 2.4GHz x86-64● 16 cores, 256 bit AVX instr.● 64 GB RAM● 300 GB local disk● 27x accelerator
Anselm cluster HPC infrastructureStorage● HOME: 300 TB (Shared)● SCRATCH 130 TB (Shared)
Interconnect● Infiniband, non blocking● 40Gb/s
Compute ● 209 nodes● SandyBridge 2.4GHz x86-64● 16 cores, 256 bit AVX instr.● 64 GB RAM● 300 GB local disk● 27x accelerator
UtilizationYou need to carefully consider how to utilize all the 16 cores available on the node and how to use multiple nodes at
the same time. ● Run the right way● Parallelize your code.
Anselm cluster HPC infrastructure
Switch
Explore the login node
● Shell configuration● Software modules
Logging in
Switch
user0 login
Environment and modules Environment● Linux operating system● Bash shell● Gnome GUI● The .bashrc file: store your aliases and other settings
here
Modules● Set up the application paths, library paths and
environment variables for particular application
Environment customizations
● Define aliases● Define useful functions● Run commands● Load modules● Save your settings to the .bashrc file
$ alias qs='qstat -a'
$ swd ()>{ > WDIR=$(pwd)>}
$ wd ()>{ > cd $WDIR>}
$ date $ hostname
$ alias ch='rspbs -get-node-ncpu-chart '
Modules
● Sets up the application paths, library paths and environment variables for particular application
● Convenient way to setup whole environment in one command
$ module avail
$ module load matlab
$ module unload
$ module load list $ module load impi$ module swap impi openmpi
$ module whatis matlab
Module versions $ module avail-----/opt/modules/modulefiles/mpi -------bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-gcc46impi/4.1.0.024 mvapich2/1.9-iccimpi/4.1.0.030 openmpi/1.6.5-gcc(default)impi/4.1.1.036(default) openmpi/1.6.5-gcc46mvapich2/1.9-gcc(default) openmpi/1.6.5-icc
● Modules come in many variants – version variant, compiler variant, library variat, etc.
● Pick a variant$ module load openmpi/1.6.5-icc
Inside of a module $ module avail-----/opt/modules/modulefiles/mpi -------bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-gcc46impi/4.1.0.024 mvapich2/1.9-iccimpi/4.1.0.030 openmpi/1.6.5-gcc(default)impi/4.1.1.036(default) openmpi/1.6.5-gcc46mvapich2/1.9-gcc(default) openmpi/1.6.5-icc
$ less /opt/modules/modulefiles/mpi/openmpi/.common
$ less /opt/modules/modulefiles/mpi/openmpi/1.6.5-icc
Save settings in .bashrc
# ./bashrc
# Source global definitionsif [ -f /etc/bashrc ]; then . /etc/bashrcfi
# User specific aliases and functionsalias qs='qstat -a'module load PrgEnv-gnu
# Display informations to standard output - only in interactive ssh sessionif [ -n "$SSH_TTY" ]then module list # Display loaded modulesfi
Allocation and executionuser0 login
Resource allocationand execution via PBS queue system
Switch
The queue system
Active project
resources priority permit walltime
Express queueqexp
no 8 1 no 1h
The queue system
Active project
resources priority permit walltime
Express queueqexp
no 8 1 no 1h
Production queueqprod
yes 209 3 no 48h
The queue system
Active project
resources priority permit walltime
Express queueqexp
no 8 1 no 1h
Production queueqprod
yes 209 3 no 48h
Long queueqlong
yes 60 3 no 3*48h
The queue system
Active project
resources priority permit walltime
Express queueqexp
no 8 1 no 1h
Production queueqprod
yes 209 3 no 48h
Long queueqlong
yes 60 3 no 3*48h
Dedicated queuesqnvidia, qmic, qfat
yes Nvidia, MIC, Fat
2 yes 48h
The queue system
Active project
resources priority permit walltime
Express queueqexp
no 8 1 no 1h
Production queueqprod
yes 209 3 no 48h
Long queueqlong
yes 60 3 no 3*48h
Dedicated queuesqnvidia, qmic, qfat
yes Nvidia, MIC, Fat
2 yes 48h
Free resource queueqfree
yes 180 4 no 12h
Queue status● Check the queue status on command line
$ qstat -q$ qstat -a$ rspbs –summary$ rspbs –get-node-ncpu-chart
● Check the queue status on web(coming soon!)
Resource accounting policy● Core hours accounting
- based on the wall clock basis- runs whenever the cores are allocated or blocked
Example1: Running for 10 hours on 160 cores (10 nodes) costs 10*160 = 1600 core hours
Example2: Running for 10 hours on 16 cores using scatter:excl (10 nodes) costs 10*160 = 1600 core hours
● Check the consumed core-hours$ it4ifree
Job submissionuser0 login
Allocation and execution via qsub
Switch
Job submission● Use the qsub command to submit your job to a queue
- it will allocate the nodes- it will execute the jobscript
$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript
What are chunks?● Use the qsub command to submit your job to a queue
- it will allocate the nodes- it will execute the jobscript
● $ qsub -A Project ID -q queue -l select=2:ncpus=4 jobscript
What are chunks?● Use the qsub command to submit your job to a queue
- it will allocate the nodes- it will execute the jobscript
● $ qsub -A Project ID -q queue -l select=2:ncpus=16 jobscript
Job submission● Use the qsub command to submit your job to a queue
- it will allocate the nodes- it will execute the jobscript
$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript
$ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=16,walltime=03:00:00 ./myjob
$ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=16:cpu_freq=24 -I
$ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16:host=cn204+1:ncpus=16:host=cn205 -I
$ qsub ./myjob
Job management● Use the qstat and check-pbs-jobs to check job status
$ qstat -a$ qstat -an$ qstat -an -u username$ qstat -f jobid
$ check-pbs-jobs --check-all$ check-pbs-jobs –print-job-out$ check-pbs-jobs --ls-lscratch
Job management● Use the qhold, qrls, qdel, qsig or qalter to manage jobs
$ qsub -A OPEN-0-0 -l select=100 ./jobscript$ qhold jobid$ qrls jobid$ qalter -l select=101,walltime=00:15:00 jobid
$ qsig jobid$ qdel jobid
Job execution$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript
$ qsub -A Project ID -q queue -l select=x:ncpus=y -I
● Jobscript is executed on first node of the allocation
● Jobscript is executed in HOME directory
● File $PBS_NODEFILE contains list of allocated nodes
● Allocated nodes are accessible to user
Job execution$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript
#!/bin/bash
# change to local scratch directorycd /lscratch/$PBS_JOBID || exit
# copy input file to scratch cp $PBS_O_WORKDIR/input .cp $PBS_O_WORKDIR/myprog.x .
# execute the calculation./myprog.x
# copy output file to homecp output $PBS_O_WORKDIR/.
#exitexit
Job execution$ qsub jobscript
#!/bin/bash#PBS -q qprod#PBS -N MYJOB#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16#PBS -A OPEN-0-0
# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit
# load the mpi modulemodule load openmpi
# execute the calculationmpiexec ./mympiprog.x
#exitexit
Job execution● Anselm nodes are NUMA nodes
$ numactl --membind=0 –cpunodebind=0 command
Mem
ory
Mem
ory
60% efficiency
Software environment• Programing environment
‒ Gnu compilers: gfortran, gcc, g++, gdb‒ Intel compilers: ifort, icc, idb‒ PGAS compilers: upc‒ Interpreters: Perl, python, java, ruby, bash● HPC libraries: intel MKL suite, ATLAS, GOTO, PETSc
Scalapack, Plasma and Magma Comm libraries: bullx MPI, OpenMPI, OpenShmem
• Performance analysis‒ gprof‒ PAPI, Scalasca‒ HPCToolkit, Open|Speedshop
System environment• Commercial products
‒ Comsol‒ Matlab‒ Ansys‒ ….
• Check out available module for list of software$ module avail$ module load bullxde$ module load bullxde papi
Octave, R and Matlab• Octave and R are linked to HPC libraries
FFTW3 and MKL (runs parallel on 16 cores)• GUI
$ module load octave/hg-20130730• $ module load Rstudio• Matlab Licences
$ module load matlab$ module load matlab/R2013a-COM
• Batch execution$ matlab -nosplash -nodisplay -r mscript > moutput.out$ octave -q –eval oscript > ooutput.out$ R CMD BATCH rscript.R routput.out
ISV Licenses● Check available licenses in license state file
Ansys /apps/user/licenses/ansys_features_state.txt
Comsol /apps/user/licenses/comsol_features_state.txt comsol-edu /apps/user/licenses/comsol-edu_features_state.txt
Matlab /apps/user/licenses/matlab_features_state.txt matlab-edu /apps/user/licenses/matlab-edu_features_state.txt
● Tell PBS about the license you need
● Licenses are not monolithic, they split in features
$ qsub … -l feature__matlab__Image_Toolbox
● Grab the license as ASAP in your job
Programming
● GNU C, C++, Fortran 77/90/95● Intel C, C++, Fortran 77/90/95● GNU UPC● Berkley UPC● Nvidia nvcc
Programming environments:● module load PrgEnv-gnu● module load PrgEnv-intel
Intel Parallel Studio● Intel Compilers
C, C++, Fortran 77/90/95
● Intel Debugger $ idb
● Intel MKL$ icc myprog.c -mkl
● Intel IPP (whatever function you can think of)● Intel TBB (Task based threaded parallelism
programming API)
MPIOpenMPI MPICH2
OpenMPI 1.6.5
BullxMPI 1.2.4
Intel MPI4.1
MPICH2 1.9
Differ by thread support level
Freely combine MPI library and compiler
Compile MPI programs using mpi wrappers (mpicc, mpif90, etc.)
Do not mix mpi implementations
Choose the right way to run an MPI program
Ways to run MPI programs1 process per node, 16 threads per process
Best for memory demanding apps with good cache data use
$ qsub -l select=xx:ncpus=16:mpiprocs=1:ompthreads=16
Ways to run MPI programs2 processes per node, 8 threads per process
Best for memory bound apps with scalable mem demand
$ qsub -l select=xx:ncpus=16:mpiprocs=2:ompthreads=8
c c c c
Ways to run MPI programs16 processes per node, 1 thread per process
Best for highly scalable applications with low communication demand.
$ qsub -l select=xx:ncpus=16:mpiprocs=16:ompthreads=1
cccc cccc cccc cccc cccc cccc cccc cccc
MPI jobscript
#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16
# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit
# load the mpi modulemodule load openmpi
# execute the calculationmpiexec ./mympiprog.x
#exitexit
1 process per node, 16 threads per process
MPI jobscript
#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16
# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit
# load the mpi modulemodule load openmpi
# execute the calculationmpiexec -bysocket -bind-to-socket ./mympiprog.x
#exitexit
2 processes per node, 8 threads per process
MPI jobscript
#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16
# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit
# load the mpi modulemodule load openmpi
# execute the calculationmpiexec -bycore -bind-to-core ./mympiprog.x
#exitexit
16 processes per node, 1thread per process
Tips and tricksData transfers to and out of Anselm● Use ssh with fast block cipher aes-128ctr 160MB/s● Use multiple ssh connections to bypass the 160MB/s boundary
File system access● Set up the stripe size and stripe count● Use local scratch or ramdisk for small files
Respect the NUMA● Consider using numactl ● Consider using MPI binding
ConclusionsComputational resources available to general academic community are allocated in competition of scientific and technical quality
Read the documentation, contact support, contact me!
IT4Innovations SuperComputer Center is here to run the computer and assist you in using it