practical introduction to anselm: environment, jobs,...

49
Branislav Jansík Practical introduction to Anselm: environment, jobs, software and libs

Upload: phamcong

Post on 05-Feb-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Branislav Jansík

Practical introduction to Anselm:environment, jobs,software and libs

Page 2: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Accessing HPC resources

Grant competitions● Open Access 2x per year ● Internal Access (via IT4I) 4x per year● Directors discretion

Page 3: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Obtaining login credentials

AllocationCommittee

PI

Resource Allocation

Collaborator 1

Collaborator 2

Collaborator n

Authorization to utilize resources

Authorization to utilize resources

Authorization to utilize resources

Authorization chain

Page 4: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Obtaining login credentials Contact support to get the login credentials

To: [email protected]: Access to Anselm

Dear support,

Please open a user account for me and attach the account to OPEN-0-0Name and affiliation: John Smith, [email protected], Department of Chemistry, MIT, USI have read and accept the Acceptable use policy document (attached)

Preferred username: johnsm

Thank you,John Smith(Digitally signed)

Page 5: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Obtaining login credentials Authorization by PI

To: [email protected]: Authorization to Anselm

Dear support,

Please include my collaborators to project OPEN-0-0.

John Smith, [email protected], Department of Chemistry, MIT, USJonas Johansson, [email protected], Department of Physics, Royal Institute of Technology, SwedenLuisa Rossi, [email protected], Department of Mathematics, National Research Council, Italy

Thank you,PI(Digitally signed)

Page 6: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Anselm cluster HPC infrastructure

Switch

Storage● HOME: 300 TB (Shared)● SCRATCH 130 TB (Shared)

Interconnect● Infiniband, non blocking● 40Gb/s

Compute ● 209 nodes● SandyBridge 2.4GHz x86-64● 16 cores, 256 bit AVX instr.● 64 GB RAM● 300 GB local disk● 27x accelerator

Page 7: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Anselm cluster HPC infrastructureStorage● HOME: 300 TB (Shared)● SCRATCH 130 TB (Shared)

Interconnect● Infiniband, non blocking● 40Gb/s

Compute ● 209 nodes● SandyBridge 2.4GHz x86-64● 16 cores, 256 bit AVX instr.● 64 GB RAM● 300 GB local disk● 27x accelerator

Page 8: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

UtilizationYou need to carefully consider how to utilize all the 16 cores available on the node and how to use multiple nodes at

the same time. ● Run the right way● Parallelize your code.

Anselm cluster HPC infrastructure

Switch

Page 9: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Explore the login node

● Shell configuration● Software modules

Logging in

Switch

user0 login

Page 10: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Environment and modules Environment● Linux operating system● Bash shell● Gnome GUI● The .bashrc file: store your aliases and other settings

here

Modules● Set up the application paths, library paths and

environment variables for particular application

Page 11: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Environment customizations

● Define aliases● Define useful functions● Run commands● Load modules● Save your settings to the .bashrc file

$ alias qs='qstat -a'

$ swd ()>{ > WDIR=$(pwd)>}

$ wd ()>{ > cd $WDIR>}

$ date $ hostname

$ alias ch='rspbs -get-node-ncpu-chart '

Page 12: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Modules

● Sets up the application paths, library paths and environment variables for particular application

● Convenient way to setup whole environment in one command

$ module avail

$ module load matlab

$ module unload

$ module load list $ module load impi$ module swap impi openmpi

$ module whatis matlab

Page 13: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Module versions $ module avail-----/opt/modules/modulefiles/mpi -------bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-gcc46impi/4.1.0.024 mvapich2/1.9-iccimpi/4.1.0.030 openmpi/1.6.5-gcc(default)impi/4.1.1.036(default) openmpi/1.6.5-gcc46mvapich2/1.9-gcc(default) openmpi/1.6.5-icc

● Modules come in many variants – version variant, compiler variant, library variat, etc.

● Pick a variant$ module load openmpi/1.6.5-icc

Page 14: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Inside of a module $ module avail-----/opt/modules/modulefiles/mpi -------bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-gcc46impi/4.1.0.024 mvapich2/1.9-iccimpi/4.1.0.030 openmpi/1.6.5-gcc(default)impi/4.1.1.036(default) openmpi/1.6.5-gcc46mvapich2/1.9-gcc(default) openmpi/1.6.5-icc

$ less /opt/modules/modulefiles/mpi/openmpi/.common

$ less /opt/modules/modulefiles/mpi/openmpi/1.6.5-icc

Page 15: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Save settings in .bashrc

# ./bashrc

# Source global definitionsif [ -f /etc/bashrc ]; then . /etc/bashrcfi

# User specific aliases and functionsalias qs='qstat -a'module load PrgEnv-gnu

# Display informations to standard output - only in interactive ssh sessionif [ -n "$SSH_TTY" ]then module list # Display loaded modulesfi

Page 16: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Allocation and executionuser0 login

Resource allocationand execution via PBS queue system

Switch

Page 17: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Page 18: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Production queueqprod

yes 209 3 no 48h

Page 19: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Production queueqprod

yes 209 3 no 48h

Long queueqlong

yes 60 3 no 3*48h

Page 20: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Production queueqprod

yes 209 3 no 48h

Long queueqlong

yes 60 3 no 3*48h

Dedicated queuesqnvidia, qmic, qfat

yes Nvidia, MIC, Fat

2 yes 48h

Page 21: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Production queueqprod

yes 209 3 no 48h

Long queueqlong

yes 60 3 no 3*48h

Dedicated queuesqnvidia, qmic, qfat

yes Nvidia, MIC, Fat

2 yes 48h

Free resource queueqfree

yes 180 4 no 12h

Page 22: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Queue status● Check the queue status on command line

$ qstat -q$ qstat -a$ rspbs –summary$ rspbs –get-node-ncpu-chart

● Check the queue status on web(coming soon!)

Page 23: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Resource accounting policy● Core hours accounting

- based on the wall clock basis- runs whenever the cores are allocated or blocked

Example1: Running for 10 hours on 160 cores (10 nodes) costs 10*160 = 1600 core hours

Example2: Running for 10 hours on 16 cores using scatter:excl (10 nodes) costs 10*160 = 1600 core hours

● Check the consumed core-hours$ it4ifree

Page 24: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job submissionuser0 login

Allocation and execution via qsub

Switch

Page 25: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job submission● Use the qsub command to submit your job to a queue

- it will allocate the nodes- it will execute the jobscript

$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript

Page 26: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

What are chunks?● Use the qsub command to submit your job to a queue

- it will allocate the nodes- it will execute the jobscript

● $ qsub -A Project ID -q queue -l select=2:ncpus=4 jobscript

Page 27: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

What are chunks?● Use the qsub command to submit your job to a queue

- it will allocate the nodes- it will execute the jobscript

● $ qsub -A Project ID -q queue -l select=2:ncpus=16 jobscript

Page 28: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job submission● Use the qsub command to submit your job to a queue

- it will allocate the nodes- it will execute the jobscript

$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript

$ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=16,walltime=03:00:00 ./myjob

$ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=16:cpu_freq=24 -I

$ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16:host=cn204+1:ncpus=16:host=cn205 -I

$ qsub ./myjob

Page 29: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job management● Use the qstat and check-pbs-jobs to check job status

$ qstat -a$ qstat -an$ qstat -an -u username$ qstat -f jobid

$ check-pbs-jobs --check-all$ check-pbs-jobs –print-job-out$ check-pbs-jobs --ls-lscratch

Page 30: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job management● Use the qhold, qrls, qdel, qsig or qalter to manage jobs

$ qsub -A OPEN-0-0 -l select=100 ./jobscript$ qhold jobid$ qrls jobid$ qalter -l select=101,walltime=00:15:00 jobid

$ qsig jobid$ qdel jobid

Page 31: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job execution$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript

$ qsub -A Project ID -q queue -l select=x:ncpus=y -I

● Jobscript is executed on first node of the allocation

● Jobscript is executed in HOME directory

● File $PBS_NODEFILE contains list of allocated nodes

● Allocated nodes are accessible to user

Page 32: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job execution$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript

#!/bin/bash

# change to local scratch directorycd /lscratch/$PBS_JOBID || exit

# copy input file to scratch cp $PBS_O_WORKDIR/input .cp $PBS_O_WORKDIR/myprog.x .

# execute the calculation./myprog.x

# copy output file to homecp output $PBS_O_WORKDIR/.

#exitexit

Page 33: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job execution$ qsub jobscript

#!/bin/bash#PBS -q qprod#PBS -N MYJOB#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16#PBS -A OPEN-0-0

# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit

# load the mpi modulemodule load openmpi

# execute the calculationmpiexec ./mympiprog.x

#exitexit

Page 34: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Job execution● Anselm nodes are NUMA nodes

$ numactl --membind=0 –cpunodebind=0 command

Mem

ory

Mem

ory

60% efficiency

Page 35: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Software environment• Programing environment

‒ Gnu compilers: gfortran, gcc, g++, gdb‒ Intel compilers: ifort, icc, idb‒ PGAS compilers: upc‒ Interpreters: Perl, python, java, ruby, bash● HPC libraries: intel MKL suite, ATLAS, GOTO, PETSc

Scalapack, Plasma and Magma Comm libraries: bullx MPI, OpenMPI, OpenShmem

• Performance analysis‒ gprof‒ PAPI, Scalasca‒ HPCToolkit, Open|Speedshop

Page 36: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

System environment• Commercial products

‒ Comsol‒ Matlab‒ Ansys‒ ….

• Check out available module for list of software$ module avail$ module load bullxde$ module load bullxde papi

Page 37: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Octave, R and Matlab• Octave and R are linked to HPC libraries

FFTW3 and MKL (runs parallel on 16 cores)• GUI

$ module load octave/hg-20130730• $ module load Rstudio• Matlab Licences

$ module load matlab$ module load matlab/R2013a-COM

• Batch execution$ matlab -nosplash -nodisplay -r mscript > moutput.out$ octave -q –eval oscript > ooutput.out$ R CMD BATCH rscript.R routput.out

Page 38: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

ISV Licenses● Check available licenses in license state file

Ansys /apps/user/licenses/ansys_features_state.txt

Comsol /apps/user/licenses/comsol_features_state.txt comsol-edu /apps/user/licenses/comsol-edu_features_state.txt

Matlab /apps/user/licenses/matlab_features_state.txt matlab-edu /apps/user/licenses/matlab-edu_features_state.txt

● Tell PBS about the license you need

● Licenses are not monolithic, they split in features

$ qsub … -l feature__matlab__Image_Toolbox

● Grab the license as ASAP in your job

Page 39: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Programming

● GNU C, C++, Fortran 77/90/95● Intel C, C++, Fortran 77/90/95● GNU UPC● Berkley UPC● Nvidia nvcc

Programming environments:● module load PrgEnv-gnu● module load PrgEnv-intel

Page 40: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Intel Parallel Studio● Intel Compilers

C, C++, Fortran 77/90/95

● Intel Debugger $ idb

● Intel MKL$ icc myprog.c -mkl

● Intel IPP (whatever function you can think of)● Intel TBB (Task based threaded parallelism

programming API)

Page 41: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

MPIOpenMPI MPICH2

OpenMPI 1.6.5

BullxMPI 1.2.4

Intel MPI4.1

MPICH2 1.9

Differ by thread support level

Freely combine MPI library and compiler

Compile MPI programs using mpi wrappers (mpicc, mpif90, etc.)

Do not mix mpi implementations

Choose the right way to run an MPI program

Page 42: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Ways to run MPI programs1 process per node, 16 threads per process

Best for memory demanding apps with good cache data use

$ qsub -l select=xx:ncpus=16:mpiprocs=1:ompthreads=16

Page 43: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Ways to run MPI programs2 processes per node, 8 threads per process

Best for memory bound apps with scalable mem demand

$ qsub -l select=xx:ncpus=16:mpiprocs=2:ompthreads=8

c c c c

Page 44: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Ways to run MPI programs16 processes per node, 1 thread per process

Best for highly scalable applications with low communication demand.

$ qsub -l select=xx:ncpus=16:mpiprocs=16:ompthreads=1

cccc cccc cccc cccc cccc cccc cccc cccc

Page 45: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

MPI jobscript

#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16

# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit

# load the mpi modulemodule load openmpi

# execute the calculationmpiexec ./mympiprog.x

#exitexit

1 process per node, 16 threads per process

Page 46: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

MPI jobscript

#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16

# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit

# load the mpi modulemodule load openmpi

# execute the calculationmpiexec -bysocket -bind-to-socket ./mympiprog.x

#exitexit

2 processes per node, 8 threads per process

Page 47: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

MPI jobscript

#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16

# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit

# load the mpi modulemodule load openmpi

# execute the calculationmpiexec -bycore -bind-to-core ./mympiprog.x

#exitexit

16 processes per node, 1thread per process

Page 48: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

Tips and tricksData transfers to and out of Anselm● Use ssh with fast block cipher aes-128ctr 160MB/s● Use multiple ssh connections to bypass the 160MB/s boundary

File system access● Set up the stripe size and stripe count● Use local scratch or ramdisk for small files

Respect the NUMA● Consider using numactl ● Consider using MPI binding

Page 49: Practical introduction to Anselm: environment, jobs, …prace.it4i.cz/sites/prace.it4i.cz/files/files/prezentace_b.jansik.pdf · Subject: Access to Anselm Dear support, Please open

ConclusionsComputational resources available to general academic community are allocated in competition of scientific and technical quality

Read the documentation, contact support, contact me!

IT4Innovations SuperComputer Center is here to run the computer and assist you in using it

[email protected]