cluster computing in...

114
Cluster Computing in Frankfurt Anja Gerbes Goethe University in Frankfurt/Main Center for Scientific Computing December 12, 2017

Upload: others

Post on 13-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

Cluster Computing in Frankfurt

Anja Gerbes

Goethe University in Frankfurt/MainCenter for Scientific Computing

December 12, 2017

Page 2: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Center for Scientific ComputingWhat can we provide you?

CSC Center for Scientific Computing

Capability computingCapacity computingAccess to licensed softwareIntroductory Courses

HKHLR Hessischen Kompetenzzentrum für Hochleistungsrechnen

Access to hessian clustersHiPerCH Workshops

Anja Gerbes Cluster Computing in Frankfurt

Page 3: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Center for Scientific Computing

Capability computing is thought of as using the maximumcomputing power to solve a single largeproblem in the shortest amount of time.

Capacity computing in contrast is thought of as using efficientcost-effective computing power to solve asmall number of somewhat large problemsor a large number of small problems.

Access to licensed software to commercial packages: TotalViewDebugger, Vampir Profiler, Intel Compilers,Tools and Libraries.

Access to hessian clusters of the universities of Darmstadt, Frankfurt,Giessen, Kassel, and Marburg.

Anja Gerbes Cluster Computing in Frankfurt

Page 4: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Center for Scientific Computing

Introductory Courses UNIX, Shell Scripting, Software Tools,Cluster Computing (for MPI/OpenMP &Matlab users), Python, C++, TotalView,Make - Build-Management-Tool.

HiPerCH Workshops offers users twice a year an insight into thehigh-performance computing with differentHPC topics.

Anja Gerbes Cluster Computing in Frankfurt

Page 5: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

Introductionto LOEWE-CSC & FUCHS

Cluster Computing in Frankfurt

Anja Gerbes

Goethe University in Frankfurt/MainCenter for Scientific Computing

December 12, 2017

Page 6: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

HPC Terminology

Cluster A group of identical computers connected by a high-speednetwork are forming a supercomputer.

Node Currently most compute node is equivalent to a high-endworkstation and is a part of a cluster.with two sockets, each with a single CPU, volatile working memory (RAM), a hard drive

CPU A Central Processing Unit (CPU) is a processor which mayhave one or more cores to perform tasks at a given time.

Core A core is the basic computation unit of the CPU.with its own computing pipeline, logical units, memory controller

Thread Each CPU core service a number of CPU threads.each having an independent instruction stream but sharing the cores memory controller & other logical units

FLOPS Performance is measured in FLoating-point Operations PerSecond (FLOPS)

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 7: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Formula

The full and complete sample formula using dimensional analysis:

GFLOPS =#chassis ·# nodeschassis

·#socketsnode

·# coressocket

· GHzcore

· FLOPscycle

TFLOPS = TeraFLOPS = 1012FLOPS =GFLOPS

1000

GFLOPS = GigaFLOPS = 109FLOPS =MFLOPS

1000

MFLOPS = MegaFLOPS = 106FLOPS

NoteThe use of a GHz processor yields GFLOPS of theoretical performance.

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 8: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

HPC Terminology

The Past, but times changes . . .

1 A chassis contained a singlenode.

2 A single node contained a singleprocessor.

3 A processor contained a singleCPU core and fit into a singlesocket.

. . . with recent computer systems:

1 A single chassis containingmultiple nodes.

2 Those nodes contain multiplesockets.

3 The processors in those socketscontain multiple CPU cores.

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 9: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

HPC Terminology

On current computer systems:

1 A chassis houses one or more compute nodes.

2 A node contains one or more sockets.

3 A socket holds one processor.

4 A processor contains one or more CPU cores.

5 The CPU cores perform the actual mathematical computations.

6 One sequence of these mathematical operations involves theexclusive use of floating point numbers (FLOPS).

7 One or more of rack computers builds a computer system.

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 10: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

HPC Terminology

core core

corecore

processor

MEMORY

dual-core CPU

core

core

processor

MEMORY

quad-core CPU

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 11: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

HPC Terminology

core core

corecore

processor

core core

corecore

processor

24GB MEMORY

NODE

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 12: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Setting of the LOEWE-CSC Cluster

RAM

HDD

Input / Output

CPU

NodeAMD

RAM

HDD

Input / Output

CPU

NodeGPU

RAM

HDD

Input / Output

CPU

NodeIntel

graphic

card

graphic

card

interconnect Fabric

Storage Storage

Core Core

Core Core

Core Core

Core Core

Core Core

Core Core

Cluster

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 13: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Access to the Cluster of Frankfurt

LOEWE-CSCssh <username>@loewe-csc.hhlr-gu.de

FUCHSssh <username>@hhlr.csc.uni-frankfurt.de

I Go to CSC-Website/Access/LOEWE &CSC-Website/Access/FUCHS to get an account at the Clusters.

I The project manager has to send a request to Prof. Lüdde to getCPU-Time for research projects.

I Please download the file & use a regular PDF viewer to open theforms.

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 14: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Organization of a Cluster

Your PC

Login nodes

2-4 generallogin nodes

ssh connection Batch

job

Compute nodes

600+ nodes

Infiniband

network

connects

nodes

loewe-csc.hhlr-gu.dehhlr.csc.uni-frankfurt.de

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 15: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Idea behind Batch Processing

I Whatever you would normally type at the command line⇒ goes into your batch script

I Output that would normally go to the screen⇒ goes into a log file

I The system runs your job when resources become available

I Very efficient in terms of resource utilization

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 16: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Hardware Resources of the LOEWE-CSC Cluster

#NodeCPU

GHz # CoresCPU

CoresNode

ThreadsNode

RAM[in GB]

GPU

4382xAMD Opteron 6172

2.10 12 24 24 64

1xAMD HD 5800 1GB

198 2xIntel Xeon E5-2670v2 2.50 10 20 40 128

139 2xIntel Xeon E5-2640v4 2.40 10 20 40 128

502xIntel Xeon E5-2630v2

2.60 6 12 24 128

2xAMD FirePro S10000 12GB

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 17: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Filesystem of the Clusters

Warning

Use the /scratch-directory instead of /home to write out the standardoutput and error.

LOEWE-CSCmountpoint /home /scratch /local /data0[1|2]

size 10GB per user 764 TB 1.4 T 500TB eachaccess time slow fast fast slowsystem NFS FhGFS ext3 NFSnetwork Ethernet InfiniBand Ethernet

FUCHS

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 18: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Environments Modules

DefinitionEnvironments Modules provide software for specific purposes.

Syntax: module <command> <modulename>

avail display all available moduleslist display all loaded modulesload | add <module> load a moduleload unstable load a deprecated or unstable moduleunload | rm <module> unload a module

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 19: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Environments Modules

DefinitionEnvironments Modules provide software for specific purposes.

Syntax: module <command> <modulename>

switch | swap <old-module> <new-module>

first unloads an old module then loads a new module

purge unload all currently loaded modules

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 20: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Environments Modules

Syntax: module load <modulename>

If you (un)load a mpi module you will automatically (un)load a compiler:

No MPI Type1 mpi/mvapich2/gcc/2.0 gcc2 mpi/mvapich2/intel-14.0.3/2.0 intel3 mpi/mvapich2/pgi-14.7/2.0 pgi1 openmpi/gcc/1.8.1 gcc2 openmpi/intel-14.0.3/1.8.1 intel

No Compiler1 intel/compiler/64/14.0.3

2 pgi/14.7

generic term(version of)flavour of MPIcompiler (version)with which was compiled

Intel softwareBit

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 21: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Environments ModulesUse custom modules

1 writing a module file in tcl to set environment variables

2 module load use.own enables you to load your own modules

3 module load ~/privatemodules/modulename

4 use facilities provided by module

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 22: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Partitions of the Cluster

Cluster Partitionrun

timeMax

NodesMax

NodesPUMax

JobsPUMax

SubmitPU

LOEWE parallel 30d 750 150 40 50

gpu 30d 50 50 40 50

test 1h 2-12 10 10

FUCHS parallel 30d 60 100 60 100

test 12h

The maximum array size of the LOEWE-CSC-cluster is 1001.

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 23: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Architecture of the LOEWE-Partitions

– partition = parallel– constraint = dual (AMD)– constraint = intel20 (Intel)– constraint = broadwell (Intel)

RAM

HDD

Input / Output

CPU

NodeAMD

RAM

HDD

Input / Output

CPU

NodeGPU

RAM

HDD

Input / Output

CPU

NodeIntel

graphic

card

graphic

card

interconnect Fabric

Storage Storage

Core Core

Core Core

Core Core

Core Core

Core Core

Core Core

partition parallel

Cluster

– partition = gpu

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 24: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Cluster-UsageEnvironments ModulesPartitions

Architecture of the FUCHS-Partitions

Processor Type AMD #AMD Socket #CPU RAM[in GB]

Magny-Cours AMD 72 dual 24 64

Magny-Cours AMD 36 quad 48 128

Istanbul AMD 250 dual 12 32/64

I The architecture is called with ’--constraint’.

I magnycours = 72 dual-socket AMD Magny-Cours nodesdual = 250 dual-socket AMD Istanbul nodesquad = 36 quad-socket AMD Magny-Cours nodes

I ’--constraint=magnycours|dual’ to avoid quad

Anja Gerbes Introduction to LOEWE-CSC & FUCHS

Page 25: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

Batch Usageon LOEWE-CSC & FUCHS

Cluster Computing in Frankfurt

Anja Gerbes

Goethe University in Frankfurt/MainCenter for Scientific Computing

December 12, 2017

Page 26: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Batch System Concepts

Cluster consists of a set of tightly connectedidentical computers as a single system &work together to solvecomputation-intensive problems.

Resource Manager is responsible for managing the resourcesof a cluster, like tasks, nodes, CPUs,memory & network.

Scheduler controls user’s jobs on a cluster.

Batch System combines all the features of a scheduler &a resource manager in an efficient way.

SLURM offers both functionality, scheduling &resource management.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 27: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Batch System Concepts

Batch Processing executes programs or jobs without user’sintervention.

Job consists with a description of requiredresources & job steps user-defined work-flowsby the batch system.

Job Steps describe tasks that must be done.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 28: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Batch System Concepts

ClusterI consists of a set of tightly connected identical computers

I computers are presented as a single system & work together tosolve computation-intensive problems

I node are connected through high speed local network

I node have access to shared resources like shared file-systems

Resource ManagerI responsible for managing the resources of a cluster, like tasks,

nodes, CPUs, memory & network

I manages the execution of jobs

I makes sure that jobs are not overlapping on the resources &handles also their I/O

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 29: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Batch System Concepts

SchedulerI receives jobs from the users

I controls user’s jobs on a cluster

I controls the resource manager to make sure that the jobs arecompleted successfully

I handles the job submissions & put jobs into queuesI offers many features like:

I user commands for managing the jobs (start, stop, hold)I interfaces for defining work-flows & job dependenciesI interfaces for job monitoring & profiling (accounting)I partitions & queues to control jobs according to policies & limitsI scheduling mechanisms, like backfilling according to priorities

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 30: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Batch System Concepts

Batch SystemI is the combination of a scheduler & a resource manager

I combines all the features of these two parts in an efficient way

I SLURM offers both functionality, scheduling & resourcemanagement

Batch ProcessingI composition of programs, so-called Jobs, is achieved by batch

processing & realized by batch systems

I execution of programs or jobs without user’s intervention

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 31: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Batch System Concepts

JobI execution of user-defined work-flows by the batch system

I a job consists of a description of required resources & job steps

Job StepsI job steps describe tasks that must be done

I resource requests consist in a number of CPUs, computing expectedduration, amounts of RAM or disk space

I the script itself is a job step

I other job steps are created with the srun command

I when a job started, the job would run a first job step srun

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 32: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

SLURMResource Manager on the Cluster

I SLURM stand for Simple Linux Utility for Resource Management.

I The user sends a job via sbatch to SLURM.

I SLURM calculates the work priority of each job.

I SLURM starts a job according to the priority & the resourcesavailabilty.

I There is an exclusive node assignment per job.

I SLURM allocates resources of the jobs.

I SLURM provides a framework for starting & monitoring of the jobs.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 33: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

SLURM Commands

1 job submission & execution

salloc requests interactive jobs/allocationssbatch submits a batch script

srun run jobs interactively (implicit resource allocation)

2 manage a job

scancel cancels a pending or running jobsinfo shows information about nodes & partitions

squeue allows to query the list of pending & running jobsscontrol shows detailed information about compute nodes

3 accounting information

sacct displays accounting data for all jobs & job stepssacctmgr shows slurm account information

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 34: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Backfilling Scheduling

Backfilling Scheduling Algorithm may schedule jobs with lower prioritiesthat can fit in the gap created by freeing resources for thenext highest priority jobs.

−1 0 1 2 3 4

A

time

node

s

BCC

C

A is 1 node job & starts at -1. Consider a 2-node cluster, job A is running and willtake until time point 2

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 35: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Backfilling Scheduling

Backfilling Scheduling Algorithm may schedule jobs with lower prioritiesthat can fit in the gap created by freeing resources for thenext highest priority jobs.

−1 0 1 2 3 4

A

time

node

s

B

CCC

B starts at -1. Now job B is submitted and scheduled. It will start after A, as it willtake all 2 nodes.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 36: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Backfilling Scheduling

Backfilling Scheduling Algorithm may schedule jobs with lower prioritiesthat can fit in the gap created by freeing resources for thenext highest priority jobs.

−1 0 1 2 3 4

A

time

node

s

BC

CC

C starts at 0. Now job C is submitted. It will start after B, if the scheduler has toassume it will take longer than time point 2.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 37: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Backfilling Scheduling

Backfilling Scheduling Algorithm may schedule jobs with lower prioritiesthat can fit in the gap created by freeing resources for thenext highest priority jobs.

−1 0 1 2 3 4

A

time

node

s

BCC

C

However, C, if promised to end before B it will start now. This is backfillingsimplified. The actual process will take into account all resources.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 38: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job-Submission

1 Login to the clusterssh <username>@loewe-csc.hhlr-gu.de LOEWEssh <username>@hhlr.csc.uni-frankfurt.de FUCHS

2 Create a job scripte.g. with the .slurm extensionExample script name: workshop_batch_script.slurm

3 Submit this script to the cluster with sbatch

sbatch workshop_batch_script.slurm (indirect)salloc (interactive mode)

4 Use of allocated resources

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 39: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job-Submission

I commands for job allocation

I sbatch is used to submit batch jobs to the queuesbatch [options] jobscript [args...]

I salloc is used to allocate resource for interactive jobssalloc [options] [<command> [command args]]

I command for job executionI with srun the users can spawn any kind of application, process or

task inside a job allocation1 Inside a job script submitted by sbatch (starts a job step)2 After calling salloc (execute programs interactively)

srun [options] executable [args...]

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 40: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Indirect Job-Submission

sbatch

encapsulation of job parameters and user program call in a job script tothe handover to submit command

Features:

I create prefabricated job scripts with important parameters eliminatesoperator error

I simply add additional functionality

I allows transfer of additional parameters to submit command

I one-time additional expenses by draft the job-scripts

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 41: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Direct Job-Submission

salloc

transfer of job parameters and user program to submit command

Features:

I allows simple, quick and flexible change of job parameters

I prefered in many of the same jobs that differ only in very fewparameters (eg. benchmarks, the same process, different number ofCPUs)

I prone to faulty operation

I additional functionality only via encapsulation in self-generatedscripts (eg. load the library)

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 42: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Execution

srun

I used to initiate job steps mainly within a job

I start an interactive jobs

I a job can contain multiple job steps

I executing sequentially or in parallel on independent nodes within thejob’s node allocation

After modulefiles are loaded and resources have been allocated, anapplication on the assigned node can be started with precedingsrun mvapichmpirun openmpimpiexec openmpi

In this shell window more applicationscan be started.Running jobs interactively implicitresource allocation with salloc.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 43: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job-Submission

List of the submission/allocation options for sbatch and salloc:

-p, --partition partition to be used from the job-c, --cpus-per-task logical CPUs (hardware threads) per task-N, --nodes compute nodes used by the job-n, --ntasks total No. of processes (MPI processes)--ntasks-per-node tasks per compute node-t, --time max wall-clock time of the job-J, --job-name set the name of the job-o, --output path to the job’s standard output-e, --error path of the job’s standard error

srun accepts almost all allocation options of sbatch and salloc

NoteOption --partition has to be set.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 44: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 1: A naive script1 #!/bin/bash23 #SBATCH --job-name=TestJobSerial4 #SBATCH --nodes=15 #SBATCH --output=TestJobSerial-%j.out6 #SBATCH --error=TestJobSerial-%j.err7 #SBATCH --time=3089 hostname

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 45: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 2: Going parallel1 #!/bin/bash23 #SBATCH --job-name=TestJobParallel4 #SBATCH --nodes=15 #SBATCH --output=TestJobParallel-%j.out6 #SBATCH --error=TestJobParallel-%j.err7 #SBATCH --time=6089 srun --ntasks-per-node=2 hostname

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 46: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 3: Going parallel across nodes1 #!/bin/bash23 #SBATCH --job-name=TestJobParallel4 #SBATCH --nodes=45 #SBATCH --output=TestJobParallel-%j.out6 #SBATCH --error=TestJobParallel-%j.err7 #SBATCH --time=6089 srun --ntasks-per-node=2 hostname

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 47: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Creating a Parallel Job in SLURM

There are several ways a parallel job, one whose tasks are runsimultaneously, can be created:

1 by running several instances of multi-programs

2 by running a multi-process program (MPI)

3 by running a multi-threaded program (OpenMP or pthreads)

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 48: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Creating a Parallel Job in SLURM

I In SLURM context, a task is to be understood as a process.

I A multi-threaded program is consists in 1 task that uses several CPUs.

I Option --cpus-per-task is defined for multi-threaded programs.I Multi-threaded jobs run on a single node, but use more than one

processor on the node.I Tasks cannot be split across several compute nodes, so requesting

several CPUs with the --cpus-per-task option will ensure allCPUs are allocated on the same compute node.

I A multi-process program is made of several tasks.

I Option --ntasks is defined for multi-process programs.I By contrast, requesting the same amount of CPUs with the--ntasks option may lead to several CPUs being allocated onseveral, distinct compute nodes.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 49: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Trivial ParallelizationSimple Loop

Listing 4: scriptexp.sh1 #!/bin/bash23 echo ‘hostname‘ $14 exit 0

Listing 5: Serial Job, simpleloop.slurm1 #!/bin/bash2 #SBATCH --job-name=oneforloop3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=488 #SBATCH --cpus-per-task=19 #SBATCH --mem-per-cpu=1200

10 #SBATCH --mail-type=FAIL1112 for N in ‘seq 1 48‘; do13 srun -N 1 -n 1 ./scriptexp.sh $N &14 done15 wait16 sleep 300

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 50: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Trivial ParallelizationNested Loop

Listing 6: scriptexp.sh1 #!/bin/bash23 echo ‘hostname‘ $14 sleep 25 exit 0

Listing 7: Serial Job, nestedloop.slurm1 #!/bin/bash2 #SBATCH --job-name=twoforloops3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=488 #SBATCH --cpus-per-task=19 #SBATCH --mem-per-cpu=1200

10 #SBATCH --mail-type=FAIL1112 for i in ‘seq 0 3‘; do13 for M in ‘seq 1 48‘; do14 let N=$i*48+$M15 srun -N 1 -n 1 ./scriptexp.sh $N &16 done17 wait18 done19 wait

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 51: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 8: Naive OpenMP Job1 #!/bin/bash23 #SBATCH -J NaiveOMP4 #SBATCH -N 156 #SBATCH -o TestOMP-%j.out7 #SBATCH -e TestOMP-%j.err8 #SBATCH --time=2:00:009

10 #SBATCH --constraint=dual1112 # AMD nodes13 export OMP_NUM_THREADS=241415 /home/user/omp-prog

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 52: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 9: Naive OpenMP Job1 #!/bin/bash23 #SBATCH -J NaiveOMP4 #SBATCH -N 156 #SBATCH -o TestOMP-%j.out7 #SBATCH -e TestOMP-%j.err8 #SBATCH --time=2:00:009

10 #SBATCH --constraint=dual1112 # AMD nodes13 export OMP_NUM_THREADS=241415 /home/user/omp-prog

max CoresNode

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 53: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 10: MPI Job1 #!/bin/bash23 #SBATCH -J TestMPI4 #SBATCH --nodes=45 #SBATCH --ntasks=966 #SBATCH -o TestMPI-%j.out7 #SBATCH -e TestMPI-%j.err8 #SBATCH --time=0:15:009

10 #SBATCH --partition=parallel1112 # implied --ntasks-per-node=2413 srun ./mpi-prog

4×24 = 96

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 54: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 11: Multiple Job Steps1 #!/bin/bash23 #SBATCH -J TestJobSteps4 #SBATCH -N 325 #SBATCH --partition=parallel6 #SBATCH -o TestJobSteps-%j.out7 #SBATCH -e TestJobSteps-%j.err8 #SBATCH --time=6:00:009

10 srun -N 16 -n 32 -t 00:50:00 ./mpi-prog_111 srun -N 2 -n 4 -t 00:10:00 ./mpi-prog_212 srun -N 32 --ntasks-per-node=2 -t 05:00:00 ./mpi-prog_3

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 55: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 12: Multiple Job Steps1 #!/bin/bash23 #SBATCH -J TestJobSteps4 #SBATCH -N 325 #SBATCH --partition=parallel6 #SBATCH -o TestJobSteps-%j.out7 #SBATCH -e TestJobSteps-%j.err8 #SBATCH --time=6:00:009

10 srun -N 16 -n 32 -t 00:50:00 ./mpi-prog_111 srun -N 2 -n 4 -t 00:10:00 ./mpi-prog_212 srun -N 32 --ntasks-per-node=2 -t 05:00:00 ./mpi-prog_3

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 56: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 13: Multiple Job Steps1 #!/bin/bash23 #SBATCH -J TestJobSteps4 #SBATCH -N 325 #SBATCH --partition=parallel6 #SBATCH -o TestJobSteps-%j.out7 #SBATCH -e TestJobSteps-%j.err8 #SBATCH --time=6:00:009

10 srun -N 16 -n 32 -t 00:50:00 ./mpi-prog_111 srun -N 2 -n 4 -t 00:10:00 ./mpi-prog_212 srun -N 32 --ntasks-per-node=2 -t 05:00:00 ./mpi-prog_3

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 57: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 14: Job Arrays1 #!/bin/bash23 #SBATCH -J TestJobArrays4 #SBATCH --nodes=15 #SBATCH -o TestJobArrays-%A_%a.out6 #SBATCH -e TestJobArrays-%A_%a.err7 #SBATCH --time=2:00:008 #SBATCH --array=1-209

10 srun -N 1 --ntasks-per-node=1 ./prog input_${SLURM_ARRAY_TASK_ID}.txt

will cause 20 array-tasks(numbered 1, 2, . . . , 20)

“array-tasks” are simply copies of this master script

I SLURM supports job arrays with option --array

SLURM_ARRAY_JOB_ID : %A : base array job id

SLURM_ARRAY_TASK_ID : %a : array index

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 58: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Array Support

I Job arrays offer a mechanism for submitting and managingcollections of similar jobs quickly and easily.

I Job arrays with many tasks can be submitted in milliseconds.

I All jobs have the same initial options (e.g. size, time, limit)

I Users may limit how many such jobs are running simultaneously.

I Job arrays are only supported for batch jobs.

I To address a job array, SLURM provides a base array ID & an arrayindex for each job, specify with <base job id>_<array index>

I SLURM exports environment variables

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 59: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

CPU Management with SLURM

1 Selection of Nodes

2 Allocation of CPUs from Selected Nodes

3 Distribution of Tasks to Selected Nodes

4 Optional Distribution & Binding of Tasks to Allocated CPUswithin a Node

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 60: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

CPU Management

AllocationAssignment of a specific set of CPU resources (nodes, sockets, coresand/or threads) to a specific job or step

Distribution1 Assignment of a specific task to a specific node

2 Assignment of a specific task to a specific set of CPUs within a node(used for optional Task-to-CPU binding)

Core Binding

Confinement/locking of a specific set of tasks to a specific set of CPUswithin a node

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 61: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

CPU Management with SLURM

Selection of resource with sbatch

#SBATCH --partition=parallel

#SBATCH --nodes=6

#SBATCH --constraint=intel20

#SBATCH --mem=512

#SBATCH --ntasks=12

#SBATCH --cpus-per-task=3

Allocation of resource with sbatch

Distribution with srun

srun --distribution=block:cyclic ./my_program

Core Binding Process/Task Binding with srun

srun --cpu_bind=cores ./my_program

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 62: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

HOW-TO

I first you have to login to one of the login nodes

I prepare a batch script with your requirements

I execute the batch script to run your application

I monitor the batch script on the terminal

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 63: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Login to one of the Login Nodes

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 64: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Executing the Batch Script to run your Application

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 65: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Monitoring the Batch Script on the Terminal

squeue -u <user>

scancel <jobid>

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 66: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Monitoring the Batch Script on the Terminal

scontrol show job <jobid>

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 67: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Setting of the LOEWE-CSC Cluster

RAM

HDD

Input / Output

CPU

NodeAMD

RAM

HDD

Input / Output

CPU

NodeGPU

RAM

HDD

Input / Output

CPU

NodeIntel

graphic

card

graphic

card

interconnect Fabric

Storage Storage

Core Core

Core Core

Core Core

Core Core

Core Core

Core Core

important parameters

for sbatch:

I -p, --partition

I -C, --constraint

I -J, --job-name

I -t, --time

I -N, --nodes

I --mem-per-cpu

I -n, --ntasks

I -c, --cpus-per-task

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 68: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Examples

Listing 15: Parallel MPI Job#!/bin/bash

#SBATCH --job-name=parallelmpi#SBATCH --output=expscript-%j.out#SBATCH --error=expscript-%j.err##SBATCH --partition=parallel#SBATCH --constraint=dual##SBATCH --ntasks=4#SBATCH --time=00:10:00#SBATCH --mem-per-cpu=100

module load mpi/mvapich2/gcc/2.0mpiexec helloworld.mpi

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 69: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Examples

Listing 16: OpenMP Job#!/bin/bash

#SBATCH --job-name=parallelopenmp#SBATCH --output=expscript-%j.out#SBATCH --error=expscript-%j.err##SBATCH --partition=parallel#SBATCH --constraint=dual##SBATCH --ntasks=1#SBATCH --cpus-per-task=4#SBATCH --time=00:10:00#SBATCH --mem-per-cpu=100

export OMP_NUM_THREADS=4./helloworld.omp

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 70: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Listing 17: OpenMP Job#!/bin/bash#SBATCH --nodes=1#SBATCH --tasks-per-node=1#SBATCH --cpus-per-task=16

export OMP_NUM_THREADS=16

srun -n 1 --cpus-per-task $OMP_NUM_THREADS ./application

Cores 0-7 Cores 8-15

Socket 1 Socket 2

Node 1

0-15

physical view

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 71: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Examples

Listing 18: MPI Job#!/bin/bash#SBATCH --nodes=2#SBATCH --tasks-per-node=16#SBATCH --cpus-per-task=1

srun -n 32 ./application 0 2 4 6 8 101214 1 3 5 7 9 111315

Cores 0-7 Cores 8-15

Socket 1 Socket 2

Node 1

1618202224262830 1719212325272931

Cores 0-7 Cores 8-15

Socket 1 Socket 2

Node 2

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 72: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Examples

Listing 19: Hybrid MPI/OpenMP Job#!/bin/bash#SBATCH --nodes=2#SBATCH --tasks-per-node=4#SBATCH --cpus-per-task=4

export OMP_NUM_THREADS=4

srun -n 8 --cpus-per-task $OMP_NUM_THREADS ./application

Rank 0

Cores 0-7 Cores 8-15

Socket 1 Socket 2

Node 1

0,1

Rank 0

8,9

Rank 1

2,1

Rank 1

10,11

Rank 2

4,5

Rank 2

12,13

Rank 3

6,7

Rank 3

14,15

Rank 4

Cores 0-7 Cores 8-15

Socket 1 Socket 2

Node 2

0,1

Rank 4

8,9

Rank 5

2,1

Rank 5

10,11

Rank 6

4,5

Rank 6

12,13

Rank 7

6,7

Rank 7

14,15

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 73: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Submitting a Batch Script

Suppose you need 16 cores.

Have the control on how the cores are allocated using--cpus-per-task & --ntasks-per-node options.

With those options, there are several ways to get the same allocation:

Example for --cpus-per-task & --ntasks-per-node

Equivalence in terms of resource allocation:

I --nodes=4 --ntasks=4 --cpus-per-task=4

I --ntasks=16 --ntasks-per-node=4 with

I srun 4 processes are launched

I mpirun 16 processes are launched

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 74: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Submitting a Batch Script

Suppose you need 16 cores.

Example for --cpus-per-task & --ntasks-per-node

I use mpi & don’t care about where those cores are distributed:--ntasks=16

I launch 16 independent processes (no communication):--ntasks=16

I want those cores to spread across distinct nodes:--ntasks=16 --ntasks-per-node=1 or--ntasks=16 --nodes=16

I want those cores to spread across distinct nodes & no interferencefrom other jobs:--ntasks=16 --nodes=16 --exclusive

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 75: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Submitting a Batch Script

Suppose you need 16 cores.

Example for --cpus-per-task & --ntasks-per-node

I 16 processes to spread across 8 nodes to have two processes pernode:--ntasks=16 --ntasks-per-node=2

I 16 processes to stay on the same node:--ntasks=16 --ntasks-per-node=16

I one process that can use 16 cores for multithreading:--ntasks=1 --cpus-per-task=16

I 4 processes that can use 4 cores each for multithreading:--ntasks=4 --cpus-per-task=4

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 76: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Submitting a Batch Script

Example for --mem & --mem-per-cpu

I If you request two cores (-n 2) and 4 Gb with --mem, each core willreceive 2 Gb RAM.

I If you specify 4 Gb with --mem-per-cpu, each core will receive 4 Gbfor a total of 8 Gb.

Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS

Page 77: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

Different Batch-ScriptsCluster Computing in Frankfurt

Anja Gerbes

Goethe University in Frankfurt/MainCenter for Scientific Computing

December 12, 2017

Page 78: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

OpenMP example

Listing 20: OpenMP1 #!/bin/bash2 #SBATCH --job-name=openmpexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=18 #SBATCH --cpus-per-task=249 #SBATCH --mem-per-cpu=200

10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=2414 ./example_program

If your application needs 4800 MByou want to run 24 threads

set –mem-per-cpu=200 (4800/24 = 200)

Anja Gerbes Different Batch-Scripts

Page 79: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

MPI example

Listing 21: MPI1 #!/bin/bash2 #SBATCH --job-name=mpiexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=SLURM_NTASKS8 #SBATCH --cpus-per-task=19 #SBATCH --mem-per-cpu=1200

10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 module load openmpi/gcc/1.8.114 export OMP_NUM_THREADS=115 mpirun -np 96 ./example_program

SLURM_NTASKS = OpenMPI ranks

Anja Gerbes Different Batch-Scripts

Page 80: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

MPI example

Listing 22: MPI1 #!/bin/bash2 #SBATCH --job-name=mpiexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=SLURM_NTASKS8 #SBATCH --cpus-per-task=19 #SBATCH --mem-per-cpu=1200

10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 module load openmpi/gcc/1.8.114 export OMP_NUM_THREADS=115 mpirun -np 96 ./example_program

1200 MB of RAM are allocated for each rank

Anja Gerbes Different Batch-Scripts

Page 81: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

small MPI example

Listing 23: small MPI example1 #!/bin/bash2 #SBATCH --job-name=smallmpiexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --nodes=19 #SBATCH --cpus-per-task=1

10 #SBATCH --mem-per-cpu=200011 #SBATCH --time=48:00:0012 #SBATCH --mail-type=FAIL13 #14 export OMP_NUM_THREADS=115 mpirun -np 12 ./program input01 >& 01.out &16 sleep 317 mpirun -np 12 ./program input02 >& 02.out &18 wait

if you have several 12-rank MPI jobsyou can start more than one computations

Anja Gerbes Different Batch-Scripts

Page 82: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Hybrid MPI+OpenMP example

Listing 24: Hybrid MPI+OpenMP1 #!/bin/bash2 #SBATCH --job-name=hybridexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --cpus-per-task=69 #SBATCH --mem-per-cpu=200

10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=614 export MV2_ENABLE_AFFINITY=015 srun -n 24 ./example_program

24 ranks

Anja Gerbes Different Batch-Scripts

Page 83: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Hybrid MPI+OpenMP example

Listing 25: Hybrid MPI+OpenMP1 #!/bin/bash2 #SBATCH --job-name=hybridexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --cpus-per-task=69 #SBATCH --mem-per-cpu=200

10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=614 export MV2_ENABLE_AFFINITY=015 srun -n 24 ./example_program

6 threads each

Anja Gerbes Different Batch-Scripts

Page 84: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Hybrid MPI+OpenMP example

Listing 26: Hybrid MPI+OpenMP1 #!/bin/bash2 #SBATCH --job-name=hybridexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --cpus-per-task=69 #SBATCH --mem-per-cpu=200

10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=614 export MV2_ENABLE_AFFINITY=015 srun -n 24 ./example_program

200 MB per threads

Anja Gerbes Different Batch-Scripts

Page 85: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Hybrid MPI+OpenMP example

Listing 27: Hybrid MPI+OpenMP1 #!/bin/bash2 #SBATCH --job-name=hybridexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --cpus-per-task=69 #SBATCH --mem-per-cpu=200

10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=614 export MV2_ENABLE_AFFINITY=015 srun -n 24 ./example_program

24 × 6 threads→ you will get six 24 core nodes

Anja Gerbes Different Batch-Scripts

Page 86: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 28: Hybrid Job with Simultaneous multithreading (SMT)1 #!/bin/bash23 #SBATCH -J TestHybrid4 #SBATCH --ntasks=65 #SBATCH --ntasks-per-node=26 #SBATCH --cpus-per-task=247 #SBATCH -o TestMPI-%j.out8 #SBATCH -e TestMPI-%j.err9 #SBATCH --time=0:20:00

10 #SBATCH --partition=parallel1112 export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}1314 srun ./hybrid-prog

Anja Gerbes Different Batch-Scripts

Page 87: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Job Scripts Toy Examples

Listing 29: Hybrid Job with Simultaneous multithreading (SMT)1 #!/bin/bash23 #SBATCH -J TestHybrid4 #SBATCH --ntasks=65 #SBATCH --ntasks-per-node=26 #SBATCH --cpus-per-task=247 #SBATCH -o TestMPI-%j.out8 #SBATCH -e TestMPI-%j.err9 #SBATCH --time=0:20:00

10 #SBATCH --partition=parallel1112 export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}1314 srun ./hybrid-prog

Anja Gerbes Different Batch-Scripts

Page 88: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

Closing RemarksCluster Computing in Frankfurt

Anja Gerbes

Goethe University in Frankfurt/MainCenter for Scientific Computing

December 12, 2017

Page 89: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Checklistfor a successful Cluster Usage

2 Account exists?

2 I know how to access the cluster.

2 I know the parallel behavior of my software (and I know whether it isparallel at all).

2 I can approximate the runtime behavior and memory usage of mysoftware.

2 I know how to run my software on the operating system of thecluster.

2 I know where to find help when I have problems.→ HKHLR-members.

Anja Gerbes Closing Remarks

Page 90: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

SummaryResource Allocation Specifications

Syntax: sbatch myBatchScript.sh

Features of a Cluster -C, --constraint

Node count -N, --nodes

Node restrictions -w, --nodelist

Task count -n, --ntasks

Task specifications --ntasks-per-node

--ntasks-per-socket

--ntasks-per-core

--cpus-per-task

Memoryper node --mem

per CPU --mem-per-cpu

Anja Gerbes Closing Remarks

Page 91: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Cluster Quick Reference Guide

Version 1.0 February 14, 2017 Cluster | quick reference | Frankfurt1 Cluster Usage

Access Cluster Frankfurt

ssh <username>@loewe-csc.hhlr-gu.de LOEWEssh <username>@hhlr.csc.uni-frankfurt.de FUCHS

Go to CSC-Website/Access to get an account at the clusters. The projectmanager has to send for LOEWE a request to Prof. Lüdde to get CPU-Time for research projects.

Getting Help Cluster Frankfurt

You will find further information about usable commands on the clusterswith man <command>.

How-To execute myBatchScript.sh

1 first you have to login to one of the login nodes2 prepare a batch script with your requirements3 execute the batch script to run your application

Module setting program environments

Syntax: module <command> <modulename>avail display all available moduleslist display all loaded modulesload | add <module> load a moduleload unstable load a deprecated or unstable moduleunload | rm <module> unload a moduleswitch | swap <old-module> <new-module>first unloads an old module then loads a new modulepurge unload all currently loaded modules

How-To use custom modules

1 writing a module file in tcl* to set environment variables2 module load use.own enables you to load your own modules3 module load ~/privatemodules/modulename4 use facilities provided by module* look for examples in /cm/shared/modulefiles

Architecture & Constraints Cluster Frankfurt

LOEWE Cluster Frankfurt#nodes CPU GHz #CPUs

Cores RAM GPU438 AMD

Magny-CoursOpteron 6172

2.10 2/24 64GB 1xATI RadeonHD5870 1GB

198 Intel XeonE5-2670v2Ivy Bridge

2.50 2/20 128GB

139 Intel XeonE5-2640v2Broadwell

2/20 128GB

50 Intel XeonE5-2650v2Ivy Bridge

2.60 2/12 128GB 2xAMD FireProS10000 12GB

The architecture will be selectable via the ’--constraint’ option,dual = dual-socket AMD Magny-Cours CPU/GPU nodes,intel20 = dual-socket Intel Ivy Bridge CPU nodes,broadwell = dual-socket Intel Broadwell CPU nodes.

FUCHS Cluster FrankfurtProcessor Type #AMD Socket #CPU RAM [in GB]

Magny-Cours 72 dual 24 64Magny-Cours 36 quad 48 128

Istanbul 250 dual 12 32/64The architecture is called with ’--constraint’.magnycours = 72 dual-socket AMD Magny-Cours nodesdual = 250 dual-socket Istanbul nodesquad = 36 quad-socket AMD Magny-Cours nodes’--constraint=magnycours|dual’ to avoid quad

Contact HPC FrankfurtIf you have any HPC-questions about SLURM and wanthelp by debugging & optimizing your program, pleasewrite to [email protected], you can contact the system administrators if youneed software to be installed: [email protected]. . . .Detailed documentation on using the cluster can befound at CSC-Website.

Partitions Cluster Frankfurt

cluster partitionrun

timeMax

NodesMax

NodesPUMax

JobsPUMax

SubmitPULOEWE parallel 30d 750 150 40 50

gpu 30d 50 50 40 50test 1h 2-12 10 10

FUCHS parallel 30d 60 100 60 100test 12h

The maximum array size of the cluster is 1001.To view such informations on the cluster, use the command:

sacctmgr list QOS partition format=maxnodes,maxnodesperuser,maxjobsperuser,maxsubmitjobsperuser

scontrol show partitionsinfo -p partitionsqueue -p partition

partition description LOEWEparallel A mix of AMD Magny-Cours nodes, Intel Xeon Ivy

Bridges & Broadwell nodes.gpu dual-socket Intel Xeon Ivy Bridge E5-2650v2 CPU/GPU

nodes, each with two AMD FirePro S10000 dual-GPUcards

’--constraint=gpu’ become obsolete, use ’--partition=gpu’ instead.Mixed node types ’gpu*3&intel20*2’ is possible. Ensure, that the Noof nodes you request matches the No of nodes in your constraints.

Per-User Resource Limits Cluster Frankfurt

limit descriptionMaxNodes max No of nodesMaxNodesPU max No of nodes to use at the same timeMaxJobsPU max No of jobs to run simultaneouslyMaxSubmitPU max No of jobs in running or pending stateMaxArraySize max job array size

File Systems storage systems

mountpoint /home /scratch /local /data0[1|2]size 10GB PU 764 TB 1.4 T 500TB eachaccess time slow fast fast slowsystem NFS FhGFS ext3 NFSnetwork Ethernet InfiniBand Ethernet

http://csc.uni-frankfurt.dehttp://www.hpc-hessen.de

Center for Scientific ComputingHessisches Kompetenzzentrum für Hochleistungsrechnen

Anja Gerbes Closing Remarks

Page 92: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Cluster Quick Reference Guide

Version 1.0 February 14, 2017 Cluster | quick reference | Frankfurt

Resource Manager Cluster Frankfurt

On our systems, compute jobs are managed by SLURM. At the clusters,the node allocation is exclusive. You can find more examples on our CSC-Website/ClusterUsage. In SlurmCommands, there is a detailed summaryof the different options.

2 Job Submission & Execution

sbatch batch mode | salloc interactive | allocate resources

Syntax: salloc [options] [<command> [command args]]sbatch myBatchScript.sh

-a, --array=<indexes> submit a job array-C, --constraint=<feature> specify features of a Cluster-c, --cpus-per-task=<ncpus>Threads How many threads run on the node? with OpenMP-J, --job-name=<job-name> specify a name for the allocation-m, --distribution=<block|cyclic|arbitrary|plane>mapping of processes--mem=<MB> specify real memory required per node--mem-per-cpu min memory required per allocated CPU--mem_bind=<type> bind tasks to memory-N, --nodes=<min[-max]>Nodes How many nodes will be allocated to this job?-n, --ntasks=<number>Tasks How many processes are started? important for OpenMP-p <partition> request specific partition for the resource-t <time> set limit on total run time of the job-w, --nodelist=<node_name_list>request a specific list of node names

sbatch execute myBatchScript.sh | batch mode

#!/bin/bash#SBATCH -p parallel # partition (queue)#SBATCH -C dual|intel20 # class of nodes#SBATCH -N|-n|-c 1 # number of nodes|processes|cores#SBATCH --mem 100 # memory pool for all cores#SBATCH -t 0-2:00 # time (D-HH:MM)srun helloworld.sh # start program

srun run parallel jobs | interactive mode

After modulefiles are loaded and resources have been allocated, anapplication on the assigned node can be started with precedingsrun run parallel jobs | mpiexec run mpi programIn this shell window more applications can be started.

process bindingconstraints each process to run on specific processors

--cpu_bind process binding to cores & CPUs srun--bind-to |-core|-socket|-none| mpirun--cpus-per-proc <#perproc>bind each process to the specified number of cpus--report-bindings report any bindings for launched processes--slot-list <id> list of processor IDs to be used for binding

MPI processes

3 Accounting

sacct display accounting data

Syntax: sacct [options]-b, --brief displays jobid, status, exitcode-e, --helpformat print a list of available fields-o, --format comma separated list of fields

sacctmgr view Slurm account information

Syntax: sacctmgr [options] [command]list | show display information about the specified entity

4 Job Management

scancel cancel a job

Syntax: scancel <jobid>-u <username> cancel all the jobs for a user-t PD -u <username> cancel all the pending jobs for a user

sinfo view info about nodes and partitions

Syntax: sinfo [options]-i <seconds> print state on a periodic basis-l, --long print more detailed information-n <nodes> print info only about the specific node-p <partition> print info about the specified partition-R, --list-reasons list reasons why nodes are in the down,

drained, fail or failing state-s, --summarize list only a partition state summary with no

node state details

squeue view job info located in scheduling queue

Syntax: squeue [options]-i <seconds> report requested information-j <job_id_list> print list of job IDs-r print one job array element per line--start report expected start time & resources to be

allocated for pending jobs-t <state_list> print specified states of jobs-u <user_list> print jobs from list of users

scontrol view state of specified entity

Syntax: scontrol [options] [command] ENTITY_IDoptions:-d, --details print show command print more details-o, --oneliner print information one line per recordcommand <ENTITY_ID>:hold <jobid> pause a particular jobresume <jobid> resume a particular jobrequeue <jobid> requeue (cancel & rerun) a particular jobsuspend <jobid> suspend a running jobscontrol show ENTITY_ID:job <job_id> print job informationsnode <name> print node informationspartition <name> print partition informationsreservation print list of reservations

http://csc.uni-frankfurt.dehttp://www.hpc-hessen.de

Center for Scientific ComputingHessisches Kompetenzzentrum für Hochleistungsrechnen

Anja Gerbes Closing Remarks

Page 93: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

ISC STEM STUDENT DAY & STEM GALA

Purpose? HPC skills can positively shape STEM students future careersIntroduction of current & predicted HPC job landscape& howEuropean HPC workforce look like in 2020

Audience? Undergraduate and graduate students pursuing STEM degrees

When? Wednesday, June 27; 9:30am – 9:30pm

What? | Where? Day Program & Evening Program in Frankfurt

Fee? Free admission for STEM Students

Registration? Registration will open in spring 2018, just for 70 attendeesfirst come first serve

Infos? Announcement www.isc-hpc.com

Anja Gerbes Closing Remarks

Page 94: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Day Program & Evening Program in Frankfurt

I Tutorial on HPC Applications, Systems & Programming LanguagesDr.-Ing. Bernd Mohr

I Tutorial on Machine Learning & Data Analytics Prof. Dr.-Ing. Morris Riedel

I Guided Tour of ISC Exhibition & Student Cluster Competition

I Keynote Thomas Sterling at Room Konstant, Forum, Messe Frankfurt

I Welcome Kim McMahon

I Introduction by Addison Snell

I Job Fair & Dinnerat Marriott Frankfurt

Anja Gerbes Closing Remarks

Page 95: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Why Attend the STEM Day?

I science, technology, engineering or mathematics rely on HPC

I STEM degree programs are not including HPC courses into theircurriculum

I HPC: technological foundation of machine learning, AI and the Internet ofThings

I well-paying HPC-related jobs are not being filled due to a shortage of HPCskills

I depend on your skills the salary is between $80 K to $150 K

I free introduction to HPC & its role in STEM careers

I introduce the organizations that offer training in HPC

Anja Gerbes Closing Remarks

Page 96: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Feedback

QuestionnaireAnja Gerbes Cluster Computing Course Date: 14.06.2017

Evaluation of the Course

very

good

very

bad

Total impression 2 2 2 2 2 2How would you evaluate. . .

. . . the content and target of the workshop? 2 2 2 2 2 2actuality 2 2 2 2 2 2comprehensibility 2 2 2 2 2 2relevance of content 2 2 2 2 2 2practical relevance 2 2 2 2 2 2handout 2 2 2 2 2 2

. . . the professional competence of the course instructor? 2 2 2 2 2 2

. . . the presentation? 2 2 2 2 2 2

. . . the methodical-didactic competence with regard to . . .the structure of learning content 2 2 2 2 2 2and its presentation? 2 2 2 2 2 2

. . . the participant orientation? 2 2 2 2 2 2

. . . the equipment and environment? 2 2 2 2 2 2

Which course . . . you join?did would

UNIX 2 2TOOLS 2 2SHELL 2 2CLUSTER 4 2PYTHON 2 2CPP 2 2TOTALVIEW 2 2MAKE 2 2HPC 2 2LIKWID 2 2VAMPIR 2 2

Is there any topic missingthat you are interested in?

Length of the course 2 adequate 2 too short 2 too longDepth of the content 2 adequate 2 too superficial 2 too profoundSubject of the course 2 important for me 2 minor for meDo you wish further courses on this subject? 2 Yes 2 NoWould you recommend this course? 2 Yes 2 NoWill you be using the material later on? 2 Yes 2 No

Follow up courses

in Python

Would you be interested in afollow-up course about . . .

. . . TDD with Python? 2 Yes 2 No

. . . Python project development? 2 Yes 2 No. . . Other python related topic?Which one?

http://csc.uni-frankfurt.dehttp://www.hpc-hessen.de

Center for Scientific ComputingHessisches Kompetenzzentrum für Hochleistungsrechnen

QuestionnaireWhat did you like most about the course?

What did you like least about the course?

Which ideas and suggestions do you have for this course?

What content would you have additionally preferred?

How did you like the exercises?

How were you informed about this course?

2 WWW 2 colleagues 2 direct mail 2 introductory courses 2 other

About You

Affiliation (type):

2 student 2 PhD student 2 employee (but not PhD student)

Employment:

2 at University of Frankfurt

2 at University of Kassel

2 at University of Marburg

2 at University of Gießen

2 at University of Darmstadt2 at GSI2 at German federal research labs,

MPI, FhG2 at other university

2 at other institute/company

Faculty:

2 Physics 2 ComputerScience

2 Mathematics 2 Chemistry 2 Biology 2 Engineer 2 other

http://csc.uni-frankfurt.dehttp://www.hpc-hessen.de

Center for Scientific ComputingHessisches Kompetenzzentrum für HochleistungsrechnenAnja Gerbes Closing Remarks

Page 97: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

General Contact Information

Anja Gerbes M. Sc. Computer Science Thank you for your attention.Physics Building, room 1.127 Questions?

Tel.: +49 69 798-47356Fax: +49 69 798-47360E-Mail: [email protected]: [email protected]: [email protected] software

[email protected] HPC-questionsWebsite: csc.uni-frankfurt.de

public CSC-Meeting:Every first Wednesday of the month at 10:00amin the physics building, room 1.101.

Anja Gerbes Closing Remarks

Page 98: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

SLURM GlossaryCluster Computing in Frankfurt

Anja Gerbes

Goethe University in Frankfurt/MainCenter for Scientific Computing

December 12, 2017

Page 99: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sbatchSubmitting a Batch Script

I Just exclusive nodes are available on the LOEWE-CSC-Cluster.

I The following commands are therefore not considered:--exclusive exklusive nodes-s, --share shared nodes

Anja Gerbes SLURM Glossary

Page 100: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sbatchSubmitting a Batch Script

Syntax: sbatch myBatchScript.sh

-C, --constraint=<feature> specify features of a Cluster

Intel --constraint=intel20

Intel Ivy Bridge

Intel --constraint=broadwell

Intel Broadwell

AMD --constraint=dual

AMD Magny-Cour with AMD Radeon HD 5800

GPU --partition=gpu

Intel with AMD FirePro S10000

Anja Gerbes SLURM Glossary

Page 101: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sbatchSubmitting a Batch Script

Syntax: sbatch myBatchScript.sh

-p, --partition=<partition> specify partition for the resource-J, --job-name=<jobname> specify a name for allocation-w, --nodelist=<list> specify list of node names-A, --account=<account> select a project-t, --time=<time> set limit on total run time of job-a, --array=<indexes> submit a job array-o, --output=<name-%j>.out save the output file-e, --error=<name-%j>.err save the error log to a file

Anja Gerbes SLURM Glossary

Page 102: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sbatchSubmitting a Batch Script

Syntax: sbatch myBatchScript.sh

-N, --nodes=<min[-max]> No. of nodes requestedHow many nodes will be allocated to this job?

-n, --ntasks=<number> total No. of processes--ntasks-per-node . . . per node--ntasks-per-socket . . . per socket--ntasks-per-core . . . per coreHow many processes are started?

-c, --cpus-per-task=<ncpus>

How many threads run on the node?

No. of processors per taskcontrols the number of CPUsby the allocated task

--tasks-per-node processes per node

Anja Gerbes SLURM Glossary

Page 103: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sbatchSubmitting a Batch Script

Syntax: sbatch myBatchScript.sh

--mem=<MB> sets total memory across all cores--mem-per-cpu=<MB> sets the value for each requested core--mem_bind=<type> bind tasks to memory

Anja Gerbes SLURM Glossary

Page 104: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sbatchSubmitting a Batch Script

Syntax: sbatch myBatchScript.sh

--mail-user=<email> send an E-Mail--mail-type=<mode> status information

choose a mail type:BEGIN, END, FAIL, REQUEUE und ALL

receive an e-mail notification to getinformation about the status changes

Anja Gerbes SLURM Glossary

Page 105: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sbatchSubmitting a Batch Script

-m, --distribution=<block|cyclic|arbitrary|plane>

mapping of processes

The tasks are distributed as follows

block they share a node

cyclic they are distributed over consecutive nodes(Round-Robin-Concept)

arbitrary as in the environment variableSLURM_HOSTFILE

plane in blocks of a predetermined size

Anja Gerbes SLURM Glossary

Page 106: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

Process BindingConstraints each Process to run on specific Processors

srun --cpu_bind process binding to cores & CPUsmpirun --bind-to |-core|-socket|-none|

--cpus-per-proc <perproc>

bind each process to the specified number of cpus

--report-bindings report any bindings for launched processes--slot-list <id> list of processor IDs to be used for binding MPI

processes

Anja Gerbes SLURM Glossary

Page 107: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sacctDisplaying Accounting Data for all Jobs and Job Steps

Syntax: sacct [options]

-b, --brief displays jobid, status, exitcode-e, --helpformat print a list of available fields-o, --format comma separated list of fields

Anja Gerbes SLURM Glossary

Page 108: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sacctmgrShowing Slurm Account Information

Syntax: sacctmgr [options] [command]

list | show display information about the specified entity

sacctmgr list QOS partition format=maxnodes,

maxnodesperuser,

maxjobsperuser,

maxsubmitjobsperuser

In addition to the QOS , there are the following entities:Account, Association, Cluster, Configuration, Event, Problem,Transaction, User & WCKey.

Anja Gerbes SLURM Glossary

Page 109: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

scancelDeleting its own Batch Job

Syntax: scancel <jobid>

-u <username> cancel all the jobs for a user-t PD -u <username> cancel all the pending jobs for a user

Anja Gerbes SLURM Glossary

Page 110: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

sinfoShowing Information about Nodes and Partitions

Syntax: sinfo [options]

-l, --long print more detailed information-n <nodes> print info only about the specific node-p <partition> print info about the specified partition-R, --list-reasons list reasons nodes are in the down,

drained, fail or failing state

-s, --summarize list only a partition state summary with nonode state details

sinfo -p partition

Anja Gerbes SLURM Glossary

Page 111: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

squeueQuery the List of Pending and Running Jobs

Syntax: squeue [options]

-i <seconds> report requested information-j <job_id_list> print list of job IDs--start report expected start time & resources to

be allocated for pending jobs-t <state_list> print specified states of jobs-u <user_list> print jobs from list of users

Anja Gerbes SLURM Glossary

Page 112: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

scontrolShowing detailed Information about Compute Nodes

Syntax: scontrol [options] [command]

-d, --details additional information of the show command-o, --oneliner all information of a record in a row

Anja Gerbes SLURM Glossary

Page 113: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

scontrolShowing detailed Information about Compute Nodes

Syntax: scontrol show [options] [command]

job <job_id> print job informationsnode <name> print node informationspartition <name> print partition informationsreservation print list of reservations

scontrol show partition

scontrol -d show partition job <job_id>

Anja Gerbes SLURM Glossary

Page 114: Cluster Computing in Frankfurtcsc.uni-frankfurt.de/wiki/lib/exe/fetch.php?media=public:clustercomputing...Cluster Facts Batch Usage Center for Scientific Computing Capability computingis

General InformationCluster FactsBatch Usage

Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)

scontrolShowing detailed Information about Compute Nodes

Syntax: scontrol hold|resume|requeue

hold <job_id> pause a particular jobresume <job_id> resume a particular jobrequeue <job_id> requeue (cancel & rerun) a particular job

Anja Gerbes SLURM Glossary