introduction to high performance computingrcg.group.shef.ac.uk/courses/hpcintro/downloads/hpc... ·...

Post on 24-Sep-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

INTRODUCTIONTO HIGH PERFORMANCE

COMPUTING

Course material:

http://rcg.group.shef.ac.uk/courses/hpcintro/

GETTING STARTEDGetting an Account

Before you can start using Bessemer you need to register for an account.

Students can also have an account on Bessemer with the permission of their supervisors.

Accounts are available by emailing helpdesk@sheffield.ac.uk

Connecting to BessemerWindows - Putty / MobaXterm

Download and install Putty, MobaXterm or other client.

Hostname:<u>@bessemer.shef.ac.uk

Connecting to BessemerLinux / macOS

Linux and macOS both have a terminal emulator pre-installed.

Once you have a terminal open run the following command:

ssh -X <username>@bessemer.shef.ac.ukssh -Y <username>@bessemer.shef.ac.uk

where you replace <username> with your CICS username.

Connecting to the Training Cluster

Host: traininghpcUsername: Muse usernamePassword: Muse password

Connecting to ShARCPlatform independent

Open a browser and type:https://myapps.shef.ac.uk

Log in with your university account.

Click on Connect via myAPPs Portal and log in.

INTRODUCTION

The supercomputer is a computer with a high level of computing performance compared to a general-purpose computer.

Bessemer specifications:• CPU cores: 1040• Memory: 5184 GiB• GPUs: 4• Storage: 460 TiB

Machine

Dell PowerEdge C6420

Central Processing Units:• 2 x Intel Xeon Gold 6138 • 2.00 GHz;

Memory:• 192 GB• 2666 MHz• DDR4.

General CPU node specifications25 nodes are publicly available

Operating System

• Centos 7.x • Interactive and batch job

scheduling software: Slurm

• Many applications, compilers, libraries and parallel processing.

Worker node #1

Worker node #2

Worker node #3

Worker node #4

Worker node #25

Login node #1

Login node #2Shared userfile storage

Two Bessemer head-nodes are gateways to the cluster of worker nodes.

Head-nodes’ main purpose is to allow access to the worker nodes but NOT to run cpu intensive programs.

All cpu intensive computations must be performed on the worker nodes. This is achieved by;

srun --pty bash -i

RUNNING SIMPLE PROGRAMS

Setting up your softwaredevelopment environment

You can setup your software environment for a job by the command

module

All the available software environments can be listed by using

module avail

You can then select the ones you wish to use by using

module add

Using modules• List modules• Available Modules• Load Module

Write a simple “Hello World!” application and run it!

Demonstration

PRACTICE SESSION

Start an interactive session on the training machine usingsrun --pty bash -i

tar -xvf /usr/local/courses/hpc_intro_long.tgz

In LOGIN NODE extract the course examples:

We are studying inflammation in patients who have been given a new treatment for arthritis we need to analyse the first set of inflammation data. The data sets are held in comma separated variable (csv) format. Each row holds the observations for one patient. Each column holds the inflammation measured in one day.

For this practice session we will run the R application The lastest version of R can be loaded withmodule load apps/R/3.6.1/binary

Change directory to the ‘hpc_intro_long/data’ directory and run R with$ R

From the R session you can run a series of commands to plot the inflammation data.dat <- read.csv(file = "inflammation-01.csv", header = FALSE)avg_day_inflammation <- apply(dat, 2, mean)plot(avg_day_inflammation)

to exit R typeq()

MANAGING YOUR JOBS

Slurm is the resource management system, job scheduling and batch control system. Starts up

interactive jobs on available workers.

Schedules all batch orientated‘i.e. non-interactive’ jobs

Attempts to create a fair-share environment

Optimises resource utilisation

Difference between interactiveand non-interactive jobs

Until now, you have used interactive jobs. However, there are certain facts that cannot be ignored:

• Maximum time limit for interactive jobs is 8 hours.

• You must keep your connection alive!

Inconvenient or impossible to solve time consuming problems.

Solution? Non-interactive jobs

NON-INTERACTIVE JOBS

1) Write a job-submission shell script

You can submit your job, using a shell script. A general job-submission shell script contains the “bang-line” in the first row.#!/bin/bash

2) Next you may specify some options, such as memory limit.

#SBATCH --"OPTION"="VALUE"

3) Load the approipate modules if necessery.

module use "MODULE NAME”

4) Run your program by using the Slurm “srun” command.

srun "PROGRAM"

Save the script (“submission.sh”) and use

sbatch submission.sh

Note the job submission number. For example:

Submitted batch job 1226

Check your output file when the job is finished.

cat "JOB_NAME"-1226.out

JOB SUBMISSION

Jobs typically pass through several states in the course of their execution. The typical states are PENDING, RUNNING, SUSPENDED, COMPLETING, and COMPLETED.

Display the job queue.

squeue

Shows job details:

sacct -v

Deletes job from queue:

scancel "JOB_ID"

Managing Jobs monitoring and controlling your jobs

Additional options for job submission

Name your submission:

#SBATCH --comment=test_job

Specify nodes and tasks for MPI jobs:

#SBATCH --nodes=1#SBATCH --ntasks-per-node=16

Memory allocation:

#SBATCH --mem=16000

Additional options for job submissionSpecify the output file name:

#SBATCH --output=output.%j.test.out

Request time:

#SBATCH --time=00:30:00

Email notification:

#SBATCH --mail-user=username@sheffield.ac.uk

For the full list of the available options please visit the Slurm manual webpage at https://slurm.schedmd.com/pdfs/summary.pdf.

#!/bin/bash#SBATCH --nodes=1#SBATCH --ntasks-per-node=40#SBATCH --mem=64000#SBATCH --mail-user=username@sheffield.ac.uk

module load OpenMPI/3.1.3-GCC-8.2.0-2.31.1

srun programMaximum 40 cores can be requested per node in the general use queues.

EXAMPLE

DEMONSTRATION

Write a single script!

#!/bin/bashmodule add apps/python/3.6/binary srun python hello.py You simply type; sbatch myjob.sh

PRACTICE SESSION

Submit your R job using the command

sbatch rslurm.sh

Change directory to the r folder of the course examples

Inspect the script file rslurm.sh and check that it will execute The R job for analysing computing the means of the inflammationdata sets.

JOB ARRAY

Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily.

All jobs must have the same initial options (e.g. size, time limit, etc.),

#SBATCH --array=0-4

Job arrays are only supported for batch jobs and the array index values are specified using the --array or -a option of the sbatch command.

The option argument can be specific array index values, a range of index values, and an optional step size

JOB ARRAY

Job ID and Environment Variables

Job arrays will have two additional environment variable set.

SLURM_ARRAY_JOB_ID will be set to the first job ID of the array.SLURM_ARRAY_TASK_ID will be set to the job array index value.

srun ./fish < fish${SLURM_ARRAY_TASK_ID}.in > fish${SLURM_ARRAY_TASK_ID}.out

Submitting a Job Array

#!/bin/bash

#SBATCH --array=0-4

srun ./fish < fish${SLURM_ARRAY_TASK_ID}.in > fish${SLURM_ARRAY_TASK_ID}.out

Job submission script (named submit.sh):

Job submission:

sbatch submit.sh

Getting help• Web site

- http://www.shef.ac.uk/cics/research• Iceberg Documentation

- http://www.sheffield.ac.uk/cics/research/hpc/iceberg• Training (also uses the learning management system)

- http://www.shef.ac.uk/cics/research/training• Discussion Group (based on google groups)

- https://groups.google.com/a/sheffield.ac.uk/forum/?hl=en-GB#!forum/hpc• E-mail the group hpc@sheffield.ac.uk Help on google groups

- http://www.sheffield.ac.uk/cics/groups• Contacts

- research-it@sheffield.ac.uk

top related