introduction to hpc workshop october 9 2014. introduction rob lane hpc support research computing...

Post on 18-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to HPC Workshop

October 9 2014

Introduction

Rob Lane

HPC Support

Research Computing Services

CUIT

Introduction

HPC Basics

Introduction

First HPC Workshop

Yeti

• 2 head nodes

• 101 execute nodes

• 200 TB storage

Yeti

• 101 execute nodes–38 x 64 GB–8 x 128 GB–35 x 256 GB–16 x 64 GB + Infiniband–4 x 64 GB + nVidia K20 GPU

Yeti

• CPU– Intel E5-2650L–1.8 GHz–8 Cores–2 per Execute Node

Yeti

• Expansion Round–66 new systems–Faster CPU–More Infiniband–More GPU (nVidia K40)–ETA January 2015

Yeti

HP S6500 Chassis

HP SL230 Server

Job Scheduler

• Manages the cluster

• Decides when a job will run

• Decides where a job will run

• We use Torque/Moab

Job Queues

• Jobs are submitted to a queue

• Jobs sorted in priority order

• Not a FIFO

Access

Mac Instructions

1. Run terminal

Access

Windows Instructions

1. Search for putty on Columbia home page

2. Select first result

3. Follow link to Putty download page

4. Download putty.exe

5. Run putty.exe

Access

Mac (Terminal)

$ ssh UNI@yetisubmit.cc.columbia.edu

Windows (Putty)

Host Name: yetisubmit.cc.columbia.edu

Work Directory

$ cd /vega/free/users/your UNI

• Replace “your UNI” with your UNI

$ cd /vega/free/users/hpc2108

Copy Workshop Files

• Files are in /tmp/workshop

$ cp /tmp/workshop/* .

Editing

No single obvious choice for editor

• vi – simple but difficult at first• emacs – powerful but complex• nano – simple but not really standard

nano

$ nano hellosubmit

“^” means “hold down control”

^a : go to beginning of line

^e : go to end of line

^k: delete line

^o: save file

^x: exit

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

# Set output and error directories

#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI

# Print "Hello World"

echo "Hello World"

# Sleep for 10 seconds

sleep 10

# Print date and time

date

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

# Set output and error directories

#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI

# Print "Hello World"

echo "Hello World"

# Sleep for 10 seconds

sleep 10

# Print date and time

date

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m n#PBS -V

hellosubmit

#!/bin/sh

# Directives

#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m n#PBS -V

hellosubmit

# Set output and error directories

#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI

hellosubmit

# Set output and error directories

#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI

hellosubmit

# Print "Hello World"

echo "Hello World"

# Sleep for 10 seconds

sleep 10

# Print date and time

date

hellosubmit

$ qsub hellosubmit

hellosubmit

$ qsub hellosubmit298151.elk.cc.columbia.edu$

hellosubmit

$ qsub hellosubmit298151.elk.cc.columbia.edu$

qstat

$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1

hellosubmit

$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1

hellosubmit

$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1

hellosubmit

$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1

hellosubmit

$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1

hellosubmit

$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1

hellosubmit

$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1$ qstat 298151qstat: Unknown Job Id Error 298151.elk.cc.columbia.edu

hellosubmit

$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

hellosubmit

$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

hellosubmit

$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

hellosubmit

$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

hellosubmit

$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

hellosubmit

$ cat HelloWorld.o298151Hello WorldThu Oct 9 12:44:05 EDT 2014

hellosubmit

$ cat HelloWorld.o298151Hello WorldThu Oct 9 12:44:05 EDT 2014

Any Questions?

Interactive

• Most jobs run as “batch”• Can also run interactive jobs• Get a shell on an execute node• Useful for development, testing,

troubleshooting

Interactive

$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

Interactive

$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

Interactive

$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

Interactive

$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

Interactive

$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

Interactive

$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

Interactive

$ qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mbqsub: waiting for job 298158.elk.cc.columbia.edu to start

Interactive

qsub: job 298158.elk.cc.columbia.edu ready

.--. ,-,-,--(/o o\)-,-,-,. ,' // oo \\ ', ,' /| __ |\ ', ,' //\,__,/\\ ', , /\ /\ , , /'`\ /' \ , | /' `\ /' '\ | | \ ( ) / | ( /\| /' '\ |/\ ) \| /' /'`\ '\ |/ | /' `\ | ( ( ) ) `\ \ /' /' `\ \ /' /' / / \ \ v v v v v v +--------------------------------+ | | | You are in an interactive job. | | | | Your walltime is 00:05:00 | | | +--------------------------------+

Interactive

$ hostnamecharleston.cc.columbia.edu

Interactive

$ exitlogout

qsub: job 298158.elk.cc.columbia.edu completed$

GUI

• Can run GUI’s in interactive jobs

• Need X Server on your local system

• See user documentation for more information

User Documentation

• hpc.cc.columbia.edu

• Go to “HPC Support”

• Click on Yeti user documentation

Job Queues

• Scheduler puts all jobs into a queue

• Queue selected automatically

• Queues have different settings

Queue Time Limit Memory Limit

Max. User Run

Batch 1 12 hours 4 GB 512

Batch 2 12 hours 16 GB 128

Batch 3 5 days 16 GB 64

Batch 4 3 days None 8

Interactive 4 hours None 4

Job Queues

qstat -q

$ qstat -q

server: elk.cc.columbia.edu

Queue Memory CPU Time Walltime Node Run Que Lm State---------------- ------ -------- -------- ---- --- --- -- -----batch1 4gb -- 12:00:00 -- 42 15 -- E Rbatch2 16gb -- 12:00:00 -- 129 73 -- E Rbatch3 16gb -- 120:00:0 -- 148 261 -- E Rbatch4 -- -- 72:00:00 -- 11 12 -- E Rinteractive -- -- 04:00:00 -- 0 1 -- E Rinterlong -- -- 48:00:00 -- 0 0 -- E Rroute -- -- -- -- 0 0 -- E R ----- ----- 330 362

yetifree

• Maximum processors limited–Currently 4 maximum

• Storage quota–16 GB

• No email support

yetifree

$ quota -sDisk quotas for user hpc2108 (uid 242275): Filesystem blocks quota limit grace files quota limit gracehpc-cuit-storage-2.cc.columbia.edu:/free/ 122M 16384M 16384M 8 4295m 4295m

yetifree

$ quota -sDisk quotas for user hpc2108 (uid 242275): Filesystem blocks quota limit grace files quota limit gracehpc-cuit-storage-2.cc.columbia.edu:/free/ 122M 16384M 16384M 8 4295m 4295m

email

from: root <hpc-noreply@columbia.edu>to: hpc2108@columbia.edudate: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu

PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

email

from: root <hpc-noreply@columbia.edu>to: hpc2108@columbia.edudate: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu

PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

email

from: root <hpc-noreply@columbia.edu>to: hpc2108@columbia.edudate: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu

PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

email

from: root <hpc-noreply@columbia.edu>to: hpc2108@columbia.edudate: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu

PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

email

from: root <hpc-noreply@columbia.edu>to: hpc2108@columbia.edudate: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu

PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

email

from: root <hpc-noreply@columbia.edu>to: hpc2108@columbia.edudate: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu

PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

email

from: root <hpc-noreply@columbia.edu>to: hpc2108@columbia.edudate: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu

PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

email

from: root <hpc-noreply@columbia.edu>to: hpc2108@columbia.edudate: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu

PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

Intern

• Research Computing Services (RCS) is looking for an intern

• Paid position• ~10 hours a week• Will be on LionShare next week

MPI

• Message Passing Interface

• Allows applications to run across multiple computers

MPI

• Edit MPI submit file

• Load MPI environment module

• Compile sample program

MPI

#!/bin/sh

# Directives

#PBS -N MpiHello#PBS -W group_list=yetifree#PBS -l nodes=3:ppn=1,walltime=00:01:00,mem=20mb#PBS -M UNI@columbia.edu#PBS -m abe#PBS -V

# Set output and error directories

#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI

# Load mpi module.

module load openmpi

# Run mpi program.

mpirun mpihello

MPI

$ module load openmpi$ which mpicc/usr/local/openmpi/bin/mpicc$ mpicc -o mpihello mpihello.c

MPI

$ qsub mpisubmit298501.elk.cc.columbia.edu

Questions?

Any questions?

top related