advanced high performance computing workshop hpc 201

30
Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Seth Meyer LSAIT ARS June, 2014

Upload: roch

Post on 23-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Advanced High Performance Computing Workshop HPC 201. Charles J Antonelli Seth Meyer LSAIT ARS June, 2014. Roadmap. Flux review Globus Connect Advanced PBS Array & dependent scheduling Tools GPUs on Flux Scientific applications R, Python, MATLAB Parallel programming - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced High  Performance Computing Workshop HPC  201

Advanced High Performance

Computing WorkshopHPC 201

Charles J AntonelliSeth MeyerLSAIT ARSJune, 2014

Page 2: Advanced High  Performance Computing Workshop HPC  201

cja 2014 2

RoadmapFlux reviewGlobus ConnectAdvanced PBS

Array & dependent schedulingTools

GPUs on FluxScientific applications

R, Python, MATLABParallel programmingDebugging & profiling

6/14

Page 3: Advanced High  Performance Computing Workshop HPC  201

cja 2014 3

Flux review

6/14

Page 4: Advanced High  Performance Computing Workshop HPC  201

cja 2014 4

The Flux clusterLogin nodes Compute nodes

Storage…

Data transfernode

6/14

Page 5: Advanced High  Performance Computing Workshop HPC  201

cja 2014 5

A Flux node

12-40 Intel cores

48 GB – 1 TB RAM

Local disk

6/14

8 GPUs (GPU Flux)

Each GPU contains 2,688 GPU cores

Page 6: Advanced High  Performance Computing Workshop HPC  201

cja 2014 6

Programming Models

Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network

Used when the data are too large to fit on a single node, and simple synchronization is adequate“Coarse parallelism” or “SPMD”Implemented using MPI (Message Passing Interface) libraries

Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives

Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable“Fine-grained parallelism” or “shared-memory parallelism”Implemented using OpenMP (Open Multi-Processing) compilers and libraries

Both6/14

Page 7: Advanced High  Performance Computing Workshop HPC  201

cja 2014 7

Using FluxThree basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token)

Logging in to Fluxssh flux-login.engin.umich.eduCampus wired or MWirelessOtherwise:

VPNssh login.itd.umich.edu first

6/14

Page 8: Advanced High  Performance Computing Workshop HPC  201

cja 2014 8

Cluster batch workflow

You create a batch script and submit it to PBSPBS schedules your job, and it enters the flux queueWhen its turn arrives, your job will execute the batch scriptYour script has access to all Flux applications and dataWhen your script completes, anything it sent to standard output and error are saved in files stored in your submission directoryYou can ask that email be sent to you when your jobs starts, ends, or abortsYou can check on the status of your job at any time,or delete it if it’s not doing what you wantA short time after your job completes, it disappears from PBS

6/14

Page 9: Advanced High  Performance Computing Workshop HPC  201

cja 2014 9

Loosely-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l procs=12,pmem=1gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmpirun ./c_ex01

6/14

Page 10: Advanced High  Performance Computing Workshop HPC  201

cja 2014 10

Tightly-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:ppn=12,mem=4gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cd $PBS_O_WORKDIRmatlab -nodisplay -r script

6/14

Page 11: Advanced High  Performance Computing Workshop HPC  201

cja 2014 11

GPU batch script#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:gpus=1,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmatlab -nodisplay -r gpuscript

6/14

Page 12: Advanced High  Performance Computing Workshop HPC  201

cja 2014 12

Copying dataThree ways to copy data to/from Flux

Use scp from login server:scp flux-login.engin.umich.edu:hpc201cja/example563.png .

Use scp from transfer host:scp flux-xfer.engin.umich.edu:hpc201cja/example563.png .

Use Globus Online

6/14

Page 13: Advanced High  Performance Computing Workshop HPC  201

cja 2014 13

Globus OnlineFeatures

High-speed data transfer, much faster than SCP or SFTPReliable & persistentMinimal client software: Mac OS X, Linux, Windows

GridFTP EndpointsGateways through which data flowExist for XSEDE, OSG, …UMich: umich#flux, umich#nyxAdd your own client endpoint!Add your own server endpoint: contact [email protected]

More informationhttp://cac.engin.umich.edu/resources/login-nodes/globus-gridftp 6/14

Page 14: Advanced High  Performance Computing Workshop HPC  201

cja 2014 14

Job Arrays• Submit copies of identical jobs• Invoked via qsub –t:

qsub –t array-spec pbsbatch.txtWhere array-spec can be

m-na,b,cm-n%slotlimit

e.g.qsub –t 1-50%10 Fifty jobs,

numbered 1 through 50,only ten can run

simultaneously• $PBS_ARRAYID records array identifier

14 6/14

Page 15: Advanced High  Performance Computing Workshop HPC  201

cja 2014 15

Dependent scheduling

• Submit jobs whose scheduling depends on other jobs

• Invoked via qsub –W:qsub -W depend=type:jobid[:jobid]…

Where depend can beafter Schedule this job after jobids have startedafterany Schedule this job after jobids have finishedafterok Schedule this job after jobids have finished with no errorsafternotok Schedule this job after jobids have finished with errorsbefore jobids scheduled after this job startsbeforeanyjobids scheduled after this job completesbeforeok jobids scheduled after this job completes with no errorsbeforenotok jobids scheduled after this job completes with errors

15 6/14

Page 16: Advanced High  Performance Computing Workshop HPC  201

cja 2014 16

Troubleshootingshowq [-r][-i][-b][-w user=uniq] # running/idle/blocked jobsqstat -f jobno # full info incl gpu qstat -n jobno # nodes/cores where job runningdiagnose -p # job prio and componentspbsnodes # nodes, states, propertiespbsnodes -l # list nodes marked downcheckjob [-v] jobno # why job jobno not running mdiag -a # allocs & users (flux)freenodes # aggregate node/core busy/freemdiag -u uniq # allocs for uniq (flux)mdiag -a alloc_flux # cores active, alloc (flux)

6/14

Page 17: Advanced High  Performance Computing Workshop HPC  201

cja 2014 17

Scientific applications

6/14

Page 18: Advanced High  Performance Computing Workshop HPC  201

cja 2014 18

Scientific Applications

R (incl snow and multicore)R with GPU (GpuLm, dist)SAS, StataPython, SciPy, NumPy, BioPyMATLAB with GPUCUDA OverviewCUDA C (matrix multiply)

6/14

Page 19: Advanced High  Performance Computing Workshop HPC  201

cja 2014 19

PythonPython software available on Flux

EPDThe Enthought Python Distribution provides scientists with a comprehensive set of tools to perform rigorous data analysis and visualization.https://www.enthought.com/products/epd/biopythonPython tools for computational molecular biologyhttp://biopython.org/wiki/Main_PagenumpyFundamental package for scientific computinghttp://www.numpy.org/scipyPython-based ecosystem of open-source software for mathematics, science, and engineeringhttp://www.scipy.org/

6/14

Page 20: Advanced High  Performance Computing Workshop HPC  201

cja 2014 20

Debugging & profiling

6/14

Page 21: Advanced High  Performance Computing Workshop HPC  201

cja 2014 21

Debugging with GDB

Command-line debuggerStart programs or attach to running programsDisplay source program linesDisplay and change variables or memoryPlant breakpoints, watchpointsExamine stack frames

Excellent tutorial documentationhttp://www.gnu.org/s/gdb/documentation/

21 6/14

Page 22: Advanced High  Performance Computing Workshop HPC  201

cja 2014 22

Compiling for GDBDebugging is easier if you ask the compiler to generate extra source-level debugging information

Add –g flag to your compilationicc –g serialprogram.c –o serialprogramormpicc –g mpiprogram.c –o mpiprogramGDB will work without symbols

Need to be fluent in machine instructions and hexadecimal

Be careful using –O with –gSome compilers won’t optimize code when debuggingMost will, but you sometimes won’t recognize the resulting source code at optimization level -O2 and higher

22 6/14

Page 23: Advanced High  Performance Computing Workshop HPC  201

cja 2014 23

Running GDBTwo ways to invoke GDB:

Debugging a serial program:gdb ./serialprogram

Debugging an MPI program:mpirun -np N xterm -e gdb ./mpiprogram

This gives you N separate GDB sessions, each debugging one rank of the programRemember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start xterms there

6/14

Page 24: Advanced High  Performance Computing Workshop HPC  201

cja 2014 24

Useful GDB commandsgdb exec start gdb on executable execgdb exec core start gdb on executable exec with core file corel [m,n] list sourcedisas disassemble function enclosing current instructiondisas func disassemble function funcb func set breakpoint at entry to funcb line# set breakpoint at source line#b *0xaddr set breakpoint at address addri b show breakpointsd bp# delete beakpoint bp#r [args] run program with optional argsbt show stack backtracec continue execution from breakpointstep single-step one source linenext single-step, don’t step into functionstepi single-step one instructionp var display contents of variable varp *var display value pointed to by varp &var display address of varp arr[idx] display element idx of array arrx 0xaddr display hex word at addrx *0xaddr display hex word pointed to by addrx/20x 0xaddr display 20 words in hex starting at addri r display registersi r ebp display register ebpset var = expression set variable var to expressionq quit gdb

6/14

Page 25: Advanced High  Performance Computing Workshop HPC  201

cja 2014 25

Debugging with DDTAllinea’s Distributed Debugging Tool is a comprehensive graphical debugger designed for the complex task of debugging parallel codeAdvantages include

Provides GUI interface to debuggingSimilar capabilities as, e.g., Eclipse or Visual Studio

Supports parallel debugging of MPI programsScales much better than GDB

6/14

Page 26: Advanced High  Performance Computing Workshop HPC  201

cja 2014 26

Running DDTCompile with -g:mpicc –g mpiprogram.c –o mpiprogram

Load the DDT module:module load ddt

Start DDT:ddt mpiprogram

This starts a DDT session, debugging all ranks concurrentlyRemember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start ddt there

http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/

http://content.allinea.com/downloads/userguide.pdf

6/14

Page 27: Advanced High  Performance Computing Workshop HPC  201

cja 2014 27

Application Profiling with MAP

Allinea’s MAP Tool is a statistical application profiler designed for the complex task of profiling parallel codeAdvantages include

Provides GUI interface to profilingObserve cumulative results, drill down for details

Supports parallel profiling of MPI programsHandles most of the details under the covers

6/14

Page 28: Advanced High  Performance Computing Workshop HPC  201

cja 2014 28

Running MAPCompile with -g:mpicc –g mpiprogram.c –o mpiprogram

Load the MAP module:module load ddt

Start MAP:ddt mpiprogram

This starts a MAP sessionRuns your program, gathers profile data, and displays summary statistics

Remember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start ddt there

http://content.allinea.com/downloads/userguide.pdf

6/14

Page 29: Advanced High  Performance Computing Workshop HPC  201

cja 2014 29

Resourceshttp://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/

U-M Advanced Research Computing Flux pageshttp://arc.research.umich.edu/resources-services/flux/

U-M Advanced Research Computing Flux pageshttp://cac.engin.umich.edu/

CAEN HPC Flux pageshttp://www.youtube.com/user/UMCoECAC

CAEN HPC YouTube channelFor assistance: [email protected]

Read by a team of people including unit support staffCannot help with programming questions, but can help with operational Flux and basic usage questions

2/14

Page 30: Advanced High  Performance Computing Workshop HPC  201

cja 2014 30

References1. Supported Flux software,

http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/, (accessed June 2014)

2. Free Software Foundation, Inc., “GDB User Manual,” http://www.gnu.org/s/gdb/documentation/ (accessed June 2014).

3. Intel C and C++ Compiler 14 User and Reference Guide, https://software.intel.com/en-us/compiler_14.0_ug_c (accessed June 2014).

4. Intel Fortran Compiler 14 User and Reference Guide,https://software.intel.com/en-us/compiler_14.0_ug_f(accessed June 2014).

5. Torque Administrator’s Guide, http://docs.adaptivecomputing.com/torque/4-2-8/torqueAdminGuide-4.2.8.pdf (accessed June 2014).

6. Submitting GPGPU Jobs, https://sites.google.com/a/umich.edu/engin-cac/resources/systems/flux/gpgpus (accessed June 2014).

6/14