advanced high performance computing workshop hpc 201
DESCRIPTION
Advanced High Performance Computing Workshop HPC 201. Charles J Antonelli Seth Meyer LSAIT ARS June, 2014. Roadmap. Flux review Globus Connect Advanced PBS Array & dependent scheduling Tools GPUs on Flux Scientific applications R, Python, MATLAB Parallel programming - PowerPoint PPT PresentationTRANSCRIPT
Advanced High Performance
Computing WorkshopHPC 201
Charles J AntonelliSeth MeyerLSAIT ARSJune, 2014
cja 2014 2
RoadmapFlux reviewGlobus ConnectAdvanced PBS
Array & dependent schedulingTools
GPUs on FluxScientific applications
R, Python, MATLABParallel programmingDebugging & profiling
6/14
cja 2014 3
Flux review
6/14
cja 2014 4
The Flux clusterLogin nodes Compute nodes
Storage…
Data transfernode
6/14
cja 2014 5
A Flux node
12-40 Intel cores
48 GB – 1 TB RAM
Local disk
6/14
8 GPUs (GPU Flux)
Each GPU contains 2,688 GPU cores
cja 2014 6
Programming Models
Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network
Used when the data are too large to fit on a single node, and simple synchronization is adequate“Coarse parallelism” or “SPMD”Implemented using MPI (Message Passing Interface) libraries
Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives
Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable“Fine-grained parallelism” or “shared-memory parallelism”Implemented using OpenMP (Open Multi-Processing) compilers and libraries
Both6/14
cja 2014 7
Using FluxThree basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token)
Logging in to Fluxssh flux-login.engin.umich.eduCampus wired or MWirelessOtherwise:
VPNssh login.itd.umich.edu first
6/14
cja 2014 8
Cluster batch workflow
You create a batch script and submit it to PBSPBS schedules your job, and it enters the flux queueWhen its turn arrives, your job will execute the batch scriptYour script has access to all Flux applications and dataWhen your script completes, anything it sent to standard output and error are saved in files stored in your submission directoryYou can ask that email be sent to you when your jobs starts, ends, or abortsYou can check on the status of your job at any time,or delete it if it’s not doing what you wantA short time after your job completes, it disappears from PBS
6/14
cja 2014 9
Loosely-coupled batch script
#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l procs=12,pmem=1gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe
#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmpirun ./c_ex01
6/14
cja 2014 10
Tightly-coupled batch script
#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:ppn=12,mem=4gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe
#Your Code Goes Below:cd $PBS_O_WORKDIRmatlab -nodisplay -r script
6/14
cja 2014 11
GPU batch script#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:gpus=1,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe
#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmatlab -nodisplay -r gpuscript
6/14
cja 2014 12
Copying dataThree ways to copy data to/from Flux
Use scp from login server:scp flux-login.engin.umich.edu:hpc201cja/example563.png .
Use scp from transfer host:scp flux-xfer.engin.umich.edu:hpc201cja/example563.png .
Use Globus Online
6/14
cja 2014 13
Globus OnlineFeatures
High-speed data transfer, much faster than SCP or SFTPReliable & persistentMinimal client software: Mac OS X, Linux, Windows
GridFTP EndpointsGateways through which data flowExist for XSEDE, OSG, …UMich: umich#flux, umich#nyxAdd your own client endpoint!Add your own server endpoint: contact [email protected]
More informationhttp://cac.engin.umich.edu/resources/login-nodes/globus-gridftp 6/14
cja 2014 14
Job Arrays• Submit copies of identical jobs• Invoked via qsub –t:
qsub –t array-spec pbsbatch.txtWhere array-spec can be
m-na,b,cm-n%slotlimit
e.g.qsub –t 1-50%10 Fifty jobs,
numbered 1 through 50,only ten can run
simultaneously• $PBS_ARRAYID records array identifier
14 6/14
cja 2014 15
Dependent scheduling
• Submit jobs whose scheduling depends on other jobs
• Invoked via qsub –W:qsub -W depend=type:jobid[:jobid]…
Where depend can beafter Schedule this job after jobids have startedafterany Schedule this job after jobids have finishedafterok Schedule this job after jobids have finished with no errorsafternotok Schedule this job after jobids have finished with errorsbefore jobids scheduled after this job startsbeforeanyjobids scheduled after this job completesbeforeok jobids scheduled after this job completes with no errorsbeforenotok jobids scheduled after this job completes with errors
15 6/14
cja 2014 16
Troubleshootingshowq [-r][-i][-b][-w user=uniq] # running/idle/blocked jobsqstat -f jobno # full info incl gpu qstat -n jobno # nodes/cores where job runningdiagnose -p # job prio and componentspbsnodes # nodes, states, propertiespbsnodes -l # list nodes marked downcheckjob [-v] jobno # why job jobno not running mdiag -a # allocs & users (flux)freenodes # aggregate node/core busy/freemdiag -u uniq # allocs for uniq (flux)mdiag -a alloc_flux # cores active, alloc (flux)
6/14
cja 2014 17
Scientific applications
6/14
cja 2014 18
Scientific Applications
R (incl snow and multicore)R with GPU (GpuLm, dist)SAS, StataPython, SciPy, NumPy, BioPyMATLAB with GPUCUDA OverviewCUDA C (matrix multiply)
6/14
cja 2014 19
PythonPython software available on Flux
EPDThe Enthought Python Distribution provides scientists with a comprehensive set of tools to perform rigorous data analysis and visualization.https://www.enthought.com/products/epd/biopythonPython tools for computational molecular biologyhttp://biopython.org/wiki/Main_PagenumpyFundamental package for scientific computinghttp://www.numpy.org/scipyPython-based ecosystem of open-source software for mathematics, science, and engineeringhttp://www.scipy.org/
6/14
cja 2014 20
Debugging & profiling
6/14
cja 2014 21
Debugging with GDB
Command-line debuggerStart programs or attach to running programsDisplay source program linesDisplay and change variables or memoryPlant breakpoints, watchpointsExamine stack frames
Excellent tutorial documentationhttp://www.gnu.org/s/gdb/documentation/
21 6/14
cja 2014 22
Compiling for GDBDebugging is easier if you ask the compiler to generate extra source-level debugging information
Add –g flag to your compilationicc –g serialprogram.c –o serialprogramormpicc –g mpiprogram.c –o mpiprogramGDB will work without symbols
Need to be fluent in machine instructions and hexadecimal
Be careful using –O with –gSome compilers won’t optimize code when debuggingMost will, but you sometimes won’t recognize the resulting source code at optimization level -O2 and higher
22 6/14
cja 2014 23
Running GDBTwo ways to invoke GDB:
Debugging a serial program:gdb ./serialprogram
Debugging an MPI program:mpirun -np N xterm -e gdb ./mpiprogram
This gives you N separate GDB sessions, each debugging one rank of the programRemember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start xterms there
6/14
cja 2014 24
Useful GDB commandsgdb exec start gdb on executable execgdb exec core start gdb on executable exec with core file corel [m,n] list sourcedisas disassemble function enclosing current instructiondisas func disassemble function funcb func set breakpoint at entry to funcb line# set breakpoint at source line#b *0xaddr set breakpoint at address addri b show breakpointsd bp# delete beakpoint bp#r [args] run program with optional argsbt show stack backtracec continue execution from breakpointstep single-step one source linenext single-step, don’t step into functionstepi single-step one instructionp var display contents of variable varp *var display value pointed to by varp &var display address of varp arr[idx] display element idx of array arrx 0xaddr display hex word at addrx *0xaddr display hex word pointed to by addrx/20x 0xaddr display 20 words in hex starting at addri r display registersi r ebp display register ebpset var = expression set variable var to expressionq quit gdb
6/14
cja 2014 25
Debugging with DDTAllinea’s Distributed Debugging Tool is a comprehensive graphical debugger designed for the complex task of debugging parallel codeAdvantages include
Provides GUI interface to debuggingSimilar capabilities as, e.g., Eclipse or Visual Studio
Supports parallel debugging of MPI programsScales much better than GDB
6/14
cja 2014 26
Running DDTCompile with -g:mpicc –g mpiprogram.c –o mpiprogram
Load the DDT module:module load ddt
Start DDT:ddt mpiprogram
This starts a DDT session, debugging all ranks concurrentlyRemember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start ddt there
http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/
http://content.allinea.com/downloads/userguide.pdf
6/14
cja 2014 27
Application Profiling with MAP
Allinea’s MAP Tool is a statistical application profiler designed for the complex task of profiling parallel codeAdvantages include
Provides GUI interface to profilingObserve cumulative results, drill down for details
Supports parallel profiling of MPI programsHandles most of the details under the covers
6/14
cja 2014 28
Running MAPCompile with -g:mpicc –g mpiprogram.c –o mpiprogram
Load the MAP module:module load ddt
Start MAP:ddt mpiprogram
This starts a MAP sessionRuns your program, gathers profile data, and displays summary statistics
Remember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start ddt there
http://content.allinea.com/downloads/userguide.pdf
6/14
cja 2014 29
Resourceshttp://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/
U-M Advanced Research Computing Flux pageshttp://arc.research.umich.edu/resources-services/flux/
U-M Advanced Research Computing Flux pageshttp://cac.engin.umich.edu/
CAEN HPC Flux pageshttp://www.youtube.com/user/UMCoECAC
CAEN HPC YouTube channelFor assistance: [email protected]
Read by a team of people including unit support staffCannot help with programming questions, but can help with operational Flux and basic usage questions
2/14
cja 2014 30
References1. Supported Flux software,
http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/, (accessed June 2014)
2. Free Software Foundation, Inc., “GDB User Manual,” http://www.gnu.org/s/gdb/documentation/ (accessed June 2014).
3. Intel C and C++ Compiler 14 User and Reference Guide, https://software.intel.com/en-us/compiler_14.0_ug_c (accessed June 2014).
4. Intel Fortran Compiler 14 User and Reference Guide,https://software.intel.com/en-us/compiler_14.0_ug_f(accessed June 2014).
5. Torque Administrator’s Guide, http://docs.adaptivecomputing.com/torque/4-2-8/torqueAdminGuide-4.2.8.pdf (accessed June 2014).
6. Submitting GPGPU Jobs, https://sites.google.com/a/umich.edu/engin-cac/resources/systems/flux/gpgpus (accessed June 2014).
6/14