introduction to scientific computing on bu’s linux cluster
DESCRIPTION
Introduction to Scientific Computing on BU’s Linux Cluster. Doug Sondak Linux Clusters and Tiled Display Walls Boston University July 30 – August 1, 2002. Outline. hardware parallelization compilers batch system profilers. Doug Sondak Linux Clusters and Tiled Display Walls - PowerPoint PPT PresentationTRANSCRIPT
Introduction to Scientific Computing on BU’s Linux
Cluster
Doug SondakLinux Clusters and Tiled Display
WallsBoston University
July 30 – August 1, 2002
Outline
• hardware• parallelization• compilers• batch system• profilers
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Hardware
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
BU’s Cluster
• 52 2-processor nodes• specifications
– 2 Pentium III processors per node– 1 GHz– 1 GB memory per node– 32 KB L1 cache per CPU– 256 KB L2 cache per CPU
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
BU’s Cluster (2)
• Myrinet 2000 interconnects– sustained 1.96 Gb/s
• Linux
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Some Timings
• CFD code, MPI, 4 procs.
Origin2000 495
SP 329
Cluster, 2 procs. per box 174
Cluster, 1 proc. per box 153
Regatta 78
Machine Sec.
Parallelization
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Parallelization
• MPI is the recommended method– PVM may also be used
• some MPI tutorials– Boston Universityhttp://scv.bu.edu/Tutorials/MPI/– NCSAhttp://pacont.ncsa.uiuc.edu:8900/public/MPI/
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Parallelization (2)
• OpenMP is available for SMP within a node
• mixed MPI/OpenMP not presently available– we’re working on it!
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Compilers
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Compilers
• Portland Group– pgf77– pgf90– pgcc– pgCC
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Compilers (2)
• gnu– g77– gcc– g++
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Compilers (3)
• Intel– Fortran
ifc
– C/C++ icc
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Compilers (2)Polyhedron F77 Benchmarks http://www.polyhedron.com/
PG gnu Intel AC 8.66 12.38 6.13ADI 8.48 9.27 6.83AIR 16.41 15.65 13.45CHESS 11.67 10.06 10.16DODUC 21.35 36.23 18.18LP8 4.31 7.88 4.16MDB 3.62 3.81 2.94MOLENR 11.66 12.72 7.61PI 24.58 41.95 7.08PNPOLY 3.81 5.24 4.86RO 10.75 10.31 3.92
TFFT 18.84 20.24 20.18
Compilers (3)
• Portland Group– pgf77 generally faster than g77
• Intel – ifc generally faster than pgf77
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Compilers (4)
• Linux C/C++ compilers– gcc/g++ seems to be the standard,
usually described as a good compiler
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Portland Group
-O2– highest level of optimization
-fast– same as -O2 -Munroll -Mnoframe
-Minline– function inlining
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Portland Group (2)
-Mbyteswapio– swaps between big endian and little endian– useful for using files created on our SP,
Regatta, or Origin2000
-Ktrap=fp– trap floating point invalid operation, divide
by zero, or overflow– slows code down, only use for debugging
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Portland Group (3)
-Mbounds– array bounds checking– slows code down, only use for debugging
-mp– process OpenMP directives
-Mconcur– automatic SMP parallelization
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Intel
• Need to set some environment variables– contained in
/usr/local/IT/intel6.0/compiler60/ia32/bin/iccvars.csh
– source this file, copy it into your .cshrc file, or source it in .cshrc
– there’s an identical file called ifcvars.csh to avoid (create?) confusion
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Intel (2)
-O3– highest level of optimization
-ipo– interprocedural optimization
-unroll– loop unrolling
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Intel (3)
-openmp -fpp– process OpenMP directives
-parallel– automatic SMP parallelization
-CB– array bounds checking
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Intel (3)
-CU– check for use of uninitialized variables
• Endian conversion by way of environment variablessetenv F_UFMTENDIAN big
• all reads will be converted from big to little endian, all writes from little to big endian
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Intel (4)
• Can specify units for endian conversionsetenv F_UFMTENDIAN big:10,20
• Can mix endian conversionssetenv F_UFMTENDIAN little;big:10,20
• all units are little endian except for 10 and 20, which wil be converted
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Batch System
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
Batch System
• PBS– different than LSF on O2k’s, SP’s,
Regattas
• there’s only one queuedque
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
qsub
• job submission done through script– script details will followqsub scriptname
• returns job ID• in working directory
– std. out - scriptname.ojobid– std. err - scriptname.ejobid
[sondak@hn003 run]$ qsub corrun808.hn003.nerf.bu.edu
qstat
• Check status of all your jobsqstat
• lies about run time– often (always?) zero
[sondak@hn003 run]$ qstatJob id Name User Time Use S Queue---------------- ---------------- ---------------- ------------ - --------808.hn003 corrun sondak 0 R dque
qstat (2)
• S - job status– Q - queued– R - running– E - exiting (finishing up)
• qstat -f gives detailed statusexec_host = nodem019/0+nodem018/0
+nodem017/0+nodem016/0
• to specify jobidqstat jobid
Other PBS Commands
• kill jobqdel jobid
• some less-important PBS commands– qalter, qhold, qrls, qmsg, qrerun– man pages are available for all
commands
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
PBS Script
• For serial runs
#!/bin/bash# Set the default queue#PBS -q dque
# ppn is cpu's per node#PBS -l nodes=1:ppn=1,walltime=00:30:00cd $PBS_O_WORKDIR
myrun
PBS/MPI
• For MPI, set up gmi file in PBS script
test -d ~/.gmpi || mkdir ~/.gmpiGMCONF=~/.gmpi/conf.$PBS_JOBID/usr/local/xcat/bin/pbsnodefile2gmconf $PBS_NODEFILE > $GMCONFcd $PBS_O_WORKDIRNP=$(head -1 $GMCONF)
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
PBS/MPI (2)
• To run MPI, end PBS script with (all on one line)
mpirun.ch_gm --gm-f $GMCONF --gm-recv polling --gm-use-shmem --gm-kill 5 -np $NP PBS_JOBID=$PBS_JOBID myprog
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
PBS/MPI (3)
• mpirun.ch_gm– version of mpirun that uses myrinet
• --gm-f $GMCONF– access configuration file constructed
above
• --gm-recv polling– poll continually to check for completion of
sends and receives– most efficient for dedicated procs.
• That’s us!
PBS/MPI (4)
• --gm-use-shmem– enable shared-memory support– may improve or degrade performance– try your code with and without it
• --gm-kill 5– if one MPI process aborts, kill others after 5
sec.
Doug Sondak
Linux Clusters and Tiled Display WallsJuly 30 – August 1, 2002
PBS/MPI (5)
• -np $NP– run on NP procs as computed earlier in
script– equals “nodes x ppn” from PBS -l option
• PBS_JOBID=$PBS_JOBID– seems redundant redundant– do it anyway
• myprog– run the darn code already!
Portland Group
• Portland Group Compiler flag– function level
-Mprof=func
– line level-Mprof=lines• much larger file
• creates pgprof.out file in working directory
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
PG (2)
• At unix prompt, type pgprof command• will pop up window with bar chart of
timing results• can take file name argument in case
you’ve renamed the pgprof.out filepgprof pgprof.lines
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
PG (3)
• option to specify source directorypgprof -I sourcedir pgprof.lines
– can specify multiple directories with multiple -I flags
• also can use GUI menu– Options Source Directory...
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
PG (5)
• Calls - number of times routine was called
• Time - time spent in specified routine• Cost - time spent in specified routine
plus time spent in called routines
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002
PG (6)
• Lines profiling– with optimization, may not be able to
identify many (most?) lines in source code• reports results for blocks of code, e.g., loops
– without optimization, doesn’t measure what you really want
– initial screen looks like “func” screen– double-click function/subroutine name to
get line-level listing
Questions/Comments
• Feel free to contact us directly with questions about the cluster or parallelization/optimization issues
Doug Sondak [email protected] Tseng [email protected]
Doug SondakLinux Clusters and Tiled Display Walls
July 30 – August 1, 2002