introduction to scientific computing on bu’s linux cluster

45
Introduction to Scientific Computing on BU’s Linux Cluster Doug Sondak Linux Clusters and Tiled Display Walls Boston University July 30 – August 1, 2002

Upload: lynley

Post on 08-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Scientific Computing on BU’s Linux Cluster. Doug Sondak Linux Clusters and Tiled Display Walls Boston University July 30 – August 1, 2002. Outline. hardware parallelization compilers batch system profilers. Doug Sondak Linux Clusters and Tiled Display Walls - PowerPoint PPT Presentation

TRANSCRIPT

Introduction to Scientific Computing on BU’s Linux

Cluster

Doug SondakLinux Clusters and Tiled Display

WallsBoston University

July 30 – August 1, 2002

Outline

• hardware• parallelization• compilers• batch system• profilers

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Hardware

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

BU’s Cluster

• 52 2-processor nodes• specifications

– 2 Pentium III processors per node– 1 GHz– 1 GB memory per node– 32 KB L1 cache per CPU– 256 KB L2 cache per CPU

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

BU’s Cluster (2)

• Myrinet 2000 interconnects– sustained 1.96 Gb/s

• Linux

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Some Timings

• CFD code, MPI, 4 procs.

Origin2000 495

SP 329

Cluster, 2 procs. per box 174

Cluster, 1 proc. per box 153

Regatta 78

Machine Sec.

Parallelization

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Parallelization

• MPI is the recommended method– PVM may also be used

• some MPI tutorials– Boston Universityhttp://scv.bu.edu/Tutorials/MPI/– NCSAhttp://pacont.ncsa.uiuc.edu:8900/public/MPI/

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Parallelization (2)

• OpenMP is available for SMP within a node

• mixed MPI/OpenMP not presently available– we’re working on it!

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Compilers

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Compilers

• Portland Group– pgf77– pgf90– pgcc– pgCC

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Compilers (2)

• gnu– g77– gcc– g++

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Compilers (3)

• Intel– Fortran

ifc

– C/C++ icc

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Compilers (2)Polyhedron F77 Benchmarks http://www.polyhedron.com/

PG gnu Intel AC 8.66 12.38 6.13ADI 8.48 9.27 6.83AIR 16.41 15.65 13.45CHESS 11.67 10.06 10.16DODUC 21.35 36.23 18.18LP8 4.31 7.88 4.16MDB 3.62 3.81 2.94MOLENR 11.66 12.72 7.61PI 24.58 41.95 7.08PNPOLY 3.81 5.24 4.86RO 10.75 10.31 3.92

TFFT 18.84 20.24 20.18

Compilers (3)

• Portland Group– pgf77 generally faster than g77

• Intel – ifc generally faster than pgf77

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Compilers (4)

• Linux C/C++ compilers– gcc/g++ seems to be the standard,

usually described as a good compiler

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Portland Group

-O2– highest level of optimization

-fast– same as -O2 -Munroll -Mnoframe

-Minline– function inlining

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Portland Group (2)

-Mbyteswapio– swaps between big endian and little endian– useful for using files created on our SP,

Regatta, or Origin2000

-Ktrap=fp– trap floating point invalid operation, divide

by zero, or overflow– slows code down, only use for debugging

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Portland Group (3)

-Mbounds– array bounds checking– slows code down, only use for debugging

-mp– process OpenMP directives

-Mconcur– automatic SMP parallelization

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Intel

• Need to set some environment variables– contained in

/usr/local/IT/intel6.0/compiler60/ia32/bin/iccvars.csh

– source this file, copy it into your .cshrc file, or source it in .cshrc

– there’s an identical file called ifcvars.csh to avoid (create?) confusion

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Intel (2)

-O3– highest level of optimization

-ipo– interprocedural optimization

-unroll– loop unrolling

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Intel (3)

-openmp -fpp– process OpenMP directives

-parallel– automatic SMP parallelization

-CB– array bounds checking

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Intel (3)

-CU– check for use of uninitialized variables

• Endian conversion by way of environment variablessetenv F_UFMTENDIAN big

• all reads will be converted from big to little endian, all writes from little to big endian

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Intel (4)

• Can specify units for endian conversionsetenv F_UFMTENDIAN big:10,20

• Can mix endian conversionssetenv F_UFMTENDIAN little;big:10,20

• all units are little endian except for 10 and 20, which wil be converted

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Batch System

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Batch System

• PBS– different than LSF on O2k’s, SP’s,

Regattas

• there’s only one queuedque

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

qsub

• job submission done through script– script details will followqsub scriptname

• returns job ID• in working directory

– std. out - scriptname.ojobid– std. err - scriptname.ejobid

[sondak@hn003 run]$ qsub corrun808.hn003.nerf.bu.edu

qstat

• Check status of all your jobsqstat

• lies about run time– often (always?) zero

[sondak@hn003 run]$ qstatJob id Name User Time Use S Queue---------------- ---------------- ---------------- ------------ - --------808.hn003 corrun sondak 0 R dque

qstat (2)

• S - job status– Q - queued– R - running– E - exiting (finishing up)

• qstat -f gives detailed statusexec_host = nodem019/0+nodem018/0

+nodem017/0+nodem016/0

• to specify jobidqstat jobid

Other PBS Commands

• kill jobqdel jobid

• some less-important PBS commands– qalter, qhold, qrls, qmsg, qrerun– man pages are available for all

commands

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

PBS Script

• For serial runs

#!/bin/bash# Set the default queue#PBS -q dque

# ppn is cpu's per node#PBS -l nodes=1:ppn=1,walltime=00:30:00cd $PBS_O_WORKDIR

myrun

PBS/MPI

• For MPI, set up gmi file in PBS script

test -d ~/.gmpi || mkdir ~/.gmpiGMCONF=~/.gmpi/conf.$PBS_JOBID/usr/local/xcat/bin/pbsnodefile2gmconf $PBS_NODEFILE > $GMCONFcd $PBS_O_WORKDIRNP=$(head -1 $GMCONF)

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

PBS/MPI (2)

• To run MPI, end PBS script with (all on one line)

mpirun.ch_gm --gm-f $GMCONF --gm-recv polling --gm-use-shmem --gm-kill 5 -np $NP PBS_JOBID=$PBS_JOBID myprog

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

PBS/MPI (3)

• mpirun.ch_gm– version of mpirun that uses myrinet

• --gm-f $GMCONF– access configuration file constructed

above

• --gm-recv polling– poll continually to check for completion of

sends and receives– most efficient for dedicated procs.

• That’s us!

PBS/MPI (4)

• --gm-use-shmem– enable shared-memory support– may improve or degrade performance– try your code with and without it

• --gm-kill 5– if one MPI process aborts, kill others after 5

sec.

Doug Sondak

Linux Clusters and Tiled Display WallsJuly 30 – August 1, 2002

PBS/MPI (5)

• -np $NP– run on NP procs as computed earlier in

script– equals “nodes x ppn” from PBS -l option

• PBS_JOBID=$PBS_JOBID– seems redundant redundant– do it anyway

• myprog– run the darn code already!

Profiling

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

Portland Group

• Portland Group Compiler flag– function level

-Mprof=func

– line level-Mprof=lines• much larger file

• creates pgprof.out file in working directory

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

PG (2)

• At unix prompt, type pgprof command• will pop up window with bar chart of

timing results• can take file name argument in case

you’ve renamed the pgprof.out filepgprof pgprof.lines

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

PG (3)

• option to specify source directorypgprof -I sourcedir pgprof.lines

– can specify multiple directories with multiple -I flags

• also can use GUI menu– Options Source Directory...

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

PG (4)

PG (5)

• Calls - number of times routine was called

• Time - time spent in specified routine• Cost - time spent in specified routine

plus time spent in called routines

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002

PG (6)

• Lines profiling– with optimization, may not be able to

identify many (most?) lines in source code• reports results for blocks of code, e.g., loops

– without optimization, doesn’t measure what you really want

– initial screen looks like “func” screen– double-click function/subroutine name to

get line-level listing

PG (7)

Questions/Comments

• Feel free to contact us directly with questions about the cluster or parallelization/optimization issues

Doug Sondak [email protected] Tseng [email protected]

Doug SondakLinux Clusters and Tiled Display Walls

July 30 – August 1, 2002