computing services at the computing centre - luis - … · 2013-02-04 · fluentfluid dynamics...
TRANSCRIPT
Computing services at thecomputing centre
Scientific Computing Group
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 1
Contents1 Access
2 Workflow
3 Available computing power
4 Applications and Software
5 The modules environment
6 Using the batch system
7 Example batch scripts
8 File systems
9 When problems occur. . .
10 Links to further information
11 Contact information
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 2
Access–Applying for an account
A “project” already exists:your “project leader” can apply for a new account for an existing “project”on the BIAS website: https://bias.rrzn.uni-hannover.de
No existing project:fill out an ORG.BEN4afterwards, apply for accounts via BIAS
Tips:the requested username should reflect the user’s nameplease provide the user’s email address
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 3
Access to the system
Orac and Avon are the login nodesHere it is possible to:
Prepare computing jobsPrepare batch scriptsSend computing jobs into the queueRun small testsShow the queue statusView simulation resultsCopy files into the archive systemCopy files to your desktop computer
These nodes are NOT for production simulations: processes will be killedafter a maximum run time of 30 minutes
All applications, compilers and tools are available
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 4
Access to the system (Windows; graphical interface)
NXFreely downloadable clientVery fast; can be used on slow connectionshttp://www.rrzn.uni-hannover.de/cluster-zugang.html#c15446
X-Win32Price: 10 ehttp://www.rrzn.uni-hannover.de/cluster-zugang.html#c12246
XMingFree (Open Source Software)Uses PuTTY for a secure connection to the login nodeshttp://www.rrzn.uni-hannover.de/cluster-zugang.html#c12252
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 5
Access to the system (Windows; other software)
PuTTY: Usage via the command line on to the login nodesFree (Open Source Software)Uses SSH (the Secure Shell) for an encrypted connectionhttp://www.rrzn.uni-hannover.de/cluster-zugang.html#c14762
FileZilla: File transferFree (Open Source Software)Has an easy-to-use graphical user interfacehttp://filezilla-project.org/
Documentation:http://www.rrzn.uni-hannover.de/cluster-zugang.html?&L=1
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 6
Sketch of the cluster system
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 7
Normal workflow
Copy files from desktop PC to one of the login nodes
Make sure that the program works as expected
Write a batch script for the simulation
Submit the batch script into the queue⇒ qsub <batchscript>Check the job status⇒ qstat -ahttp://www.rrzn.uni-hannover.de/betriebsstatus.htmlDelete unnecessary files
Optional: copy simulation results to desktop PC
Optional: copy simulation results into the archive system⇒ Account access to the archive needs to be set via BIAS
http://www.rrzn.uni-hannover.de/batchsystem.html?&L=1
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 8
Available computing power3 Clusters (for jobs needing lots of CPUs)
Tane: 96 nodes; 12 cores each @ 2.9 GHz; 48 GBParis: 11 nodes; 8 cores each @ 3 GHz; 64 GBTaurus: 54 nodes; 12 cores each @ 2.66 GHz; 48 GB
14 SMP computers (for jobs needing lots of RAM)Estragon: 16 cores @ 2.4 GHz; 96 GBVladimir, Lucky: 24 cores each @ 2.6 GHz; 256 GBSMP: 9 nodes; 24 cores each @ 2.0 GHz; 256 GBCentaurus: 32 cores @ 2.7 GHz; 512 GBHelena: 160 cores @ 2.0 GHz; 640 GB
http://www.rrzn.uni-hannover.de/clustersystem.html?&L=1
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 9
Available computing power (cont.)
1 GPU workstation (for jobs able to use GPGPUs)Tesla: 8 CPU cores each @ 2.5 GHz; 24 GB4 »Tesla« NVIDIA GPU cards; each card respectively:
Total memory: 4 GBNumber of multiprocessors: 30Number of GPU cores: 240
Very large projects are able to use the HLRNseparate application method to that of the computing centre»test accounts« are availablehttp://www.hlrn.de
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 10
Increasing computing power
0
100
200
300
400
Rech
enle
istu
ng/W
att
(MFl
ops/
W)
Rechenleistung/Watt
01.2
008
07.2
008
02.2
009
08.2
009
03.2
010
09.2
010
04.2
011
11.2
011
05.2
0120
1
2
3
·104
Datum
Rech
enle
istu
ng(G
Flop
s)
Rechenleistung des Clustersystems
Rechenleistung
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 11
Applications and Software
There is a large range of applications and software on the cluster systemChemistryBiologyEngineeringNumericsMathematicsPhysicsStatisticsEconomicsParallelisation toolsSoftware developmentVisualisation
The »modules« environment makes things easier
The module command initialises the correct environment for therelevant application
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 12
Applications—Physics
Application Field Module name
Comsol Multiphysics comsolMatlab Matrix and general numerics matlabOctave Matrix and general numerics octaveQuTiP Quantum information qutip
Image: Trajectories for a new 3D electron microscope (COMSOL) | Renke Scheuer, Institut für Mess- und Regelungstechnik, LUH
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 13
Applications—Mathematics
Application Field Module name
GAMS Mathematical optimisation gamsMaple Symbolic mathematics mapleMathematica Symbolic mathematics mathematica
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 14
Applications—Economics and Statistics
Application Field Module name
Ox Econometrics oxR Statistics and graphics RSAS Statistical analysis and data mining sas
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 15
Applications—Chemistry
Application Field Module name
CPMD Molecular dynamics cpmdCrystal Electronic structure calculation crystalGaussian Electronic structure calculation gaussianGAMESS-US General ab initio quantum chemistry gamess_usGromacs Molecular dynamics gromacsMSINDO Molecular dynamics msindo
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 16
Applications—Biology
Application Field Module name
Biopython Python tools for biological computation biopythonBlat Fast sequence search blatBowtie Gene sequence assembly bowtieBWA Burrows-Wheeler alignment tool bwaEdena Very short reads assembler edenaOases Transcriptome assembler for very short
readsoases
PySam Python module for Samfiles pysamSamtools Sequence alignment/map format samtoolsTGI Cluster Tool Cluster large EST/mRNA datasets tgiclVelvet Gene sequence assembly velvet
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 17
Applications—Engineering
Applications Field Module name
Abaqus Finite element simulation abaqusANSYS Multiphysics ansysCFX Fluid dynamics cfxFluent Fluid dynamics fluentGambit Preprocessing (Geometry/Meshes) gambitHFSS Electromagnetic field simulation hfssMarc Finite element simulation marcMaxwell Electromagnetic field simulation maxwell
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 18
Applications—Engineering (cont.)
Application Field Module name
Nastran Finite element simulation nastranOpenFOAM Fluid dynamics openfoamPatran Pre- and postprocessor for CAE simulations patranCreo CAD/CAE design and development tool creo/proeSELFE 3D ocean modelling selfeStarCCM+ Fluid dynamics starccmStarCD Fluid dynamics starcd
Image: Water level and salinity fluctuation in the Weser river estuary | Anna Zorndt, Franzius Institut, LUH
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 19
Applications—Simulation tools
Application Description Module name
Harminv Waveform harmonic inversion harminvMeep FDTD Simulationen meep
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 20
Applications—Visualisation
Application Field Module name
Blender 3D graphics blenderGnuplot General data visualisation gnuplotParaview General data and 3D visualisation paraviewPovray »Persistence of Vision« Raytracer povrayQtiPlot Fast data visualisation qtiplotVTK The Visualisation Toolkit vtk
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 21
Compilers
Application Languages Module names
GNU C/C++, Java, Fortran gcc, g++, gcj, gfortranIntel C/C++, Fortran icc, ifort,
intel.compilerPGI C/C++, Fortran pgiSun Java Java sun-javaSolaris Studio C/C++, Fortran solstudioNvidia CUDA C cudatoolkit
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 22
Numerical libraries
Library Field Module names
Geospatial Data Data processing gdalAbstraction Library (GDAL)
Gnu Scientific Library (GSL) General numerics gslFFTW Fourier transforms fftwIntel Math Kernel Library (MKL) General numerics imklLAPACK/BLAS Linear algebra lapack, blasMulti-precision complex arithm. Complex arith. mpcMulti-precision floating point Floating point arith. mpfrQhull Computational geom. qhull
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 23
General libraries and applications
Library Field Module names
LATEX Text processing latexlibctl Flexible control file library libctlPROJ Cartographic projection projXerces XML processing xerces-c
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 24
MPI implementations
MPI—the Message Passing Interface—is a library for communicationbetween processors in order to run programs in parallel in arbitraryconfigurations
Implementation Description Module name
MPICH2 Standard MPICH installation mpich2MVAPICH MPICH optimised for Infiniband mvapich2Intel MPI Intel’s implementation of MPI impiOpenMPI OpenMPI openmpiPGI-MPICH MPICH compiled with the PGI compi-
lerspgi-mpich
PGI-MVAPICH MVAPICH compiled with the PGI com-pilers
pgi-mvapich
MPI doesn’t work automatically; it has to be programmedComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 25
Debuggers und Profilers
Application Description Module name
Valgrind Call graph and memory analysis valgrindkCachegrind Graphical interface to Valgrind kcachegrindValkyrie Graphical interface to Valgrind valkyrieTotalview Parallel debugger totalviewIntel Trace Analyser Profiler itacVTune Profiler vtuneMPE MPI program profiler mpe2-impiScalasca Parallel program profiler scalascaIntel Debugger idbGNU Debugger gdb
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 26
Using the modules environment
First the modules environment needs to be initialisedInitialisation normally happens automatically when logging inInside a batch job one has to use the following code as the first line ofthe batch script:#!/bin/bash -login
In case the module command is unknown, one needs to explicitlyinitialise the environment using one of the following commandsIn general:
source $MODULESHOME/init/`basename $SHELL`Or with ksh, bash and csh:
. $MODULESHOME/init/ksh
. $MODULESHOME/init/bash
. $MODULESHOME/init/cshThe full stop (period) and the space at the front of the line are important!!!
http://www.rrzn.uni-hannover.de/modules.html?&L=1Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 27
Modules environment commands
Show all available modules$ module avail
Load one or more modules$ module load <modulename> <...>
Unload a module$ module unload <modulename>
Show already loaded modules$ module list
Show information about a module$ module show <modulename>
Show help and detailed information about a module$ module help <modulename>
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 28
Batch system
A batch system is an automated system to give computing jobs fair access toa cluster’s resources. Jobs are submitted to a batch system and are put into aqueue to wait to be run. A scheduler allocates the jobs to the availablecomputing resources according to a predefined priority algorithm in order toachieve as much throughput as possible.
Simulations (compute jobs) are sent into the batch queue with the help of abatch script. A batch script is a text file which describes the resources(computing time, main memory and number of CPUs) which the simulationneeds for its given run. The batch script contains the commands which onewould enter at the command line in order to run the job. These commands willthen be automatically executed on the relevant compute node.
http://www.rrzn.uni-hannover.de/batchsystem.html?&L=1
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 29
Anatomy of a batch script
#!/bin/bash describes the shell to be used to run the scriptOptions given to the batch system
Option lines start with #PBSDescribe such things as the job requirements, which queue should be usedetc.
Commands which prepare and run compute jobs, e.g.:If necessary, initialise the modules environment
not needed with #!/bin/bash -loginLoad modulesSet environmental variablesChange into the directory where the simulation shall be runRun the program
In general, everything after a comment character (#) is ignoredExceptions:
#! in the first line of the scriptPBS options
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 30
A basic batch script
1 #!/bin/bash -login2 #PBS -N job_name3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=17 #PBS -l walltime=00:10:008 #PBS -l mem=3gb9
10 # show which computer the job ran on11 echo "Job ran on:" $(hostname)1213 # change to work dir:14 cd $BIGWORK1516 # run the program17 ./hello
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 31
PBS options
#PBS -N <name> Name of the job
#PBS -M <email> User’s email address
#PBS -m ae Send email at the end of a job (’e’), or if it aborts (’a’)
#PBS -j oe Join standard output and standard error streams into one file
#PBS -l nodes=<x>:ppn=<y> Request <x> nodes, <y> processor cores per node
#PBS -l walltime=<time> Maximum run time of the job (HH:MM:SS)
#PBS -l mem=<RAM> Total main memory (RAM) of the job e.g. 3600mb, 10gb
#PBS -q <queue name> Queue name e.g. all, test, helena
#PBS -W x="PARTITION:<partition name>" Name of the cluster partition (optional)e.g. paris, smp, tane, taurus
#PBS -v <variable_list> List of environment variables to export to the job
#PBS -V Export all environment variables in the current shell to the job
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 32
Batch system commands
Send jobs into the queue$ qsub <options> <name of job script>
Interactive batch jobs$ qsub -I -X (opens a shell on a compute node)
Show all jobs$ qstat -a
Show all jobs with the respective nodes$ qstat -n
Show the full output for a particular job$ qstat -f <jobid>
Delete a job from the queue$ qdel <jobid>
Move a job from one queue into another$ qmove <queuename> <jobid>
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 33
Extended batch system commands
Show all jobs with a split view (RUNNING, IDLE, BLOCKED); »showqueue«
$ showq
Show all existing reservations on the cluster system; »show reservations«$ showres
Show the number of processors and respective runtimes currentlyavailable at this point in time; »show backfill«
$ showbf
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 34
Queues
all—for all kinds of jobs1 (#PBS -q all; default setting)test—for short test jobs (#PBS -q test)
only one node is able to be used; jobs which request more than one nodein this queue will not run
helena—for large SMP jobs (#PBS -q helena)only one node is able to be used; jobs which request more than one nodein this queue will not run
1except jobs intended for the test or helena queuesComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 35
Queues (limits)
Maximum resource requirements for the all and helena queuesNumber of simultaneously running jobs per user: 64Number of cores per user: 768Maximum wallclock limit: 200 hours(#PBS -l walltime=200:00:00)
Maximum resource requirements for the test queue47 GB main memory (#PBS -l mem=47gb)6 hours of wallclock time (#PBS -l walltime=6:00:00)1 node, 12 cores (#PBS -l nodes=1:ppn=12)
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 36
Available main memory resources
The operating system needs some main memory for itself; therefore themaximum possible resources are reducedAvailable resources:
Tesla: 23 GBTane node: 47 GBTaurus node: 47 GBtest-n001 (the test queue): 47 GBParis standard node: 62 GBEstragon: 94 GBParis »fat« node: 125 GBVladimir and Lucky: 252 GBSMP node: 252 GBCentaurus: 504 GBHelena: 630 GB
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 37
Things to note about job requirements
Please choose your job requirements carefully!
The wallclock time, main memory value and cpu/core number areimportantAdvantages of accurate specifications:
Better total throughput due to accurate planningJobs start sooner and are finished earlier
Jobs which use a lot fewer resources than that requested generate awarning email
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 38
Example batch scripts
serial program
parallel program
serial Matlab
parallel Matlab
serial Comsol
parallel Comsol
serielles ANSYS
shared mem ANSYS
distr mem ANSYS
SAS
R
serial GAMS
parallel GAMS
serial OpenFOAM (airFoil2D)
serial OpenFOAM (motorBike)
MSINDO
Abaqus
Gaussian
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 39
Example batch script (serial program)
An example of a »serial« program, hello (only uses one processor).
1 #!/bin/bash -login2 #PBS -N moin3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=17 #PBS -l walltime=00:10:008 #PBS -l mem=3600mb9 # show which computer the job ran on
10 echo "Job ran on:" $(hostname)11 # change to work dir:12 cd $PBS_O_WORKDIR13 # the program to run14 ./hello
⇐Back to batch script list
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 40
Example batch script (serial Matlab program)Run a Matlab program on one processor core.
1 #!/bin/bash -login2 #PBS -N serialMatlab3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=1:matlab7 #PBS -l walltime=00:10:008 #PBS -l mem=3gb9 # show which computer the job ran on
10 echo "Job ran on:" $(hostname)11 # load the relevant modules12 module load matlab13 # change to work dir:14 cd $PBS_O_WORKDIR15 # log file name16 LOGFILE=$(echo $PBS_JOBID | cut -d"." -f1).log17 # the program to run18 matlab -nojvm -nosplash < hello.m > $LOGFILE 2>&1
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 41
Example batch script (parallel Matlab program)Run a Matlab program on four (4) processor cores.
1 #!/bin/bash -login2 #PBS -N ParallelMatlab3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=4:matlab7 #PBS -l walltime=00:10:00,mem=4gb8 # show which computer the job ran on9 echo "Job ran on:" $(hostname)
10 # load the relevant modules11 module load matlab12 # change to work dir13 cd $PBS_O_WORKDIR14 # log file name15 LOGFILE=$(echo $PBS_JOBID | cut -d"." -f1).log16 # the program to run17 matlab -nodesktop < lin_solve.m > $LOGFILE 2>&1
⇐Back to batch script list
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 42
Example parallel Matlab program1 function lin_solve(n)2 fprintf(’=============== START =================\n’)3 if nargin ~= 14 n = 10000;5 fprintf(’Using default matrix size: n = %d\n’, n)6 else7 n = str2num(n); % argument is a string; convert to num8 fprintf(’Using the matrix size: n = %d\n’, n)9 end
1011 tic1213 % set up the matrix to solve14 A = rand(n);15 y = rand(n,1);1617 % solve the matrix18 x = A\y;1920 toc21 fprintf(’=============== END =================\n’)
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 43
Example batch script (serial Comsol program)Run Comsol on one processor.
1 #!/bin/bash -login2 #PBS -N comsol_micromixer3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=17 #PBS -l walltime=00:30:008 #PBS -l mem=10gb9 # show which computer the job ran on
10 echo "Job ran on:" $(hostname)11 # load the relevant modules12 module load comsol13 # change to work dir:14 cd $PBS_O_WORKDIR15 # the program to run16 comsol batch -inputfile micromixer.mph
⇐Back to batch script list
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 44
Example batch script (parallel Comsol program)Run Comsol on eight (8) CPU-cores.
1 #!/bin/bash -login2 #PBS -N comsol_micromixer_parallel3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=87 #PBS -l walltime=00:30:008 #PBS -l mem=10gb9 # show which computer the job ran on
10 echo "Job ran on:" $(hostname)11 # load the relevant modules12 module load comsol13 # change to work dir:14 cd $PBS_O_WORKDIR15 # work out the number of threads16 export NUM_THREADS=$(wc -l $PBS_NODEFILE | cut -d" " -f1)17 # the program to run18 comsol batch -inputfile micromixer.mph -np $NUM_THREADS
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 45
Example batch script (serial ANSYS program)Run ANSYS on one processor core.
1 #!/bin/bash -login2 #PBS -N testcase.serial3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=17 #PBS -l walltime=00:20:008 #PBS -l mem=2gb9 # show which computer the job ran on
10 echo "Job ran on:" $(hostname)11 # load the relevant modules12 module load ansys13 # change to work dir14 cd $PBS_O_WORKDIR15 # start program for serial run;16 # (assuming that an input file testcase.dat has been created before):17 ansys130 -i testcase.dat -o serial.out
⇐Back to batch script list
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 46
Example batch script (shared ANSYS program)Run ANSYS on one node with several processor cores and shared memory.
1 #!/bin/bash -login2 #PBS -N testcase.shared3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=47 #PBS -l walltime=00:20:008 #PBS -l mem=2gb9 # load the relevant modules
10 module load ansys11 # change to work dir12 cd $PBS_O_WORKDIR13 # calculate number of threads for shared memory computation14 nthr=$(cat $PBS_NODEFILE | wc -l)15 echo "nthreads = "$nthr16 # start program17 ansys130 -b -np $nthr -i testcase.dat -o shared.out
⇐Back to batch script list
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 47
Example batch script (distributed ANSYS program)Run ANSYS several nodes.
1 #!/bin/bash -login2 #PBS -N testcase.distr3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=2:ppn=47 #PBS -l walltime=00:20:008 #PBS -l mem=2gb9 # load the relevant modules
10 module load ansys11 # change to work dir12 cd $PBS_O_WORKDIR13 # set stacksize14 ulimit -s 30000015 # create correct HOST string for ANSYS call16 create_ansys_machine_file machines17 read HOST < machines18 echo $HOST19 # start program in distributed memory mode20 ansys130 -b -dis -machines $HOST -mpi hpmpi -i testcase.dat -o distr.out
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 48
Example batch script (SAS program)
1 #!/bin/bash -login2 #PBS -N seriellSAS3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=17 #PBS -l walltime=00:30:008 #PBS -l mem=5gb9
10 # show which computer the job ran on11 echo "Job ran on:" $(hostname)12 # load the relevant modules13 module load sas14 # change to work dir:15 cd $PBS_O_WORKDIR16 # the program to run17 sas Simulation.sas
⇐Back to batch script list
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 49
Example batch script (R program)
1 #!/bin/bash -login2 #PBS -N seriellR3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=17 #PBS -l walltime=00:10:008 #PBS -l mem=3600mb9
10 # show which computer the job ran on11 echo "Job ran on:" $(hostname)12 # load the relevant modules13 module load R14 # change to work dir:15 cd $PBS_O_WORKDIR16 # the program to run17 R --slave < fanta22_korrektur.R
⇐Back to batch script list
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 50
Example batch script (serial GAMS program)Run a GAMS program on one (1) processor core.
1 #!/bin/bash -login2 #PBS -N GAMS_trnsport3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=17 #PBS -l walltime=00:10:008 #PBS -l mem=4gb9
10 # show which computer the job ran on11 echo "Job ran on:" $(hostname)12 # load the relevant modules13 module load gams14 # change to work dir15 cd $PBS_O_WORKDIR16 # the program to run17 gams trnsport.gms lo=2 lf=transport_log.log
⇐Back to batch script list
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 51
Example batch script (parallel GAMS program)Run a GAMS program on four (4) processor cores.
1 #!/bin/bash -login2 #PBS -N CLSP_Optimal3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=47 #PBS -l walltime=00:10:008 #PBS -l mem=4gb9
10 # show which computer the job ran on11 echo "Job ran on:" $(hostname)12 # load the relevant modules13 module load gams14 # change to work dir15 cd $PBS_O_WORKDIR16 # correctly specify the number of cores in cplex.opt!!17 # the program to run18 gams CLSP_Optimal.gms lo=2 lf=CLSP_Optimal.log
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 52
Example batch script (serial OpenFOAM program)Run simpleFoam on one (1) processor core.
1 #!/bin/bash -login2 #PBS -N airFoil2D3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=17 #PBS -l walltime=00:10:008 #PBS -l mem=4gb9 # show which computer the job ran on
10 echo "Job ran on:" $(hostname)11 # load the relevant modules12 module load openfoam/1.7.113 # initialise the OpenFOAM environment14 source $foamDotFile15 # change to work dir:16 cd $PBS_O_WORKDIR/airFoil2D17 # clean up from possible previous runs18 ./AllClean19 # the program to run20 simpleFoam
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 53
Example batch script (parallel OpenFOAM program)Run simpleFoam on one (1) processor core.
1 #!/bin/bash -login2 #PBS -N motorBike3 ####PBS -M [email protected] #PBS -M [email protected] #PBS -m ae6 #PBS -j oe7 #PBS -l nodes=1:ppn=18 #PBS -l walltime=00:30:009 #PBS -l mem=4gb
10 # show which computer the job ran on11 echo "Job ran on:" $(hostname)12 # load the relevant modules13 module load openfoam/1.7.114 # initialise the OpenFOAM environment15 source $foamDotFile16 # change to work dir:17 cd $PBS_O_WORKDIR/motorBike18 # clean up from previous runs19 ./Allclean20 # set up the mesh and the simulation21 cp system/fvSolution.org system/fvSolution22 cp -r 0.org 0 > /dev/null 2>&123 blockMesh24 snappyHexMesh -overwrite25 sed -i ’s/\(nNonOrthogonalCorrectors\).*;/\1 10;/g’ system/fvSolution26 potentialFoam -writep27 sed -i ’s/\(nNonOrthogonalCorrectors\).*;/\1 0;/g’ system/fvSolution28 # the program to run29 simpleFoam
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 54
Example batch script (parallel program)A parallel MPI program called ping_pong_advanced_send is run on two (2)processor cores.
1 #!/bin/bash -login2 #PBS -N pingpong3 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=2,walltime=00:10:00,mem=4gb7 #PBS -W x=PARTITION:tane89 # show which computer the job ran on
10 echo "Job ran on:" $(hostname)11 # load the relevant modules12 module load impi13 # change to work dir:14 cd $PBS_O_WORKDIR15 # the program to run in parallel16 mpirun --rsh=ssh -machinefile $PBS_NODEFILE -np 2 -env I_MPI_DEVICE shm \17 ./ping_pong_advanced_send_c
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 55
Example batch script (MSINDO)Example of running MSINDO on 6 processor cores.
1 #!/bin/bash -login2 #PBS -N MgO_4443 #PBS -M [email protected] #PBS -m ae5 #PBS -j oe6 #PBS -l nodes=1:ppn=67 #PBS -l walltime=00:10:008 #PBS -l mem=16gb9 #PBS -W x=PARTITION:tane:paris:kuh
1011 # show which computer the job ran on12 echo "Job ran on:" $(hostname)13 # load the relevant modules14 module load msindo1516 export KMP_STACKSIZE=64M17 export OMP_DYNAMIC=.FALSE.18 export OMP_NUM_THREADS=$(cat $PBS_NODEFILE | wc -l)1920 INPUTFILE="MgO_444.inp"2122 # change to work dir:23 TEMPDIR=$BIGWORK/$(basename $INPUTFILE .inp).$$24 cd $TEMDIR25 LOGFILE=$PBS_O_WORKDIR/$INPUTFILE.out.$$26 echo "Running on $OMP_NUM_THREADS cores" >> $LOGFILE2728 # the program to run29 time msindo < $INPUTFILE >> $LOGFILE 2>&130 # clean up output files31 if [ -s "fort.9" ]; then cp fort.9 $PBS_O_WORKDIR/$INPUTFILE.f9.$$; fi32 if [ -n "$(ls *.dat)" ]; then cp *.dat $PBS_O_WORKDIR/; fi33 if [ -n "$(ls *.molden)" ]; then cp *.molden $PBS_O_WORKDIR/; fi34 if [ -n "$(ls *.xyz)" ]; then cp *.xyz $PBS_O_WORKDIR/; fi
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 56
Example batch script (Abaqus)Abaqus running on 4 processor cores.
1 #!/bin/bash -login2 #PBS -N llbeam3 #PBS -M [email protected] #PBS -j oe5 #PBS -m ae6 #PBS -l nodes=1:ppn=47 #PBS -l mem=15GB8 #PBS -l walltime=00:50:009
10 # show which computer the job ran on11 echo "Job ran on:" $(hostname)1213 # load the relevant modules14 module load abaqus1516 # change to working directory17 cd $PBS_O_WORKDIR1819 # set up simulation parameters20 np=$(cat $PBS_NODEFILE | wc -l)21 mnp=$(sort -u $PBS_NODEFILE | wc -l)22 cp $PBS_NODEFILE hostfile23 echo $np >> hostfile24 echo $mnp >> hostfile25 create_abaqus_host_list2627 # run the program28 abaqus job=llbeam cpus=$np domains=$np parallel=domain mp_mode=mpi double interactive
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 57
Example batch script (Gaussian)Gaussian running on 4 processor cores.
1 #!/bin/bash -login2 #PBS -N gaussian3 #PBS -M [email protected] #PBS -j eo5 #PBS -m ae6 #PBS -l nodes=1:ppn=47 #PBS -l mem=10gb8 #PBS -l walltime=00:30:009
10 # show which computer the job ran on11 echo "Job ran on:" $(hostname)1213 # load the relevant modules14 module load gaussian1516 # change to working directory17 cd $PBS_O_WORKDIR1819 # run the program20 g09 < input.com > g09job.out
⇐Back to batch script listComputing services at the computing centre Scientific Computing Group | 4. Februar 2013 58
File system structure
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 59
Details of the various file systems
$HOMEGlobally available home directoryData are backed up; unlimited lifetimeData volume is limited via the Unix quota systemIntended for scripts, programs and small final simulation results
$BIGWORKGlobally available work directoryIntended for large »work« files69 TB disk space in total$BIGWORK variable points to /bigwork/<username>Data are NOT backed up; data have a lifetime of 28 days (after the lastchange)
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 60
Details of the file systems (cont.)
Archive systemLong term storage of files and data, also of large amounts of dataData is backed up to the university’s tape backup archiveCan be reached from the login nodes with the lftp command3,3 PB disk spaceMore information under (German):
http://www.rrzn.uni-hannover.de/datensicherung.htmlhttp://www.rrzn.uni-hannover.de/archiva1.html
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 61
When problems occur. . .
1 Check the batch script and if necessary also the program2 Read the cluster documentation
http://www.rrzn.uni-hannover.de/clustersystem.html?&L=13 Have you checked what Google has to say?4 Ask a question in the cluster system forum:
http://www.rrzn.uni-hannover.de/forum.html5 Send an error report with the following information to the help mailing
list: [email protected] usernameThe job ID numberWhen the job ranThe compute node upon which the job ranThe batch scriptA short description of the problemAny error output from the program, if availableThe job output report e.g.: myjob.o12345 as an attachment
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 62
Useful linkshttp://www.rrzn.uni-hannover.de/clustersystem.html?&L=1
http://www.rrzn.uni-hannover.de/cluster-zugang.html?&L=1
http://www.rrzn.uni-hannover.de/batchsystem.html?&L=1
http://www.rrzn.uni-hannover.de/rechnerressourcen.html?&L=1
http://www.rrzn.uni-hannover.de/installierte_software.html?&L=1
http://www.rrzn.uni-hannover.de/handbuecher.html
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 63
Contact information
Questions, suggestions and problem reports:[email protected]
Specialist consultingDr. Gerd Brand: [email protected]. Andreas Gerdes: [email protected] Heimbrock: [email protected]. Holger Naundorf: [email protected]
Administration, general consulting
Dr. Paul Cochrane: [email protected] Dobrindt: [email protected] Njofang: [email protected]
Would you like a tour of the computers at the computing centre? Just ask!
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 64
Thank you!
:-)
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 65
Never work at Home!
Why it is a bad idea to have a link in /home that points to /bigwork:
1 root@avon:/home/nhXXXXXX# ls -l2 total 43 lrwxrwxrwx 1 nhXXXXXX nhXX 17 Oct 17 2010 bigwork -> /bigwork/nhXXXXXX4 root@avon:/home/nhXXXXXX#
/home is mounted via NFS
Data transfer via NFS to /home and via NFS back to BIGWORK
Long distance! Huge load for /home fileserver!
Computing services at the computing centre Scientific Computing Group | 4. Februar 2013 66