before&we&start · 2017-01-28 · preview: hardware arriving summer 2015 node&...
TRANSCRIPT
Before We Start • Sign in • hpcXX account slips • Laptops if you need them (Mac OS only)
Research Compu@ng at Virginia Tech Advanced Research Computing
Compute Resources
Blue Ridge Compu@ng Cluster • Resources for running jobs
– 408 dual-‐socket nodes with 16 cores/node – Intel Sandy Bridge-‐EP Xeon (8 cores) in each socket – 4 GB/core, 64 GB/node – total: 6,528 cores, 27.3 TB memory
• Special nodes: – 130 nodes with 2 Intel Xeon Phi accelerators – 18 nodes with 128 GB – 4 nodes with 2 Nvidia K40 GPUs
• Quad-‐data-‐rate (QDR) InfiniBand
HokieSpeed – CPU/GPU Cluster • 206 nodes, each with:
– Two 6-‐core 2.40-‐gigahertz Intel Xeon E5645 CPUs and 24 GB of RAM
– Two NVIDIA M2050 Fermi GPUs (448 cores/socket) • Total: 2,472 CPU cores, 412 GPUs, 5 TB of RAM • Top500 #221, Green500 #43 (November 2012) • 14-‐foot by 4-‐foot 3D visualiza@on wall • Recommended Uses:
– Large-‐scale GPU compu@ng – Visualiza@on
HokieOne -‐ SGI UV SMP System • 492 Intel Xeon 7542 (2.66GHz) cores
– Sockets: Six-‐core Intel Xeon X7542 (Westmere) – 41 dual-‐socket blades for computa@on – One blade for system + login
• 2.6TB of Shared Memory (NUMA) – 64 GB/blade, blades connected with NUMAlink
• Resources scheduled by blade (6 cores, 32 GB) • Recommended Uses:
– Memory-‐heavy applica@ons (up to 1 TB) – Shared-‐memory (e.g. OpenMP) applica@ons
Ithaca – IBM iDataPlex • 84 dual-‐socket quad-‐core Intel Nehalem 2.26 GHz nodes (672 cores in all) – 66 nodes available for general use
• Memory (2 TB Total): – 56 nodes have 24 GB (3 GB/core) – 10 nodes have 48 GB (6 GB/core)
• Quad-‐data-‐rate (QDR) InfiniBand • Opera@ng System: CentOS 6 • Recommended uses:
– Parallel Matlab (28 nodes/224 cores) – Beginning users
Preview: Hardware Arriving Summer 2015
Node Descrip<on Quan<ty CPU Memory Local Disk Other
Features Network
General 100 2 x E5-‐2680 v3 (Haswell, 24 cores) 128 GB 1.8 TB EDR IB
Large Direct Ajached Storage
16 2 x E5-‐2680 v3 (Haswell, 24 cores) 512 GB 43.2 TB
(24 x 1.8 TB) 2x 200 GB
SSD EDR IB
GPU 8 2 x E5-‐2680 v3 (Haswell, 24 cores) 512 GB 3.6 TB
(2 x 1.8 TB) NVIDIA K80
GPU EDR IB
Large Memory 2 4 x E7-‐4890 v2 (Ivy
Bridge, 60 cores) 3 TB 10.8 TB (6 x 1.8 TB) EDR IB
Storage Resources
Name Intent File System Environment Variable
Per User Maximum
Data Lifespan
Available On
Home Long-‐term storage of
files NFS $HOME 100 GB Unlimited
Login and Compute Nodes
Work Fast I/O,
Temporary storage
Lustre (BlueRidge) GPFS (Other clusters)
$WORK 14 TB,
3 million files 120 days
Login and Compute Nodes
Archive
Long-‐term storage for infrequently-‐accessed files
CXFS $ARCHIVE -‐ Unlimited Login Nodes
Local Scratch Local disk
(hard drives) $TMPDIR
Size of node hard drive
Length of Job Compute Nodes
Memory (tmpfs)
Very fast I/O Memory (RAM)
$TMPFS Size of node memory
Length of Job Compute Nodes
Old Home Access to legacy files
NFS -‐ Read-‐only TBD Login Nodes
GETTING STARTED ON ARC’S SYSTEMS
Gemng Started Steps
1. Apply for an account 2. Log in (SSH) into the system 3. System examples
a. Compile b. Test (interac@ve job) c. Submit to scheduler
4. Compile and submit your own programs
ARC Accounts
1. Review ARC’s system specifica@ons and choose the right system(s) for you a. Specialty sooware
2. Apply for an account online 3. When your account is ready, you will receive
confirma@on from ARC’s system administrators within a few days
Log In
• Log in via SSH – Mac/Linux have built-‐in client – Windows need to download client (e.g. PuTTY)
System Login Address (xxx.arc.vt.edu)
BlueRidge blueridge1 or blueridge2
HokieSpeed hokiespeed1 or hokiespeed2
HokieOne hokieone
Ithaca ithaca1 or ithaca2
BLUERIDGE ALLOCATION SYSTEM
Blue Ridge Alloca@on System
Goals of the alloca@on system: • Ensure that the needs of computa@onal intensive research projects are met
• Document hardware and sooware requirements for individual research groups
• Facilitate tracking of research hjp://www.arc.vt.edu/userinfo/[email protected]
Alloca@on Eligibility
To qualify for an alloca@on, you must meet at least one of the following: • Be a Ph.D. level researcher (post-‐docs qualify) • Be an employee of Virginia Tech and the PI for research compu@ng
• Be an employee of Virginia Tech and the co-‐PI for a research project led by non-‐VT PI
Alloca@on Applica@on Process
1. Create a research project in ARC database 2. Add grants and publica@ons associated with
project 3. Create an alloca@on request using the web-‐
based interface 4. Alloca@on review may take several days 5. Users may be added to run jobs against your
alloca@on once it has been approved
Alloca@on Tiers
Research alloca@ons fall into three @ers: • Less than 200,000 system units (SUs)
– 200 word abstract • 200,000 to 1 million SUs
– 1-‐2 page jus@fica@on • More than 1 million SUs
– 3-‐5 page jus@fica@on
Alloca@on Management
• Users can be added to the project: hjps://portal.arc.vt.edu/am/research/my_research.php
• glsaccount: Alloca@on name and membership
• gbalance -h -a <name>: Alloca@on size and amount remaining
• gstatement -h -a <name>: Usage (by job)
USER ENVIRONMENT
Environment • Consistent user environment in systems
– Using modules environment – Hierarchical module tree for system tools and applica@ons
Modules • Modules are used to set the PATH and other environment
variables • Modules provide the environment for building and running
applica@ons – Mul@ple compiler vendors (Intel vs GCC) and versions – Mul@ple sooware stacks: MPI implementa@ons and versions – Mul@ple applica@ons and their versions
• An applica@on is built with a certain compiler and a certain sooware stack (MPI, CUDA) – Modules for sooware stack, compiler, applica@ons
• User loads the modules associated with an applica@on, compiler, or sooware stack – modules can be loaded in job scripts
Modules • Modules are used to set up your PATH and other environment variables
% module {lists options}
% module list {lists loaded modules}
% module avail {lists available modules}
% module load <module> {add a module}
% module unload <module> {remove a module}
% module swap <mod1> <mod2> {swap two modules}
% module help <mod1> {module-specific help}
Module commands
module list options
module list list loaded modules
module avail list available modules
module load <module> add a module
module unload <module> remove a module
module swap <mod1> <mod2> swap two modules
module help <module> module environment
module show <module> module description
module reset reset to default
module purge unload all module
Modules
• Available modules depend on: – The compiler (eg. Intel, gcc) and – The MPI stack selected
• Defaults:
BlueRidge: Intel + mvapich2 – HokieOne: Intel + MPT – HokieSpeed, Ithaca: Intel + OpenMPI
Hierarchical Module Structure
Modules
• The default modules are provided for minimum func@onality.
• Module dependencies against choice of compiler and MPI stack are automa@cally taken care of.
module purge
module load <compiler>
module load <mpi stack>
module load <high level sooware, i.e. PETSc>
JOB SUBMISSION & MONITORING
Job Submission • Submission via a shell script
– Job descrip@on: Nodes, processes, run @me – Modules & dependencies – Execu@on statements
• Submit job script via the qsub command: qsub <job_script>
Batch Submission Process
Queue: Job script waits for resources. Master: Compute node that executes the job
script, launches all MPI processes.
Job Monitoring • Determine job status, and if pending when it will run
Command Meaning
checkjob –v JOBID Get the status and resource of a job
qstat –f JOBID Get status of a job
showstart JOBID Get expected job start @me
qdel JOBID Delete a job
mdiag -‐n Show status of cluster nodes
Job Execu@on
• Order of job execu@on depends on a variety of parameters: – Submission Time – Queue Priority – Backfill Opportuni@es – Fairshare Priority – Advanced Reserva@ons – Number of Ac@vely Scheduled Jobs per User
Examples: ARC Website • See the Examples section of each system page for
sample submission scripts and step-by-step examples: – hjp://www.arc.vt.edu/resources/hpc/blueridge.php – hjp://www.arc.vt.edu/resources/hpc/hokiespeed.php – hjp://www.arc.vt.edu/resources/hpc/hokieone.php – hjp://www.arc.vt.edu/resources/hpc/ithaca.php
Gemng Started
• Find your training account (hpcXX) • Log into BlueRidge
– Mac: ssh [email protected] – Windows: Use PuTTY
• hjp://www.chiark.greenend.org.uk/~sgtatham/pujy/ • Host Name: ithaca2.arc.vt.edu
Example: Running MPI_Quad
• Source file: hjp://www.arc.vt.edu/resources/sooware/mpi/docs/mpi_quad.c
• Copy the file to Ithaca
– wget command – Could also use scp or sop
• Build the code
Compile the Code
• Intel compiler is already loaded module list
• Compile command (executable is mpiqd) mpicc -o mpiqd mpi_quad.c
• To use GCC instead, swap it out: module swap intel gcc
Prepare Submission Script
1. Copy sample script: cp /home/TRAINING/ARC_Intro/it.qsub .
1. Edit sample script: a. Wall@me b. Resource request (nodes/ppn) c. Module commands (add Intel & mvapich2) d. Command to run your job
2. Save it (e.g., mpiqd.qsub)
Submission Script (Typical) #!/bin/bash #PBS -l walltime=00:10:00 #PBS -l nodes=2:ppn=8 #PBS -q normal_q #PBS -W group_list=ithaca #PBS –A AllocationName <--Only for BlueRidge module load intel mvapich2 cd $PBS_O_WORKDIR echo "MPI Quadrature!" mpirun -np $PBS_NP ./mpiqd exit;
Submission Script (Today) #!/bin/bash #PBS -l walltime=00:10:00 #PBS -l nodes=2:ppn=8 #PBS -q normal_q #PBS -W group_list=training #PBS –A training #PBS –l advres=NLI_ARC_Intro.11 module load intel mvapich2 cd $PBS_O_WORKDIR echo "MPI Quadrature!" mpirun -np $PBS_NP ./mpiqd exit;
Submit the job
1. Copy the files to $WORK: cp mpiqd $WORK cp mpiqd.qsub $WORK
2. Navigate to $WORK cd $WORK
3. Submit the job: qsub mpiqd.qsub
4. Scheduler returns job number: 25770.master.cluster
Wait for job to complete
1. Check job status: qstat –f 25770 or qstat –u hpcXX checkjob –v 25770
2. When complete: 1. Job output: mpiqd.qsub.o25770 2. Errors: mpiqd.qsub.e25770
3. Copy results back to $HOME: cp mpiqd.qsub.o25770 $HOME
Resources
• ARC Website: hjp://www.arc.vt.edu • ARC Compute Resources & Documenta@on: hjp://www.arc.vt.edu/resources/hpc/
• New Users Guide: hjp://www.arc.vt.edu/userinfo/newusers.php
• Frequently Asked Ques@ons: hjp://www.arc.vt.edu/userinfo/faq.php
• Unix Introduc@on: hjp://www.arc.vt.edu/resources/sooware/unix/