research computing with newton gerald ragghianti newton hpc workshop sept. 3, 2010

13
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

Upload: benedict-booth

Post on 26-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

Research Computing with Newton

Gerald Ragghianti

Newton HPC workshop

Sept. 3, 2010

Page 2: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

What is the Newton Program?

• Research computing support• Infrastructure management• Consultation• Training

• Research Objectives• Effectiveness• Efficiency• Capability

User

applicationsComputational

environment (OS, cluster

management, software)

Computing hardware

Computing infrastructure (space, network, power, cooling)

Community organization (policies, membership)

Page 3: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

The Newton cluster• “Normal” Linux compute cluster• 295 computers• 2500 processors• 5TB RAM• 40 Gbit/sec Infiniband• 80 TB Storage

Storage server

Lustre storageHead node

Compute node Compute node Compute nodeInteractive

node

Lustre storage

External network

Infiniband network

Ethernet network

Compute node

Compute node Compute node Compute nodeInteractive

nodeCompute node

Lustre storage

Storage server

Page 4: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

Newton cluster machinesRack 1 Rack 2 Rack 3 Rack 4 Rack 5 Rack 6 Rack 7 Rack 8

Dell R410 (tao040) Dell R410 (tao059) Dell R410 (tao078) Dell R410 (tao119) Dell 1950 (gamma31) C6100Dell R410 (tao039) Dell R410 (tao058) Dell R410 (tao077) Dell R410 (tao118) KVM Dell 1950 (gamma30) X2200M2 (lustre4)

Dell R410 (tao038) Dell R410 (tao057) Dell R410 (tao076) Dell R410 (tao117) Dell 1950 (zeta31) Dell 1950 (gamma29) X2200M2 (lustre3) C6100Dell R410 (tao037) Dell R410 (tao056) Dell R410 (tao075) Dell R410 (tao116) Dell 1950 (zeta30) Dell 1950 (gamma28) X2200M2 (lustre2)

Dell R410 (tao036) Dell R410 (tao055) Dell R410 (tao074) Dell R410 (tao115) Dell 1950 (zeta29) Dell 1950 (gamma27) X2200M2 (lustre1) C6100Dell R410 (tao035) Dell R410 (tao054) Dell R410 (tao073) Dell R410 (tao114) Dell 1950 (zeta28) Dell 1950 (gamma26) X2200M2 (alpha11)

Dell R410 (tao034) Dell R410 (tao053) Dell R410 (tao072) Dell R410 (tao113) Dell 1950 (zeta27) Dell 1950 (gamma25) X2200M2 (alpha10) C6100Dell R410 (tao033) Dell R410 (tao052) Dell R410 (tao071) Dell R410 (tao112) Dell 1950 (zeta26) Dell 1950 (gamma24) X2200M2 (alpha09)

Dell R410 (tao032) Dell R410 (tao051) Dell R410 (tao070) Dell R410 (tao111) Dell 1950 (zeta25) Dell 1950 (gamma23) X2200M2 (alpha08) C6100Dell R410 (tao031) Dell R410 (tao050) Dell R410 (tao069) Dell R410 (tao110) Dell 1950 (zeta24) Dell 1950 (gamma22) X2200M2 (alpha07)

Dell R410 (tao030) Dell R410 (tao049) Dell R410 (tao068) Dell R410 (tao109) Dell 1950 (zeta23) Dell 1950 (gamma21) X2200M2 (alpha06) C6100Dell R410 (tao029) Dell R410 (tao048) Dell R410 (tao067) Dell R410 (tao108) Dell 1950 (zeta22) Dell 1950 (gamma20) X2200M2 (alpha05)

Dell R410 (tao028) Dell R410 (tao047) Dell R410 (tao066) Dell R410 (tao107) Dell 1950 (zeta21) Dell 1950 (gamma19) X2200M2 (alpha04) C6100Dell R410 (tao027) Dell R410 (tao046) Dell R410 (tao065) Dell R410 (tao106) Dell 1950 (zeta20) Dell 1950 (gamma18) X2200M2 (alpha03)

Dell R410 (tao026) Dell R410 (tao045) Dell R410 (tao064) Dell R410 (tao105) Dell 1950 (zeta19) Dell 1950 (gamma17) X2200M2 (alpha02) C6100Dell R410 (tao025) Dell R410 (tao044) Dell R410 (tao063) Dell R410 (tao104) Dell 1950 (zeta18) Dell 1950 (gamma16) X2200M2 (lustre0)

Dell R410 (tao024) Dell R410 (tao043) Dell R410 (tao062) Dell R410 (tao103) Dell 1950 (zeta17) Dell 1950 (gamma15) X2200M2 (alpha00) C6100Dell R410 (tao023) Dell R410 (tao042) Dell R410 (tao061) Dell R410 (tao102) Dell 1950 (zeta16) Dell 1950 (gamma14) Dell 1850 (isaac)

Dell R410 (tao022) Dell R410 (tao041) Dell R410 (tao060) Dell R410 (tao101) Dell 1950 (zeta15) Dell 1950 (gamma13)EMC CX300 SAN

Qlogic IB 122000

Dell R410 (tao021)

Dell R900 (epsilon0)

C6100 Dell R410 (tao100) Dell 1850 (admin) Dell 1950 (gamma12) Qlogic IB 122000

Dell R410 (tao020) Dell R410 (tao099) console Dell 1950 (gamma11) Qlogic IB 122000

Dell R410 (tao019) C6100 Dell R410 (tao098) Dell 1950 (zeta14) Dell 1950 (gamma10)EMC CX300 SAN

C6100Dell R410 (tao018) Dell R410 (tao097) Dell 1950 (zeta13) Dell 1950 (gamma09)

Dell R410 (tao017) Qlogic IB 123000 Qlogic IB 122000 Dell R410 (tao096) Dell 1950 (zeta12) Dell 1950 (gamma08) C6100Qlogic IB 122000 Qlogic IB 122000 Qlogic IB 122000 Qlogic IB 122000 Dell 1950 (zeta11) Dell 1950 (gamma07)

EMC CX300 SANDell R410 (tao016) Dell R510 nfs-mrail0 Dell R510 lustre-oss-0 Dell R410 (tao095) Dell 1950 (zeta10) Dell 1950 (gamma06) C6100Dell R410 (tao015) Dell R410 (tao094) Dell 1950 (zeta09) Dell 1950 (gamma05)

Dell R410 (tao014)

SunFire X4540 (thumper-spanier)

Dell R510 lustre-oss-1 Dell R410 (tao093) Dell 1950 (zeta08) Dell 1950 (gamma04)EMC CX300 SAN

C6100Dell R410 (tao013) Dell R410 (tao092) Dell 1950 (zeta07) Dell 1950 (gamma03)

Dell R410 (tao012) Dell R510 lustre-oss-2 Dell R410 (tao091) Dell 1950 (zeta06) Dell 1950 (gamma02) C6100Dell R410 (tao011) Dell R410 (tao090) Dell 1950 (zeta05) Dell 1950 (gamma01) EMC CX300 SAN

Dell R410 (tao010) Dell 1850 (login0) Dell R510 lustre-mds Dell R410 (tao089) Dell 1950 (zeta04) Dell 1950 (gamma00)

PDU

C6100Dell R410 (tao009) Dell 1850 (login1) Dell R410 (tao088) Dell 1950 (zeta03) Dell 6248 Ethernet

Dell R410 (tao008) Sun X2200M2 (head) Dell R510 lustre-oss-3 Dell R410 (tao087) Dell 1950 (zeta02) Dell 6248 Ethernet C6100Dell R410 (tao007)

Dell R900 (epsilon1)

Dell R410 (tao086) Dell 1950 (zeta01)

Dell R410 (tao006) Dell 6248 Ethernet Dell R410 (tao085) Dell 1950 (zeta00) C6100Dell R410 (tao005) Dell 6248 Ethernet Dell R410 (tao084)

Cisco Infiniband

Dell R410 (tao004) Dell 6248 Ethernet Dell R410 (tao083) C6100Dell R410 (tao003)

SunFire X4500 (thumper)

Dell 6248 Ethernet Dell R410 (tao082)

Dell R410 (tao002) Dell R410 (tao081) PC 6248 switch

Dell R410 (tao001) PDU Dell R410 (tao080) PC 6248 switch

Dell R410 (tao000) Dell R410 (tao079) PC 3548 switch

Legend:

server

storage server

compute node

login compute node

Infiniband switch

Ethernet switch

management

power distribution

empty

Page 5: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

Getting started

• SSH to login.newton.utk.edu using NetID• Transfer files with scp, sftp, or FileZilla• Display graphics with X11, xorg, or Xming

• Requires X11 “tunneling” through SSH client

$ ssh [email protected]: ***************[gragghia@newton1 ~]$ lsTest.sge filename.txt[gragghia@newton1 ~]$ w10:36:49 up 32 days, 15:07, 20 users, load average: 1.98, 1.81, 1.88USER TTY FROM LOGIN@ IDLE JCPU PCPU WHATgragghia pts/0 poltth Tue05 1:05 1.39s 1.39s -bashmkzadd pts/1 bkg.engr.utk.edu Thu18 15:16m 0.06s 0.06s -bashKrrrccc pts/2 ares.bio.utk.edu 03Aug10 3days 0.03s 0.03s -bash

Page 6: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

Environment management

• Modules utility• Manages environment variables and aliases• User chooses applications and libraries to use

• Allows multiple versions to be available

• Example use:• See available modules: “module avail”• Load a module: “module add R”• Unload a module: “module unload R”

Page 7: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

Resource Management: The Grid Engine

1. Accepts job requests• Executable to run• Execution time• Parallelization• RAM needed

2. Finds available resources (compute nodes)

3. Reserves and uses resources

4. Returns output

Page 8: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

A simple job

1. Create a job request file.

2. Submit job$ qsub job.sge

3. Monitor job$ qstat -g t

4. View result log files#$ -q short*#$ -cwd#$ -N Testuname –asleep 30

Page 9: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

More Sophistication: Array jobs»Run the same job multiple times

1. Create data files (optional)$ ~gragghia/workshop/make_datafiles.sh

2. Create a job request file with “-t” option:

3. Submit job$ qsub job.sge

4. Monitor job$ qstat -g t

5. View result log files

#$ -q short*#$ -cwd#$ -N Array#$ -t 1-10md5sum data-$SGE_TASK_ID.dat

Page 10: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

A parallel job: MPI

1. Download the software:$ wget http://newton.utk.edu/workshop/hello.tar

2. Extract the software:$ tar –vxf hello.tar

3. Select MPI version:$ module add openmpi/1.4.2/intel

4. Compile the application:$ cd hello

$ make

5. Create a batch submit file

6. Submit the job

#$ -N Hello#$ -q short*#$ -cwd -V#$ -pe openmpi* 16mpirun hellosleep 30

Page 11: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

Compiling and Installing Software

Example: Fractal generator

1. Find the software

2. Transfer to Newton• Direct: wget http://newton.utk.edu/workshop/gmandel.tgz• Indirect: Download to workstation and scp (sftp)

3. Extracting the source code1. Uncompressed: tar

2. Compressed: gunzip or unzip

4. Configure the software:$ ./configure –prefix=$HOME/gmandel

5. Compile: $ make

6. Install: $ make install

$ wget http://newton.utk.edu/workshop/gmandel.tgz$ tar –vzxf gmandel.tgz$ ./configure –-prefix=$HOME/gmandel$ make install…

Page 12: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

Commercial Applications

• Matlab• Graphical (interactive)• Batch mode (parallel): matlab –r <Function>

• SAS• SPSS

$ module load matlabt$ matlab$ matlab –r ‘TestFunction’

Page 13: Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010

More Information• Newton Program website: http://newton.utk.edu/

• Program policies• Documentation• Meetings / support / consulting schedule

• Research Computing Mailing List:[email protected]

Visit http://oit.utk.edu/workshops/eval/• Section ID: Newton_Cluster-5