herding penguins - linux clusters institutesystem operations infrastructure and software – done so...
TRANSCRIPT
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 1
Herding Penguins or
Clusters for a Multi-User,Production Computing Environment
Robert EadesMolecular Science Computing Facility
William R. Wiley Environmental Molecular Sciences Laboratory
[email protected] or 509-375-2279
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 2
Outline Who we are, where we are, and what we do (PNNL-EMSL-MSCF) Strategies for Linux Objectives for the MSCF’s Colony Linux Cluster
• The software space one needs to address• Drivers for Computational Science and Engineering
Overview of Colony • Hardware and System Architecture• System Operations Infrastructure and Software (Done & To Do)
HPC Parallel Tools and Libraries – ARMCI & PeIGS HPC Scientific Applications
• MS3 (NWChem and Ecce) - Geosciences• NWGrid & NWPhys - The Virtual Lung
Usage profile on Colony and what has it cost Conclusions and Future Directions Team Members and Collaborators References
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 3
Pacific Northwest National LaboratoryRichland, Washington
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 4
As a national scientific user facility and a research organization, the mission of EMSL is to:
• Provide advanced resources to scientists engaged in fundamental research on the physical, chemical and biological processes that underpin critical scientific issues.
• Conduct fundamental research in molecular and computational sciences to achieve a better understanding of biological and environmental effects associated with energy technologies; to provide a basis for new and improved energy technologies; and in support of DOE's other missions.
• Educate scientists in the molecular and computational sciences to meet the demanding challenges of the future.
William R. WileyEnvironmental Molecular Sciences Laboratory
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 5
EMSL Molecular Science Computing Facility
A National Scientific User Facility• High Performance Computing Resources
• Scientific Data and Model Repository
• Graphics & Visualization Laboratory
• Molecular Science Software Suite
• Parallel Tools & Libraries
• Scientific Computing Consulting
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 6
Linux Strategies
The Napoleon Strategy – only two elements
• Show up
• See what’s happening
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 7
The Wellington Strategy
The Field of Waterloo 'Hougoumont' 1819 by J.M.W. Turner. Fitzwilliam Museum, Cambridge
• Show up
• See what’s happening
• Bring partners• Wellington had Blucher and the Prussians
• Don’t delay taking decisive actions• Napoleon delayed his attack, due to heavy rains the night before, and lost the element of surprise
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 8
Objectives for the MSCF’s Colony Linux Cluster Computational resource to support several R&D projects
• PNNL Computational Science and Engineering Initiative • DOE SC/BES Geosciences• DOE SC/ASCR MICS ACTS (Advanced Computational Testing and Simulation)
Toolkit
Q: Would you want a large scale Linux cluster as the primary production HPC system, and if so when?
A: Everyone had their own response. The MSCF differs from some other centers in that we do the software first and then buy the box, not visa versa
Support development of an effective multi-user, production high performance computing environment• System operations infrastructure• HPC tools and libraries• HPC Applications• Collaborative Problem Solving Environments• Grid
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 9
The software space one needs to address for a multi-user, production computing environment
Resource Management
Communications/IO Libraries & Tools
Scheduling and Meta-scheduling
Allocations Management and
Exchange
System Management and Monitoring
Collaborative Problem Solving Environments
Parallel SolversParallel SolversParallel Solvers
HPC ApplicationsHPC ApplicationsHPC Applications
Grid Middleware
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 10
Problem Drivers forComputational Science and Engineering
Bioremediation
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 11
Colony
196 PentiumIII processorsGiganet high speed switch97 gigaflops theoretical peak
January 2000 installCurrently 27 users
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 12
Overview of Colony’s Hardware 98 Dual Pentium III 500 MHz nodes. 128 Node Giganet cLAN 5300/5000 switch fabric. 97 Nodes connected to the Giganet switch. 2 Home directory NFS servers / Login nodes each with 1 GB
of RAM and 35 GB RAID 5 Home file system. 3 Scratch filesystem servers / cluster management nodes
each with 1 GB of RAM and ~ 60 GBs of RAID 0 scratch file system space.
Compute nodes with 512 MB RAM, and ~ 6 GB local scratch file system.
4 Dual Boot nodes – Linux / Windows 2000 with both Giganetand Myrinet
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 13
Scalable Hierarchical InfrastructureControl NodeConfiguration DatabaseSchedulingPower ManagementAllocation ManagementInstall Hierarchical NodesCollection Point
Login NodesExternal Network ConnectionNFS Home Filesystem ServerUser Login
-Submit Jobs-Compiling
HierarchicalServer NodesInstall Compute NodesSerial Console MgtParallel Filesystem ServersResource ManagementMonitoring/Event Mgt
Compute NodesWorkhorses for batch and interactive jobs
…
…………
…
S.Jackson, R.Braby, G.Skouson, & R.Wescott, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 14
System Operations Infrastructure and Software – Done so far Installed cfengine to handle propagation of account
information and configuration files. Installed PBS to handle resource management. Installed the Maui Scheduler for job scheduling. Ported Qbank to Linux for resource allocation management Installed PVFS to replace the NFS scratch directories Installed Baytech RPC-3 power control units Installed Cyclades multi-port serial adapters and set up serial
consoles from all the nodes.
S.Jackson, R.Braby, G.Skouson, R.Wescott, & D.Jackson, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 15
System Operations Infrastructure and Software – What’s in process Automated Network installation – currently working on adapting IBM’s
LIM (Linux Installation Manager) to fit our needs. Optimized Compute Kernel – working on stripping out unused device
drivers and compiling for Pentium III. Also working on benchmarking before and after kernels to quantify any performance improvement.
System Event Monitoring Enhanced capabilities and scaling for Scheduling and Allocation
Management Meta-scheduling Multi-site resource allocation exchange New Resource Manager
S.Jackson, R.Braby, D.Jackson, & G.Skouson (PNNL) and K.Yoshimoto (SDSC)
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 16
Meta-scheduling (Silver)Successful demonstration of Meta-scheduling with Silver
achieved between PNNL (NWecs1) and SDSC (Blue Horizon) on July 8, 2000• Earliest Start-time (constraints: 16 tasks on 1 system at the earliest start-time)• Intelligent Job Partitioning (intelligently partition a job to allow improved
completion time: Ran a 230 processor job in 4 hours rather than 10)• Co-allocation (co-allocate resources of different types) • Large Job Support (simulated – since advanced approval was not obtained from
SDSC to make reservations for an excess of 1150 processors)In Progress
• Implementing Silver Meta-scheduler on top of Colony and IBM SPs in the MSCF
Next (this fall)• Early user demonstration of meta-scheduling within the MSCF with
production applications (MS3)• PNNL-SDSC-others? demonstration of meta-scheduling with production
applications (MS3)D.Jackson & S.Jackson (PNNL) and
K.Yoshimoto (SDSC)
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 17
MSCF Meta-scheduling with Silver – last night
Demonstration was done on two computer systems running in production, using a production application - NWChem• Colony Linux cluster (PBS-Maui-Qbank)• NWecs1 IBM SP (LoadLeveler-Maui-Qbank)
Submission of a NWChem job which is capable of running either on NWecs1 or Colony. (i.e., automatically selecting node requirements, executable, command line arguments, etc. as needed to run the job)
Meta job provides resource requirement info allowing Silver to determine which system possesses the best resources
Maui reserved Qbank allocations as well as resources when it makes a reservation for the NWChem job. This will guarantee when the job is staged to colony or NWecs1 that it will have the needed Qbank allocations to run to completion.
D.Jackson & S.Jackson (PNNL)
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 18
HPC Parallel Tools and Libraries(some examples)
MPI/Pro from MPI Software Technology, Inc.http://www.mpi-softtech.com/
Communications protocols and libraries• ARMCI• Global Arrays• Distant I/O
Parallel Solvers• PeIGS• NWPhys
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 19
ARMCI - portable 1-sided communication library Functionality
• put, get, accumulate (also with noncontiguous interfaces)• atomic read-modify-write, mutexes and locks• memory allocation operations• fence operations
Characteristics• simple progress rules (truly one-sided)• operations ordered w.r.t. target (ease of use)• compatible with message-passing libraries (MPI, PVM)• simpler and less restrictive than MPI-2 1-sided (performance)
Applications• distributed array libraries: Global Arrays (PNNL), Adlib (U.Syracuse)• GPSHMEM - generalized portable SHMEM (Ames, PNNL)
– based on the original Cray T3D SHMEM interfaces
Jarek Nieplocha, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 20
Performance of ARMCI get operation on Giganet(initial VIA version un-optimized)
05
1015202530354045
1 100 10000 1000000 100000000
bytes (log scale)
band
wid
th [M
B/s
VIA/cLAN
IP/cLAN
Asymptotic bandwidth can be doubled by avoiding extra memory copies
Jarek Nieplocha, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 21
PeIGS - Parallel Eigensolver for Dense Real Symmetric Generalized and Standard Eigensystem Problems
PeIGS 3 on Colony vs NWmpp1preliminary results 12-Sep-2000
n=966
1
10
100
1 6 11 16
cpus
tim
e t
o s
olu
tio
n(s
Colony NWMPP1
George Fann, PNNL
• Inverse iteration usingDhillon-Parlett-Fann'sparallel algorithm (fast uniprocessor performance and good parallel scaling)
• Guaranteed orthonormaleigenvectors in the presence of large clusters of degenerate eigenvalues
• True Packed Storage
• Smaller scratch space requirements
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 22
1999
Award2000FLC
Award
2000FLC
Award
0
200400600800
10001200
Jun-98
Jul-98
Aug-98
Sep-98Oct-
98
Nov-98
Dec-98Jan
-99Feb
-99Mar-
99
Apr-99
May-99
Jun-99
Jul-99
Aug-99
Sep-99Oct-
99
Nov-99
Dec-99Jan
-00Feb-
00Mar-
00
Apr-00
May-00
Monthly Total
April 2000: NWChem reaches 1000’th download
Molecular Science Software Suite: MS3
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 23
Parallel Scaling of NWChem
Edo Aprá, PNNL
Colony
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 24
New Tools Being Developed for Geochemistry
Natural mineral surfaces have highly variable and complex morphologies.
Goal: develop computational approaches that will allow us to investigate the degradation of environmental pollutants on natural mineral surfaces.
Exhibited: Mechanism of chromate reduction by naturalbiotite
Support:
• DOE Office of Basic Energy Sciences, Geosciences Research Program
• PNNL Computational Science & Engineering Initiative
E.J. Bylaska, J.R. Rustad, M.Dupuis, & A.R.Felmy, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 25
Ab Initio Calculations of Semiconductors and Insulatorsperfect silicon-carbide crystala = 4.38 Å
F. Gao, E. J. Bylaska, W. J. Weber and L. R. Corrales, PNNL
Formation energy (eV)
Defects TPA-MD TMB-MD Ab initio [1] Ab initio
(present work)
Formation
volume (Ω0
CTC 4.86 8.34 11.0 6.72 1.06
CTS 2.51 6.68 8.6 6.83 1.92
SiTC 15.18 10.52 14.7 1.83
SiTS 14.64 14.33 15.0 2.87
C+-Si<100> 9.03 5.88 - 3.59 0.18
C-C <100> 6.41 5.97 - 3.16 0.43
C-Si+ <100> 9.10 11.28 - 10.05 1.27
Si-Si <100> 15.44 10.03 - 0.95
C--Si<100> defectEf = 3.59 eV
Native Defect Properties in ß-SiCSilicon Carbide (SiC) has been regarded as a potential structural component for use in gas-cooled fission reactors and fusion environments.
Determination of defect formation and energetics is crucial for understanding the response of SiC to radiation damage and ion implantation.
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 26
Extensible Computational Chemistry Environment
Ecce
Graphical user interfaces Calculation management Visualization Integrated scientific data
management Security Automated distribution of client
software to users Production support
July 2000: Implemented and operational on Colony Linux Cluster with NWChem
Gary Black, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 27
Virtual Lung from PNNL's Virtual Biology Center
NWGrid & NWPhysdesigned to simulate coupled fluid dynamics and continuum mechanics in complex geometries using 3-D, hybrid, adaptive, unstructured grids.
NWGrid - grid generation & setup toolbox
NWPhys - collection of computational physics solvers
Harold Trease, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 28
Particle Distribution in the Flow Airways (Particle occurs in right branch of bifurcation)
membrane wallmembrane wall
Airway passageAirway passage
particleparticle
Harold Trease, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 29
Pressure Contours of the Flow Field throughout the Lung Airways
Particles occur in everyParticles occur in everyright branch of a bifurcationright branch of a bifurcation
Harold Trease, PNNL
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 30
MSCF Colony Linux Cluster Usage
1.4% 0.8%4.9%
77.6%
12.5%
2.9%
0%
10%
20%
30%
40%
50%
60%
70%
80%
% P
roce
ssor
-Hou
rs U
sed
7-Jan 15-Aug 16-31 32-63 64-127 128-94
Number of Processors Used in a Job
May-Aug 2000
93% of the usage is in parallel jobs with 32 or more processorsRunning at 80% Usage level for May-Aug 2000
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 31
The Cost Total cost of ownership to date - $476,933
• 97 Gflops Colony Linux Cluster -- $416,118– 3 years maintenance & support included
• Install (racks, electricians, etc.) - $10,511• Staff labor over 10 months - $50,304• Lots of free stuff
Assuming a 3 year system lifetime - $17K/month• Cluster + Maintenance - $12K per month• Labor - $5K per month
– ~25% of our labor has been R&D, not System Administration
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 32
Conclusions and Future DirectionsWould you want a large scale Linux cluster as the primary production
HPC system, and if so when?• Probably, but it’s about a year out still• Looks to be pretty cost effective for HPC, if you have the staff
and partner that know how to put it all together
Still a lot of work to accomplish across the full software space. But unlike previous experiences, it’s mostly in the “infrastructure” and “libraries/tools” areas, not applications. Unfortunately, no one is significantly funding these efforts yet.
For now, find partners whose talents and interests compliment each other, share common goals, and proceed• PNNL, SDSC, BYU, NCSA, OSU, others?
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 33
Team Members and Collaborators Pacific Northwest National Laboratory
• Edo Aprá• Eric Bylaska• Gary Black• Ryan Braby• George Fann• David Jackson• Scott Jackson• Jarek Nieplocha• Gary Skouson• Harold Trease• Ralph Wescott
Brigham Young University• Mark Clement• Quinn Snell
Ohio State University• D.K. Panda
San Diego Supercomputer Center• Victor Hazlewood• Philip Papadopoulos• Kenneth Yoshimoto
NCSA• Rob Pennington
Dell• Jenwei Hsieh• Anders Snorteland
Giganet• Stephen Hird
MPI Software Technology• Anthony Skjellum
Myricom• Nan Boden
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 34
Sponsors & Industry Partners U.S. Department of Energy, Office of Science
• Office of Biological and Environmental Research, EMSL Operations• Office of Basic Energy Sciences, Geosciences• Office of Advanced Scientific Computing Research, Mathematical,
Information, and Computational Sciences PNNL Laboratory Directed Research and Development
• Computational Science and Engineering Initiative• Environmental Health Initiative
Industry Partners• Dell• Giganet• Myricom• MPI Software Technology• IBM
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 35
References Computational Science and Engineering at PNNL - http://www.pnl.gov/cse/
• NWGrid - http://www.pnl.gov/cse/products/detailed.html#nwgrid• NWPhys - http://www.pnl.gov/cse/products/detailed.html#nwphys
Environmental Health Initiative at PNNL - http://www.pnl.gov/ehi/• Virtual Biology Center - http://www.pnl.gov/ehi/4.2.stm
EMSL - http://www.emsl.pnl.gov/ MSCF - http://www.emsl.pnl.gov:2080/mscf/
• MS3 (Ecce-NWChem-ParSoft) -http://www.emsl.pnl.gov:2080/capabs/mscf/?/capabs/mscf/software/ms3-1999.html
• Parallel Tools and Libraries – ParSoft, ARMCI, Global Arrays, etc. -http://www.emsl.pnl.gov:2080/docs/parsoft
• PeIGS - http://www.emsl.pnl.gov:2080/docs/nwchem/nwchem.html• Qbank - http://www.emsl.pnl.gov:2080/docs/mscf/qbank-2.8/
Colony - http://www.emsl.pnl.gov:2080/capabs/mscf/hardware/config_colony.html
U.S. Department of EnergyPacific Northwest National Laboratory
9/18/00 36
References, continued Maui Scheduler - http://www.mhpcc.edu/maui/doc/ Supercluster.org - http://www.supercluster.org
• Meta-scheduling (Silver) - http://www.supercluster.org/projects/silver/index.html and http://www.supercluster.org/projects/silver/msdemooverview.html
Brigham Young University - http://www.byu.edu/home5.html San Diego Supercomputer Center - http://www.sdsc.edu/ Dell - http://www.dell.com/us/en/gen/default.htm Giganet - http://www.giganet.com/index.asp Myricom - http://www.myri.com/ MPI Software Technology - http://www.mpi-softtech.com/
Napoleon, Wellington, and the Battle of Waterloo -http://website.lineone.net/~carpenter9/artist/turner-waterloo.htm
Chukar Cherries - http://www.chukar.com/frames/welcome.html