ha hpc with opennebula - meetupfiles.meetup.com/1774957/microway__ha_hpc_with_opennebula.pdfha hpc...
TRANSCRIPT
HA HPC with OpenNebulaEliot Eshelman - Microway
2015-06-29
HPC - what is it good for?
Lattice Quantum ChromoDynamics (QCD)RBC/UKQCD collaboration; Research Team: Dirk Broemmel, Thomas Rae, Ben Samways, Investigators: Jonathan Flynn
Physics
HPC - what is it good for?
Tech-X VORPAL for the DOE and NNSA
Physics
HPC - what is it good for?
AstrophysicsSimulation of a supernovaCourtesy of Oak Ridge National Laboratory, U.S. Dept. of Energy
HPC - what is it good for?
Planetary ScienceWRF 0.5km simulation of Hurricane SandyNCAR CISL VAPOR visualizations
HPC - what is it good for?
Life ScienceNAMD & GROMACS; visualized with VMD
HPC - what is it good for?
https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Documents/NERSCWorkloadAnalysisFeb2013.pdf
All Science!
HPC - what is it good for?
ALTAIR AcuSolve
Engineering:FEACFD
Multi-Physics
HPC - what is it good for?
Machine LearningNVIDIA DIGITS with Caffe from UC Berkeley
HPC - what is it good for?
Big Data
First: a discussion of scale
What types of HPC systems do we design?● up to ~512 nodes● budgets of $50K to $3MMost leadership-class HPC sites use similar designs, but source from the big vendors.
10,000-foot view of an HPC cluster
HPC clusters ready to ship
Microway's Test Drive cluster
● Owned and maintained by Microway● Used by customers for benchmarking● Used by employees for testing, replicating
customer issues & software development● Not actually mission-critical, but designed to
emulate those that are...
The Hardware
● (3) OpenNebula hosts● (4) Parallel storage servers● (6) Bare-metal CPU + GPU
compute nodes
● Gigabit Ethernet● 56Gbps FDR InfiniBand
Physical Network Topology
...
Logical Infrastructure
HPC Cluster Services
Compute Nodes
● Remaining bare metal for now○ Virtualizing GPUs has caveats
● Virtualizing the nodes does give a lot more flexibility to the admins and the users○ HPC users have very specific software needs○ VMs can enable reproducibility○ Some sites are trying out containers (Docker)
End Goal
● Each employee/customer can be assigned their own private HPC cluster
● Multiple cluster instances for:○ Development○ QA○ Production
What we gain
Flexibility:● Easy backups● Easy restores● Easy upgrades● Easy rollbacks● Faster software
development
Customer sees:● Better uptime● Quicker upgrades● Fewer bugs● Better performance
What we lose
Not much!● a little bit of performance
(~1% on CPU; up to 10% on I/O)● no more direct access to InfiniBand
(HPC folks like having access to bare metal)
Other tools to investigate...
What's next?
● Got a project in mind?● Inspired to speak at our next meetup?
Get in touch!