openstack for hpc: best practices for optimizing software-defined
TRANSCRIPT
OpenStack for HPC: Best Practices for Optimizing Software-Defined Infrastructure SC16 Birds of a Feather session. Organized by Mike Lowe, Blair Bethwaite, Stig Telfer, Robert Budden, Tim Randles, and Jonathan Mills. (Chris Hoge from the OpenStack Foundation is also present, and available for any questions during the conference [email protected]) OpenStack Scientific Working Group Information: https://wiki.openstack.org/wiki/Scientific_working_group OpenStack for Scientific Research PDF: http://openstack.org/assets/science/OpenStack-CloudandHPC6x9Booklet-v4-online.pdf http://www.openstack.org/science https://etherpad.openstack.org/p/SC16_BOF_Openstack_for_HPC http://bit.ly/2fdXAdJ
Description OpenStack is becoming increasingly capable for management of HPC infrastructure and support of HPC workloads. However, performance overheads, network integration, and system complexity all combine to pose a daunting challenge. The optimal outcome is to achieve all the benefits of software-defined infrastructure without paying any of the penalties. How can we get closer to achieving this? This BOF is aimed at architects, administrators and software engineers who are interested in designing and deploying OpenStack infrastructure for HPC, but are unsure how to begin. Speakers from OpenStack’s Scientific Working Group will share their experiences and engage the audience in discussions. Anyone willing to share deployment info and/or use-cases: Jonathan Mills, Robert Budden, Blair Bethwaite, Tim Randles, Mike Lowe, Stig Telfer, Clyde Jones, Andrew Young.
System Complexity ● Why not everything in Docker? / Shifter / Singularity / rkt / Charliecloud
○ Containers can make upgrades simpler but from the operator’s standpoint moving all of the guests to containers doesn’t simplify operations, the operator still has an image repository and the underlying infrastructure to maintain
● When to use one or other (or both)?
DIY Approach vs Turnkey Solution
Moving beyond the 5 core services (Nova, Keystone, Neutron, Glance, Cinder)
● Murano as a method to disseminate scientific applications across clouds ● Ironic ● Magnum? ● Swift for Object Storage ● Cyborg (formerly nomad) - new project for adding devices (accelarators, SR-iOV
etc) ● Integration / Interaction with Puppet / Ansible / Chef / SaltStack / whatever
Caching and optimizing the control plane ● Install - kolla? ● Job based arch. comment - look at CloudLab - https://www.cloudlab.us/
User Interfaces used in production ● Horizon (Scientific Cloud) ● DC2 from George Mason University ● Atmosphere / iPlant / CyVerse ● HPC resource manager integration (HPC + Cloud) ● Cluster-Cloud convergence
○ Slurm's Elastic Computing ○ MonARCH
● API documentation is incomplete and implementations are partial ○ Shade may be a better alternative to OSC
Staff support required to keep openstack running / up-to-date?
Performance Penalties Impact of not doing proper compute and memory tuning for HPC workloads is high - as bad as 2-5x for mixed I/O and GPU compute centric workflows. Using the Linux KVM hypervisor, local compute and memory is easily optimizable through:
CPU core pinning ● NUMA pinning ● CPU and NUMA topology exposure (for large guests) ● CPU model and extension pass-through ● Hugepages backing guest memory
Nova allows these parameters to be configured on a fairly granular basis via flavor or image metadata/properties. Some anecdotal evidence that emulated machine types and CPU+memory topology confuses HPC helper libraries such as HWLOC in OpenMPI. Deadline schedule on hypervisor to improve guest Io. https://wiki.openstack.org/wiki/Enhanced-platform-awareness-pcie
Identity and Federation
Keystone to Keystone ● Open Mass. Cloud: WIP - utilize storage across cloud boundaries
OpenID Connect Shibboleth Integration with existing account systems
● Alternate Keystone backends ○ Kerberos ○ Active Directory (Already works) - slow lookups (kilo)? ○ Redhat IdM / IPA / FreeIPA (Already works) then kerberos too no?
Network
Integration ● L2 or L3 based (e.g. host-based routing)
○ More than one way to do L3 - host-based routing or tunnel-endpoints on ToR switches
● Provider networks, gateways ○ Provider networks are a good option for long-lived clusters as the cloud
admin can have more control, e.g. of address space, when the provider net is integrated with cloud external networks, e.g., vlan mechanism driver with L3 gateways on core routers
● Floating IPs, or no?
Neutron network design ● VLAN, VXLAN
○ mixed reports regarding VXLAN performance in the wild ● Linux Bridge, Openvswitch ● Router modes: Legacy, DVR, DVR_SNAT
Overlay replacements for Neutron (Replacements or drivers/plugins for alternate technology options?)
● Midokura, OpenContrail, etc ○ Pluribus network OS
● Is this a rabbit hole? A distraction?
Storage
Access to storage from within tenants
Access to parallel filesystems ● Requires the parallel filesystems to support multi-tenancy:
○ Bleeding-edge feature in Lustre 2.9(?) ○ GPFS? ○ CephFS now supports this
■ http://sc16.supercomputing.org/presentation/?id=pan120&sess=sess185
○ Manila? ● Converged storage paradigms on the one technology stack:
○ GPFS ■ Swift integration is built-in so requires running IBM's
version but otherwise (Cinder and Manila) can use stock. ○ Ceph
Pros/Cons of Hyper-converged Storage Schemes? (Are there any pros beyond toy/lab sized deployments?)
Persistent on-demand user storage ● Cinder in an HPC environment? ● swift?
Accounting ● How to integrate HPC WLM and OpenStack accounting
○ Use SLURM to do it all+1 ● Some alternate user interfaces, such as DC2, have their own accounting databases for
tracking billable units