openstack for hpc: best practices for optimizing software-defined

OpenStack for HPC: Best Practices for Optimizing Software-Defined Infrastructure SC16 Birds of a Feather session. Organized by Mike Lowe, Blair Bethwaite, Stig Telfer, Robert Budden, Tim Randles, and Jonathan Mills. (Chris Hoge from the OpenStack Foundation is also present, and available for any questions during the conference [email protected]) OpenStack Scientific Working Group Information: https://wiki.openstack.org/wiki/Scientific_working_group OpenStack for Scientific Research PDF: http://openstack.org/assets/science/OpenStack-CloudandHPC6x9Booklet-v4-online.pdf http://www.openstack.org/science https://etherpad.openstack.org/p/SC16_BOF_Openstack_for_HPC http://bit.ly/2fdXAdJ

Description OpenStack is becoming increasingly capable for management of HPC infrastructure and support of HPC workloads. However, performance overheads, network integration, and system complexity all combine to pose a daunting challenge. The optimal outcome is to achieve all the benefits of software-defined infrastructure without paying any of the penalties. How can we get closer to achieving this? This BOF is aimed at architects, administrators and software engineers who are interested in designing and deploying OpenStack infrastructure for HPC, but are unsure how to begin. Speakers from OpenStack’s Scientific Working Group will share their experiences and engage the audience in discussions. Anyone willing to share deployment info and/or use-cases: Jonathan Mills, Robert Budden, Blair Bethwaite, Tim Randles, Mike Lowe, Stig Telfer, Clyde Jones, Andrew Young.

https://wiki.openstack.org/wiki/Scientific_working_group

https://wiki.openstack.org/wiki/Scientific_working_group

http://openstack.org/assets/science/OpenStack-CloudandHPC6x9Booklet-v4-online.pdf

http://openstack.org/assets/science/OpenStack-CloudandHPC6x9Booklet-v4-online.pdf

http://www.openstack.org/science

http://www.openstack.org/science

https://etherpad.openstack.org/p/SC16_BOF_Openstack_for_HPC

http://bit.ly/2fdXAdJ

http://bit.ly/2fdXAdJ

System Complexity ● Why not everything in Docker? / Shifter / Singularity / rkt / Charliecloud

○ Containers can make upgrades simpler but from the operator’s standpoint moving all of the guests to containers doesn’t simplify operations, the operator still has an image repository and the underlying infrastructure to maintain

● When to use one or other (or both)?

DIY Approach vs Turnkey Solution

Moving beyond the 5 core services (Nova, Keystone, Neutron, Glance, Cinder)

● Murano as a method to disseminate scientific applications across clouds ● Ironic ● Magnum? ● Swift for Object Storage ● Cyborg (formerly nomad) - new project for adding devices (accelarators, SR-iOV

etc) ● Integration / Interaction with Puppet / Ansible / Chef / SaltStack / whatever

Caching and optimizing the control plane ● Install - kolla? ● Job based arch. comment - look at CloudLab - https://www.cloudlab.us/

User Interfaces used in production ● Horizon (Scientific Cloud) ● DC2 from George Mason University ● Atmosphere / iPlant / CyVerse ● HPC resource manager integration (HPC + Cloud) ● Cluster-Cloud convergence

○ Slurm's Elastic Computing ○ MonARCH

● API documentation is incomplete and implementations are partial ○ Shade may be a better alternative to OSC

Staff support required to keep openstack running / up-to-date?

https://www.cloudlab.us/

https://www.cloudlab.us/

Performance Penalties Impact of not doing proper compute and memory tuning for HPC workloads is high - as bad as 2-5x for mixed I/O and GPU compute centric workflows. Using the Linux KVM hypervisor, local compute and memory is easily optimizable through:

CPU core pinning ● NUMA pinning ● CPU and NUMA topology exposure (for large guests) ● CPU model and extension pass-through ● Hugepages backing guest memory

Nova allows these parameters to be configured on a fairly granular basis via flavor or image metadata/properties. Some anecdotal evidence that emulated machine types and CPU+memory topology confuses HPC helper libraries such as HWLOC in OpenMPI. Deadline schedule on hypervisor to improve guest Io. https://wiki.openstack.org/wiki/Enhanced-platform-awareness-pcie

Identity and Federation

Keystone to Keystone ● Open Mass. Cloud: WIP - utilize storage across cloud boundaries

OpenID Connect Shibboleth Integration with existing account systems

● Alternate Keystone backends ○ Kerberos ○ Active Directory (Already works) - slow lookups (kilo)? ○ Redhat IdM / IPA / FreeIPA (Already works) then kerberos too no?

https://wiki.openstack.org/wiki/Enhanced-platform-awareness-pcie

https://wiki.openstack.org/wiki/Enhanced-platform-awareness-pcie

Network

Integration ● L2 or L3 based (e.g. host-based routing)

○ More than one way to do L3 - host-based routing or tunnel-endpoints on ToR switches

● Provider networks, gateways ○ Provider networks are a good option for long-lived clusters as the cloud

admin can have more control, e.g. of address space, when the provider net is integrated with cloud external networks, e.g., vlan mechanism driver with L3 gateways on core routers

● Floating IPs, or no?

Neutron network design ● VLAN, VXLAN

○ mixed reports regarding VXLAN performance in the wild ● Linux Bridge, Openvswitch ● Router modes: Legacy, DVR, DVR_SNAT

Overlay replacements for Neutron (Replacements or drivers/plugins for alternate technology options?)

● Midokura, OpenContrail, etc ○ Pluribus network OS

● Is this a rabbit hole? A distraction?

Storage

Access to storage from within tenants

Access to parallel filesystems ● Requires the parallel filesystems to support multi-tenancy:

○ Bleeding-edge feature in Lustre 2.9(?) ○ GPFS? ○ CephFS now supports this

■ http://sc16.supercomputing.org/presentation/?id=pan120&sess=sess185

○ Manila? ● Converged storage paradigms on the one technology stack:

○ GPFS ■ Swift integration is built-in so requires running IBM's

version but otherwise (Cinder and Manila) can use stock. ○ Ceph

Pros/Cons of Hyper-converged Storage Schemes? (Are there any pros beyond toy/lab sized deployments?)

Persistent on-demand user storage ● Cinder in an HPC environment? ● swift?

Accounting ● How to integrate HPC WLM and OpenStack accounting

○ Use SLURM to do it all+1 ● Some alternate user interfaces, such as DC2, have their own accounting databases for

tracking billable units

http://sc16.supercomputing.org/presentation/?id=pan120&sess=sess185

http://sc16.supercomputing.org/presentation/?id=pan120&sess=sess185

openstack for hpc: best practices for optimizing software-defined

Documents