gpu-accelerated signal processing in openstack · openstack background ! openstack founded by...

GPU Accelerated Signal Processing in OpenStack

John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute

jwalters@isi.edu

Outline

§  Motivation §  OpenStack Background §  Heterogeneous OpenStack §  GPU Performance across hypervisors §  Current Status §  Future work

Motivation

§  Scientific workloads demand increasing performance with greater power efficiency –  Architectures have been driven towards specialization,

heterogeneity §  Infrastructure-as-a-Service (IaaS) clouds can democratize

access to the latest, most powerful accelerators –  Then why are most of today’s clouds homogeneous? –  Of the major providers, only Amazon offers virtual machine

access to GPUs in the public cloud

Cloud Computing and GPUs

§  GPU passthrough has historically been hard –  Specific to particular GPUs, hypervisors, host OS –  Legacy VGA BIOS support, etc.

§  Today we can access GPUs through most of the major hypervisors –  KVM, VMWare ESXi, Xen, LXC

§  Combine this with a heterogeneous cloud

OpenStack Background §  OpenStack founded by Rackspace

and NASA §  In use by Rackspace, HP, and

others for their public clouds §  Open source with hundreds of

participating companies §  In use for both public and private

clouds §  Current stable release: OpenStack

Havana –  OpenStack Icehouse to be

released in April

Google Trends Searches for Common Open Source IaaS Projects openstack cloudstack opennebula eucalyptus cloud

OpenStack Architecture

Image source: http://docs.openstack.org/training-guides/content/module001-ch004-openstack-architecture.html

Supporting GPUs §  We’re pursuing multiple approaches for GPU support in

OpenStack –  LXC support for container-based VMs –  Xen support for fully virtualized guests –  KVM support for fully virtualized guests, SR-IOV

§  Also compare against VMWare ESXi §  Our OpenStack work currently supports GPU-enabled LXC

containers –  Xen prototype implementation as well

§  Given widespread support for GPUs across hypervisors, does hypervisor choice impact performance?

GPU Performance Across Hypervisors

§  2 CPU Architectures, 2 GPU Architectures, 4 Hypervisors –  Sandy Bridge + Kepler, Westmere + Fermi –  KVM, Xen, LXC, VMWare ESXi

§  Standardize on a common CentOS 6.4 base system for comparison –  Same 2.6.32-358.23.2 kernel across all guests and LXC

Hardware Setup

Sandy Bridge + Kepler Westmere + Fermi

CPU (cores) 2xE5-‐2670 (16) 2xX5660 (12)

Clock Speed 2.6 GHz 2.6 GHz

RAM 48 GB 192 GB

NUMA Nodes 2 2

GPU 1xK20m 2xC2075

Hypervisor Configuration

Hypervisor Linux Kernel Linux Distro

KVM 3.12 Arch 2013.10.01

Xen 4.3.0-‐7 3.12 (dom0) Arch 2013.10.01

VMWare ESXi 5.5.0 N/A N/A

LXC 2.6.32-‐358.23.2 CentOS 6.4

Benchmarks §  3 Benchmarks

–  SHOC OpenCL: signal processing –  GPU-LIBSVM: big data, machine learning –  HOOMD: Molecular dynamics, GPUDirect

§  Virtual machines: CentOS 6.4 with 2.6.32-358.23.2 kernel, 20 GB RAM, and 1 CPU socket –  Control for NUMA effects

K20 Results - SHOC

0.96 0.97 0.98 0.99

1 1.01

RelaGv

rforman

SHOC Performance for Common Signal Processing Kernels

VMWare

K20 Results – SHOC Outliers

0.94 0.96 0.98

1 1.02 1.04 1.06

RelaGv

rforman

SHOC OpenCL Level 1, Level 2 Outliers

KVM Xen LXC VMWare

C2075 Results - SHOC

0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98

1 1.02

RelaGv

rforman

SHOC Performance for Common Signal Processing Kernels

VMWare

C2075 Results – SHOC Outliers

0.6 0.7 0.8 0.9 1

RelaGv

rforman

SHOC OpenCL Level 1, Level 2 Outliers

KVM Xen LXC VMWare

SHOC Observations §  Overall both Fermi and Kepler systems perform near-

native –  This is especially true for KVM and LXC

§  Xen on the C2075 system shows some overhead –  Likely because Xen couldn’t activate large page tables

§  Some unexpected performance improvement for Kepler Spmv

K20 Results – GPU-LIBSVM

0.88 0.9

0.92 0.94 0.96 0.98

1 1.02

1800 3600 4800 6000

RelaGv

rforman

# of training instances

GPU-‐LIBSVM RelaGve Performance

VMWare

C2075 Results – GPU-LIBSVM

0 0.2 0.4 0.6 0.8 1

1.2 1.4

1800 3600 4800 6000

RelaGv

rforman

# of training instances

GPU-‐LIBSVM RelaGve Performance

VMWare

GPU-LIBSVM Observations §  Unexpected performance improvement for KVM on both

systems –  Most pronounced on Westmere/Fermi platform

§  This is due to the use of transparent hugepages (THP) –  Back the entire guest memory with hugepages –  Improves TLB performance

§  Disabling hugepages on Westmere/Fermi platform reduces performance to 80-87% of the base system

Multi-GPU with GPUDirect §  Many real applications extend beyond a single node’s

capabilities §  Test multi-node performance with Infiniband SR-IOV and

GPUDirect §  2 Sandy Bridge nodes equipped with K20 GPUs

–  ConnectX-3 IB with SR-IOV enabled –  Ported Mellanox OFED 2.1-1 to 3.13 kernel –  KVM hypervisor

§  Test with HOOMD, a commonly used GPUDirect-enabled particle dynamics simulator

GPUDirect Advantage

Image source: http://old.mellanox.com/content/pages.php?pg=products_dyn&product_family=116

HOOMD Performance

16k 32k 64k 128k 256k 512k RelaGv

rforman

Number of ParGcles

HOOMD Lennard-‐Jones Liquid MD RelaGve Performance

Current Status

§  Source code is available now –  https://github.com/usc-isi/nova

§  Includes support for heterogeneity –  GPU-enabled LXC instances –  Bare-metal provisioning –  Architecture-aware scheduler –  Prototype Xen with GPU passthrough implementation

Future Work §  Primary focus: multi-node

–  Greater range of applications, larger systems §  Integrate GPU passthrough support for KVM

–  This might come free with the existing OpenStack PCI passthrough work

§  NUMA support –  This work assumes perfect NUMA mapping –  OpenStack should be NUMA-aware

gpu-accelerated signal processing in openstack · openstack background ! openstack founded by...

Documents

rackspace: driving openstack forward · 2 white paper ::...

learning to scale openstack: an update from the rackspace...

openstack & rackspace – yesterday, today and tomorrow

big data and openstack, a love story: michael still,...

openstack in action2 rackspace- state of the openstack...

enable gpu virtualization in openstack · • intel gpu...

introduction to openstack and rackspace

déploiement d'un cloud de calcul scientifique · coût...

ansible windows workshop · openstack rackspace +more...

bridging the gap: explaining openstack to vmware ... ·...

pierre riteau - nimbusproject.org · rackspace! •...

rackspace hybrid cloud and brocade vrouter openstack summit...

openstack · อาทเิช่น rackspace, mirantis,...

rackspace reference architecture for...

openstack technology101 - home | department of...

rackspace private cloud powered by openstack ·...

rackspace private cloud powered by red hat… · 4 service...

test · puppet rhev-m ec2 vmvvare openstack restful api web...

andy mccrae, rackspace - using ansible to deploy and...

openstack training - cern · what is openstack? 4 "founded...